AI voice generation is a huge opportunity for podcasting — and a big risk
Photo: cottonbro studio
Artificial intelligence is revolutionising entertainment, including podcasts. Perhaps the most contentious part of this revolution is AI voice generation, which is stoking legitimate fears from creators that their voices will be used without permission. However, podcast creators have a particular opportunity to monetise their vocals themselves. New tools allow them to generate host-read ads, introductions, and even entire episodes, based on text, thus eliminating time spent in the studio and the need for expensive recording equipment. But AI voice generation is a slippery slope, and it will be essential to protect creators’ rights and prevent unauthorised uses, which risk spreading misinformation on a mass scale.
AI in audio ads
Advertisers already use AI to fine-tune podcast advertising. For instance, iHeartMedia’s partnership with Sounder uses AI to evaluate the context of a podcast episode and ensure a brand is the right fit. Moreover, Acast’s conversational targeting tool places an ad in the most relevant part of an episode, by focusing on keywords or sentences that amplify a brand’s message.
Now, companies are exploring how AI can be used to create entire audio ads. Audio ad tech company Adswizz is developing AI voiceover tools for advertisers that eliminate the costs of hiring voice actors. However, host-read ads are viewed as more effective, since podcast listeners place trust in their hosts and are more sensitive to ad relevance than the average consumer. This opens an opportunity for hosts to leverage AI voice models themselves. Spotify executive and podcast host Bill Simmons recently claimed that the platform is testing AI technology that would empower podcasters to instantly generate ads read in their voice. So, creators can make host-read ads in much less time, and in turn, make more ads and generate more revenue. This technology could also help geo-target listeners and translate ads in a variety of languages — provided, of course, that the host gives permission for their voice to be replicated. Such a tool could attract more advertisers and creators to Spotify, which aligns with its recent pivot towards serving independent creators.
Podcasts and radio How radio can play by its own rules
Radio set the stage for music streaming and podcasts, only to watch them take portions of its audience. However, there is a clear opportunity for radio companies to assert themselves in the podcast market,...Find out more…
Spotify is not the only platform (allegedly) working on these types of features. For instance, Podcastle allows creators to make a digital copy of their voice to generate not just ads but podcast intros and entire episodes, based on scripts. However, Spotify and the wider industry must ensure the podcasters are properly compensated for the ads generated in their voice, as well as guarding against misuse.
The risk of misinformation
A podcaster's voice is perhaps the most valuable tool in their arsenal. For many creators, their voice is their only identifier, especially for those who do not create podcast videos. Recently, The Joe Rogan AI Experience podcast replicated Rogan’s voice for a fake conversation that, ironically, revolved around concerns over AI. Moreover, TikTok removed a false viral ad that was created using Rogan’s voice. News is a top genre for podcasts, meaning that false episodes can have serious consequences, propelling misinformation on a mass scale. AI voice-generation tools are evolving at lightning speed, but the regulation around protecting their voices will take time to develop.
However, if platforms can figure out how to prevent misuse and flag unauthorised work, podcasters could even use AI to take their voices to other entertainment forms. Podcasters could licence their voices for film and TV, like actor James Earl Jones, who cut a deal with AI cloning start-up Respeecher to replicate his voice for Disney+’s Obi-Wan Kenobi. Unlike artists, actors and other entertainment creators, podcasters’ single most valuable asset is their speaking voice — and this gives them all the more reasons to figure out how to work with AI voice technology rather than against it.