It’s hard to imagine an election promise more unpopular than raising the price of beer. But that’s exactly what the leader of the Progressive Slovakia party, Michal Šimečka, was recorded saying he would do just three days before Slovakia’s elections on Sept. 30, 2023. There was just one catch: Šimečka never made such a promise. Instead, his “recorded” proposal was generated by a machine learning model trained to impersonate his voice. While fact-checkers quickly flagged the recording as fake, just one day later, another false recording of Šimečka’s voice began making its rounds through Slovak internet spaces. In this new audio clip, which was shared widely and even reposted by a former member of parliament, Šimečka appears to scheme with a prominent journalist to buy votes to rig the upcoming election.
Voice cloning, the practice of training an artificial intelligence (AI) model to mimic an individual’s voice and speech patterns, has become staggeringly advanced in the past few years. A recent study by the University College London found that participants could only identify whether a selected voice was AI-generated 73% of the time. In addition, thanks to tech companies like ElevenLabs and Speechify, the technology for copying voices is becoming increasingly affordable and accessible. And that’s a problem: Our ability to differentiate between the real and the synthetic forms the basis for our worldviews, political beliefs, and decision-making. What happens when that is lost?
With widespread use and greater sophistication just around the corner, it’s high time we prepare ourselves for the coming era of voice cloning. While this technology has important practical applications, especially in the realms of entertainment and translation, it also has a high ceiling for misuse, one that tech companies and policymakers alike need to swiftly address, warns a new research brief by Democracy Reporting International.
Creating More Convincing Disinformation
As demonstrated in the Slovak elections, the increasing sophistication of voice cloning presents a serious challenge for democracies. High-quality disinformation can be especially difficult to tackle in smaller democracies like Slovakia, where fact-checking organizations and content moderation teams familiar with the local language consist of only a few individuals each. Such dangers are, of course, particularly worse on X — formerly Twitter — where entire moderation teams have been dismissed. When the Slovakian government and the European Commission summoned social media executives to Bratislava to discuss election disinformation, representatives from X didn’t even bother to show up.
It should come as no surprise that some of the loudest voices speaking out against this technology are those with popular and recognizable voices. The copyright issues created by AI music have been a particular issue for big-named artists like Drake, whose voice was used to create a viral hit on TikTok in April 2023. The song’s popularity has generated an online wave of music that uses clones of popular artists’ voices. The rich and famous aside, the current lack of legal protections against AI-generated mimics leaves smaller artists — who lack large legal teams — especially vulnerable to exploitation.
We currently exist in the “wild west” moment of AI, where regulation and legal definitions still lag behind the technology’s rapid advances. Lawmakers have yet to clearly define legal limits to the use of another’s voice, while AI companies lack verification processes to ensure users have the right to use a voice before they clone it.
And then there’s the threat of misrepresentation. Thanks to voice cloning technology, famous voices can be heard endorsing whichever individual, product, scam, or ideology the creator chooses. Want your transphobic Make-America-Great-Again music video to feature iconic rappers? Just add Drake and 21 Savage’s voice clones and make them sing about patriotism and voting for Donald Trump in 2024 (neither are US citizens). The possibilities for misuse were best summarized by British actor Stephen Fry upon discovery that an AI copy of his voice, which had narrated all seven Harry Potter audiobooks, was circulating online. “It could therefore have me read anything from a call to storm Parliament to hard porn, all without my knowledge and without my permission.”
Swindling Banks and Grandparents Alike
But AI-generated voices aren’t just a threat to creative talent. Voice cloning also has a variety of criminal applications, particularly for identity theft and fraud. Already, voice clones have been used to carry out high-profile bank heists, mimicking the voices of trusted managers to authorize transfers of millions of dollars. While the best voice clones are trained on hours of audio content, less sophisticated copies can be created using just a few seconds of someone’s recorded voice. If you’ve ever uploaded a video to social media where you even briefly speak, scammers can create a clone of your voice that, while crude, could pass as you to bank tellers or even loved ones.
In light of these threats, many banks are already stepping up their cybersecurity by requiring multi-factor authentication, including the use of AI to verify face and voice biometrics. But while large financial institutions like banks can and will take steps to secure themselves, individuals remain at significant risk. In March 2023, several Canadian elderly people were tricked into wiring thousands of dollars to scammers who had used voice clones to imitate their grandchildren begging for money to pay bail fees. These so-called “grandparent scams” are an effective racket for fraudsters, as they target a demographic largely unaware of the existence of voice cloning technology.
A Healthy Dose of Skepticism, and Then Some
Voice cloning and generative AI are here to stay, and we must adapt accordingly. We currently exist in the “wild west” moment of AI, where regulation and legal definitions still lag behind the technology’s rapid advances. Lawmakers have yet to clearly define legal limits to the use of another’s voice, while AI companies lack verification processes to ensure users have the right to use a voice before they clone it.
More pressing is the need for greater awareness of AI’s increasing capabilities and applying a more critical eye to what we see and hear. The age of watching a video or hearing a recording and taking it at face value is over — and has been for some time. If you get a call from a loved one asking for thousands of gift cards, call them back on their personal cell. If you see a video of a politician saying they want to increase the price of a pint, double-check if any trusted news sources have reported similar.
In this new era of AI disinformation, social media content policies, moderation teams, and fact-checkers will be more important than ever. It is therefore important that debunkers are equipped with the know-how and technology to identify synthetic media. Several tech companies have already announced plans to create automated AI detection tools. These tools, which themselves use models trained on large datasets of AI-generated images, videos, and audio, would be able to detect and flag synthetic content as soon as it was posted. While these investments are important, social media companies should also focus on rebuilding geographically diverse trust and safety teams that can adapt to emerging disinformation narratives.
In the meantime, however, the rest of us will have to make do with a healthy dose of skepticism, and then some. Technology marches ever onward, we just need to make sure we can keep up. So spread the word, and maybe check in on your grandma.