Voice cloning: Potential, peril and protection

Voice cloning is technology that uses artificial intelligence to create a replica of a specific person’s voice.

 As cyber attackers are finding ways to use technology to clone voices to further their deceptive practices, others are attempting to find ways to detect their misdeeds… with varying degrees of success. (Photo: Debalina/Adobe Stock)

The media reports a lot on “deepfakes” and, more recently, “cheap fakes”— video and still images that do not reflect reality but are, in essence, an AI-generated facsimile that looks and sounds frighteningly like the real thing.

The rise of synthetic voices

Of course, synthetic media isn’t limited to just images and video. Voices can—and are—also faked. Voice cloning is technology that uses artificial intelligence to create a replica of a specific person’s voice. In a true-life story, there was rampant online speculation that actress Scarlett Johansson’s voice was dubbed for use in a new voice assistant model by the makers of ChatGPT without her permission. The matter was left among lawyers but raises questions about copyright, privacy and emotional connections with AI assistants.

To be sure, voice cloning isn’t inherently evil. There are plenty of positive applications for both businesses and individuals. It can be used to create personalized digital assistants, to narrate audiobooks, or to enable people who have lost their voice the ability to speak again. As an example, after a stroke left him unable to sing, country musician Randy Travis was able to use voice cloning technology to release new music. And companies are marketing the capability for individuals to “listen to” the voices of deceased family members and friends.

Corporate communications and marketing professionals are finding useful applications for voice cloning – everything from creating audio communications for use in chatbots, to sending voicemail updates in native languages across multi-national offices.

But voice cloning also holds the potential for peril and those malicious intents have not been lost on cybercriminals. Deepfake audio can be used to spread disinformation, or to impersonate others without their consent for nefarious purposes.

Simple, inexpensive and very accessible

The ability to clone and use someone’s voice—for good or ill—is accessible to just about anyone. For example, a CBS reporter discovered a five-dollar voice cloning service that, once given a 30-second random audio clip of someone’s voice, can generate a deepfake in just a few minutes.

Considering how simple it is to obtain a recording of someone’s voice—whether it be a celebrity figure, a politician running for office, or any member of your company’s C-suite—the potential for harm and illicit activity is all too real.

And, that ability to use synthetic media for fraud on manipulation has certainly not gone unnoticed by bad actors. Here are just a few examples:

These fakes are real. And their implications are scary. As cyber attackers are finding ways to use technology to clone voices to further their deceptive practices, others are attempting to find ways to detect their misdeeds… with varying degrees of success.

Detecting the deepfakes

The University at Buffalo’s Media Forensic Lab launched a DeepFake-O-Meter — an open platform that allows users to upload an image, video, or audio file to determine if it’s real or fake.

The FTC also launched a voice cloning challenge to encourage people to develop algorithms that can detect AI-generated voices. The top three submissions were selected by a panel of preeminent judges and will share $35,000 in prize money.

But attackers and defenders are engaged in a never-ending arms race of tech-powered one-upmanship. Just as fast as detection technology emerges, threat actors develop new ways to thwart detection. And, honestly, no detection method is 100% effective… which is why creators on TikTok and YouTube are able to get around AI-powered censorship and moderation by using stupidly simple low-tech methods like so-called ‘algospeak’. Technology is just one tool in the arsenal for protecting our organizations and ourselves from voice cloning and other frauds.

As for cybersecurity risks in general, the human element is a critical first and last line of defense.

Establishing a strong security culture

Creating a healthy security culture is a continuous process, not a one-time event. Implementing a concerted effort to educate employees about their role in mitigating the risks associated with voice cloning is vital for protecting your company, employees, and customers.

While voice authentication detection through technology cannot be solely relied upon, it still plays an important role when combined with a positive security culture and well-educated and informed employees. A combination of high-tech and high touch will provide the best protection from nefarious voice cloning efforts and other security risks.

Perry Carpenter is the author of the upcoming book, “FAIK: A Practical Guide to Living in a World of Deepfakes, Disinformation, and AI-Generated Deceptions.” [2024, Wiley] His second Wiley book on the subject. He is chief human risk management strategist for KnowBe4, provider of security awareness training and simulated phishing platforms used by more than 65,000 organizations and 60 million users worldwide.

Related: