exist Unlimited conversations on the sitegerman filmmaker Werner Herzog and Slovenian philosopher Slavoj Žižek are talking openly about anything. Their discussions are persuasive, in part because these intellectuals speak English with a pronounced accent, not to mention a penchant for eccentric word usage.But they have something else in common: both voices are deep fakethe text they speak in those distinctive accents is being generated by artificial intelligence.
I create this dialogue as a warning.so-called improvement machine learning Deepfakes — extremely realistic but false images, videos or voices — are too easy to create, and the quality is too good. Meanwhile, language-generating AI can generate large amounts of text quickly and cheaply. Together, these technologies allow for more than just unlimited conversations.they have the ability to overwhelm us ocean of disinformation.
Machine learning, an artificial intelligence technique that uses vast amounts of data to “train” algorithms to improve as they perform specific tasks repeatedly, is going through a phase of rapid growth. This has taken the entire field of information technology to the next level, including speech synthesis, systems that produce speech that humans can understand. As someone interested in the boundary space between humans and machines, I’ve always found it a fascinating application. So when these advances in machine learning have allowed speech synthesis and voice cloning technologies to take giant leaps in the past few years—after a long period of small, incremental improvements—I’m noticing.
when i stumbled across a Sample Speech Synthesis Program Called Coqui TTS. Many projects in the digital space begin by finding previously unknown software libraries or open source programs. When I discovered this kit, along with its thriving user community and extensive documentation, I knew I had all the ingredients necessary to clone famous sounds.
As an admirer of Werner Herzog’s work, characters and worldview, I’ve always been fascinated by his voice and the way he speaks.I’m not alone as pop culture turns Herzog into a cartoon: His cameos and collaborations include The Simpsons, rick and morty and Penguins of MadagascarSo when it comes to picking someone’s voice to tinker with, there’s no better choice – especially since I know I’ll have to listen to that voice for hours on end. It is almost never tiresome to listen to his dry speeches and thick German accent, which convey a gravitas that cannot be ignored.
Building the training set for cloning Herzog’s voice was the easiest part of the process. Between his interviews, voiceovers, and audiobooks, there are literally hundreds of hours of lectures that could be used to train machine learning models — or in my case, fine-tune existing models. The output of a machine learning algorithm typically improves over “epochs,” which are the periods during which a neural network is trained using all of the training data. The algorithm can then sample the results at the end of each epoch, providing researchers with material to review to assess how the program is going. With the synthesized voice of Werner Herzog, hearing the model improve with each passing era feels like witnessing the birth of a metaphor as his voice gradually comes to life in the digital realm.
Once I had a satisfying Herzog voice, I started researching a second voice and chose Slavoj Zizek intuitively. Like Herzog, Žižek has an amusing, quirky accent, has a relevant presence in intellectual circles, and has ties to film. He also achieved a degree of popularity, in part because of his polemical enthusiasm and sometimes controversial ideas.
At this point, I’m still not sure what the final format of my project will be – but I’m amazed at how easy and smooth the entire voice cloning process was, and I know that’s a warning to anyone who’s paying attention. Deepfakes have gotten too good and easy to make; just this month, Microsoft announced a New Speech Synthesis Tool VALL-E Any sound can be imitated with just a three-second recording, the researchers claim. We are about to face a crisis of confidence, and we are not prepared for it.
To underscore the technology’s ability to generate massive amounts of disinformation, I decided to engage in a never-ending conversation. All I needed was a large language model — fine-tuned based on the text each of the two participants had written — and a simple program to control the back and forth of the conversation so that its flow felt natural and believable.
At its core, language models predict the next word in a sequence based on a sequence of words that already exists. By fine-tuning a language model, it is possible to replicate the style and concepts that a particular person might talk about, provided you have a large transcript of that person’s conversations. I decided to use one of the leading commercial language models. That’s when it dawned on me that it was already possible to generate a fake dialogue, including its synthetic speech form, in less time than listening to it. This gave me an obvious project name: Infinite Conversation. After several months of work, I released it online last October.Infinite Conversations will also begin on February 11 at Museum of Misplacement Art installation in San Francisco.
Once all the pieces were in place, I was amazed at the things I didn’t think of when I started this project. Just like their real-life characters, my chatbot versions of Herzog and Žižek often engage in conversations around philosophical and aesthetic topics. Due to the esoteric nature of these topics, listeners can temporarily ignore the occasional nonsense produced by the models. For example, Al Žižek’s take on Alfred Hitchcock alternates between seeing the famous director as a genius and a cynical manipulator.In another contradiction, the real Herzog notoriously hates chicken, but his AI impersonators sometimes talk sympathetically about the fowl.Because actual postmodern philosophy reads confusingly, a question Zizek himself pointed outThe lack of clarity in infinite dialogue can be interpreted as a deep ambiguity rather than an impossible contradiction.
This may contribute to the overall success of the project. Hundreds of visitors to Infinite Conversation have been listening for over an hour, and in some cases longer. As I mentioned on the site, I hope that visitors to Infinite Conversation don’t pay too much attention to what the chatbot says, but rather understand the technology and its consequences; Believe me, then imagine how these authentic-sounding speeches could be used to tarnish the reputations of politicians, defraud business leaders, or simply distract people with misinformation that sounds like human reporting.
But there is also a bright side. Visitors to Infinite Conversation can join the growing number of listeners who report using the soothing sounds of Werner Herzog and Slavoj Žižek as a form of white noise to fall asleep to. That’s the usage of this new technology that I have access to.
This is an opinion and analysis article and the views expressed by the author or author do not necessarily represent Scientific American.