Yesterday an artificial intelligence firm in Canada called Lyrebird announced that it had developed algorithms that can mimic anyone’s voice using only 60 seconds of audio. They are not the only company that is working on human-like voice synthesis. The Verge points out that both Adobe and Google have voice synthesis divisions working on developing realistic computer generated speech.
Using machine learning AI, Google has developed life-like voice synthesis that even incorporates the breath sounds that are heard in human speech. Adobe has been working on Project VoCo, a tool similar to Photoshop, except for audio editing. Its software can make just about any voice say anything, but it needs a 20-minute speech sample to accomplish this.
The fact that Lyrebird only needs 60 seconds of audio is a significant factor, but the resulting voice still sounds robotic. However, the software does get the pitch, accent, and some of the speaking mannerisms though, as this sample of a conversion between Donald Trump, Barack Obama, and Hilary Clinton shows.
It will be some time before it is of a quality that would fool someone, but that raises the question: What happens when it or any other software can fool someone? A successful Turing test for voice synthesis is not that far away judging from some of the samples already out there. So what happens when a computer can fool a human into thinking that they are hearing another human? The potential for abuse is obvious.
According to the Verge, “We already know that synthetic voice generators can trick biometric software used to verify identity.”