AI chatbots and art generators are popping up everywhere, so the big players in the biz are trying to keep up with their own tools. Meta just unveiled Voicebox, a text-guided AI speech generator they say outperforms all existing models. Though it’s not available for public use yet, Meta has made demos available for anyone who wants to learn more.
Voicebox could be used in audio editing by content creators and editors – its voice generation produces natural-sounding audio clips. It can even edit out noise from voice clips like dogs barking and regenerate the voice without a hitch. Plus, visually-impaired users can give Voicebox an audio clip of a friend as short as two seconds and it’ll read their written messages in their voice using AI!
Meta AI’s new generative AI tool can solve tasks via in-context learning, so it can process text it’s never seen before and generate context and inflections just like a person would – using existing knowledge to learn and tackle new challenges. But the ethical and legal implications of this groundbreaking tool are no joke: anyone could use recordings of someone’s voice without permission to make them say whatever they want.In their paper, Meta claims a binary classification model can tell real-world speech from Voicebox-generated ones. But since the system isn’t publicly available yet, we won’t know for sure until Meta gets put to the test.
To get optimal performance, Meta trained Voicebox on 60,000 hours of English audiobooks and 50,000 hours of multilingual audiobooks in six languages. This training gives it the power to do multilingual text-to-speech with no extra training, speech denoising, styling, editing…you name it.
Meta claims Voicebox is faster and makes fewer errors than its competitors. Plus, it can convert written text into spoken words in one or multiple languages without needing to be trained for each language separately. Compared to YourTTS, Voicebox reduced the average word error rate from 10.9% to 5.2%, and increased audio similarity from 0.335 to 0.481.