Meta, formerly known as Facebook, has unveiled its latest innovation in AI technology—Voicebox AI. This groundbreaking tool is designed to generate spoken speech based on textual cues, representing a significant leap forward in the field of voice assistants and speech synthesis.
Voicebox AI operates on a model reminiscent of OpenAI’s ChatGPT and DALL-E, albeit with a unique focus on generating spoken language rather than text or images. The system has undergone extensive training using a vast dataset comprising 50,000 hours of unfiltered audio. This rich dataset includes transcripts of publicly available audiobooks recorded in multiple languages, including English, French, Spanish, German, Polish, and Portuguese.
The diversity of this dataset equips Voicebox AI with the remarkable capability to generate “more conversational speech,” transcending language barriers and facilitating smoother interactions.
Meta asserts that speech recognition models trained on synthetic speech generated by Voicebox perform impressively, nearly matching models trained on real speech. The company claims that Voicebox surpasses Microsoft’s VALL-E in text-to-language conversion in terms of both intelligibility and audio similarity, all while operating 20 times faster.
Beyond its remarkable speech generation capabilities, Voicebox AI offers additional features that promise to revolutionize audio editing. Users can harness the tool to edit audio, remove noise, and even rectify mispronounced words. Human users can identify segments of speech marred by noise, such as a barking dog, trim these segments, and instruct the model to correct them.
However, Meta has not yet made the Voicebox program or its source code publicly available, citing concerns about potential misuse. The company is committed to ensuring the responsible deployment of this technology.
Researchers behind Voicebox AI envision a future where this technology finds applications in various domains, including assisting individuals with damaged vocal cords, enhancing gaming non-player characters (NPCs), and empowering digital assistants to provide more engaging and human-like interactions.
Meta’s approach to sharing its AI technologies has evolved over time. In January, the company released its LLaMA AI language model as an open-source package accessible to the AI community. However, concerns about misuse arose when the model’s data appeared on unauthorized platforms.
In addition to Voicebox AI, Meta has been actively developing other AI models, such as SAM, an AI image segmentation model capable of identifying specific objects in images or videos based on user cues. The company also collaborates with developers by providing open-source code and datasets for projects like Animated Drawings AI.
Meta’s unveiling of Voicebox AI signals a significant advancement in the field of speech generation and underscores the company’s commitment to driving innovation and responsible AI development.