Highlights:
- Artificial intelligence (AI) voice cloning technology can create and duplicate voices.
- Projects, a new product from ElevenLabs, enables users to create comprehensive dialogue segments for extensive projects.
ElevenLabs Inc., a platform for artificial intelligence voice synthesis, declared recently that it had raised USD 19 million in new funding in a Series A round to introduce a new set of tools, among them an AI speech detection module.
Anyone can create lifelike speech from text using the ElevenLabs platform using synthetic voices, cloned, or entirely artificial voices that can be customized based on customer needs. It is possible to customize generated voices based on preferences for gender, age, and accent. In order to create highly realistic content, voices can also be programmed to reflect tone, intonation, and emotion.
Andreessen Horowitz, Nat Friedman, and Daniel Gross co-led the funding round. Credo Ventures, Concept Ventures, and business angel investors such as Mike Krieger, Co-founder of Instagram, Brendan Iribe, Co-founder of Oculus VR, Mustafa Suleyman, Co-founder of DeepMind, and Tim O’Reilly, Founder of O’Reilly Media, also participated in the round.
The ability of AI voice cloning technology to create and duplicate voices has made it a viral sensation and an incredible tool for producing speech on demand. According to the company, it has registered over one million users and produced more than 10 years’ worth of audio content after spending 2022 researching voice synthesis technology and launching its technology in beta in 2023.
This has raised questions about the potential immoral uses of this technology, like AI deepfakes. One instance is when users of the anonymous English-language image board 4chan shared instructions on how to use technology to create hate-filled memes with famous voices.
ElevenLabs is releasing what it refers to as an AI Speech Classifier in order to combat the possibility of future deepfake AI voices proliferating. Anyone can use this tool to determine whether or not the audio they’re listening to was produced entirely or in part using ElevenLabs’ speech synthesis models. Anyone can now upload a clip, test voices, choose partners as an API, and use the publically accessible service.
According to Co-founder and CEO of ElevenLabs, Mati Staniszewski, the release of this tool reflects the business’ dedication to openness and moral generative media. He added, “Our mission is to be the ultimate tool for storytelling, dissolving language barriers and putting all audiences in the reach of all content creators in a safe and responsible way.”
The tool will let people wary of voices test them, even though it won’t necessarily stop bad actors from using voice cloning or abusing AI voice technology. The fact that it can only identify voices created by ElevenLabs raises the possibility that the voices created by other AI will go undetected. Still, it may also encourage other upcoming AI synthesis model makers to develop their own.
Projects is a production workflow platform for long-form speech synthesis and audio “infill” that enables users to create entire dialogue segments for large projects, according to the company’s other announcement. This includes listening to audiobooks and reading news articles without ever leaving the platform.
Creators will have complete control over pacing, pausing, and assigning specific speakers to various text fragments when using Projects to direct and edit large project audio. According to the company, Projects’ main goal is to essentially turn into the “Google Docs” of speech audio creation and editing.
Projects will be added to ElevenLabs’ current line of products, which already includes VoiceLab, which enables users to clone voices or develop original artificial voices for use in their own work, and Speech Synthesis, the company’s text-to-speech generation technology.