Undergraduate Innovators Launch AI Speech Model to Compete with Industry Leaders

Close-up of a microphone with a blurred background

In an exciting development, two undergraduates, Toby Kim and his fellow co-founder, have ventured into the rapidly evolving domain of artificial intelligence, unveiling a new AI speech model named Dia. This model promises to offer functionalities akin to major players in the AI speech generation arena, such as Google’s NotebookLM. Remarkably, neither of the founders possessed deep prior expertise in AI, yet they have managed to create an accessible and powerful tool that can generate engaging podcast-style audio clips. With the global market for synthetic speech technology expanding significantly, this startup, Nari Labs, is poised to make a mark amid established competitors like ElevenLabs and PlayAI.

The co-founders were inspired by their observations of industry trends and innovations; within merely three months of conducting research, they identified a desire to provide users with more nuanced control over voice generation—a crucial aspect for content creators. To develop Dia, they leveraged Google’s TPU Research Cloud program, which grants access to advanced TPU AI chips for training models. Weighing in at an impressive 1.6 billion parameters, Dia is designed to produce realistic dialogue based on user scripts, allowing adjustments in tone, as well as the inclusion of nonverbal pauses and inflections such as coughs or laughter.

Unlike many existing applications, Dia specializes in voice modulation, providing users with the ability to clone specific voices or generate new styles based on input descriptions. Initial tests conducted by TechCrunch demonstrated Dia’s capability to produce smooth and engaging two-way conversations on a range of topics, showcasing its competitive voice quality against other industry models.

However, it’s essential to note that like many voice generation models, Dia lacks robust safeguards against potential misuse, raising concerns about the spread of disinformation. While Nari Labs advises users against illicit use of the technology, the implications of its capabilities have sparked discussions about ethical constraints in AI development. The specifics of the training set used for Dia remain undisclosed, with speculations hinting at the inclusion of copyrighted materials—a debated legal gray area in AI development.

The future looks bright for Nari Labs, as they plan to enhance Dia into a comprehensive synthetic voice platform with social networking features, alongside releasing supporting documentation to detail their methodologies. Additionally, they aim to broaden the model’s applicability to various languages soon, ensuring a wider global reach.

Newsletter Updates

Enter your email address below and subscribe to our newsletter