AI company Sesame has open-sourced the base model behind Maya, its highly realistic voice assistant.
Named CSM-1B, the model has 1 billion parameters and is available under an Apache 2.0 license, allowing commercial use with minimal restrictions. According to Sesame, it generates “RVQ audio codes” from text and audio inputs—a method also used in AI audio technologies like Google’s SoundStream and Meta’s Encodec.
CSM-1B is built on a Meta Llama model and paired with an audio decoder. While a fine-tuned version powers Maya, the open-source release is a general-purpose model capable of producing various voices. However, it hasn’t been optimized for specific voices or languages beyond English.
Sesame hasn’t disclosed the training data used for CSM-1B, and the model lacks built-in safeguards. The company simply asks users not to misuse it for voice impersonation, fake news, or malicious activities.
I tried the demo on Hugging Face, and cloning my voice took less than a minute. From there, it was easy to generate speech to my heart’s desire, including on controversial topics like the election and Russian propaganda.
Consumer Reports recently cautioned that many popular AI-powered voice cloning tools lack strong safeguards against fraud and misuse.
Sesame, the startup co-founded by Oculus co-creator Brendan Iribe, gained viral attention in late February for its lifelike voice assistant technology. Its assistants, Maya and Miles, mimic natural speech patterns with pauses, breaths, and interruptions—similar to OpenAI’s Voice Mode—making them feel remarkably real.
Backed by investors like Andreessen Horowitz, Spark Capital, and Matrix Partners, Sesame has raised an undisclosed amount in funding. Beyond voice assistants, the company is also developing AI-powered smart glasses designed for all-day wear, featuring its custom AI models.