Zonos-v0.1: A New Era in Voice Cloning

Zonos-v0.1: A New Era in Voice Cloning

Zonos-v0.1 is an open-weight text-to-speech (TTS) model that rivals or even surpasses top TTS providers in expressiveness and quality.

With just 5 to 30 seconds of audio, Zonos can perform high-fidelity voice cloning. It allows fine-tuned control over speaking rate, pitch, audio quality, and emotions such as happiness, fear, and anger. The model outputs speech at a native 44kHz and has been trained on approximately 200,000 hours of primarily English speech data.

Key features:

🔹 Zero-shot TTS: Generate high-quality speech with a short voice sample

🔹 Audio prefixes: Use audio prompts for more realistic speech synthesis

🔹 Multilingual support: Supports English, Japanese, Chinese, French, and German

🔹 Fast performance: Runs at ~2x real-time speed on an RTX 4090 GPU

🔹 Easy setup and usage: Simple deployment with Docker and an intuitive Gradio interface

Zonos is a powerful tool for anyone looking to generate lifelike and expressive speech.

Source : https://x.com/ZyphraAI/status/1888996367923888341

https://huggingface.co/Zyphra/Zonos-v0.1-hybrid

Categories

Language

Zonos-v0.1: A New Era in Voice Cloning

Categories

Language

Zonos-v0.1: A New Era in Voice Cloning

📬 Subscribe to Our Newsletter