Qwen/Qwen3-TTS-12Hz-1.7B-Base

🧠 AI ModelQwen

Open-source 1.7B parameter TTS model by Qwen with efficient 12Hz audio generation.

Qwen3-TTS-12Hz-1.7B-Base is a state-of-the-art text-to-speech model developed by Qwen, the AI research team behind the Qwen series. It uses a transformer architecture with 1.7 billion parameters to convert text into 12Hz low-frame-rate speech tokens. The innovative 12Hz tokenization reduces computational overhead while maintaining natural prosody and voice clarity, making it suitable for edge deployment and real-time systems. The model has been trained on diverse multilingual data and supports zero-shot voice cloning. With over 1.3 million downloads on Hugging Face, it has quickly become a popular choice for developers and researchers. The Apache 2.0 license allows free use, modification, and distribution.

💡Highlights

├─1.7B transformer parameters
├─12Hz low-frame-rate tokens
└─Apache 2.0 license

🎯For

├─AI developers
├─voice interface engineers
└─speech synthesis researchers

🔗Links

└─Hugging Face model page