Qwen/Qwen3-TTS-12Hz-1.7B-Base
π§ AI ModelQwen
Open-source 1.7B parameter TTS model by Qwen with efficient 12Hz audio generation.
Qwen3-TTS-12Hz-1.7B-Base is a state-of-the-art text-to-speech model developed by Qwen, the AI research team behind the Qwen series. It uses a transformer architecture with 1.7 billion parameters to convert text into 12Hz low-frame-rate speech tokens. The innovative 12Hz tokenization reduces computational overhead while maintaining natural prosody and voice clarity, making it suitable for edge deployment and real-time systems. The model has been trained on diverse multilingual data and supports zero-shot voice cloning. With over 1.3 million downloads on Hugging Face, it has quickly become a popular choice for developers and researchers. The Apache 2.0 license allows free use, modification, and distribution.
π‘Highlights
- ββ1.7B transformer parameters
- ββ12Hz low-frame-rate tokens
- ββApache 2.0 license
π―For
- ββAI developers
- ββvoice interface engineers
- ββspeech synthesis researchers
πLinks
- ββHugging Face model page