charactr/vocos-mel-24khz
🧠 AI Modelcharactr
High-quality neural vocoder for mel-spectrogram to 24kHz audio, open-source.
Vocos is a neural vocoder that synthesizes audio from mel-spectrograms at 24kHz sampling rate. It is based on the paper arXiv:2306.00814, which proposes a hybrid approach combining time-domain and Fourier-based methods to achieve high-quality audio synthesis with fast inference. Key innovations include a differentiable time-frequency domain transformation that allows the model to operate in both domains, reducing artifacts and improving efficiency. The model uses a convolutional architecture with residual blocks and adversarial training to generate natural-sounding waveforms. It is optimized for real-time applications and integrates seamlessly with TTS pipelines. Available on HuggingFace with PyTorch, it has garnered over 1.36 million downloads and 41 likes.
💡Highlights
- ├─1.36M downloads
- ├─arXiv:2306.00814 paper
- └─MIT license
🎯For
- ├─TTS developers
- ├─Audio researchers
- └─AI enthusiasts