mesolitica/wav2vec2-xls-r-300m-mixed
🧠 AI Modelmesolitica
Multilingual speech recognition model based on Wav2Vec2 XLS-R 300M for mixed-language ASR tasks.
This model is a fine-tuned version of the Wav2Vec2 XLS-R 300M, originally developed by Facebook AI, adapted by mesolitica for mixed-language automatic speech recognition. Built on the transformer architecture, Wav2Vec2 XLS-R learns cross-lingual speech representations through self-supervised pretraining on vast multilingual audio data.
Key features include:
- Based on the XLS-R 300M architecture, containing 300 million parameters
- Fine-tuned on mixed-language datasets to handle code-switching and multilingual speech
- Compatible with both PyTorch and TensorFlow frameworks
- Supports HuggingFace Transformers, Inference Endpoints, and Azure deployment
- Generated using Keras callback training pipeline
- Open-source with permissive licensing for research and commercial use
The model is particularly useful for transcribing speech containing multiple languages, a common scenario in multilingual regions like Malaysia and Southeast Asia. It supports the transformers pipeline tag for automatic-speech-recognition and can be easily integrated into production workflows via HuggingFace's ecosystem.
💡Highlights
- ├─Wav2Vec2 XLS-R 300M fine-tuned for mixed-language ASR
- ├─1M+ downloads on HuggingFace
- └─Supports PyTorch and TensorFlow
🎯For
- ├─ASR researchers
- ├─multilingual application developers
- └─speech technology engineers