mesolitica/wav2vec2-xls-r-300m-mixed

🧠 AI Modelmesolitica

Multilingual speech recognition model based on Wav2Vec2 XLS-R 300M for mixed-language ASR tasks.

This model is a fine-tuned version of the Wav2Vec2 XLS-R 300M, originally developed by Facebook AI, adapted by mesolitica for mixed-language automatic speech recognition. Built on the transformer architecture, Wav2Vec2 XLS-R learns cross-lingual speech representations through self-supervised pretraining on vast multilingual audio data. Key features include: - Based on the XLS-R 300M architecture, containing 300 million parameters - Fine-tuned on mixed-language datasets to handle code-switching and multilingual speech - Compatible with both PyTorch and TensorFlow frameworks - Supports HuggingFace Transformers, Inference Endpoints, and Azure deployment - Generated using Keras callback training pipeline - Open-source with permissive licensing for research and commercial use The model is particularly useful for transcribing speech containing multiple languages, a common scenario in multilingual regions like Malaysia and Southeast Asia. It supports the transformers pipeline tag for automatic-speech-recognition and can be easily integrated into production workflows via HuggingFace's ecosystem.

💡Highlights

├─Wav2Vec2 XLS-R 300M fine-tuned for mixed-language ASR
├─1M+ downloads on HuggingFace
└─Supports PyTorch and TensorFlow

🎯For

├─ASR researchers
├─multilingual application developers
└─speech technology engineers

🔗Links

└─HuggingFace Model Page