kingabzpro/wav2vec2-large-xls-r-300m-Urdu
🧠 AI Modelkingabzpro
Urdu speech recognition model fine-tuned from Facebook's wav2vec2-xls-r-300m on Common Voice 8.0.
The kingabzpro/wav2vec2-large-xls-r-300m-Urdu model is a fine-tuned variant of Facebook's wav2vec2-xls-r-300m, designed specifically for Urdu speech recognition. It leverages the self-supervised learning capabilities of the XLS-R architecture, which was pre-trained on 128 languages, and adapts it to the Urdu language using the Common Voice 8.0 dataset. The model uses the Wav2Vec2ForCTC architecture with a linear layer on top for connectionist temporal classification (CTC) decoding. Training was performed with the HuggingFace Transformers library, likely using mixed precision and gradient accumulation. Key features include a 300M parameter model, safety tensors format for efficient loading, and compatibility with the HuggingFace ASR pipeline. The model is optimized for Urdu, handling its unique phonetic characteristics and script. It can be used for transcribing audio files or real-time speech, and is applicable in voice assistants, transcription services, and language learning tools.
💡Highlights
- ├─Fine-tuned on Common Voice Urdu
- ├─Based on XLS-R 300M
- └─Over 1.3M HuggingFace downloads
🎯For
- ├─Urdu speakers
- ├─ASR researchers
- └─NLP engineers