distil-whisper/distil-large-v3

🧠 AI Modeldistil-whisper

A faster, lighter, and highly accurate distilled version of OpenAI's Whisper large-v3 model for speech recognition.

Distil-Whisper/distil-large-v3 represents a significant advancement in speech-to-text technology by applying knowledge distillation to the state-of-the-art Whisper large-v3 architecture. The model is trained to mimic the output of the teacher model while utilizing a reduced number of decoder layers, resulting in a model that is roughly 50% faster and 30% smaller than the original large-v3. It retains the robust multilingual capabilities and high-quality transcription performance of the original model, making it highly effective for diverse audio inputs. The model supports multiple formats including ONNX, JAX, and Safetensors, ensuring seamless integration into various production pipelines. By significantly lowering the computational overhead, it enables developers to deploy high-fidelity speech recognition in latency-sensitive applications, such as live captioning and voice-controlled interfaces, without sacrificing accuracy.

💡Highlights

├─50% faster than Whisper large-v3
├─30% smaller memory footprint
└─Drop-in replacement for Whisper

🎯For

├─AI Engineers
├─Software Developers
└─Audio Processing Researchers

🔗Links

└─Hugging Face Repository