Voxtral Small 24B 2507
🧠 AI Modelmistralai
24B audio-text model with state-of-the-art speech transcription and understanding, enhanced from Mistral Small 3.
Voxtral Small 24B is built upon the Mistral Small 3 architecture, enhanced with state-of-the-art audio understanding. It enables seamless processing of speech and text, supporting tasks like real-time transcription, language translation, and audio-based reasoning. The model features a 32,000-token context window, structured outputs, and standard inference controls (frequency/penalty, temperature, seed). It accepts multimodal inputs (text, audio, file) and produces text outputs. Priced at $0.10/M input tokens and $0.30/M output tokens, it offers a cost-effective solution for audio-centric workflows while maintaining high accuracy on text benchmarks.
💡Highlights
- ├─24B parameters
- ├─Audio input + text output
- └─32K context window
🎯For
- ├─AI developers
- ├─speech recognition researchers
- └─product teams building voice interfaces