Voxtral Small 24B 2507

🧠 AI Modelmistralai

24B audio-text model with state-of-the-art speech transcription and understanding, enhanced from Mistral Small 3.

Voxtral Small 24B is built upon the Mistral Small 3 architecture, enhanced with state-of-the-art audio understanding. It enables seamless processing of speech and text, supporting tasks like real-time transcription, language translation, and audio-based reasoning. The model features a 32,000-token context window, structured outputs, and standard inference controls (frequency/penalty, temperature, seed). It accepts multimodal inputs (text, audio, file) and produces text outputs. Priced at $0.10/M input tokens and $0.30/M output tokens, it offers a cost-effective solution for audio-centric workflows while maintaining high accuracy on text benchmarks.

💡Highlights

├─24B parameters
├─Audio input + text output
└─32K context window

🎯For

├─AI developers
├─speech recognition researchers
└─product teams building voice interfaces

🔗Links

└─OpenRouter Model Page