GPT Audio
🧠 AI Modelopenai
OpenAI's first generally available audio model with 128K context and natural voice output.
GPT Audio is a multimodal AI model from OpenAI that processes both text and audio inputs to generate text or audio outputs. It has a context length of 128,000 tokens, allowing for extended interactions. The model includes an upgraded audio decoder that produces more natural-sounding voices and maintains better voice consistency across outputs. Input pricing is $2.50 per million tokens, output pricing is $10.00 per million tokens. Available via OpenRouter, the model supports features such as frequency penalty, logit bias, logprobs, max tokens, presence penalty, response format, seed, and stop. This model bridges the gap between text and audio modalities, enabling applications like voice assistants, transcription, and audio generation.
💡Highlights
- ├─128K token context length
- ├─Upgraded natural-sounding audio decoder
- └─Text and audio input/output modalities
🎯For
- ├─Developers building voice-enabled apps
- ├─Researchers in multimodal AI
- └─Conversational AI engineers