OpenAI Adds Three Voice Models to Its Realtime API
OpenAI announced three additions to its Realtime API on May 7, 2026: GPT-Realtime-2, GPT-Realtime-Translate, and GPT-Realtime-Whisper. Each handles a different slice of the voice stack.
The Models
GPT-Realtime-2 is the upgraded reasoning model. OpenAI says it runs on GPT-5-class reasoning and handles more complex requests than its predecessor, GPT-Realtime-1.5. It bills by token consumption.
GPT-Realtime-Translate handles real-time language translation. It supports more than 70 input languages and 13 output languages. It bills by the minute.
GPT-Realtime-Whisper adds live speech-to-text transcription to the API. Also billed by the minute.
Billing Structure
The pricing split is worth noting. Translate and Whisper use per-minute billing. GPT-Realtime-2 uses per-token billing. This likely reflects their different computational profiles: streaming tasks with predictable duration get minute rates, while a reasoning model where a simple question and a complex one cost very differently gets token counting. The practical cost difference will depend heavily on request volume and complexity.
Modular by Design
Packaging three distinct capabilities into one API lets developers pull only what they need. Transcription only. Translation only. Full GPT-5-class reasoning with voice input. Whether the reasoning upgrade meaningfully improves on GPT-Realtime-1.5 in practice is a separate question from what the model spec claims. That answer comes from production use.
Source: Techcrunch