OpenAI Adds Three Voice Models to Realtime API
OpenAI expanded its Realtime API on May 7, 2026 with three new voice models: GPT-Realtime-2, GPT-Realtime-Translate, and GPT-Realtime-Whisper. Each handles a distinct part of the voice pipeline.
GPT-Realtime-2
GPT-Realtime-2 replaces GPT-Realtime-1.5 and brings GPT-5-class reasoning to voice interactions. OpenAI's claim, not a verified benchmark. Billed by token consumption, same model as the rest of the GPT-5 family.
Translation and Transcription
GPT-Realtime-Translate handles real-time translation across more than 70 input languages and 13 output languages. GPT-Realtime-Whisper handles live speech-to-text transcription.
Both are billed by the minute rather than by token. That billing structure is worth noting: minute-based pricing fits streaming workloads better than token counting, since a pause in speech still costs processing time.
One API, Three Models
All three models sit inside the existing Realtime API. Developers already integrated with GPT-Realtime-1.5 get access without a new endpoint. Whether the upgrade from 1.5 to 2 is a drop-in replacement or requires parameter changes was not specified in the announcement.
The combination of translation, transcription, and reasoning in a single API suggests OpenAI is positioning the Realtime offering as a full voice stack rather than a collection of point solutions. Whether latency holds up across 70 input languages in production is a question the benchmarks haven't answered yet.
Source: Techcrunch