OpenAI Adds Three Voice Models to Realtime API

May 9, 2026

OpenAI expanded its Realtime API on May 7, 2026 with three new voice models: GPT-Realtime-2, GPT-Realtime-Translate, and GPT-Realtime-Whisper. Each handles a distinct part of the voice pipeline.

GPT-Realtime-2

GPT-Realtime-2 replaces GPT-Realtime-1.5 and brings GPT-5-class reasoning to voice interactions. OpenAI's claim, not a verified benchmark. Billed by token consumption, same model as the rest of the GPT-5 family.

Translation and Transcription

GPT-Realtime-Translate handles real-time translation across more than 70 input languages and 13 output languages. GPT-Realtime-Whisper handles live speech-to-text transcription.

Both are billed by the minute rather than by token. That billing structure is worth noting: minute-based pricing fits streaming workloads better than token counting, since a pause in speech still costs processing time.

One API, Three Models

All three models sit inside the existing Realtime API. Developers already integrated with GPT-Realtime-1.5 get access without a new endpoint. Whether the upgrade from 1.5 to 2 is a drop-in replacement or requires parameter changes was not specified in the announcement.

The combination of translation, transcription, and reasoning in a single API suggests OpenAI is positioning the Realtime offering as a full voice stack rather than a collection of point solutions. Whether latency holds up across 70 input languages in production is a question the benchmarks haven't answered yet.

Source: Techcrunch

OpenAI Adds Three Voice Models to Realtime API

GPT-Realtime-2

Translation and Transcription

One API, Three Models

Related Articles