kurt.news

Clean, fast AI news without the hype or doom.

Ai

OpenAI Adds Three Voice Models to Realtime API

OpenAI Adds Three Voice Models to Realtime API

OpenAI expanded its Realtime API on May 7, 2026 with three new voice models: GPT-Realtime-2, GPT-Realtime-Translate, and GPT-Realtime-Whisper. Each handles a distinct part of the voice pipeline.

GPT-Realtime-2

GPT-Realtime-2 replaces GPT-Realtime-1.5 and brings GPT-5-class reasoning to voice interactions. OpenAI's claim, not a verified benchmark. Billed by token consumption, same model as the rest of the GPT-5 family.

Translation and Transcription

GPT-Realtime-Translate handles real-time translation across more than 70 input languages and 13 output languages. GPT-Realtime-Whisper handles live speech-to-text transcription.

Both are billed by the minute rather than by token. That billing structure is worth noting: minute-based pricing fits streaming workloads better than token counting, since a pause in speech still costs processing time.

One API, Three Models

All three models sit inside the existing Realtime API. Developers already integrated with GPT-Realtime-1.5 get access without a new endpoint. Whether the upgrade from 1.5 to 2 is a drop-in replacement or requires parameter changes was not specified in the announcement.

The combination of translation, transcription, and reasoning in a single API suggests OpenAI is positioning the Realtime offering as a full voice stack rather than a collection of point solutions. Whether latency holds up across 70 input languages in production is a question the benchmarks haven't answered yet.

Source: Techcrunch