
OpenAI on May 7, 2026 introduced three new realtime audio models through its Realtime API: GPT-Realtime-2 for voice agents with reasoning, GPT-Realtime-Translate for live speech translation, and GPT-Realtime-Whisper for streaming transcription. Alongside the launches, the company moved the Realtime API out of beta and into general availability. The release pushes voice applications past the basic question-and-answer loop toward systems that can listen, reason, translate, transcribe, and act within a single live conversation.
GPT-Realtime-2 is the first voice model OpenAI describes as having GPT-5-class reasoning. Its context window expands from 32K to 128K tokens, and developers can now dial reasoning intensity across five levels — minimal, low, medium, high, and xhigh — with low set as the default. New conversational features include preambles such as "let me check that," parallel tool calls with audio narration, and stronger recovery behavior when something goes wrong mid-conversation. The model is priced at $32 per million audio input tokens and $64 per million audio output tokens, with cached input at $0.40 per million. GPT-Realtime-Translate, which supports more than 70 input languages and 13 output languages, is billed at $0.034 per minute, while GPT-Realtime-Whisper runs at $0.017 per minute. OpenAI also added two new voices, Cedar and Marin, with this release.
On audio benchmarks, GPT-Realtime-2 at high reasoning effort scored 96.6 percent on Big Bench Audio, a 15.2-point improvement over GPT-Realtime-1.5's 81.4 percent. On the Audio MultiChallenge instruction-following benchmark, it rose from 34.7 percent to 48.5 percent at xhigh effort, a 13.8-point gain. Independent rankings from Artificial Analysis place GPT-Realtime-2 high at the same level as Google's Gemini 3.1 Flash Live Preview High, while Step-Audio R1.1 Realtime and Grok Voice Think Fast 1.0 remain slightly ahead on the same test. The minimal variant of GPT-Realtime-2 leads the Full Duplex Bench conversational dynamics ranking at 96.1 percent.
Early enterprise deployments report measurable gains. Zillow said GPT-Realtime-2 lifted call-success rate from 69 percent to 95 percent on its hardest adversarial benchmark, a 26-point swing. Enterprise search company Glean recorded a 42.9 percent relative increase in helpfulness in internal evaluations, and Genspark reported a 26 percent improvement in effective conversation rate on its Call for Me agent after upgrading. Deutsche Telekom, Priceline, and Vimeo are running the models in multilingual customer support, travel assistance, and live video dubbing scenarios. Microsoft Foundry began rolling out the same models the same day. The new lineup is currently available only through the Realtime API; OpenAI has said ChatGPT voice mode upgrades are still pending.