OpenAI unveils GPT-5.4 Pro and Thinking models

0
24

OpenAI unveils GPT-5.4 Pro and Thinking models

OpenAI released GPT-5.4 on Thursday, introducing a new foundation model available in standard, Thinking, and Pro versions.

The launch introduces a model with a 1 million token context window and improved token efficiency, targeting professional workloads. The release includes new benchmark records and a system to manage tool calling within the API.

GPT-5.4 is available in three versions: standard, a reasoning model (GPT-5.4 Thinking), and an optimized high-performance version (GPT-5.4 Pro). The API version supports context windows as large as 1 million tokens, the largest available from OpenAI. OpenAI stated GPT-5.4 solves the same problems with significantly fewer tokens than its predecessor.

The model achieved record scores in computer-use benchmarks OSWorld-Verified and WebArena Verified. It scored a record 83% on OpenAI’s GDPval test for knowledge work tasks. GPT-5.4 also took the lead on Mercor’s APEX-Agents benchmark, which tests professional skills in law and finance.

Mercor CEO Brendan Foody stated that GPT-5.4 excels at creating long-horizon deliverables such as slide decks, financial models, and legal analysis. Foody said the model delivers top performance while running faster and at lower cost than competitive frontier models.

OpenAI reported GPT-5.4 is 33% less likely to make errors in individual claims compared to GPT 5.2. Overall responses are 18% less likely to contain errors. OpenAI introduced Tool Search, a new system for managing tool calling in the API that allows models to look up tool definitions as needed.

Tool Search reduces token use and improves speed and cost in systems with many tools. OpenAI added a new safety evaluation to test chain-of-thought monitoring, addressing concerns that reasoning models could misrepresent their reasoning process.

The new evaluation shows deception is less likely in the GPT-5.4 Thinking version. OpenAI stated this suggests the model lacks the ability to hide its reasoning and that CoT monitoring remains an effective safety tool.

Featured image credit