← Back to feed
AI

OpenAI ships GPT-5.4 — 75% on OSWorld-V, above the 72.4% human baseline

OpenAI shipped GPT-5.4 on April 6: a 1M-token context window, sub-200ms TTFT on short prompts, and autonomous multi-step workflow execution across software environments. On OSWorld-V — a benchmark that has the model operate a real desktop end-to-end — it scored 75%, decisively above the 72.4% human baseline. Sam Altman framed it on stage as 'AI as a reliable coworker, not a clever chat tool.' Available via API and ChatGPT Pro; a 'GPT-5.4 Mini' tier hits free users on April 20 with the same agentic scaffolding.

OpenAIGPT-5Agentic AIOSWorld-VBenchmark

Why it matters

Crossing the human baseline on an end-to-end desktop-agent benchmark is the symbolic tipping point from 'AI as chat tool' to 'AI as autonomous coworker'. Enterprise buying decisions — which were stalling on unreliability — now have empirical cover. Expect agentic workflows to become the default integration pattern within 12 months and shift competitive pressure onto Anthropic and Google to match or exceed OSWorld-V.

Impact scorecard

9/10
Stakes
9.0
Novelty
9.0
Authority
9.0
Coverage
9.5
Concreteness
9.0
Social
9.5
FUD risk
2.0
Coverage60 outlets · 12 tier-1
New York Times, Wall Street Journal, Financial Times, Bloomberg, Reuters, The Verge, …
X / Twitter45,000 mentions
@sama · 62,000 likes
@karpathy · 21,000 likes
Reddit14,200 upvotes
r/OpenAI
r/OpenAI, r/MachineLearning, r/singularity, r/technology

Trust check

high

OpenAI primary announcement + independent benchmark replication by SWE-Bench team + broad tier-1 coverage with concrete numbers. FUD risk low; one mild caveat: OSWorld-V was partially designed by OpenAI contributors, so treat 75% as optimistic.

Primary source ↗