GPT-5.4 Pro jumps to 150 IQ on MESNA Norway test as OpenAI breaks its own record

Make preferred on

OpenAI’s latest GPT-5.4 Pro model has now achieved an IQ score higher than 99.96% of all human beings, giving markets a fresh signal that AI capability gains are starting to outpace the usual product-cycle noise.

OpenAI’s GPT-5.4 Pro touches 150 on public IQ benchmark as markets enter another macro-heavy week

TrackingAI’s public leaderboard now places OpenAI GPT-5.4 Pro at an IQ score of 150, a sharp step up from the 136 score that OpenAI’s o3 posted on the Mensa Norway test last year.

The jump arrives at a moment when market attention has narrowed around Iran, energy, labor softness, and the next inflation print. That creates a different question for the week ahead: how quickly is machine intelligence compounding, and when will that acceleration begin to overlap with economic positioning?

Why this matters: A move from 136 to 150 on a widely understood benchmark compresses a complex capability shift into a simple signal. For businesses, that signal feeds directly into decisions around automation, software budgets, and headcount planning. For markets, it adds another variable alongside rates, inflation, and growth expectations.

OpenAI introduced GPT-5.4 as its most capable and efficient frontier model for professional work, with stronger coding, tool use, and computer use, and a context window of up to 1 million tokens. In the same release, OpenAI said GPT-5.4 achieved a new state of the art on GDPval and exceeded human performance on OSWorld-Verified.

Those benchmarks are separate from a public IQ test, yet the direction of travel aligns. Capability is rising across separate measurement systems, and that rise is becoming fast enough to influence budgeting, hiring plans, workflow design, and software spend.

A score of 150 on a public IQ-style benchmark compresses a broader capability move into a single, portable signal. The number is easy to understand even before the methodology is debated.

The earlier o3 Mensa result established the benchmark and its limits. GPT-4.1’s one-million-token context window showed how OpenAI was extending model utility across long-horizon code and document tasks, while our analysis of OpenAI’s expanding capital loop linked model progress to hardware expansion, financing loops, and infrastructure demand.

Taken together, those developments place the latest IQ score within a broader commercial and economic context. A move from 136 to 150 on a public benchmark is striking on its own. A move from 136 to 150 while OpenAI is pushing deeper into tool use, computer use, enterprise productivity, and capital-intensive infrastructure carries broader implications.

Read More:  Standard Chartered Named Custodian For TP ICAP’s Fusion Digital Assets

Public IQ benchmarks are limited, but the capability curve is still moving higher

Public IQ-style tests remain imperfect instruments for measuring frontier models. TrackingAI runs a public Mensa-style benchmark and also maintains a harder private offline test.

IQ-style tests compress a narrow slice of cognitive performance into a single number, obscuring variation across reasoning types, context handling, creativity, and real-world problem-solving.

For AI and humans alike, scores are sensitive to test design, training exposure, and pattern familiarity, which makes them a noisy proxy for general capability.

An IQ of 150 sits at the extreme upper tail of the distribution, often associated with individuals such as Albert Einstein or Richard Feynman. In practical terms, it implies very fast abstraction, strong pattern recognition, and the ability to navigate complex, multi-step problems with limited guidance.

The platform reports scores as rolling averages across recent completions, and the methodology raises familiar questions around prompt structure, reproducibility, training-set contamination, and format familiarity. Those concerns were already visible when o3 reached 136, and they remain active now that GPT-5.4 Pro sits at 150.

Related Reading

OpenAI’s o3 scores 136 on Mensa Norway test, surpassing 98% of human population

OpenAI’s o3 model reaches Mensa-Level IQ in independent testing.

Apr 17, 2025 · Liam ‘Akiba’ Wright

Even with those limits, the broader pattern has become harder to dismiss. One isolated benchmark result can be explained away as a quirk. A cluster of gains across public IQ-style testing, coding, browser use, desktop navigation, and knowledge-work performance carries more analytical weight.

TrackingAI’s latest leaderboard places GPT-5.4 Pro at the top of its public IQ board ahead of all Cluade, Gemini, Qwen, and Grok models, offering an external, legible public benchmark that maps quickly onto the broader capability debate.

Few people need a detailed understanding of benchmark design to grasp that 150 sits in a rare range and investors do not need to accept every premise behind an IQ-style test to recognize that a jump of this size suggests acceleration rather than drift.

Chart titled “AI IQ Test Results” showing average Mensa Norway IQ scores for major AI models on a bell curve, with OpenAI’s GPT-5.4 variants plotted near the top end of the range.

Enterprise buyers also do not need to believe that IQ equals general intelligence to see that systems with stronger pattern recognition, stronger tool use, and stronger long-horizon task handling are moving toward economically useful territory, extending far beyond puzzle-solving.

Read More:  Morgan Stanley Set To Undercut Bitcoin ETF Rivals With 0.14% Fee Ahead Of Launch

This points toward systems that can search, plan, verify, navigate, and produce real work across extended contexts. In that setting, the IQ score functions less as a novelty number and more as a signal of the density of frontier reasoning.

There is also competitive value in the leaderboard itself. A leadership position on a public benchmark reinforces OpenAI’s standing in the race for visible capability leadership, especially at a moment when model differentiation is becoming harder to discern from architecture notes alone.

Benchmark leadership compresses complexity into a simple hierarchy. It offers developers a signal, enterprise buyers a narrative handle, and investors another proxy for where the capability frontier currently sits.

CryptoSlate Daily Brief

Daily signals, zero noise.

Market-moving headlines and context delivered every morning in one tight read.