GPT-5.5 officially released: six-week iteration, focusing on scientific research applications, code testing hits record high

📄Full Article· Automatically extracted by trafilaturaGemini 翻譯1471 words

OpenAI released GPT-5.5 on the 23rd, just six weeks after the launch of GPT-5.4. The standard version scored 82.7% on the programming benchmark Terminal-Bench 2.0, while the Pro version achieved 39.6% on the postdoctoral-level math test FrontierMath Tier 4. (Previous coverage: OpenAI's most powerful GPT-5 launch summary: free access, feature highlights, gpt-5, gpt-5-mini, and gpt-5-nano API pricing) (Background: Five minutes to understand GPT-5: How is it different from ChatGPT 4o? Fewer hallucinations, more obedient, and API pricing summary) GPT-5.5 went live on April 23, exactly six weeks after the previous version, GPT-5.4. OpenAI officially positions this release as the "smartest and most intuitive model to date," emphasizing that compared to GPT-5.4, it "thinks faster and more accurately with fewer tokens." In the programming benchmark Terminal-Bench 2.0, the standard GPT-5.5 scored 82.7%, while Claude Opus 4.7 scored 69.4% on the same test, a gap of about 13 percentage points. In infrastructure optimization tasks, GPT-5.5's token generation speed increased by over 20%, representing simultaneous improvements in cost-efficiency for long-context processing and multi-step workflows. The differentiation of the Pro version focuses on mathematical reasoning. FrontierMath Tier 4 is recognized in the industry as the most difficult math evaluation set, with questions at a postdoctoral research level that might take human experts several days to solve. GPT-5.5 Pro scored 39.6% on this test, while Claude Opus 4.7 scored 22.9%, a gap of nearly 17 percentage points. Another notable figure: on the GDPval economic task benchmark, the standard GPT-5.5 scored 84.9%, which is actually higher than the Pro version. This result indicates that for general knowledge work scenarios, the standard version is sufficient and cost-effective; the differentiated value of the Pro version is more concentrated on high-intensity reasoning tasks rather than breadth of coverage. OpenAI also pointed out that GPT-5.5 has significantly improved its "computer use" capabilities: it can autonomously manipulate software interfaces, handle multi-step workflows, and requires less user intervention for agentic tasks. This is a somewhat unusual narrative focus for this release, with the official announcement stating that "substantial progress has been made in scientific and technical research workflows," specifically mentioning drug discovery scenarios and claiming that GPT-5.5 can help expert scientists make breakthroughs. There is one cited case: a customized version of GPT-5.5 assisted researchers in finding a new combinatorial proof for Ramsey numbers. The Ramsey number problem has been a hardcore challenge in pure mathematics for decades. The fact that this case was directly mentioned by OpenAI in its official release indicates that it is not a fringe demonstration, but a signal of future commercialization directions. Why is the scientific research scenario so emphasized? Behind it is a clear business logic: pharmaceutical companies, materials laboratories, and research institutions are paying groups that can accept high computing costs; the Pro version is priced significantly higher than the standard version but is currently only open to Business and Enterprise subscribers. Through differentiated pricing, OpenAI is effectively treating research scenarios as high-end SKUs rather than universal offerings. Currently, GPT-5.5 is available starting today for Plus, Pro, Business, and Enterprise users, with GPT-5.5 Pro limited to Business and Enterprise, and API access "coming soon." The performance of the standard version on GDPval shows that it is already sufficient to handle most knowledge work; the Pro version is more clearly targeted at enterprise scenarios requiring high-intensity mathematical reasoning. The six-week iteration cadence is a structural pressure. When competitors can launch eight major versions per year, any window of technological lag is extremely short. The speed of algorithm releases is now a part of competitiveness itself.

Data Status✓ Full text extractedRead Original (動區 BlockTempo)

🔍Historical Similar Events· Keyword + Asset Matching5 items

2026-04-24

DeepSeek V4 released, can its programming capabilities beat GPT and Claude? Costs dominate the leaderboard again

Similarity 120%關鍵字 gpt同分類 zh

2026-04-24

Jensen Huang sends internal memo: 10,000 NVIDIA employees mandated to switch to OpenAI Codex, GPT-5.5 to run on GB200 chips

Similarity 120%關鍵字 gpt同分類 zh

2026-04-24

Jensen Huang sends all-hands email embracing OpenAI Codex: over 10,000 NVIDIA employees are already using it, with GPT-5.5 running on GB200

Similarity 120%關鍵字 gpt同分類 zh

2026-04-23