News listOpenAI launches GPT-Image-2: Visual generation achieves crushing superiority, designers are truly going to be unemployed this time
動區 BlockTempo2026-04-22 03:36:30

OpenAI launches GPT-Image-2: Visual generation achieves crushing superiority, designers are truly going to be unemployed this time

ORIGINALOpenAI 祭出 GPT-Image-2:視覺生成迎來碾壓,設計師這次真的要失業了
AI Impact AnalysisGrok analyzing...
📄Full Article· Automatically extracted by trafilaturaGemini 翻譯7596 words
GPT-Image-2 has landed at the top of the leaderboard with an ultra-high score, marking a leap from simple image generation to a model equipped with business strategy and layout logic through its Thinking Mode. (Context: OpenAI has launched the cybersecurity-specific model GPT-5.4-Cyber, which has patched 3,000 high-risk vulnerabilities, competing with Claude Mythos.) (Background: GPT-5 is delayed! OpenAI has released o3 and o4-Mini first; Sam Altman revealed that integration is more difficult than imagined.) If one were to conduct a periodic review of OpenAI’s 2025, many would likely describe it as lackluster or even slightly passive. Over the past year or so, they have indeed methodically paved the path for logical reasoning, densely releasing reasoning models from o3pro to o4mini, and launching brand-new base models like GPT-4.5 and GPT-5. However, in the field of visual generation—where it is easiest for general users to perceive and form spontaneous dissemination—their influence has been gradually shrinking. After the initial shock of Sora’s debut, OpenAI seemed to enter a long period of silence in this track. Meanwhile, other players at the table have not been idle. In the open-source ecosystem, models like Flux have completely shattered the threshold for high-quality local image generation; On the commercial side, not only do old rivals hold onto extreme aesthetic barriers, but new contenders like Nano-banana, which comes with built-in web search capabilities, have also emerged. In comparison, OpenAI’s previous flagship image generation model, GPT-Image-1.5, has long appeared inadequate: It not only suffers from poor image quality and rigid layouts but often crashes when faced with complex text. Gradually, a consensus has formed in the industry: OpenAI has encountered a technical bottleneck in visual generation and is struggling under the siege of various competitors. Until a few weeks ago, the turning point appeared in a very subtle way. On the well-known large model blind test platform LM Arena, a mysterious image model codenamed Duct Tape was quietly added. Users participating in the blind test soon discovered that something was off: This model not only controls extreme aspect ratios with extreme precision but can also output layout posters containing large amounts of multilingual text without flaws, and even seems to have an invisible logical planning process before generating the image. For a time, various technical communities were speculating about which company had secretly launched a major move, but OpenAI remained silent. Early this morning, the truth was finally revealed. Without a lengthy launch event or overwhelming marketing hype, OpenAI officially named this codenamed Duct Tape model ChatGPT GPT-Image-2 and pushed it to the market in full. Along with it came a Text-to-Image arena leaderboard that felt somewhat suffocating. GPT-Image-2 took the championship directly with an ultra-high score of 1512, leading the second place (the one with web search capabilities, Nano-banana-2) by a full 242 points. In the context of large model benchmarking, people usually make a big deal out of leads of a few decimal points or single digits, as scores between top models are extremely tight. A 242-point lead is unprecedented in the history of the arena. This is not some minor version iteration; it is a powerful technical chasm. I spent the better part of the day carefully reviewing its various extreme capabilities and the latest API interface documentation. I had only one major takeaway: OpenAI is still that same OpenAI. When it decides to reclaim lost ground, it does so by directly reshaping the rules of the game. In front of this model, those visual design jobs we thought would take another two or three years to be completely replaced by AI can basically be said to have reached their end today. To understand why GPT-Image-2 can pull off such an exaggerated score gap, one must first break the traditional perception of text-to-image models. In the past, using AI to draw was essentially a game of luck: you threw in a few prompts and waited for it to arrange pixels into the shape you wanted. But GPT-Image-2 is more like an agent with a built-in visual engine. The most obvious change is that it splits into two completely different modes at the mechanism level. One is the Instant Mode, open to all users. This mode focuses on ultra-fast response and seamless integration into life and work workflows. For example, if you give it a command on your phone, it can give you a structurally complete image in seconds. Its underlying visual understanding capability is extremely strong, but it mainly solves high-frequency, one-off visual conversion needs. Then there is the Thinking Mode, open to paid users. Before it
Data Status✓ Full text extractedRead Original (動區 BlockTempo)
🔍Historical Similar Events· Keyword + Asset Matching6 items
💡 Currently matching via keywords + symbols (MVP) · Will be upgraded to embedding semantic search later
Raw Information
ID:a8b00e3dc9
Source:動區 BlockTempo
Published:2026-04-22 03:36:30
Category:zh_news · Export Category zh
Symbols:Unspecified
Community Votes:+0 /0 · ⭐ 0 Important · 💬 0 Comments