DeepSeek V4 specs leaked early? AI scholar Yifan Zhang reveals: 1.6 trillion parameters, million-token context, but "no multimodal"

📄Full Article· Automatically extracted by trafilaturaGemini 翻譯1781 words

DeepSeek V4 technical specifications leaked early? Princeton University AI scholar Yifan Zhang dropped a bombshell on X today (22nd), claiming that the V4 model will feature a massive 1.6 trillion parameters and support an ultra-long context window of 1 million tokens. Additionally, a 285B Lite version will be introduced for the first time. However, in an era dominated by multimodality, the leak indicates that V4 will "only support text," sparking heated debate in the community. (Previous coverage: DeepSeek valuation exceeds $20 billion! Reports suggest Tencent and Alibaba are competing to lead the first round of financing) (Background: Anthropic's one trillion vs. DeepSeek's 10 billion) The mystery surrounding the next-generation flagship model V4 from Chinese AI giant DeepSeek has seemingly been ruthlessly unveiled by an academic ahead of schedule. Today (22nd), Yifan Zhang (@yifan_zhang_), a researcher at the Princeton University AI Lab and a PhD student focusing on LLM Reasoning and Reinforcement Learning (RL), posted an extremely detailed technical specification sheet for the model on X. Combined with his teaser from last week (19th), "V4, next week," the industry generally believes this is internal intelligence regarding the upcoming DeepSeek V4 model. V4 1.6T, V4-Lite 285B Attention: DSA2 (NSA + DSA), head-dim 512 Sparse MQA + SWAMoE: Fused MoE Mega-Kernel with 6 active in 384 experts Residual: Hyper-Connections Optimizer: Muon Pretrain context length: 32K RL: GRPO with corrected KL Final Context Length: 1M Modality:… https://t.co/CC2Nof0OHy — Yifan Zhang (@yifan_zhang_) April 22, 2026 V4 Technical Specifications Decrypted: 1.6 Trillion Parameters and a New Lite Version Although Yifan Zhang is not currently employed by DeepSeek (having previously worked with the ByteDance Seed team), his reliable channels within the circle have caused this hardcore technical list to immediately trigger discussions in the community. According to the leak, the V4 family will welcome two members and several underlying architectural upgrades: - Model Scale: The flagship V4 has a total of 1.6T (1.6 trillion) parameters, and a lightweight version, V4-Lite, with 285B (285 billion) parameters has been exposed for the first time. - MoE Architecture Optimization: A total of 384 experts are configured, with 6 active at any given time (active parameters approximately 25B). The underlying layer adopts Fused MoE Mega-Kernel technology, which significantly improves computational efficiency. - Attention Mechanism: Adopts DSA2 (a combination of NSA + DSA), head-dim 512, and Sparse MQA paired with SWA (Sliding Window Attention). - Training Breakthroughs: The optimizer has been switched to the more advanced matrix-level optimizer Muon; residual connections use Hyper-Connections. - Context and Reinforcement Learning: The pre-training context length is 32K, but after the RL phase using GRPO with corrected KL, it can ultimately support an ultra-long context of up to 1M (1 million tokens). "Text-only" Against the Trend? Mixed Community Reactions In this extremely packed specification sheet, the most surprising aspect for the industry is that the modality of V4 is set to "Text only." At a time when competitors like GPT-4o and Gemini are aggressively pushing for the integration of voice, vision, and image multimodality, the decision for V4 to stick to the text-only track has triggered polarized reactions. Under the tweet, some netizens marveled that the data "looks invincible, definitely SOTA (State of the Art) level," while many others criticized it, asking "Why still do text-only in this day and age?" and questioning why visual capabilities were not included. Meanwhile, because this specification sheet is so detailed and DeepSeek has yet to confirm or deny it, some developers remain skeptical about its authenticity. However, for AI researchers, the hardcore details mentioned in the table, such as the "application of the Muon optimizer" and "KL divergence correction," are consistent with DeepSeek's past technical preference for pursuing extreme algorithmic cost reduction and efficiency. Will V4 really be launched in a flash next week? The global tech community is holding its breath.

Data Status✓ Full text extractedRead Original (動區 BlockTempo)

🔍Historical Similar Events· Keyword + Asset Matching2 items

2026-04-22

DeepSeek valuation surpasses $20 billion! Foreign media reports Tencent and Alibaba are competing to invest in the first round of financing.

Similarity 120%關鍵字 deepseek同分類 zh

2026-04-21

Get free NVIDIA model API keys! Register in 3 minutes to access Kimi, DeepSeek, and Llama for all your needs.

Similarity 120%關鍵字 deepseek同分類 zh

💡 Currently matching via keywords + symbols (MVP) · Will be upgraded to embedding semantic search later

Raw Information

ID:a061116244

Source:動區 BlockTempo

Published:2026-04-22 13:30:29

Category:zh_news · Export Category zh

Symbols:Unspecified

Community Votes:+0 / −0 · ⭐ 0 Important · 💬 0 Comments