{"slug": "deepseek-v4-towards-highly-efficient-million-token-context-intelligence", "title": "DeepSeek-V4: Towards Highly Efficient Million-Token Context Intelligence", "summary": "DeepSeek AI released preview versions of its DeepSeek-V4 series, including two Mixture-of-Experts language models with up to 1.6 trillion parameters and support for one-million-token contexts. The models feature architectural innovations like hybrid attention and a new optimizer, achieving state-of-the-art performance while significantly reducing inference costs for long-context tasks.", "body_md": "arXiv:2606.19348v1 Announce Type: new\nAbstract: We present a preview version of DeepSeek-V4 series, including two strong Mixture-of-Experts (MoE) language models -- DeepSeek-V4-Pro with 1.6T parameters (49B activated) and DeepSeek-V4-Flash with 284B parameters (13B activated) -- both supporting a context length of one million tokens. DeepSeek-V4 series incorporate several key upgrades in architecture and optimization: (1) a hybrid attention architecture that combines Compressed Sparse Attention (CSA) and Heavily Compressed Attention (HCA) to improve long-context efficiency; (2) Manifold-Constrained Hyper-Connections (mHC) that enhance conventional residual connections; (3) and the Muon optimizer for faster convergence and greater training stability. We pre-train both models on more than 32T diverse and high-quality tokens, followed by a comprehensive post-training pipeline that unlocks and further enhances their capabilities. DeepSeek-V4-Pro-Max, the maximum reasoning effort mode of DeepSeek-V4-Pro, redefines the state-of-the-art for open models, outperforming its predecessors in core tasks. Meanwhile, DeepSeek-V4 series are highly efficient in long-context scenarios. In the one-million-token context setting, DeepSeek-V4-Pro requires only 27% of single-token inference FLOPs and 10% of KV cache compared with DeepSeek-V3.2. This enables us to routinely support one-million-token contexts, thereby making long-horizon tasks and further test-time scaling more feasible. The model checkpoints are available at https://huggingface.co/collections/deepseek-ai/deepseek-v4.", "url": "https://wpnews.pro/news/deepseek-v4-towards-highly-efficient-million-token-context-intelligence", "canonical_source": "https://arxiv.org/abs/2606.19348", "published_at": "2026-06-19 04:00:00+00:00", "updated_at": "2026-06-19 04:04:52.308106+00:00", "lang": "en", "topics": ["large-language-models", "artificial-intelligence", "ai-research", "ai-infrastructure"], "entities": ["DeepSeek", "DeepSeek-V4-Pro", "DeepSeek-V4-Flash", "DeepSeek-V4-Pro-Max", "DeepSeek-V3.2", "Hugging Face"], "alternates": {"html": "https://wpnews.pro/news/deepseek-v4-towards-highly-efficient-million-token-context-intelligence", "markdown": "https://wpnews.pro/news/deepseek-v4-towards-highly-efficient-million-token-context-intelligence.md", "text": "https://wpnews.pro/news/deepseek-v4-towards-highly-efficient-million-token-context-intelligence.txt", "jsonld": "https://wpnews.pro/news/deepseek-v4-towards-highly-efficient-million-token-context-intelligence.jsonld"}}