{"slug": "ai-deepseek-machinelearning", "title": "ai, deepseek, machinelearning", "summary": "Chinese AI labs have progressed from early BERT-era models to trillion-parameter systems like Wu Dao 2.0 (1.75T parameters) and cost-efficient architectures such as DeepSeek V3 (trained for $5.6M), according to a developer tracking the field since 2017. The timeline shows Chinese researchers, including co-authors of the foundational Transformer paper, building on that work to create models that now compete globally despite GPU export restrictions. By 2023, over 200 Chinese companies had announced LLM projects following ChatGPT's launch, with firms like Baidu, Alibaba, and Huawei releasing models that narrowed the performance gap with Western counterparts.", "body_md": "title: The Rise of China's LLMs: A Complete History from 2017 to 2026 published: ture description: From Wu Dao 2.0 (1.75T params) to DeepSeek V3 ($5.6M training cost) — the full story of how Chinese AI labs went from \"cheap copy\" to genuine competitors. tags: ai, deepseek, machinelearning, llm, china cover_image:\n\nI've been following the AI space since the GPT-2 days, and one thing that consistently surprises people is how far Chinese AI labs have come in just a few years. Most Western developers still think \"Chinese AI\" = \"cheap copy.\" The reality is far more interesting.\n\nLet me walk through the full timeline.\n\n2017–2020: The Foundation Years\n\nThe story starts, as most LLM stories do, with the Transformer paper (\"Attention Is All You Need\") from Google Brain in 2017. Chinese researchers were deeply involved from the start — several of the paper's authors were Chinese nationals who later returned to China to build AI labs.\n\n2018: BERT-era awakening\n\nWhen Google released BERT in late 2018, Chinese tech giants jumped in immediately:\n\nBaidu released ERNIE 1.0 in March 2019, beating Google's own BERT on several Chinese NLP benchmarks. ERNIE incorporated knowledge graph embeddings — something BERT didn't have.\n\nAlibaba released its own pretrained models for e-commerce NLP.\n\nTencent followed with its own family of pretrained models.\n\nBut none of these were \"large\" by today's standards. The parameter counts were in the hundreds of millions, not billions.\n\n2019: GPT-2 triggers the race\n\nOpenAI's GPT-2 (1.5B parameters) made it clear that scaling worked. Chinese labs realized they needed to think bigger. But there was a problem: NVIDIA GPUs were hard to get due to US export restrictions starting to tighten.\n\nThis constraint would later become a feature, not a bug — but we'll get to that.\n\n2021: The Year Everything Changed\n\nJune 2021 — Beijing Academy of Artificial Intelligence (BAAI) releases Wu Dao 2.0\n\nThis was the moment that shocked the global AI community. Wu Dao 2.0 had 1.75 trillion parameters — at the time, larger than GPT-3 (175B) by a factor of 10x. It was trained on a Chinese-made supercomputer and could generate text, images, and even write poetry.\n\nThe Western press mostly ignored it. Those who paid attention dismissed it as \"impressive but not practical.\" In hindsight, this was the first major signal that China was serious about foundation models.\n\nKey Wu Dao 2.0 stats:\n\n1.75T parameters (sparse MoE architecture)\n\nTrained on 4.9 TB of text data\n\n1,000+ GPUs (NVIDIA A100, obtained before export restrictions tightened)\n\nCould generate text, images, and video\n\nLate 2021 — Zhipu AI releases GLM-130B\n\nTsinghua University spin-off Zhipu AI released the General Language Model (GLM) at 130B parameters. This was significant because it was the first Chinese LLM to explicitly target English + bilingual performance.\n\n2022: The Calm Before the Storm\n\nWhile OpenAI was quietly training GPT-4 and Anthropic was working on Claude, Chinese labs were making incremental progress:\n\nAlibaba released Tongyi Qianwen (Qwen) 7B and 14B\n\nBaidu launched ERNIE 3.0 Titan (260B parameters)\n\nHuawei released PanGu-Σ (1.085T parameters, MoE)\n\nTencent open-sourced its Hunyuan model family\n\nNone of these made global headlines. The performance gap with GPT-3.5 was real — Chinese models were roughly 6-12 months behind in benchmark scores.\n\nThen ChatGPT launched in November 2022.\n\n2023: China's \"200 Models\" Era\n\nChatGPT's launch sent shockwaves through China. Within weeks, over 200 Chinese companies announced LLM projects. The government fast-tracked approval for commercial LLM deployment.\n\nKey events in 2023:\n\nMarch — Baidu ERNIE Bot Baidu launched ERNIE Bot, China's first public-facing ChatGPT competitor. The launch was rough — the demo was pre-recorded and the actual product had obvious quality issues. Critics called it \"embarrassing.\" But Baidu iterated fast.\n\nApril — Alibaba Qwen open-sourcing Alibaba surprised everyone by open-sourcing the Qwen-7B and Qwen-14B models under a permissive license. The global open-source community took notice.\n\nAugust — China approves commercial LLMs The Chinese government approved 8 LLMs for public commercial use, including Baidu ERNIE, Alibaba Qwen, and Zhipu GLM. This was the starting gun for the AI application boom.\n\nOctober — DeepSeek enters the chat DeepSeek, a hedge-fund-backed AI lab, released its first model — DeepSeek 67B. It was trained on a relatively modest budget ($12M estimated) and achieved performance comparable to LLaMA 2 70B.\n\n2024: The Open-Source Revolution\n\nThis was the year Chinese models stopped being \"behind.\"\n\nJanuary — DeepSeek V2 DeepSeek V2 introduced Mixture-of-Experts (MoE) with a game-changing innovation: Multi-head Latent Attention (MLA). This reduced KV cache usage by 90%, making inference dramatically cheaper.\n\n236B total parameters, 21B active per token\n\nTraining cost: ~\n\n10\n\nM\n\n(\n\nv\n\ns\n\nG\n\nP\n\nT\n\n−\n\n4\n\n′\n\ns\n\ne\n\ns\n\nt\n\ni\n\nm\n\na\n\nt\n\ne\n\nd\n\n10M(vsGPT−4\n\n′\n\nsestimated100M+)\n\nAPI pricing: $0.14/M input tokens\n\nMay — Qwen2 series Alibaba released Qwen2, from 0.5B to 72B. The 72B model was competitive with LLaMA 3 70B. All fully open-source.\n\nDecember — DeepSeek V3 This was the bombshell. DeepSeek V3:\n\n671B total parameters (37B active)\n\nTrained on 2,048 NVIDIA H800 GPUs for 2.788M GPU hours\n\nTotal training cost: $5.576M\n\nPerformance: Comparable to GPT-4o and Claude 3.5 Sonnet\n\nAPI pricing:\n\n0.40\n\n/\n\nM\n\ni\n\nn\n\np\n\nu\n\nt\n\n,\n\n0.40/Minput,1.60/M output\n\nTo put that training cost in perspective:\n\nModel Estimated Training Cost\n\nGPT-4 $100M+\n\nGemini Ultra $200M+\n\nLlama 3 405B ~$30M\n\nDeepSeek V3 $5.6M\n\n2025: The Chinese Model Explosion\n\nJanuary — DeepSeek R1 DeepSeek released R1, an open reasoning model rivaling OpenAI o1. Cost:\n\n1.10\n\n/\n\nM\n\ni\n\nn\n\np\n\nu\n\nt\n\nv\n\ns\n\no\n\n1\n\n′\n\ns\n\n1.10/Minputvso1\n\n′\n\ns15/M. That's 93% cheaper.\n\nMarch — Qwen3 (235B) Alibaba released Qwen3, a 235B MoE model with 128K context. It matched GPT-4o on MMLU, HumanEval, and multilingual benchmarks.\n\nMay — Kimi K2 Moonshot AI released K2, a 1T-parameter MoE model. It led the Chatbot Arena leaderboard for several weeks and was particularly strong at long-context tasks (up to 1M tokens).\n\nWhere We Are Today (May 2026)\n\nModel Params (Active) Input $/M Output $/M MMLU HumanEval\n\nGPT-4o ~1.7T (?) $10.00 $30.00 88.7 90.2\n\nClaude 3.5 Sonnet — $3.00 $15.00 88.3 92.0\n\nDeepSeek V3 671B (37B) $0.40 $1.60 88.5 90.5\n\nQwen3-235B 235B (35B) $0.50 $2.00 88.0 89.8\n\nKimi K2 1T (32B) $0.50 $2.00 89.1 91.2\n\nThe benchmark gap has essentially closed. On some tasks (math, coding, long context), Chinese models actually lead.\n\nWhat Drove This?\n\nThree factors Western developers should understand:\n\nThe compute constraint became an innovation driver When US chip restrictions limited access to NVIDIA H100/B200, Chinese labs had to optimize every last flop. They developed more efficient architectures (MoE, MLA), better training algorithms (FP8 mixed precision), and clever infrastructure hacks (DeepSeek's \"DualPipe\" algorithm).\n\nMassive domestic talent pool China produces roughly 500,000 engineering graduates per year. Top labs (DeepSeek, Zhipu, Moonshot) recruit from Tsinghua, PKU, and Zhejiang University — all world-class CS programs.\n\nGovernment + VC funding Chinese AI labs received over $50B in total funding between 2021-2025. The government designated AI as a strategic priority and provided subsidies, data center access, and fast-track regulatory approval.\n\nThe Takeaway\n\nChinese LLMs are no longer \"just catching up.\" They've become the cost-effective option in a market where Western models keep getting more expensive. DeepSeek V3 delivers roughly 88% of GPT-4o's quality at 4% of the price.\n\nThe Chinese AI story isn't about geopolitical competition. It's about what happens when brilliant engineers face resource constraints and decide to innovate their way out instead of throwing money at the problem.\n\nSources: DeepSeek technical reports (arxiv), BAAI publications, Alibaba Qwen papers, Moonshot AI blog, artificial analysis, llm-stats, public API pricing pages. All data current as of May 2026.\n\nNext up in this series: Why Western AI models are so expensive — and whether the pricing is justified.", "url": "https://wpnews.pro/news/ai-deepseek-machinelearning", "canonical_source": "https://dev.to/_e85ab4244368a4a168fd64/ai-deepseek-machinelearning-1odk", "published_at": "2026-05-30 14:15:57+00:00", "updated_at": "2026-05-30 14:42:20.835290+00:00", "lang": "en", "topics": ["artificial-intelligence", "machine-learning", "large-language-models", "ai-research", "natural-language-processing"], "entities": ["Google Brain", "Baidu", "Alibaba", "Tencent", "ERNIE", "BERT", "GPT-2", "DeepSeek V3"], "alternates": {"html": "https://wpnews.pro/news/ai-deepseek-machinelearning", "markdown": "https://wpnews.pro/news/ai-deepseek-machinelearning.md", "text": "https://wpnews.pro/news/ai-deepseek-machinelearning.txt", "jsonld": "https://wpnews.pro/news/ai-deepseek-machinelearning.jsonld"}}