Faster AI, lower costs: DSpark eases inference bottlenecks and chip strain, says DeepSeek Chinese AI startup DeepSeek unveiled DSpark, a speculative decoding framework that speeds up AI inference by up to 85% while reducing costs and chip strain. The module uses a lightweight draft model and semi-autoregressive generation to accelerate response generation, addressing GPU underutilization and high latency in AI serving. Faster AI, lower costs: DSpark eases inference bottlenecks and chip strain, says DeepSeek Start-up unveils speculative decoding framework that speeds up inference by up to 85 per cent amid China’s push to overcome US AI curbs Ben Jiang /author/ben-jiang in Beijing reducing serving costs and enhancing user experience https://www.scmp.com/tech/article/3357926/china-ai-ready-match-anthropics-fable-5-musk-zhipus-tang-clash-over-glm-52-rise?module=inline&pgtype=article . reduce AI systems’ reliance on larger, more powerful chip infrastructure https://www.scmp.com/tech/big-tech/article/3357289/ai-less-price-war-china-deepens-amid-intense-competition?module=inline&pgtype=article . AI models’ conventional token-by-token output often slowed when responses were lengthy, leading to low utilisation of graphics processing units GPU and high user-perceived waiting time, which was a “primary bottleneck in serving AI”, the company said in research published on Saturday. DeepSeek said the DSpark module accelerated AI response generation – also known as AI inference, which refers to serving a trained model to respond to user queries – by using a lightweight draft model to propose candidate responses and then verifying them in batches with a larger model, speeding up output. DSpark further refined the approach with a semi-autoregressive generation method, allowing the model to produce small chunks of tokens rather than strictly one at a time. It also introduced a confidence-based scheduling system that dynamically adjusted how much verification was applied based on computing demand, helping balance speed and output quality.