DeepSeek's DSpark speeds up AI generation by 60-85% without new models

DeepSeek has just taken a fresh run at one of the most stubborn bottlenecks in AI serving: how to make a large language model spit out text faster without swapping in a new model. The company’s new framework, DSpark, is a serving-side optimization rather than a new model release. It slots a lightweight “draft module” onto existing DeepSeek-V4 weights and, in production tests, cut per-user generation time by 60–85% compared with the company’s MTP-1 baseline while keeping output lossless. The team also open-sourced DeepSpec, an MIT-licensed codebase for training and evaluating speculative decoding drafters.

Behind the speed-up: three levers, not a new architecture

Speculative decoding has been around for a while, but DSpark refines the classic two-stage process—draft then verify—by pulling three levers at once. First, it drafts faster by using a parallel backbone (in their setup, DFlash) that produces base logits for every position in one forward pass. Second, it drafts better by adding a tiny sequential head that biases each token toward locally coherent continuations; a Markov head with rank-256 low-rank factorization is the default. Finally, it verifies smarter: a confidence head and a load-aware scheduler decide how many draft tokens to accept or reject based on current GPU utilization, avoiding wasted compute during peak loads.

Open weights, open training code

The new checkpoints—DeepSeek-V4-Pro-DSpark and DeepSeek-V4-Flash-DSpark—reuse the original V4 weights and simply attach the draft module. DeepSpec, the accompanying training framework, is released under the MIT license, giving teams a ready-made path to train custom drafters or reproduce the results. Offline benchmarks show DSpark’s accepted token length rising 26–31% over Eagle3 and 16–18% over DFlash, while production logs on DeepSeek-V4 confirm the 60–85% speed-up over the MTP-1 baseline. Because the verification step preserves the target-model distribution exactly, quality remains unchanged.

Source: MarkTechPost. AI-assisted editorial synthesis — TechnoExpress.

DeepSeek's DSpark speeds up AI generation by 60-85% without new models

Behind the speed-up: three levers, not a new architecture

Open weights, open training code

Essential tech, every morning