DeepSeek’s Advances and the Question of Peak AI Compute
Examining DeepSeek’s Impact on AI Compute Scaling, Synthetic Data Costs, and Market Disruption.
There’s been justifiable buzz around the recent releases from DeepSeek, with their DeepSeek v3 LLM and R1 reasoning LLM shaking up the leadership of top American AI labs. Several factors have contributed to this disruption, primarily:
Performance comparable to models from OpenAI and Anthropic.
Reportedly much lower costs for training these models.
Substantially reduced inference costs.
Achieving all this despite China facing AI chip restrictions imposed by the US.
Financial markets have reacted strongly to this development. The prevailing narrative is that since DeepSeek managed to train their models with significantly less compute, future demand for GPUs might decline as training new state-of-the-art LLMs could require fewer resources. As of January 27th, 2025, NVIDIA’s stock price has dropped 17.4%, wiping out $600 billion in market capitalization.
However, I believe this narrative is being overstated. Several key considerations are being overlooked: the historical trajectory of AI development, the nuances of current AI research, and the strategic decisions now facing AI labs. In this post, I’ll explore each of these aspects and argue that while DeepSeek’s advancements are undeniably impressive, they don’t necessarily signal a seismic shift in the demand for GPUs.
Don’t Bet Against Scaling Compute
Throughout AI’s history, scaling compute and data has consistently outperformed efforts to embed handcrafted “intelligence” into models. Take Convolutional Neural Networks (CNNs): introduced by Yann LeCun in the 1980s, their potential remained untapped until AlexNet in 2012 leveraged GPUs and the ImageNet dataset, revolutionizing computer vision by scaling compute and data.
This pattern has repeated in Chess, Go, and large language models (LLMs), where scaling has outpaced systems relying on human-crafted rules. As Richard Sutton’s “The Bitter Lesson” reminds us, breakthroughs come from letting data and compute drive learning, not from designing domain-specific shortcuts.
Applying this to DeepSeek, their success with modest compute is remarkable, but imagine pairing their approach with significantly more resources. It’s likely leading AI labs are already dissecting their methods to adapt and scale them. Far from reducing compute, history suggests the same levels of compute will be used, but far more efficiently, paving the way for the next major leap.
Hidden Compute Costs in Synthetic Data and Training Efficiency
Everyone in the market has focused on the headline figure from DeepSeek’s v3 paper—that they trained the model with around $6M worth of compute. However, this glosses over a significant cost of training these models: the compute required to generate the synthetic data used to train DeepSeek v3 and R1.
Major labs are increasingly using synthetic data to train their models, and a large part of compute costs has shifted from pre-training and supervised fine-tuning to reinforcement learning driven by synthetic data. SemiAnalysis even suggests that compute usage in post-training (synthetic data generation and reinforcement learning) may now exceed where the majority of compute was traditionally concentrated in LLMs—pre-training and fine-tuning.
Take the R1 paper as an example: DeepSeek mentions using “about 800k samples” of synthetic data to teach their models to reason. To get to this curated dataset, they had to generate many times more, discarding incorrect samples and keeping only accurate ones. This filtering process adds a significant hidden compute cost that isn’t included in the top-line figure.
It’s also highly likely that DeepSeek relied heavily on synthetic training data generated by OpenAI’s models to train their own, reducing the amount of compute it would require for the total training run. For instance, when prompted with questions like “Which model are you?” DeepSeek v3 often responds as though it is ChatGPT. This raises broader strategic questions for AI labs: should we publish our latest research? How do we ensure our models aren’t used to train competitors? These implications may be more impactful for lab strategy than the actual compute costs.
While DeepSeek may have trained their model at a discount, claiming they achieved this with “substantially lower compute requirements” feels like a stretch—particularly for the R1 model.
Reassessing Peak AI Compute: Inference and Post-Training Costs
DeepSeek’s contributions to AI research are undeniably significant. Their work pushes the field forward and opens up new avenues for exploration, particularly benefiting those in open-source LLM research and those building solutions on top of open-source models. However, calling this the peak of compute scaling—and claiming further scaling won’t be worthwhile—overlooks key aspects of DeepSeek’s approach. Their R1 model, for instance, can be highly compute-intensive during post-training and even more so during inference. As benchmarked by SemiAnalysis, a reasoning model required approximately 10x more tokens to generate an answer. With such demands in post-training and inference, does it really make sense to declare that compute scaling has peaked?
The more pressing takeaway from what DeepSeek has achieved lies, in my view, in the strategic implications for leading AI labs. DeepSeek has completely reshaped the competitive landscape by demonstrating the viability of training cheaper models poaching synthetic data from a leading closed LLM. Moreover, by offering their solution at a massive discount—25x cheaper than OpenAI and Anthropic—they have drastically undercut their competitors' pricing power and disrupted the market dynamics.