Just a week or so ago, a little-known Chinese technology company called DeepSeek quietly debuted an artificial intelligence app. What happened next was anything but quiet.
U.S. technology stocks reeled, losing billions of dollars in value. Why? DeepSeek’s AI was developed and trained on the cheap – just pennies on the dollar compared to the vast sums of money American AI companies have poured into research and development. And experts say DeepSeek appears to be just as good as household names like ChatGPT and Microsoft Copilot.
Is this a technology fluke? A shot across the computing bow? An AI anomaly?
UVA Today chatted with Michael Albert, an AI and computing expert in the University of Virginia’s Darden School of Business.
Q. First of all, what is DeepSeek?
A. DeepSeek is a Chinese AI research lab, similar to OpenAI, founded by a Chinese hedge fund, High-Flyer. Unlike other commercial research labs, outside of maybe Meta, DeepSeek has primarily been open-sourcing its models. Unlike even Meta, it is truly open-sourcing them, allowing them to be used by anyone for commercial purposes. It has released several families of models, each with the name DeepSeek followed by a version number.
The recent excitement has been about the release of a new model called DeepSeek-R1. DeepSeek-R1 is a modified version of the DeepSeek-V3 model that has been trained to reason using “chain-of-thought.” This approach teaches a model to, in simple terms, show its work by explicitly reasoning out, in natural language, about the prompt before answering. This chain-of-thought approach is also what powers GPT o1 by OpenAI, the current best model for mathematics, scientific and programming questions. DeepSeek-R1 is so exciting because it is a fully open-source model that compares quite favorably to GPT o1.
Q. Why have so many in the tech world taken notice of a company that, until this week, almost no one in the U.S. had heard of?
A. The excitement around DeepSeek-R1 this week is twofold. First, the fact that a Chinese company, working with a much smaller compute budget (allegedly $6 million versus $100 million for OpenAI GPT-4), was able to achieve a state-of-the-art model is seen as a potential threat to U.S. dominance in AI.
However, the alleged training efficiency seems to have come more from the application of good model engineering practices more than it has from fundamental advances in AI technology. There does not seem to be any major new insight that led to the more efficient training, just a collection of small ones.
The second cause of excitement is that this model is open source, which means that, if deployed efficiently on your own hardware, leads to a much, much lower cost of use than using GPT o1 directly from OpenAI. This opens new uses for these models that were not possible with closed-weight models, like OpenAI’s models, due to terms of use or generation costs.
Q. Investors have been a little cautious about U.S.-based AI because of the enormous expense required, in terms of chips and computing power. Is DeepSeek’s AI model mostly hype or a game-changer?
A. DeepSeek-R1 is not a fundamental advance in AI technology. It is an interesting incremental advance in training efficiency. However, it was always going to be more efficient to recreate something like GPT o1 than it would be to train it the first time. The reality is that the major expense for these models is incurred when they are generating new text, i.e. for the user, not during training. DeepSeek-R1 seems to only be a small advance as far as efficiency of generation goes. The real seismic shift is that this model is fully open source.
Also, this does not mean that China will automatically dominate the U.S. in AI technology. In December of 2023, a French company named Mistral AI released a model, Mixtral 8x7b, that was fully open source and thought to rival closed-source models. However, closed-source models adopted many of the insights from Mixtral 8x7b and got better. Since then, Mistral AI has been a relatively minor player in the foundation model space.
Q. The U.S. has been trying to control AI by limiting the availability of powerful computing chips to countries like China. If AI can be done cheaply and without the expensive chips, what does that mean for America’s dominance in the technology?
A. I don’t think that DeepSeek-R1 means that AI can be trained cheaply and without expensive chips. What they have allegedly demonstrated is that previous training methods were somewhat inefficient. This just means that the next round of models from U.S. companies will be trained more efficiently and achieve even better performance, assuming that models are not plateauing.