How DeepSeek revolutionized AI’s cost calculus – Asia Times

State-of-the-art artificial intelligence systems like OpenAI’s ChatGPT, Google’s Gemini and Anthropic’s Claude have captured the public mind by producing competent language in many cultures in response to user causes. These businesses have also made articles with their significant investments in developing stronger designs.

DeepSeek, a Chinese AI company, has stymied expectations regarding the amount of funding required to create the most recent and greatest AIs. In the process, they’ve cast doubt on the billions of dollars of funding by the great AI people.

I study system teaching. DeepSeek’s destructive debut comes lower not to any beautiful technological breakthrough but to a time-honored exercise: finding efficiency. In a niche that consumes large computing sources, that has proved to be important.

Where the prices are located

Building a huge language model is the first step in developing such potent AI systems. Given the preceding words, a large vocabulary model predicts the following word. A large language model might predict that the next word in a sentence will be” Einstein” if the first sentence is” The theory of relativity was discovered by Albert.” In a procedure known as pretraining, big language versions are trained to become proficient in making such projections.

Pretraining requires a lot of processing power and data. The businesses use online crawling and book scanning to gather data. Graphic processing devices, or GPUs, are typically used for processing.

Why pictures? It turns out that straight arithmetic is the same branch of mathematics that is used to support both system design and the artificial neural networks that support significant language versions. Hunderts of billions of numbers are privately stored in large language versions as parameters or weights. It is these workouts that are modified during pretraining.

YouTube video

Big language models use a lot of computing power, which results in a lot of energy.

Pretraining is, nevertheless, not enough to offer a consumer goods like ChatGPT. A huge language model with pre-training typically lacks the ability to follow people instructions. It might also not remain aligned with individual preferences. For example, it may result dangerous or harsh terminology, both of which are current in words on the web.

The pre-trained design, therefore, typically goes through further stages of training. In teaching setting, where the design is shown examples of human guidance and expected responses, is one of these stages.

A step called encouragement learning from mortal feedback follows after training tuning. People annotators are shown numerous large-language design responses to the same prompt at this stage. Finally, the annotators are asked to indicate which response they prefer.

It is easy to see how expenses add up when building an AI concept: hiring top-quality AI ability, building a data center with hundreds of GPUs, collecting information for pretraining, and running pretraining on GPUs. Also, there are costs associated with the human feedback stages of training tuning and reinforcement learning by collecting and computing data.

All included, prices for building a cutting-edge Artificial type can jump up to US$ 100 million. GPU instruction accounts for a major portion of the total cost.

When the unit is finished, the costs does not end. When the unit is deployed and responds to customer prompts, it uses more processing, known as test period or conclusion time compute.

GPUs are also required for check time computation. With their most recent model, OpenAI announced a new phenomenon in December 2024: as test time compute increased, the model improved in scientific reasoning tasks like math tournament and dynamic coding problems.

Slimming down resource consumption

Thus, it appeared that investing in more computation during both training and inference was the key to creating the best AI models in the world. However, DeepSeek stepped up and reversed this pattern.

YouTube video

DeepSeek sent shockwaves through the financial sector’s tech sector.

Their V-series models, culminating in the V3 model, used a series of optimizations to make training cutting-edge AI models significantly more economical. According to their technical report, V3 training cost less than$ 6 million.

They admit that this cost does not include costs of hiring the team, doing the research, trying out various ideas and data collection. However,$ 6 million is still a respectable sum for training a model that is much more expensive than the leading AI models that have been developed.

The cost savings were not just a magic number. It was the result of numerous wise engineering choices, including reducing the number of bits used to represent model weights, and improving the neural network architecture to reduce communication overhead as data travels between GPUs.

The DeepSeek team did not have access to high-performance GPUs like the Nvidia H100 because of U.S. export restrictions against China. Instead they used Nvidia H800 GPUs, which Nvidia designed to be lower performance so that they comply with U. S. export restrictions. Working with this restriction appears to have prompted the DeepSeek team to ingenuity even more.

Additionally, DeepSeek made some improvements to make inference less expensive, lowering the cost of maintaining the model. Moreover, they released a model called R1 that is comparable to OpenAI’s o1 model on reasoning tasks.

Resetting expectations

They released all the model weights for V3 and R1 publicly. Anyone can download and modify their models to improve or customize them. Furthermore, DeepSeek released their models under the permissive MIT license, which allows others to use the models for personal, academic or commercial purposes with minimal restrictions.

The landscape of large AI models has been fundamentally altered by DeepSeek. An economically trained open weights model is now comparable to more expensive and closed models that demand paid subscription plans.

The stock market and the research community will need some time to adjust to this new reality.

Ambuj Tewari is professor of statistics, University of Michigan

This article was republished from The Conversation under a Creative Commons license. Read the original article.