How DeepSeek did it – Asia Times
With the release of highly effective AI models that can compete with cutting-edge products from US companies like OpenAI and Anthropic, Chinese artificial intelligence ( AI ) company DeepSeek has shocked the tech industry.
With a fraction of the money and computing power of its competitors, DeepSeek, which was founded in 2023, has been able to achieve its goals.
DeepSeek’s “reasoning” R1 unit, released last week, provoked enthusiasm among academics, shock among shareholders, and reactions from AI heavyweights. A model that you work with both images and text was released on January 28th.
But what has DeepSeek done, and how did it do it?
In December, DeepSeek released its V3 type. This is a very effective” normal” large language model that works at a similar amount to OpenAI’s GPT-4o and Anthropic’s Claude 3.5.
These types can perform tasks like writing essays, writing system script, and correcting errors when they are prone to make up their own facts. On some testing of problem-solving and scientific argument, they score better than the average man.
V3 was trained at a noted value of about US$ 5.58 million. This is dramatically cheaper than GPT-4, for example, which cost more than$ 100 million to develop.
DeepSeek even claims to have trained V3 using around 2, 000 professional computer chips, especially H800 GPUs made by Nvidia. This is again little fewer than other businesses, which may have used up to 16, 000 of the more prominent H100 cards.
On January 20, DeepSeek released another unit, called R1. This is a so-called “reasoning” unit, which tries to work through difficult problems step by step. These models appear to be better at a number of tasks that call for context and have numerous linked components, including reading comprehension and strategic planning.
The R1 concept was modified to make room for V3 using a method known as reinforcement learning. R1 appears to work at a similar amount to OpenAI’s o1, released next year.
DeepSeek also used the same technique to make “reasoning” types of little open-source designs that can work on household servers.
This announcement has caused a significant increase in interest in DeepSeek, increasing the popularity of its V3-powered robot app, and causing a significant price drop in tech stocks as investors reevaluate the Artificial industry. At the time of writing, chipmaker Nvidia has lost around$ 600 billion in value.
DeepSeek’s advances have been in achieving greater performance: getting good results with fewer tools. In specific, DeepSeek’s engineers have pioneered two methods that may be adopted by AI researchers more widely.
The first involves a scientific concept known as” sparsity.” Although V3 has around 671 billion guidelines, only a small portion of these variables is used for any given type, AI models have a lot of them.
But, identifying which criteria will be needed isn’t simple. DeepSeek used a new approach to do this, and subsequently trained solely those guidelines. As a result, its types needed much less education than a standard method.
The other flaw is related to how V3 shops data in memory. DeepSeek has discovered a smart way to condense the relevant data to make it easier to store and get immediately.
People can download and change the concepts and methods used by DeepSeek under the complimentary MIT License.
Although this may be bad for some Artificial companies– whose profits may be hampered by the availability of readily available, effective models – it is also good for the broader Iot research community.
At present, a lot of AI exploration requires access to huge amounts of technology solutions. Scientists like myself who are based at universities ( or anywhere else besides big tech firms ) have had limited access to conducting tests and tests.
The condition can be changed by more effective designs and methods. For us, research and growth may now be much simpler.
For consumers, exposure to AI may also become cheaper. More AI designs may be run on people ‘ personal tools, such as devices or apps, rather than running “in the sky” for a membership fee.
More productivity may have a smaller impact for scientists who already have a lot of sources. Whether DeepSeek’s strategy will help to create models with better overall performance or just more effective designs remains to be seen.
Tongliang Liu is the chairman of the University of Sydney’s Sydney AI Centre and associate professor of machine learning.
This content was republished from The Conversation under a Creative Commons license. Read the original post.