Alibaba’s AI model Qwen3: A smart kid prone to hallucinations – Asia Times

Alibaba’s AI model Qwen3: A smart kid prone to hallucinations – Asia Times

Qwen3 from Alaba Group’s recently released big language model has higher levels of code-writing and mathematical proof skills than some of its American competitors, putting it at the top of the standard figures. &nbsp,

Qwen3 offers two mixture-of-experts ( MoE ) models ( Qwen3-235B-A22B and Qwen3-32B-A3B ) and six dense models. &nbsp,

A MoE you assign a specific “expert” type to answers questions on a particular topic, as well as OpenAI’s ChatGPT and Anthropic’s Claude. By &nbsp learning intricate patterns in data, a compact model can conduct a wide range of tasks, including natural language processing and image classification.

The amount used to teach Qwen3 was double the 36 trillion currencies used by Alibaba, a Hangzhou-based business. Another Hangzhou-based company, DeepSeek, trained its R1 type using 14.8 trillion currencies. The more experienced an AI design is, the more tokens are used.

People may use Qwen3 with lower operating costs and less energy consumption because it has a lower implementation level than DeepSeek V3.

Qwen3-235B-A22B has 235 billion criteria, but just 22 billion can be activated. DeepSeek R1 requires 37 billion activation points and has 671 billion guidelines. Lower operating costs are attributed to low parameters.

After DeepSeek released its R1 unit on January 20, the US property market crashed. DeepSeek R1’s higher efficiency and lower education costs shocked AI property investors.

According to reports, DeepSeek said it would release its R2 design in May. Some AI enthusiasts anticipated that DeepSeek R2 would be able to match up with OpenAI o4-mini in terms of reasoning skill and the ability to match up with R1.

Absurd criterion hacking

Artificial fans have run a number of tests to evaluate Qwen3’s performance since Alibaba released it first on April 29.

Qwen3 received a 70.7 on LiveCodeBench v5, a program that evaluates AI types ‘ capacity to code. This beat DeepSeek R1 ( 64.3 ), OpenAI o3-mini ( 66.3 ), Gemini2.5 Pro ( 70.4), and Grok 3 Beta ( 70.6 ).

On AIME’24, which tests AI models ‘ mathematical-proofing ability, Qwen3 scored 85.7, better than DeepSeek R1 (79.8 ), OpenAI o3-mini (79.6), and Grok 3 Beta (83.9 ). But, it lacked a score of 92, trailing only behind Gemini2.5 Pro.

The reporter for the newspaper discovered that Qwen3 is unable to deal with difficult reasoning problems and lacks knowledge in some areas, leading to “hallucinations,” a typical situation where an AI model delivers false information.

We requested that Qwen3 write some Chinese-language reports. The reporter claimed that the stories are more sensitive and competent than those created by earlier AI models, but that their flow and scene arcs are illogical. The AI type appears to be putting everything together without any thought.

According to Artificial Analysis, an independent AI benchmarking and analysis firm, Qwen3 received a 70 % in terms of scientific reasoning, trailing behind Gemini 2.5 Pro ( 84 % ), OpenAI o3-mini (83 % ), Grok 3 mini (79 % ), and DeepSeek R1 ( 71 % ). &nbsp,

In terms of reasoning and knowledge in humanity, Qwen3 scored 11.7 %, beating Grok 3 mini ( 11.1 % ), Claude 3.7 ( 10.3 % ), and DeepSeek R1 ( 9.3 % ). However, it still lagged behind OpenAI o3-mini ( 20 % ) and Gemini 2.5 Pro ( 17.1 % ).

Satya Nadella, the CEO of Microsoft, stated in February of this year that focusing on self-proclaimed milestones like achieving artificial general intelligence ( AGI ) is only a form of “nonsensical benchmark hacking.”

He claimed that an AI type can only win if it contributes to a 10 % annual increase in the country’s GDP. &nbsp,

Chip deficit

Chinese AI companies must now compete with United players in a new market with fewer AI chips, despite the need for more time.

Chinese media reported that ByteDance, Alibaba, and Tencent reportedly purchased more than 100 000 H20 chips from Nvidia for 16 billion yuan ( US$ 2.2 billion ) in early April.

Nvidia announced on April 15 that the US government had informed the company that it would require a permit to export its H20 Artificial chips to China. The government cited the danger that Chinese companies would incorporate the H20 cards into supercomputers.

On May 2, The Data reported that Nvidia had informed some of its biggest Foreign customers&nbsp that it is changing the way its AI cards are designed to keep China from shipping AI cards there. As early as June, a specimen of the new device will be accessible.

Nvidia has already made several changes to its AI cards for the Chinese business. Nvidia created the A800 and H800 chips after Washington imposed a ban on the A100 and H100’s trade to China in October 2022. Yet, the US government tightened its trade controls in October 2023 to include them. The H20 was finally unveiled by Nvidia.

Instead of purchasing Huawei’s Ascend 910B device, which has a limited supply due to a lower production yield, Chinese companies are still rushing to get it despite only performing 15 % of the H100.

The Ascend 910B is faster than the H20, according to a Chinese IT journalist, but the H20’s speed is ten days the 910B’s. He claimed that a more robust effectiveness can be achieved with a higher speed in an AI chip, similar to a better engine in a sports vehicle.

China’s AI companies may try to use homegrown chips, such as the Siyuan 590, DCU line from Hygon Information Technology, Moore Threads ‘ MTT S80, BR104 from Biren Technology, or Huawei’s forthcoming Ascend 910C, according to The Application of Electronic Technique, a Chinese medical journal. &nbsp, &nbsp,

Read more about China’s Manus, the hottest fresh AI in the spotlight, in After DeepSeek.