Highlights:
- Reasoning models differ from standard LLMs due to their ability to “fact-check” their responses.
- The model’s reasoning process is fully transparent, enabling users to track each step it takes to reach an answer.
The Chinese AI startup DeepSeek has introduced a new “reasoning” model, claiming it performs exceptionally well compared to OpenAI’s o1 large language model, which is specifically designed to answer math and science questions with greater accuracy than traditional LLMs.
The startup, a spinoff of the quantitative hedge fund High-Flyer Capital Management Ltd., recently announced on X the preview launch of its first reasoning model, DeepSeek-R1.
Reasoning models differ from standard LLMs due to their ability to “fact-check” their responses. They achieve this by spending significantly more time evaluating how to respond to a prompt, helping them avoid issues like “hallucinations,” a common problem in chatbots like ChatGPT.
When OpenAI introduced the o1 model in September, it highlighted the model’s superior ability to handle queries and questions requiring reasoning skills. This advantage stems from a machine learning technique called “chain of thought” (CoT), which enables the model to decompose complex tasks into smaller steps and process them sequentially, resulting in greater accuracy.
DeepSeek operates similarly, strategizing in advance when tackling complex problems and solving them step by step to ensure accurate responses. However, this process can be time-consuming, and like the o1 model, it may require up to 10 seconds to “think” before generating a response.
The model’s reasoning process is fully transparent, enabling users to observe each step it takes to reach an answer.
The startup claims that DeepSeek-R1 outperforms OpenAI’s o1 on two key benchmarks: AIME and MATH. AIME employs other AI models to assess LLM performance, while MATH consists of challenging word problems. Additionally, according to sources, the model successfully answered several “trick” questions that have previously stumped existing models like GPT-4o and Anthropic PBC’s Claude.
Despite its strengths, DeepSeek-R1 has its shortcomings, with some commenters on X noting that it seems to struggle with logic-based problems like Tic-Tac-Toe. However, o1 has faced similar challenges with these types of tasks.
Users have noted that DeepSeek avoids responding to queries likely considered sensitive by the Chinese government. When asked about topics such as the Tiananmen Square massacre, Chinese President Xi Jinping’s relationship with Donald Trump, or the possibility of China invading Taiwan, it consistently responded with statements like, “not sure how to approach this type of question.”
DeepSeek’s avoidance of politically sensitive queries is likely due to the requirement for Chinese developers to ensure their models align with “core socialist values.”
However, some users have discovered that it’s relatively easy to jailbreak DeepSeek and prompt it to bypass its guardrails. For instance, one user found a way to get the model to provide a detailed recipe and instructions for making methamphetamine, which is illegal in most countries.
DeepSeek is an unconventional AI startup, mainly because it is backed by a quantitative hedge fund that aims to use LLMs to enhance its trading strategies. The company is not new to the AI field, having previously launched an LLM named DeepSeek-V2 for general-purpose text and image generation and analysis. DeepSeek, founded by computer science graduate Liang Wenfeng, has the stated goal of developing “superintelligent” AI.
DeepSeek-R1 is accessible through the DeepSeek Chat application on the company’s website. While it’s free to use, non-paying users are restricted to 50 messages per day. The company also plans to offer DeepSeek-R1 through an application programming interface (API) in the future.