Highlights:
- DeepSeek-V3, an open-source LLM launched in December, serves as the foundation for DeepSeek-R1, the reasoning model that brought the Chinese AI lab into the spotlight earlier this year.
- Sources indicate that the latest DeepSeek-V3 version outperforms the original in programming tasks.
Recently, DeepSeek has unveiled an enhanced version of DeepSeek-V3 large language model under a newly introduced open-source license.
Software developer and blogger Simon Willison was the first to report the update, as DeepSeek did not make an official announcement. The Readme file for the new model, typically used to provide explanatory notes in code repositories, is currently empty.
Released in December, DeepSeek-V3 is an open-source LLM that serves as the foundation for DeepSeek-R1, the reasoning model that brought the Chinese AI lab into the spotlight earlier this year. While DeepSeek-V3 is a general-purpose model rather than one specifically optimized for reasoning, it is capable of solving some math problems and generating code.
Previously, the LLM was available under a custom open-source license. With recent release, DeepSeek has transitioned to the widely adopted MIT License, allowing developers to use and modify the updated model for commercial projects with virtually no restrictions.
Notably, the latest DeepSeek-V3 release seems to offer improved capabilities and greater hardware efficiency compared to the original.
Most advanced LLMs require data center-grade GPUs to operate. However, Awni Hannun, a research scientist at Apple Inc.’s machine learning research group, successfully ran the new DeepSeek-V3 release on a Mac Studio, where it generated output at approximately 20 tokens per second.
The Mac Studio used for testing had a high-end configuration, priced at USD 9,499. Running DeepSeek-V3 on the device required applying four-bit quantization, an optimization technique for LLMs that reduces memory usage and latency at the cost of some output accuracy.
As noted in an X post highlighted by sources, the latest DeepSeek-V3 version demonstrates improved programming capabilities compared to the original release. The post features a benchmark test assessing the model’s ability to generate Python and Bash code. The updated version scored around 60%, outperforming the original DeepSeek-V3 by several percentage points.
The model still lags behind DeepSeek-R1, the AI lab’s flagship LLM optimized for reasoning. Additionally, the latest DeepSeek-V3 release scored lower than Qwen-32B, another model designed for reasoning tasks.
Despite having 671 billion parameters, DeepSeek-V3 activates only about 37 billion when processing prompts. This design allows the model to operate with less infrastructure compared to traditional LLMs that utilize all their parameters. DeepSeek also claims that the model is more efficient than DeepSeek-R1, reducing inference costs.
The initial DeepSeek-V3 model was trained on a dataset containing 14.8 trillion tokens, utilizing approximately 2.8 million GPU hours-considerably less than what cutting-edge LLMs typically require. To enhance its output quality, DeepSeek engineers fine-tuned the model with prompt responses from DeepSeek-R1.