News | Chinese AI Developer DeepSeek Released DeepSeek-V3

Chinese AI Developer DeepSeek Released DeepSeek-V3

Published by: Insights Desk Released: Dec 27, 2024 Source: DemandTalk

Highlights:

DeepSeek-V3 is built on a mixture of expert (MoE) architecture, consisting of several neural networks, each specialized in a distinct set of tasks.
The MoE architecture reduces hardware costs by activating only the relevant neural network for a given prompt, rather than the entire LLM.

Recently, Chinese AI developer DeepSeek released DeepSeek-V3, a new open-source large language model with 671 billion parameters.

The LLM is capable of generating text, writing software code, and performing related tasks. According to DeepSeek, it surpasses two of the most advanced open-source LLMs available in over half a dozen benchmark tests.

DeepSeek-V3 uses a mixture of expert (MoE) architecture, with multiple neural networks, each focused on a particular set of tasks. Upon receiving a prompt, a routing component directs the request to the neural network most suited to handle it.

The primary advantage of the MoE architecture is its ability to lower hardware costs. When a prompt is sent to DeepSeek-V3, only the specific neural network assigned to handle the request is activated, rather than the entire LLM. Each of these neural networks has 34 billion parameters, requiring relatively modest infrastructure to operate.

While the MoE architecture offers advantages, it also presents challenges. During training, some neural networks in an MoE model may receive more data than others, potentially leading to inconsistencies in the LLM’s output quality. DeepSeek claims to have developed and implemented a new method in DeepSeek-V3 to address this issue.

The LLM was trained on 14.8 trillion tokens, with each token representing a few letters or numbers. The training process required 2.788 million GPU hours, indicating relatively modest infrastructure usage. Advanced AI clusters in the industry, equipped with tens of thousands of GPUs or more, can accomplish similar training tasks within a few days.

In addition to its MoE architecture, DeepSeek-V3 incorporates several optimizations aimed at enhancing its output quality.

LLMs employ a technique called attention to pinpoint the most important details in a sentence. DeepSeek-V3 uses multihead latent attention, an enhanced version of this method that enables it to extract critical details from a text snippet multiple times instead of just once. This reduces the likelihood of overlooking important information.

DeepSeek-V3 also includes a multitoken prediction feature. Unlike traditional language models that generate text one token at a time, DeepSeek-V3 produces multiple tokens simultaneously, accelerating the inference process.

DeepSeek tested its algorithm against three other open-source LLMs: the earlier DeepSeek-V2, Llama 3.1 405B, and Qwen2.5 72B. DeepSeek-V3 surpassed them on all nine coding and math benchmarks in the evaluation and excelled in a range of text-processing tasks.

The DeepSeek-V3 code is available on Hugging Face.

harnessing ai: the future of business transformati...

prepare for the future now. achieve greater, secur...

stay ahead with modern technology...

stay ahead...

workforce upskilling for the ai era...

unlock the full potential of generative ai at work...

ai pcs are quickly becoming the key to achieving s...

developing tomorrow’s ai on today’s ai-ready w...

unveiling ai-level productivity...

the new cyber security opportunity in an ‘ai eve...

how ai is changing managed detection and response...

answering your 4 biggest questions about generativ...

understanding the costs of generative ai...

the top 5 generative ai questions on every executi...

7 leading generative ai use cases...

6 steps to success with generative...

revolutionize your product launches with ai-driven...

unlock the full potential of ai-powered software d...

new era energy efficiency whitepaper longform...

new era energy efficiency whitepaper longform...

leading the way: how modern workplaces embrace cha...

choosing the right ai foundation model for your ne...

ai governance: the path to responsible ai...

ai in market research: new possibilities, new insi...

ai ready workforce: upskilling for the ai era...

ai pricing strategy: the key to sustainable busine...

ai in business strategy: enhancing decisions boos...

genai at work: revolutionizing modern business ope...

ai misinformation: ai’s role in amplifying misin...

decision intelligence empowering business actions ...

committee machine in ml harnessing ensemble techni...

information processing language serves scalable an...

ai agents in business: transforming operations dr...

ai adoption framework: key components for effectiv...

machine learning use cases that deliver tangible r...

profitable ai-powered data management solutions to...

business-centric cognitive architecture revolution...

ai use cases – innovations for business success...

the role of ai in software development...

ai in cybersecurity – your digital guardian...

google introduces two new ai features to enhance i...

coreweave plans to buy weight biases for seamless...

openai launches nextgenai consortium with 15 insti...

anthropic pbc raises usd 3.5 b at usd 61.5 b value...

openai introduces gpt-4.5 as the most advanced and...

amazon launches alexa , an llm-powered assistant...

perplexity ai is creating a browser for ‘agentic...

mongodb acquires voyage ai for ai models generatin...

couchbase enhances its agentic ai capabilities wit...

google introduces deep research to dig deeper into...

openai surpasses 400 m weekly users...

luminance technologies secures usd 75 m for legal ...

enso introduces ai marketplace offering monthly su...

hightouch secures usd 80 m at usd 1.2 b valuation...

safe superintelligence reportedly securing usd 1b...

spotdraft raises usd 54 m to help legal teams...

tines secures usd 125m to advance its workflow aut...

openai is producing its in-house ai chip to suppor...

nxp acquires kinara for usd 307m to enhance edge a...

truefoundry secures usd 19m for ai workload manage...

role of machine learning in networking...

Chinese AI Developer DeepSeek Released DeepSeek-V3

Insights Desk

Related posts

Google Introduces Two New AI Features to Enhance i...

CoreWeave Plans to Buy Weight Biases for Seamless...

OpenAI Launches NextGenAI Consortium with 15 Insti...

Anthropic PBC Raises USD 3.5 B at USD 61.5 B Value...

OpenAI Introduces GPT-4.5 as the Most Advanced and...

Amazon Launches Alexa , an LLM-powered Assistant...

Perplexity AI is Creating a Browser For ‘Agentic...

MongoDB Acquires Voyage AI for AI Models Generatin...

Couchbase Enhances its Agentic AI Capabilities wit...

Google Introduces Deep Research to Dig Deeper into...

Our Brands