Highlights:
- Musk explained that Colossus is outfitted with 100,000 of Nvidia’s H100 GPUs.
- Grok-2, xAI’s leading LLM, was trained on 15,000 GPUs. With Colossus’ 100,000 chips, the development of much more advanced language models is possible.
Elon Musk’s xAI Corp. has unveiled ‘Colossus,’ an AI training system outfitted with 100,000 graphics cards.
Musk recently shared news of this milestone in a post on X. The system, named Colossus by xAI, was activated over the weekend.
The CEO founded xAI last year to compete with OpenAI, against which he is currently pursuing legal action for alleged breach of contract. The startup is working on a series of large language models known as Grok. In May, xAI raised USD 6 billion in funding, reaching a valuation of USD 24 billion to fuel its AI development efforts.
In a recent post on X, Musk referred to the newly launched Colossus as the “most powerful AI training system in the world.” This implies that it surpasses the U.S. Energy Department’s Aurora system, which is currently the fastest AI supercomputer globally. In a benchmark test conducted in May, Aurora achieved a peak speed of 10.6 exaflops with 87% of its hardware operational.
Musk revealed that Colossus is equipped with 100,000 Nvidia H100 graphics cards. Launched in 2022, the H100 was Nvidia’s most advanced AI processor for over a year and can execute language models up to 30 times faster than the company’s previous-generation GPUs.
A key factor in the H100’s performance is its Transformer Engine module, a collection of circuits designed specifically to run AI models built on the Transformer neural network architecture. This architecture supports GPT-4o, Meta Platforms Inc.’s Llama 3.1 405B, and numerous other advanced LLMs.
Musk disclosed that xAI plans to double Colossus’ chip count to 200,000 in the coming months. He noted that 50,000 of these new processors will be H200s. The H200 is an upgraded and significantly faster version of the H100, which Nvidia launched last November.
AI models frequently move information between the chip’s logic circuits and its memory, more so than many other types of workloads. As a result, increasing the speed of data transfer between the memory and logic modules can boost AI model performance. The H200 performs these data transfers much faster than the H100.
The H200’s speed boost comes from two key architectural enhancements. First, Nvidia replaced the H100’s HBM3 memory with a newer RAM type, HBM3e, which enables quicker data transfers to and from the chip’s logic circuits. Second, the company nearly doubled the onboard memory to 141 gigabytes, allowing the H200 to store more of an AI model’s data close to its logic circuits.
Grok-2, the flagship LLM from xAI, was trained using 15,000 GPUs. With Colossus’ 100,000 chips, the development of language models with much greater capabilities could become a reality. The company is reportedly aiming to release Grok-2’s successor by the end of the year.
Some of Colossus’ servers might be running on chips initially designated for Tesla Inc. In January, CNBC reported that Musk requested Nvidia to redirect 12,000 H100s, valued at over USD 500 million, from Tesla to xAI and AI projects. That same month, Musk estimated that Tesla would invest between USD 3 billion and USD 4 billion in Nvidia hardware by year’s end.