Elon Musk’s xAI developed Colossus in just 122 days — “The most powerful AI training system in the world” powered by 100K NVIDIA H100 GPUs could that make Grok ‘the most powerful AI by every metric’

Elon Musk's xAI developed Colossus in just 122 days — "The most powerful AI training system in the world" powered by 100K NVIDIA H100 GPUs could that make Grok 'the most powerful AI by every metric'

What you need to know

  • Elon Musk’s xAI team launched Colossus, powered by 100,000 NVIDIA H100 GPUs.
  • The company plans to double the system’s capacity with an additional 50,000 NVIDIA H100 and H200 GPUs in the next few months.
  • Colossus could help train Grok 3 to become the most powerful AI by every metric.

As an observer with a keen interest in technology and AI, I find myself both amazed and slightly concerned by Elon Musk’s latest venture – the Colossus supercomputer. The speed at which this project was completed is truly remarkable, given the scale of its ambition. In just 122 days, the xAI team has managed to assemble a training cluster powered by an impressive 100,000 NVIDIA H100 GPUs, making it the most powerful AI training system in the world.


Recently, Elon Musk disclosed that his team at xAI was devising an ambitious strategy to educate Grok AI employing “the most advanced AI training facility globally,” with an aim to create the “world’s most potent AI” by all standards

As a tech enthusiast, I’m thrilled about Elon Musk’s latest announcement: The world’s most potent training system, christened Colossus supercomputer, is now live! This beast of a machine boasts an army of 100,000 NVIDIA H100 GPUs, ready to be harnessed for training purposes. In the coming months, we can expect an expansion of its capabilities with an extra 50,000 NVIDIA H100 and H200 GPUs!

This weekend, the XAI team made the Colossus 100k training cluster available online. It took from its initial start to completion over a period of 122 days. The Colossus system is currently one of the most powerful AI training systems globally. By a few months, it’s expected to double in size to 200k (50k H200k). Impressive…September 2, 2024

According to Elon Musk:

“This weekend, the team brought our Colossus 100k H100 training cluster online. From start to finish, it was done in 122 days. Colossus is the most powerful AI training system in the world. Moreover, it will double in size to 200k (50k H200s) in a few months. Excellent work by the team, Nvidia and our many partners/suppliers.”

The expenses associated with finishing the Memphis, Tennessee project are still unclear, but it’s worth noting that NVIDIA’s H100 chip carries a price tag of approximately $30,000. This figure is consistent with the tech giant’s projected spending of around $3-4 billion on acquiring GPUs. Given the energy-intensive nature of AI projects, it’s important to remember that they require substantial amounts of electricity for power and cooling water

Approximately $5 billion out of the approximately $10 billion that I mentioned Tesla will spend this year on artificial intelligence-related expenses are internal costs, mainly going towards designing the AI inference computer and sensors used in all our vehicles, as well as Dojo. Regarding the construction of the AI training superclusters, we’re planning to use Nvidia hardware for around… June 4, 2024

Recently, Grok 2 was exclusively released to X premium and X premium plus subscribers, featuring capabilities for generating images and text. Known for its uncensored nature, the Language Learning Model (LLM) behind Grok 2 was trained using an impressive 15,000 Graphics Processing Units (GPUs). The launch of Colossus paves the way for the successors of Grok 2 to leverage over 100,000 NVIDIA H100 GPUs during their training phase, suggesting a promising future for these upcoming models

Elon Musk may transport Grok 2’s next version by December, boasting as the most powerful AI across all performance indicators. However, this decision has sparked controversy due to reports that users’ X data is being utilized to train the AI model without their explicit permission, raising privacy concerns. Authorities are currently examining this issue, and if X fails to provide a valid legal justification for its actions, it could face fines equivalent to 4% of its annual global revenue

Read More

2024-09-05 13:39