A brisk look at Elon Musk’s Cortex AI supercluster project demonstrates an overreliance on the world’s most profitable chip brand for GPUs and high demand for cooling water and power

A brisk look at Elon Musk's Cortex AI supercluster project demonstrates an overreliance on the world's most profitable chip brand for GPUs and high demand for cooling water and power

What you need to know

  • Elon Musk shared the progress of Tesla’s supercluster, Cortex AI, in Austin, Texas.
  • The project will ship with 50,000 NVIDIA H100s and an additional 20,000 of the company’s custom Dojo AI hardware to foster autonomous driving, energy management, and more.
  • Cortex AI will require up to 130 megawatts (MW) of cooling and power to launch, with projections of 500 MW by 2026.

As a seasoned researcher who has witnessed the rapid advancements in technology over the past few decades, I find myself both intrigued and concerned by the ambitious Cortex AI project undertaken by Tesla. The sheer scale of this endeavor is unprecedented, with 70,000 AI servers and an energy demand that could potentially rival that of small countries.


tech visionary Elon Musk recently revealed updates on Tesla’s Cortex AI supercomputer project at its base in Austin, Texas. This ambitious endeavor is housed within Tesla’s headquarters and consists of approximately 70,000 AI servers. To get it up and running, it will initially need around 130 megawatts (MW) of power and cooling, with an anticipated increase to 500 MW by the year 2026.

Cortex AI is set to boost and refine Tesla’s artificial intelligence systems, fostering advancement. In essence, this technology will be instrumental in honing Tesla’s AI capabilities used in areas such as self-driving cars, energy optimization, and beyond.

The supercluster developed by Tesla might be the largest training system ever, boasting a whopping 50,000 NVIDIA H100 enterprise GPUs along with an extra 20,000 pieces of their hardware. Yet, Elon Musk had hinted earlier that the Cortex AI would come equipped with 50,000 units of Tesla’s own Dojo AI hardware.

For now, it appears that Elon Musk is focusing on upgrading Tesla’s self-designed Dojo supercomputer. His goal is to boost its potential by achieving a training capacity equivalent to 8,000 H100 units before the year ends.

From the video shown and as reported by Electrek, it’s clear that there are ongoing efforts needed before the supercluster can function at full capacity. Currently, the cluster is using a temporary cooling system, and Tesla needs additional network feeders. Taking everything into account, there’s a possibility that the cluster might be ready by October, which coincidentally lines up with the highly anticipated launch of the Robotaxi.

Based on a post Musk made on X earlier this year, it’s expected that Tesla might invest as much as $10 billion in total this year towards Artificial Intelligence, specifically for training and inference. Notably, emails between Musk and NVIDIA suggest he asked them to prioritize the delivery of processors for X and xAI over Tesla’s needs (as reported by CNBC).

As a result, instead of focusing on making Tesla a major player in artificial intelligence, Musk’s demand for X to move ahead of Tesla caused a delay of more than four hundred million dollars worth of processors, pushing the delivery schedule back by several months.

AI projects are becoming a tad expensive 

A brisk look at Elon Musk's Cortex AI supercluster project demonstrates an overreliance on the world's most profitable chip brand for GPUs and high demand for cooling water and power

The AI initiative by Tesla, Cortex, mirrors increasing apprehensions about generative AI and its excessive resource consumption. As skeptics argue that AI is merely a passing trend that has reached its zenith, with predictions suggesting 30% of AI projects will be abandoned by 2025 following proof-of-concept, investors in the sector have voiced their discontent over the high water usage needed for cooling (approximately 1 bottle of water per query).

This situation is accompanied by a significant increase in energy requirements. Forecasts suggest that even with the impending major technological leap in AI, there may not be sufficient electrical power to support AI advancements by 2025. Currently, Google and Microsoft’s electricity consumption outstrips the energy usage of more than 100 nations.

Read More

2024-08-28 12:39