TheTechOasis
Posts
AI's Elephant in the Room: Energy Constraints

AI's Elephant in the Room: Energy Constraints

Ignacio de Gregorio Noblejas
June 22, 2024

🏝 TheTechOasis 🏝

part of the:

In AI, learning is winning.

This post is part of the Premium Subscription, in which we analyze technology trends, companies, products, and markets to answer AI's most pressing questions.

You will also find this deep-dive in TheWhiteBox ‘Actionable Insights’, for future reference if necessary and to ask me any questions you consider.

Although all my sources are cited, I want to highlight one specifically: Semianaysis; it has really helped me go as deep as I needed into the matter.

With NVIDIA finally claiming the most important throne in capitalism as the most valuable company in the world for a few days, we seriously need to discuss AI’s elephant in the room.

It’s not NVIDIA’s insane run;
it’s not the Hyperscalers’ pathetic AI revenues;
and it’s not a potential Chinese blockade in the Taiwan Strait.

Despite all these being obvious issues that must be addressed, the biggest looming threat in AI, even acknowledged by Sam Altman, Bill Gates, and Mark Zuckerberg, is energy constraints.

Today, we are exploring this topic nose-deep.

By the end of this piece, not only will you be much more aware of the biggest threat in AI, but you will also have a much richer understanding of the functioning of some of the biggest corporations in the world (and prominent players in the space) and some numbers and facts that will become a central topic of discussion soon and a key driver in your future investments.

AI Unit Economics

First, I want to give a high-level overview of AI's unit economics to set the stage.

As you probably know by now (please read my deep dive on NVIDIA otherwise), the world’s AI runs on GPUs.

GPUs excel at matrix multiplication, the crucial operator required to run Large Language Models (LLMs). This allows the entire process to be parallelized, ensuring great performance and competitive latency.

However, GPUs are extremely energy-hungry. But how much and how unprepared are we for what’s to come?

A Worrying Comparison

Today, standard web and file storage server nodes require an average of 1 KiloWatt of energy. On the other hand, a state-of-the-art GPU node packed with 8 NVIDIA H100s requires upwards of 11 KW, a ten KW increase.

Although NVIDIA claims the consumption of a node is 10.2 KW maximum, it grows larger when you factor inefficiencies.

These numbers probably don’t say much right now, but they will in a minute.

Scarily, unlike traditional IT equipment, running costs are a measly part of the overall picture due to the insane capital costs of owning GPUs, at an average beyond $20k for NVIDIA H100s, the state-of-the-art.

Naturally, this tenfold increase translates into a much higher average energy consumption per request, with a Google query consuming around 0.3 Wh, and a ChatGPT request around 2.9, according to the Electric Power Research Institute (EPRI).

Concerningly, according to estimates, Google’s AI-powered search, called AI overviews, may increase the average consumption to almost 9 Wh, 30 times that of a standard query.

Long story short, traditional data centers are about to receive many requests 10-30 times more energy-demanding than those they were originally designed for.

But what is the overall impact? Let’s see an example.

Already in the plans of all hyperscalers, the world’s first 100,000 H100 GPU cluster is very close.

Taking SemiAnalysis’ estimate of 80% of actual net cluster usage and assuming a Power Usage Effectiveness (PUE), a measure of how much data center input energy is actually converted into IT processes, of 1.3, we get the following results:

Nº of GPUs: 100,000
Nº of DGX nodes: 100,000 / 8 = 12,500
Watts per node: 11,100 W
Required power: 11,100 × 12,500 = 138.8 MegaWatts
Factoring in efficiency: 138.8 × 0.8 = 111 MW
Factoring in 1.3 PUE: 144 MW
Total yearly consumption = 144 × 24 × 365 = 1,265 GWh/year, or 1.27 TWh
At the average US industrial tariff = 1.265×10⁶ × 0.083 $/KWh = 105 million US dollars/year.

And using US prices is very unfair as they are some of the cheapest. That very same data center in Europe, at the average industrial tariff of $0.184/KWh, would cost a whopping $232.8 million US dollars, more than double.

Again, that’s every year. But how much demand are we expecting globally?

A Cambrian Explosion

In fact, the above numbers could soon be modest, as Jensen Huang, NVIDIA’s CEO, takes for granted that 1GW data centers, almost ten times the one we just discussed, and are already in the pipeline of most hyperscalers.

Such a data center would consume almost 9 TWh, almost the total energy consumption of Kenya (53 million people), and would cost 727.1 million US dollars every year at current US prices (remember, well below global average).

The Hyperscalers are more than prepared, as Microsoft would take a measly 53 hours of free cash flow to rack up that amount of money, signaling how insanely rich these corporations are.

It would cost Microsoft around just 0.6% of its yearly free cash flow (money they can spend at will) to run these absolute unit of data center.

But why do we need these huge data centers? How large will the demand for them grow?

According to the International Energy Agency (IEA), data center demand in the US, the EU, and China is expected to grow to approximately 710 TWh annually by 2026.

For reference, that’s almost as large as France and Italy’s combined energy consumption in 2022 (720 TWh).

Source

These numbers account for AI, crypto and traditional web and enterprise workloads all combined.

Other references, like SemiAnalysis, estimate a world demand for data centers of 96GW for 2026, or 830 TWh.

Source

Not surprisingly, most of this compute is being consolidated among just a few players that, not only have the cash to do so, but have a considerable advantage efficiency-wise:

With scale and cash, they can reduce the PUE we mentioned earlier, achieving much higher efficiencies; this is especially true as most AI data centers have to be built from scratch due to high power demands per server rack (where the GPUs are located), which current data centers aren’t prepared for.
They also consolidate most AI engineering talent, which has extensive experience running the highly distributed workloads that AI requires, effectively maximizing the MFU (Model Flop Utilization).

For reference, NVIDIA’s Nemotron model we discussed yesterday didn’t even reach 50% MFU utilization, meaning even NVIDIA can’t maximize GPU utilization for LLM workloads.

Nonetheless, the hyperscaler’s superiority will widen, as they are all accumulating huge amounts of compute and are projected to continue doing so for the foreseeable future, with the top four representing up to 80% of global demand:

Source

If those numbers are accurate, by 2026, almost half of the world’s demand for AI workloads (40GW) will be delivered by only four players (20.8 GW).

Through their investments in OpenAI, Anthropic, or Mistral, they also own the model layer. If the trend continues, the words AI and US Big Tech companies will be interchangeable.

But this begs the question: Will they be able to do this? And I’m not referring to regulatory pressures or antitrust laws: I'm referring to energy constraints.

Lead Times and Bureaucracy

There’s a growing risk that the speed at which demand for AI services will grow will considerably outpace the speed at which countries can accelerate energy generation and transmission.

Building Power

Sending power to a GPU rack is easier said than done, as you need electricity generators and transmission lines.

The problem is that AI will not only grow in demand, but cryptocurrencies and general household energy consumption will also do so. In particular, while regions like Europe have declined their energy usage (probably due to high prices), emerging markets are showing strong growth:

Source

Importantly, the temptation to increase generation in every way possible could induce regions to fall behind their carbon emission promises.

This, considering that emerging market economies have a much dirtier power mix, could cause AI to become one of the biggest pollutants in the world, as economies like China or India desperately increase coal consumption to meet demand.

While 61% of China’s energy consumption comes from coal, India’s fossil fuels account for almost 80% of its power mix.

Source

All things considered, all roads seem to be leading to the same place: nuclear energy.

Luckily for the environment, both China and India are leading the charge with regard to new nuclear capacity, to the point that Asia is expected to surpass North America in installed capacity by 2026.

However, AI’s approach will be different from what you might expect. But more on that in a minute.

As before, we need to talk about the biggest issue: the grid.

Unreachable power

As Elon Musk pointed out jokingly, “Transformers need transformers to run.”

The former refers to the components that step down the voltage coming out of the power plants to one that can be ingested into data centers and homes, and the latter refers to the underlying architecture behind ChatGPT or Gemini, the Transformer architecture.

The problem is that besides being unappealing to invest in and build, transformers are mostly custom-made and take time to build.

Nonetheless, while lead times for electricity generation and data center buildout are a maximum of 24 months, deploying transmission lines can take up to a decade or more due to bureaucracy and disputes.

In the US specifically, approval processes alone can take up to 4 years or more as the grid lines required to build more wind and solar power from remote areas to where demand is go over multiple states and local companies that purposely slow the process.

Nevertheless, as recently as August 2023, the US had 1,350 GW of clean energy capacity ready to be deployed… but awaiting approval.

Source

This huge amount of power could alleviate most of the problems we are discussing today, but will take years to solve, especially at the grid level.

Nevertheless, the picture from the present and the required grids is daunting, as conceived by the National Renewable Energy Laboratory and echoed by the New York Times:

Source

Moreover, this study was meant to address how to reach 100% renewable energy by 2035 and before the AI CAPEX boom, so it’s safe to say that this picture undermines the true extent of the need.

Luckily, the current US government is trying to make the whole process more agile, even suggesting a possible capacity to overrun state-level blockades if it considers the blocked line of national importance.

In parallel, the US Department of Energy is trying to optimize current grids using sensors and advanced controls, but the impact of these measures has not been made explicit.

But if the situation looks dire in the US, Europe’s position is terrible. Falling energy demand and high prices make the idea of building new power, transmission lines, or data centers more like a funny joke than a serious idea.

All things considered, it’s no wonder that all AI tycoons, from Elon Musk to Atlman to Zuckerberg to Jassy, are taking matters into their own hands. In fact, they are no longer waiting for solutions.

They are taking action.

Subscribe to Full Premium package to read the rest.

Become a paying subscriber of Full Premium package to get access to this post and other subscriber-only content.

Upgrade

Already a paying subscriber? Sign In.

A subscription gets you:

• NO ADS
• An additional insights email on Tuesdays
• Gain access to TheWhiteBox's knowledge base to access four times more content than the free version on markets, cutting-edge research, company deep dives, AI engineering tips, & more