Big Tech's Big Plan Undercovered

MARKETS
Big Tech’s Big Plan Undercovered

You’ve surely seen the numbers. The largest tech companies in the world are all investing billions in AI and will invest a combined multiple trillions over this decade.

But what you haven’t seen is the ‘why.’ Journalists will tell you, ‘It is to build AGI.’ However, that only scratches the surface. I mean, what does that even mean?

And what they can’t even articulate is why they need to invest so much money and, crucially, what ‘thing’ they are trying to build. Today, we are answering just that by diving into:

  • The radical religion installed in Silicon Valley,

  • The unprecedented phenomenon of The Great Convergence among Big Tech companies,

  • A detailed and intuitive grounding of the ambitions of mega-corporations to build supercomputers with more compute power than all humankind (yes, I’m being serious).

  • The remaining challenges that could turn GPUs into glorified, overpriced calculators and the AI industry into the biggest money waste in the history of capitalism,

  • And we’ll even have time to proclaim the stock market's ‘Renaissance’ of a forgotten industry that could rise from the ashes thanks to AI.

If you’re an investor, you will now finally understand what companies representing a great part of your holdings are doing with their spare money and why, as well as possible new market opportunities. And if you’re an executive, strategist, or analyst, you will gain the capacity to decide whether this is pure madness or not.

While this article does not allow for hype, there is certainly much to take away from it. Let’s dive in.

Religion or Virus?

I’ve discussed a couple of times how AI, at least in the Silicon Valley area, is starting to look more like a religion than a science. And here, we are worshipping computation.

I’ve read countless blogs from AI start-ups and research papers, and one reference that continues to pop up extensively is The Bitter Lesson by Rich Sutton.

This very short piece boils down to one conclusion: Every advance in AI has been driven by one thing only: computation directed toward learning and search. It’s not algorithmic breakthroughs; it’s not data; what determines the next step-function improvement is more computation being made available at scale.

If that sounds familiar, that’s because, as we discussed last week, OpenAI's o1 models embody this vision.

Thus, what are companies with deep pockets doing? Well, pursuing that vision. But let me tell you, your brain can’t even fathom the extremes to which they will go to attain this.

The Hardware Lottery

As you probably guessed, most of this capital (approximately 50%) goes to GPUs. And despite the huge numbers (around $50 billion a quarter) the trend is only accelerating.

For instance, here are the staggering numbers of some of NVIDIA’s latest purchase orders:

How much $ is that?

Assuming $25,000 per H100, Meta’s investment alone is around $8.75 billion. As for the rest, they are buying NVIDIA’s new GPUs, Blackwell, retailing at prices between $1.8 to $3 million (each GB200 has 72 GPUs, the difference is how many you stack on a single rack, 36 or 72).

Assuming 80% of those orders are in the GB200 NVL36 form factor (2 racks), which is cheaper, (something that seems to be the case based on preliminary insights) that gives an average price of $2.04 million per 72 GPUs (or around $30k a piece, which sounds about right based on B100 retail prices).

Assuming Microsoft’s purchase order to be around 800,000 chips, that would mean these Blackwell POs combined would amount to a total amount of $43 billion.

That means we have at least (probably more) $52 billion in NVIDIA future revenues in that table alone.

But are these investments based on convincing demand that justifies them? Well, no.

We already discussed the AI bubble recently; this year’s total Generative AI revenues (besides NVIDIA) will be around $30-$50 billion, which is between 12 and 20 times smaller than the global AI CapEx hole in these companies' balance sheets, estimated by Sequoia to be around $600 billion.

And the short-term prospects don’t show better:

But despite the odds, companies are decisively doubling down on this bet. The rationale? “If our current models aren’t good enough, more compute will do the trick.”

But at this point, is this science, blind faith in the ‘compute religion,’ or a tremendously infectious disease called FOMO (fear of missing out) that has spread across Silicon Valley?

To me, it’s the latter. You can tell they are doubtful (you could tell by the words of Sundar or Zuck), but they all think seem to agree this is the next multi-trillion business, and no one wants to be left out.

The fear of missing out is so extreme that these CEOs are approving building ‘one-off’ data centers. But what does that mean?

In the words of Sequoia partner David Cahn, “Nobody is going to train a frontier model in the same data center twice because, by the time you’ve trained it, the GPUs will be outdated.” Billion-dollar one-off investments. What could go wrong?

Which leads us to The Great Convergence.

The Great Convergence

Despite having totally different business models today, the reality is that all Big Tech companies are converging into the same company; they are all transitioning into becoming AI infrastructure companies.

Even NVIDIA, the leading hardware company, is venturing into the AI-serving arena through NIMs, its inference microservice product line that allows you to run AI models in preoptimized containers that NVIDIA provides.

In other words, 30-ish% of the S&P500’s valuation is a handful of companies all converging into the same business. But what proof do we have of this?

FLOPs, GigaWatts, & Governors

We are on a direct trajectory to a world filled with GigaWatt data centers, a world where, quite frankly, things stop making sense.

Giga-Scale Data Centers

According to SemiAnalysis, the demand for AI data centers could reach around 40GW as soon as 2026. In other words, the AI industry will require 40GW of power to operate, which accounts for 350.4 TWh of consumed energy a year. For reference, that number would land AI as the 11th highest-consuming country in the world.

That’s a lot of GW-scale data centers. However, to this date, our most significant one is Elon Musk’s Colossus, at 100,000 H100 equivalents. It requires an average of ‘just’ 140MW (which is still enough to serve electricity to 116,800 homes at average US consumption). But a 1GW data center would require the same power as the entire city of San Francisco, which is an entirely different story.

And even that is small compared to the numbers we are delving into today. But first, how will Hyperscalers make this vision come to fruition?

In summary, we are just starting to take The Bitter Lesson to the extreme. Which leads us to the great question: What world are we moving toward, and is this sustainable?

From One Scaling Law to Two

The advent of models like OpenAI’s o1 family will only worsen matters regarding power requirements.

And what’s the point I’m trying to make?

In layman’s terms, our power and IT server grids are about to be flooded with 30 to multiple hundred times more energy-intensive requests than initially designed. And data center providers are already suffering the pains of this ‘new world.’

For instance, data centers usually allocate 20kW at most per server rack. With Blackwell GPUs, that number increases to 60kW or even 120kW, depending on the form factor, which is totally unprecedented.

These orders of magnitude increase in energy requirements is a huge risk on itself. I highly recommend my other article on that topic.

Source: OpenAI

However, we can’t forget about training scaling. While it’s unclear, we should expect much larger models over time. But models aren’t getting bigger for no apparent reason, but to accommodate the insane compute throughputs Giga-scale data centers will offer.

And while journalists will give you the number and compare it to a city, today, you will learn to make real sense of these numbers. What humongous beast could emerge from a 10 GW data center? And what would the total costs be?

Let’s do a simple exercise.

The Models of the Future

First, we need to make a set of assumptions:

  • The model can’t be trained for longer than six months, as suggested by Epoch AI’s research (for longer, it becomes obsolete by the time it’s finished)

  • Based on that research, we also assume training data to be around 1000 trillion tokens by 2030, a mixture of text, images, video, and licensed private data, a dataset 66 times larger than the one used to train Llama 3.1.

  • We’ll assume future models will be trained at FP8 precision (we’ll make sense of this in a minute).

  • We assume models will continue to be Transformers. Thus, the total budget to train a model is dictated by the following formula, introduced by OpenAI: Total Training = 6 x {Non-embedding parameters} x Training tokens (Nº of Batches x Batch size in distributed settings)

Now, how much processing power does a 10GW data center, a beast requiring more electrical power than the entire country of Finland, have?

As we are going to deal with huge numbers, I’ll use the US system (short scale). In the short scale, every new “-illion” adds three more zeros, while in the long scale, it adds six.

Using H100 equivalents, the best available GPU (Blackwell is still unproven in the wild), as we fully know how much power H100s consume, 12 kW per 8-GPU server, a 10GW data center would have, brace yourself, 7.2 million NVIDIA H100s, 72 times more GPUs than Elon Musk’s Colossus.

In turn, this would give us a peak processing capacity, knowing each H100 gives 1,979 trillion FLOPs (TeraFLOPs) and using OpenAI’s formula, of {1.979e15 × 7.2e6 = 14.25e21 14.25 ZettaFLOPs, or 14.3 sextillion (21 zeros) mathematical operations per second}.

And how much compute power is that?

In that data center, GPT-4, at 2.1e25 total training FLOPs, would have been trained in {2.1e25 / 14.3e21 = 1468.5 seconds, or 24.5 minutes}. We mustn’t forget that this model was trained just two years ago for 100 full days, showing how exponential training budgets will become.

Now, what if we stretch training on this cluster for six months and apply the rest of the previous assumptions?

In that case, we enter a totally new dimension both in terms of total processing power and, importantly, costs, as we are talking about models with compute budgets larger than entire economies; compute numbers that pull us closer to the total human available compute.

In other words, supercomputers with the compute power as all humankind.

Subscribe to Leaders to read the rest.

Become a paying subscriber of Leaders to get access to this post and other subscriber-only content.

Already a paying subscriber? Sign In

A subscription gets you:
High-signal deep-dives into the most advanced AI in the world in a easy-to-understand language
Additional insights to other cutting-edge research you should be paying attention to
Curiosity-inducing facts and reflections to make you the most interesting person in the room