• TheTechOasis
  • Posts
  • Killer Applications Coming to AI Soon Round 2

Killer Applications Coming to AI Soon Round 2

PREMIUM NOTION SITE
Additional Premium Content

  • 🧟‍♀️ The AI That Has Come Back from the Dead. An exciting look at a classical architecture revived thanks to a new twist that might have solved one of the hardest problems in AI: life-long learning. (Only Full Premium)

  • 🤩 Trend of the week: Grokking (with additional technical insights on how it works). (Full and Basic Premium)

MARKET-SAVING DEMAND
Killer AI Applications Coming Soon, Round 2

Last week, we saw the first round of killer applications. But today, things get serious. We are looking into two killer applications tackling serious problems in trillion-dollar industries.

One will dramatically pop one of the largest bubbles in existence today (I’m not referring to the AI bubble), changing the future of upcoming generations, and the other could put into question the jobs of thousands, if not millions, of people.

I’m referring to the corporate job AI tycoons are betting will be the first to fall. I will also provide an easy way to know if that’s you, how to prevent it, and how to assess other job types that could be impacted.

Naturally, I will point out which companies are most likely to eat the cake in each case and who will get their cake eaten, with one company uniquely positioned for success.

But first of all, I’ll prove how the industry is lying to you regarding the true intelligence of current AI, a no-hype filter of the crude reality of the industry today, and explain why this is finally about to change.

The Conquest of Reasoning

The last two years of the AI industry can be summarized in one word: compression.

Glorified databases

With the industry ‘stuck’ at the million ExaFLOP range of compute used to train GPT-4 back in 2022 (it was released in March 2023 but was trained the previous year), with Llama 3.1 405B only consuming 1.5 times more compute, the entire industry has been on the ‘GPT-4 intelligence’ level for two years.

Whether this is due to a lack of enough GPU install base or simply because we are hitting an intelligence wall, the reality is that the industry, especially in 2024, has focused more on efficiency.

In other words, we haven’t made AI more ‘intelligent’ (in fact, as we saw in a previous analysis we did, they aren’t that smart at all), but ‘greater in value per pound of weight’ (no pun intended).

Cynically, one could make a strong case that all AI has conquered in the last two years is memorization. In fact, ChatGPT can be reduced to a glorified database you can chat with (and, more often than not, be deceived by).

But this doesn’t mean we have wasted our time. This was a necessary step in our path to solving the real problem that AI needs to solve (and is finally solving):

It’s poor intelligence.

On the Measure of Intelligence

If you’ll excuse my boldness, after years of reading research of all types, I have a pretty nice idea of how AI will conquer human intelligence and how we can categorize current models in the ‘intelligence spectrum.’

There are two levels of ‘intelligence’ AI must conquer:

  1. Compression = Memorization + Regularization:

    1. Memorization: Humans have a memory to recall important facts or experiences, shaping our expectations and behavior. (i.e., whether I’ve tested it in the past or heard about it, my memory tells me that punching a wall will hurt and, thus, it’s intelligent to avoid it.)

    2. Local generalization (implicit reasoning): Humans, based on their memory, can also reason over it. (i.e., I know how cats look, so if I stumble across a new cat breed, I’ll still infer it’s a cat.)

  2. Out-of-distribution generalization (explicit reasoning): Humans, despite not having any experience or memory on a specific matter, can still find a way to solve a problem they’ve never encountered before. (i.e., Grigori Perelman solved the Poincaré Conjecture, a century-old problem in mathematics, proving that every simply connected, closed 3D space is like a 3D sphere).

For proper research on the matter, I recommend ‘On the Measure of Intelligence,’ by François Chollet.

However, today, AI has only conquered compression, and I would argue that it’s a ‘weak conquer’ as models today still confabulate far more than they should, and as we saw last Thursday, their capacity to reason over their knowledge is extremely poor (most of the time, they aren’t reasoning, but memorizing the reasoning processes).

As for explicit reasoning, the most coveted type of intelligence, it’s entirely missing in current AI models, as famous benchmarks like ARC-AGI have proven, benchmarks were models barely go beyond 0%.

One great way to summarize this particular type of intelligence is through Jean Piaget’s definition:

❝

“Intelligence is what you use when you don't know what to do: when neither innateness nor learning has prepared you for the particular situation.”

Jean Piaget

Despite this provable reality, we’ve been told that AIs will be ‘on Ph.D.-level intelligence by 2026’ or ‘human-level AI in two years.’ These types of statements fall into two categories depending on who’s saying them:

  • Gaslighting: {Insert start-up founder name} has billions of dollars on the line, and needs to convince society that her/his AI is super intelligent.

  • Ignorance show-off: I am confusing being ‘textbook smart’ with ‘smart.’ LLMs have memorized the entire public Internet. Reciting Shakespeare's poems to the coma doesn’t make you an expert poet.

You don’t have to believe, just take a look at the example we analyzed on Thursday. Is this thing “as smart as a teenager?”

Luckily, many industry researchers are finally openly discussing this fallacy, and the discussion has finally centered around solving this problem.

So, how are we going to solve it?

A New Breed of Models

Over the next months, we are going to see huge progress in three directions that will improve reasoning capabilities considerably:

  1. Data Augmentation with Synthetic Data

  2. Over-extended training

  3. Search algorithms

The first one is obvious: increasing the quality of the data. Reasoning data (data where you clearly see the thought process) is a rare sight, and thus, researchers are creating the data themselves.

Source: OpenAI

Moving on, we have also realized that, besides creating ‘reasoning data,’ we don’t necessarily need more of it but should train on it for longer.

As we saw on Thursday through grokking (extreme over-training) and as Meta hinted several times, training models for longer improves local generalization (implicit reasoning, or reasoning over “known knowns”).

But what do I mean by that?

As models are all designed according to Occam’s Razor (through Machine Learning regularization techniques like weight decay or decreasing learning rate), they are incentivized to find simpler solutions to problems.

“Entia non sunt multiplicanda praeter necessitatem,” translating to “Don't complicate the description of something beyond what is necessary to explain it.“

The simpler the solution, the better. That’s Occam’s Razor.

Therefore, running a model through the data several times naturally ‘prunes’ its reasoning circuits.

For instance, if I memorize how a hairy cat looks, I will remember all the minor details of its body (including ones that other cats won’t have, like hair). But if I understand what a cat is, I just need 2 or 3 cues (whiskers, slit-shaped eyes, and tail, for instance).

Naturally, the second approach is simpler and succinct, yielding better generalization behavior (less likely to be fooled by particular traits of a given cat that could make me mislabel a Sphynx as not a cat just because it’s hairless).

Therefore, improvements in data augmentation and over-extended training will largely influence the next generation of AI models, which is the perfect summary of how Meta trained Llama 3.1.

And considering that Mark Zuckerberg recently stated that Llama 4 will require “ten times more compute than Llama 3.” This is around 15 times the compute that OpenAI used for GPT-4.

Thus, knowing that Meta is quite obsessed with allowing frontier models to run on a single 8-GPU node, it’s pretty clear they will explore over-training rather than simply making the model arbitrarily bigger.

Finally, we have the third method, combining search algorithms with LLMs, summarized as letting the model generate multiple solutions to a problem and choosing the best one.

Commonly referred to as ‘long-inference models,’ I’ve discussed it several times in detail, and I expect other competitors’ models (GPT-5, Claude 4, or Gemini 3.0) to be members of these model family.

All things considered, knowing that we have finally stopped pretending to have high-intelligence models and, coupled with the realization that we now have a nice intuition on how to build them through the three techniques mentioned above, I’m convinced that the new generation (GPT-5, Llama 4, etc.) will see the emergence of actually smart models, not those pretending to be.

But improved reasoning isn’t the only leap we are about to see.

Agents and Emotion

We are seeing the emergence of emotionally aware, collaborative, and adaptable agents.

Multimodality has gone an extra step.

A handful of minutes. That’s all I need to convince you that truly multimodal models are beautiful and extremely powerful. This video (the link goes straight to the interesting parts) will show you what OpenAI’s new Advanced audio mode is capable of.

It will blow your mind.

With these capabilities, AIs will feel much more human, and killer applications like AI companions will become a new experience as AI becomes capable of understanding and imitating emotions.

But in terms of actual utility, few features are more exciting than the recent focus on improved function calling.

Efficient Tool use

Models are becoming much more capable of structuring their outputs (generating data in a specific, fixed way). This may seem irrelevant, but it’s a crucial development for agents.

Agents are AIs that can interact with third-party software. Besides some bold exceptions like Rabbit (borderline scam) and Adept (acquired by Amazon), this interaction mainly occurs through APIs.

In a nutshell, APIs allow the AI to perform actions on other software without using its user interface but by making structured ‘calls.’ This is amazing as interfaces change continuously, while APIs don’t.

Conversely, APIs are rigid, requiring a specific and often ad hoc structure. As you may foresee, LLMs do not fare well in these situations, making syntax or schema errors that, while subtle, make the process fail.

Luckily, models are becoming much better at this procedure (referred to as function-calling in the AI world), to the point that OpenAI just announced ‘Structured Outputs,’ guaranteeing perfect structure on demand, this very week.

This is so huge that Sam Altman himself published the news.

Long story short, smarter and emotionally-aware reasoners with efficient tool use are coming soon. And knowing that, what are the trillion-dollar applications unlocked by these new capabilities?

Deflating the Education Bubble

According to Morgan Stanley, the global education industry is worth more than $6 trillion today and could be worth $8 trillion by 2030.

But are students better taught than they were a decade ago? Most probably not. In fact, I believe that the education system should be turned upside down.

AI Deflationary Pressures

Despite not preparing our future generations as they shouldcollege tuition has grown twice as fast (68%) in the past twenty years than the Consumer Price Index (CPI) (39%). College tuition costs have even outpaced the increase in median Home prices.

With top business schools looking more like private hedge funds with education businesses on the side, and with all rich countries seeing a steady increase in inequality, education is becoming a rich people’s game.

Richest 10% income share of the economy. Source

Knowing this, AI Tutors are the perfect killer application to deflate the education bubble.

Ever Helpful

With real examples like Khanmigo, a GPT-4-powered AI tutor by Khan Academy, AI tutors offer the perfect mix of economic incentives and technology.

  • For starters, it’s an entirely conversational use case, ideal for LLMs. And if there’s one thing LLMs excel at, it's idea generation. Thus, as the kid engages, the agent can generate millions of examples on a specific topic so the student can learn and internalize topics.

  • They are also great at summarizing and simplifying things with examples and illustrations, allowing kids to see things in a simpler light.

  • They are eternally patient, not being deterred by the student’s difficulties. Also, now they can transmit emotion, ensuring that the tutor’s voice is comforting, judgeless, and encouraging.

  • Moreover, their improved function calling capabilities allow these models to call powerful tools like Wolfram Alpha or Napkin AI to perform calculations with no errors (thereby reducing hallucinations), and in a more appealing and visual form.

All this, coupled with the insane unit economics these models already have with models like GPT-4o mini and the steady rise of hybrid architectures that will unlock infinite-length context windows we discussed last week, will allow the AI Tutor to have a life-long interaction with its user, setting the stage for a Cambrian explosion of AI tutors.

Winners & Losers

Regarding losers, it’s pretty straightforward. AI will have a commoditizing effect in every industry it disrupts. Thus, unless you have a powerful brand (Harvard in the US or Eton College in the UK), you will have to drop costs, as many avoid college tuition altogether.

As for the winners, it’s a similar outcome to AI companions. AI Tutors require advanced intelligence, both implicit (it must reason over the student’s solutions and mistakes) and explicit (it must be able to search for variations to a problem), which will require the most powerful models, which all run in the cloud.

Therefore, Hyperscalers have a real advantage. However, the biggest winner here will be OpenAI, as I feel they have been grooming this precise scenario for quite some time.

For starters, Sam Altman has openly discussed AI Tutors as one of the most powerful applications. They’ve also established partnerships with educational start-ups like Khan Academy and have a history of LLM integration with math tools like Wolfram Alpha.

They have a native coding interpreter on the ChatGPT app so that the tutor can write and execute code, which is essential for this use case, and, as mentioned earlier, they are putting a lot of effort into structured responses, key for more ad hoc integration with other educational tools.

Importantly, education it’s a historical opportunity for OpenAI. It’s a huge economic use case, a great story to tell (OpenAI democratizes education for the poor), and the literal embodiment of a use case built for conversational assistants.

And the most critical differentiator of all. It’s a use case that requires native multimodality. Unlike other rivals (Claude 3.5 Sonnet or Llama 3.1), GPT-4o is natively multimodal-in/multimodal-out, something not even Google Gemini can say.

What this means is that, unlike the rest, GPT-4o is not an LLM, meaning it doesn’t require everything to be translated into text and back whenever it receives audio or video.

GPT-4o doesn’t care, as just like humans, an audio of a bark and a text describing a dog barking are the same thing: a dog barking.

But if democratizing education is worth fighting for, this next use case is as spicy as they get. AI is coming for a very particular corporate job, and it’s coming fast.

Subscribe to Full Premium package to read the rest.

Become a paying subscriber of Full Premium package to get access to this post and other subscriber-only content.

Already a paying subscriber? Sign In

A subscription gets you:
NO ADS
Get high-signal insights to the great ideas that are coming to our world
Understand and leverage the most complex AI concepts in the world in a language your grandad would understand
Full access to all TheWhiteBox content on markets, cutting-edge research, company and product deep dives, & more.