Killer Applications Coming to AI Soon

PREMIUM NOTION SITE
Additional Premium Content

  • 😖 $210 billion on the line. Will the AI Search Business Work? A thorough analysis of how AI-generated search will work and the estimated margins.

  • 🫢 The Fascinating AI Future of the Gaming Industry. Answering a Premium subscriber question on how AI is affecting the industry.

MARKET-SAVING DEMAND
Killer Applications Coming to AI

With markets showing serious signs that investors are concerned (with any minor inconvenient causing a massive sell-off of big tech stocks), AI faces a real challenge:

How do we justify the hype? And the answer… is killer applications.

For that reason, today, we are looking at what research and numbers point to as the first-to-come super applications of AI, a list packed with surprises, including what I consider the first real job in which AI will retire humanity and winners and losers in each category.

Let’s dive in.

Current Technological Limitations

To justify which killer applications will come first, we need to understand the current limitations that have prevented their emergence and the technologies that will soon allow them to finally emerge.

Fighting hallucinations

We first have the most common problem: when the model confabulates reality or facts, a term commonly known as ‘hallucination.’

The term ‘hallucination’ is wrong. Confabulate is more accurate. For a detailed description of this, read my Medium blog article.

Worse off, the model is tremendously eloquent, almost arrogant, increasing the chances the user gets deceived. But here’s the thing: this is all they do.

On first principles, every single prediction is a ‘hallucination.’ But why?

Although everyone defines their inner workings as ‘predicting the next word to a sequence,’ that is, in fact, a poor way of describing how they work.

In reality, they predict the probability distribution over the next word, a fancy way of saying that the model ranks all the words it knows according to how adequate they are as continuations to the sequence.

But here’s where things get tricky. They don’t always pick the most likely one (‘Playground’ in the image above), but they randomly choose one of the most probable.

While this enhances the model’s creativity, it increases the likelihood of choosing a word that distorts the truth. In other words, they don’t confabulate only when they get things wrong; it’s all they do.

Needless to say, hallucinations are the biggest issue with Large Language Models (LLMs) today. But why do I think this is about to change? 

Augmented LLMs.

Although I covered them extensively in a previous newsletter issue, the idea is that while we can retain the model’s creativity (aka saying the same thing in various ways, which leads to more frequent hallucinations), for specific instances like facts, we can use memory adapters to collapse the distribution into the exact word.

But what does that mean?

Using the example below, a standard LLM (middle) would consider ‘1981’ and ‘1970’ as semantically probable next continuations. However, while both are semantically valid, only one is true.

But as I was saying, the LLM doesn’t care, as it isn’t optimized to seek the truth but provide reasonable continuations, which can be true… or not.

However, by adding the memory adapter, the model is forced to choose ‘1981’ (right below).

Source: Lamini.ai

Long story short, with memory adapters, I predict we will soon see much more accurate LLMs across the board. But nothing has the industry more excited than our next leap, which will make the models much smarter.

Data Augmentation to Battle Poor Reasoning

Also previously covered in detail, researchers are figuring out ways to obtain new data that allows models to ‘think better’ as, unbeknownst to many, current frontier models are terrible reasoners.

And the biggest reason is data.

Unlike the public Internet—the main training data source—which is mainly absent of reasoning procedures (people state their conclusion, not the thought process they went through), we are now using other LLMs to generate multi-step reasoning data like the example below, helping to teach models ‘how to think’:

Source: OpenAI

Although purists will argue that this still isn’t real reasoning, if models learn to approach problem-solving in a multi-step manner, we will have much more powerful models. Nevertheless, recent examples like Llama 3.1 405B (page 23) already use this technique with great success.

This, combined with larger sizes (GPT-5, Grok-3), will enable a new range of models with enhanced reasoning capabilities.

Neurosymbolic systems like AlphaGeometry 2, talked about on Thursday, are also a promising avenue of enhanced reasoning.

However, it’s the next set of efficiencies that could really move the needle for AI.

Dealing with Huge Costs

With every passing month, we learn to train more compressed models, with a state-of-the-art like Llama 3.1 405B capable of running single-node (in ‘just’ 8 H100 GPUs at FP8 precision, ensuring high bandwidth between each GPU to guarantee low latency).

In other words, we are getting the same results while making our models smaller and, thus, cheaper to run.

With OpenAI’s GPT-4o mini being the latest example, whose unit economics we covered recently, all applications we will see today will generate huge demand and, thus, require great efficiency.

Crucially, we are seeing the emergence of hybrid architectures departing from the current quadratic complexity. The reason is that models like ChatGPT or Llama 3.1 are Transformers, which are very expensive for two reasons:

  • They are very large (in the TeraByte range for frontier AI models), requiring multiple GPUs to work together to host the model and requiring an ungodly amount of FLOPs (amount of computations).

  • Their cache, known as the KV Cache, is used to avoid redundant computation. However, it scales quadratically to the input sequence size (doubling the sequence implies a 4-fold increase, tripling a 9-fold increase), exploding the memory requirements and, hence, requiring a much larger number of GPUs.

In fact, when the model has to process very large sequences, the cache becomes the main memory bottleneck, as we can see below:

Therefore, for quite some time, new, less demanding architectures have emerged, but none of them has really posed a challenge to the status quo despite Transformer’s inefficient nature.

However, when combined with Transformers, creating a hybrid, we can create models where the less demanding architecture does the heavy lifting while the Transformer ensures maximum quality.

To dive deeper into how hybrids work, read my Microsoft model Samba analysis.

Additionally, complementary approaches to increase efficiency also include:

  • Mixture-of-Experts, where a model is partitioned into smaller models to reduce the computation required per prediction. We recently covered Google’s breakthrough, where they achieved a million-expert model.

  • Speculative decoding, rumored to be used by GPT-4, involves the model predicting several words per prediction instead of just one, reducing FLOPs.

However, none of these two tackle the biggest problem in Transformers, memory, which is why I’m so bullish on hybrids.

In summary, less tendency to confabulate, coupled with better training data and higher efficiency, all set the stage perfectly for a new wave of applications that should, finally, enable AI to live up to its name, use cases ranging from potentially creating a trillion dollars in savings to directly eliminating the need for humans in the job altogether.

AI Companions

Although you have probably heard about this use case, you surely aren’t aware of how prominent they are becoming. They have something irresistible, even scary, about them.

For those unaware, AI companions are models meant to be their human’s best friend, trained to be patient, attentive, encouraging, and joyful in conversation with the user.

And with OpenAI readying its voice feature, they are, quite literally, the embodiment of the robot in the movie ‘Her.’

Why are they a killer app?

You may think this is still science fiction, but they are being used already at scale.

According to venture capital firm a16z, users of the character.ai chatbots, a start-up that focuses on chatbots that imitate historical figures or ones more tailored to your particular needs, use them on average for 2 hours daily.

Interestingly, the user engagement is off the charts, being at least an order of magnitude better than any other conversational AI use case, including ChatGPT (General Assistant):

Source: a16z

But a16z isn’t the only one claiming this, as Sequoia Capital had a similar take last year, with Character.ai having almost three times the amount of daily active users than ChatGPT proportionally to their monthly user base:

Source: Sequoia

For whatever reason, people love talking to machines. Nonetheless, there are real examples of people entering relationships with their Replika.ai friends, which can generate pretty dangerous situations.

What Tech will enable them at scale?

AI companions have experienced failure, too.

Inflection, acquihired by Microsoft (a deal currently being investigated), was born out of this precise idea: create an emotionally intelligent AI they called Pi (you can try it using the link).

The problem? Besides not being particularly smart, it had a really small context window (the number of words it could process at any given time).

Another company that rushed this vision too quickly was Meta, which recently scrapped its ‘celebrity chatbots.’

So what is different this time?

With examples like hybrid architectures, we will soon see an explosion of context window lengths, even nearing infinite length. Thus, due to their uncanny attractiveness and infinite memory, companions will not only be probable but a matter of time, as they will be capable of remembering every interaction with you.

And who are the key winners and losers?

As these models require great intelligence and long sequences, they were born to be run on the cloud; there’s no chance that the models running these companions will be stored in our smartphones or computers.

Whenever you read about AI in the cloud, always remember that the winners of such use cases are the Hyperscalers (Amazon, Microsoft, Google, Meta, xAI, and their Chinese counterparts like Alibaba), aka the companies that are investing a decent chunk of their hard-earned cash flows to build the infrastructure to run Generative AI at scale.

There will probably be room for sector-specific start-ups like Character.ai with proven traction, but the question will be whether they can meet demand, which could lure them into a never-ending spree of investment rounds.

Nonetheless, this company is already in talks with GPU-rich companies like Meta or Elon Musk’s xAI. And when that happens, it’s clear what the result will be: an acquihire like Inflection, or more recently, Adept (by Amazon).

Thus, all comes back to Big Tech. And what about losers?

In this particular case, I fear that society will be the biggest loser in the proliferation of these technologies. The US Surgeon General has been very outspoken about the increasing loneliness tendencies of many people, especially younger generations, defining loneliness as damaging to health as smoking 15 cigarettes a day.

While some claim this technology could precisely help reduce loneliness, I don’t think a soulless computer can fill the void for an animal that craves tribal connection as much as humans do.

Now, moving on, we have the use case for what I predict will be the first human job that disappears because of AI.

Subscribe to Full Premium package to read the rest.

Become a paying subscriber of Full Premium package to get access to this post and other subscriber-only content.

Already a paying subscriber? Sign In.

A subscription gets you:

  • • NO ADS
  • • An additional insights email on Tuesdays
  • • Gain access to TheWhiteBox's knowledge base to access four times more content than the free version on markets, cutting-edge research, company deep dives, AI engineering tips, & more