- TheTechOasis
- Posts
- Killer Applications Coming to AI Soon Round 2
Killer Applications Coming to AI Soon Round 2
PREMIUM NOTION SITE
Additional Premium Content
đ§ââď¸ The AI That Has Come Back from the Dead. An exciting look at a classical architecture revived thanks to a new twist that might have solved one of the hardest problems in AI: life-long learning. (Only Full Premium)
𤊠Trend of the week: Grokking (with additional technical insights on how it works). (Full and Basic Premium)
MARKET-SAVING DEMAND
Killer AI Applications Coming Soon, Round 2
Last week, we saw the first round of killer applications. But today, things get serious. We are looking into two killer applications tackling serious problems in trillion-dollar industries.
One will dramatically pop one of the largest bubbles in existence today (Iâm not referring to the AI bubble), changing the future of upcoming generations, and the other could put into question the jobs of thousands, if not millions, of people.
Iâm referring to the corporate job AI tycoons are betting will be the first to fall. I will also provide an easy way to know if thatâs you, how to prevent it, and how to assess other job types that could be impacted.
Naturally, I will point out which companies are most likely to eat the cake in each case and who will get their cake eaten, with one company uniquely positioned for success.
But first of all, Iâll prove how the industry is lying to you regarding the true intelligence of current AI, a no-hype filter of the crude reality of the industry today, and explain why this is finally about to change.
The Conquest of Reasoning
The last two years of the AI industry can be summarized in one word: compression.
Glorified databases
With the industry âstuckâ at the million ExaFLOP range of compute used to train GPT-4 back in 2022 (it was released in March 2023 but was trained the previous year), with Llama 3.1 405B only consuming 1.5 times more compute, the entire industry has been on the âGPT-4 intelligenceâ level for two years.
Whether this is due to a lack of enough GPU install base or simply because we are hitting an intelligence wall, the reality is that the industry, especially in 2024, has focused more on efficiency.
In other words, we havenât made AI more âintelligentâ (in fact, as we saw in a previous analysis we did, they arenât that smart at all), but âgreater in value per pound of weightâ (no pun intended).
Cynically, one could make a strong case that all AI has conquered in the last two years is memorization. In fact, ChatGPT can be reduced to a glorified database you can chat with (and, more often than not, be deceived by).
But this doesnât mean we have wasted our time. This was a necessary step in our path to solving the real problem that AI needs to solve (and is finally solving):
Itâs poor intelligence.
On the Measure of Intelligence
If youâll excuse my boldness, after years of reading research of all types, I have a pretty nice idea of how AI will conquer human intelligence and how we can categorize current models in the âintelligence spectrum.â
There are two levels of âintelligenceâ AI must conquer:
Compression = Memorization + Regularization:
Memorization: Humans have a memory to recall important facts or experiences, shaping our expectations and behavior. (i.e., whether Iâve tested it in the past or heard about it, my memory tells me that punching a wall will hurt and, thus, itâs intelligent to avoid it.)
Local generalization (implicit reasoning): Humans, based on their memory, can also reason over it. (i.e., I know how cats look, so if I stumble across a new cat breed, Iâll still infer itâs a cat.)
Out-of-distribution generalization (explicit reasoning): Humans, despite not having any experience or memory on a specific matter, can still find a way to solve a problem theyâve never encountered before. (i.e., Grigori Perelman solved the PoincarĂŠ Conjecture, a century-old problem in mathematics, proving that every simply connected, closed 3D space is like a 3D sphere).
For proper research on the matter, I recommend âOn the Measure of Intelligence,â by François Chollet.
However, today, AI has only conquered compression, and I would argue that itâs a âweak conquerâ as models today still confabulate far more than they should, and as we saw last Thursday, their capacity to reason over their knowledge is extremely poor (most of the time, they arenât reasoning, but memorizing the reasoning processes).
As for explicit reasoning, the most coveted type of intelligence, itâs entirely missing in current AI models, as famous benchmarks like ARC-AGI have proven, benchmarks were models barely go beyond 0%.
One great way to summarize this particular type of intelligence is through Jean Piagetâs definition:
âIntelligence is what you use when you don't know what to do: when neither innateness nor learning has prepared you for the particular situation.â
Jean Piaget
Despite this provable reality, weâve been told that AIs will be âon Ph.D.-level intelligence by 2026â or âhuman-level AI in two years.â These types of statements fall into two categories depending on whoâs saying them:
Gaslighting: {Insert start-up founder name} has billions of dollars on the line, and needs to convince society that her/his AI is super intelligent.
Ignorance show-off: I am confusing being âtextbook smartâ with âsmart.â LLMs have memorized the entire public Internet. Reciting Shakespeare's poems to the coma doesnât make you an expert poet.
You donât have to believe, just take a look at the example we analyzed on Thursday. Is this thing âas smart as a teenager?â
Luckily, many industry researchers are finally openly discussing this fallacy, and the discussion has finally centered around solving this problem.
So, how are we going to solve it?
A New Breed of Models
Over the next months, we are going to see huge progress in three directions that will improve reasoning capabilities considerably:
Data Augmentation with Synthetic Data
Over-extended training
Search algorithms
The first one is obvious: increasing the quality of the data. Reasoning data (data where you clearly see the thought process) is a rare sight, and thus, researchers are creating the data themselves.
Source: OpenAI
Moving on, we have also realized that, besides creating âreasoning data,â we donât necessarily need more of it but should train on it for longer.
As we saw on Thursday through grokking (extreme over-training) and as Meta hinted several times, training models for longer improves local generalization (implicit reasoning, or reasoning over âknown knownsâ).
But what do I mean by that?
As models are all designed according to Occamâs Razor (through Machine Learning regularization techniques like weight decay or decreasing learning rate), they are incentivized to find simpler solutions to problems.
âEntia non sunt multiplicanda praeter necessitatem,â translating to âDon't complicate the description of something beyond what is necessary to explain it.â
The simpler the solution, the better. Thatâs Occamâs Razor.
Therefore, running a model through the data several times naturally âprunesâ its reasoning circuits.
For instance, if I memorize how a hairy cat looks, I will remember all the minor details of its body (including ones that other cats wonât have, like hair). But if I understand what a cat is, I just need 2 or 3 cues (whiskers, slit-shaped eyes, and tail, for instance).
Naturally, the second approach is simpler and succinct, yielding better generalization behavior (less likely to be fooled by particular traits of a given cat that could make me mislabel a Sphynx as not a cat just because itâs hairless).
Therefore, improvements in data augmentation and over-extended training will largely influence the next generation of AI models, which is the perfect summary of how Meta trained Llama 3.1.
And considering that Mark Zuckerberg recently stated that Llama 4 will require âten times more compute than Llama 3.â This is around 15 times the compute that OpenAI used for GPT-4.
Thus, knowing that Meta is quite obsessed with allowing frontier models to run on a single 8-GPU node, itâs pretty clear they will explore over-training rather than simply making the model arbitrarily bigger.
Finally, we have the third method, combining search algorithms with LLMs, summarized as letting the model generate multiple solutions to a problem and choosing the best one.
Commonly referred to as âlong-inference models,â Iâve discussed it several times in detail, and I expect other competitorsâ models (GPT-5, Claude 4, or Gemini 3.0) to be members of these model family.
All things considered, knowing that we have finally stopped pretending to have high-intelligence models and, coupled with the realization that we now have a nice intuition on how to build them through the three techniques mentioned above, Iâm convinced that the new generation (GPT-5, Llama 4, etc.) will see the emergence of actually smart models, not those pretending to be.
But improved reasoning isnât the only leap we are about to see.
Agents and Emotion
We are seeing the emergence of emotionally aware, collaborative, and adaptable agents.
Multimodality has gone an extra step.
A handful of minutes. Thatâs all I need to convince you that truly multimodal models are beautiful and extremely powerful. This video (the link goes straight to the interesting parts) will show you what OpenAIâs new Advanced audio mode is capable of.
It will blow your mind.
With these capabilities, AIs will feel much more human, and killer applications like AI companions will become a new experience as AI becomes capable of understanding and imitating emotions.
But in terms of actual utility, few features are more exciting than the recent focus on improved function calling.
Efficient Tool use
Models are becoming much more capable of structuring their outputs (generating data in a specific, fixed way). This may seem irrelevant, but itâs a crucial development for agents.
Agents are AIs that can interact with third-party software. Besides some bold exceptions like Rabbit (borderline scam) and Adept (acquired by Amazon), this interaction mainly occurs through APIs.
In a nutshell, APIs allow the AI to perform actions on other software without using its user interface but by making structured âcalls.â This is amazing as interfaces change continuously, while APIs donât.
Conversely, APIs are rigid, requiring a specific and often ad hoc structure. As you may foresee, LLMs do not fare well in these situations, making syntax or schema errors that, while subtle, make the process fail.
Luckily, models are becoming much better at this procedure (referred to as function-calling in the AI world), to the point that OpenAI just announced âStructured Outputs,â guaranteeing perfect structure on demand, this very week.
This is so huge that Sam Altman himself published the news.
by very popular demand, structured outputs in the API:
â Sam Altman (@sama)
5:56 PM ⢠Aug 6, 2024
Long story short, smarter and emotionally-aware reasoners with efficient tool use are coming soon. And knowing that, what are the trillion-dollar applications unlocked by these new capabilities?
Deflating the Education Bubble
According to Morgan Stanley, the global education industry is worth more than $6 trillion today and could be worth $8 trillion by 2030.
But are students better taught than they were a decade ago? Most probably not. In fact, I believe that the education system should be turned upside down.
AI Deflationary Pressures
Despite not preparing our future generations as they should, college tuition has grown twice as fast (68%) in the past twenty years than the Consumer Price Index (CPI) (39%). College tuition costs have even outpaced the increase in median Home prices.
With top business schools looking more like private hedge funds with education businesses on the side, and with all rich countries seeing a steady increase in inequality, education is becoming a rich peopleâs game.
Richest 10% income share of the economy. Source
Knowing this, AI Tutors are the perfect killer application to deflate the education bubble.
Ever Helpful
With real examples like Khanmigo, a GPT-4-powered AI tutor by Khan Academy, AI tutors offer the perfect mix of economic incentives and technology.
For starters, itâs an entirely conversational use case, ideal for LLMs. And if thereâs one thing LLMs excel at, it's idea generation. Thus, as the kid engages, the agent can generate millions of examples on a specific topic so the student can learn and internalize topics.
They are also great at summarizing and simplifying things with examples and illustrations, allowing kids to see things in a simpler light.
They are eternally patient, not being deterred by the studentâs difficulties. Also, now they can transmit emotion, ensuring that the tutorâs voice is comforting, judgeless, and encouraging.
Moreover, their improved function calling capabilities allow these models to call powerful tools like Wolfram Alpha or Napkin AI to perform calculations with no errors (thereby reducing hallucinations), and in a more appealing and visual form.
All this, coupled with the insane unit economics these models already have with models like GPT-4o mini and the steady rise of hybrid architectures that will unlock infinite-length context windows we discussed last week, will allow the AI Tutor to have a life-long interaction with its user, setting the stage for a Cambrian explosion of AI tutors.
Winners & Losers
Regarding losers, itâs pretty straightforward. AI will have a commoditizing effect in every industry it disrupts. Thus, unless you have a powerful brand (Harvard in the US or Eton College in the UK), you will have to drop costs, as many avoid college tuition altogether.
As for the winners, itâs a similar outcome to AI companions. AI Tutors require advanced intelligence, both implicit (it must reason over the studentâs solutions and mistakes) and explicit (it must be able to search for variations to a problem), which will require the most powerful models, which all run in the cloud.
Therefore, Hyperscalers have a real advantage. However, the biggest winner here will be OpenAI, as I feel they have been grooming this precise scenario for quite some time.
For starters, Sam Altman has openly discussed AI Tutors as one of the most powerful applications. Theyâve also established partnerships with educational start-ups like Khan Academy and have a history of LLM integration with math tools like Wolfram Alpha.
They have a native coding interpreter on the ChatGPT app so that the tutor can write and execute code, which is essential for this use case, and, as mentioned earlier, they are putting a lot of effort into structured responses, key for more ad hoc integration with other educational tools.
Importantly, education itâs a historical opportunity for OpenAI. Itâs a huge economic use case, a great story to tell (OpenAI democratizes education for the poor), and the literal embodiment of a use case built for conversational assistants.
And the most critical differentiator of all. Itâs a use case that requires native multimodality. Unlike other rivals (Claude 3.5 Sonnet or Llama 3.1), GPT-4o is natively multimodal-in/multimodal-out, something not even Google Gemini can say.
What this means is that, unlike the rest, GPT-4o is not an LLM, meaning it doesnât require everything to be translated into text and back whenever it receives audio or video.
GPT-4o doesnât care, as just like humans, an audio of a bark and a text describing a dog barking are the same thing: a dog barking.
But if democratizing education is worth fighting for, this next use case is as spicy as they get. AI is coming for a very particular corporate job, and itâs coming fast.
Subscribe to Full Premium package to read the rest.
Become a paying subscriber of Full Premium package to get access to this post and other subscriber-only content.
Already a paying subscriber? Sign In.
A subscription gets you:
- ⢠NO ADS
- ⢠An additional insights email on Tuesdays
- ⢠Gain access to TheWhiteBox's knowledge base to access four times more content than the free version on markets, cutting-edge research, company deep dives, AI engineering tips, & more