Is AI Truly Intelligent?

šŸ TheTechOasis šŸ

part of the:

In AI, learning is winning.

This post is part of the Premium Subscription, in which we analyze technology trends, companies, products, and markets to answer AI's most pressing questions.

You will also find this deep-dive in TheWhiteBox ā€˜Key Future Trends in AIā€™, for future reference if necessary and to ask me any questions you consider.

Over the last year, Iā€™ve grown tired of all this hyperbole, moonshot takes, and unrealized promises around AI, so Iā€™ve decided itā€™s time to call bullshit.

After we studied the markets a few weeks ago, we realized that the hype around AI was largely unmet with proper demand, which is confusing considering how smart AI allegedly is.

So, it got me thinking: Is the ā€˜Iā€™ in ā€˜AIā€™ actually real? Is AI actually ā€˜intelligentā€™?

To answer, for weeks, Iā€™ve feasted on the opinions of both sides, from the LLM enthusiasts to those who think current AI is everything but ā€˜intelligentā€™ and that the entire world is wrong, very wrong.

So, what should you expect? This article will:

  • Give you a reality check of current state-of-the-art AI so that you donā€™t take for granted what many do: That AI may not be what it seems"known knownsā€ can be deceivingā€¦ or a straight lie.

  • A clear idea of whatā€™s to come in AI to breach the next frontier of intelligence, reasoning, and the most likely form the next generation of models will have based on the latest research.

Finally, we will reflect on the question that could send our markets into the abyss and shatter the hype around AI for years:

What if LLMs arenā€™t the way to intelligence?

Letā€™s dive in.

When AI Became a Religion

If you follow the AI industry, you will realize that its fellowship is strongly influenced by faith rather than proof, making it similar to religions.

AI is tremendously inductive, meaning that most of the ā€˜breakthroughsā€™ we see in the industry are not discovered by humans, but discovered by AIs.

As researchers stumble upon breakthroughs instead of actively deducing them, all those claims of ā€˜x or y will lead to Artificial General Intelligence (AGI)ā€™ are as good as yours, even when coming from people at the forefront of research.

Bluntly put, while the worldā€”and marketsā€”take these statements from experts as facts, they are, in fact, beliefs.

Nonetheless, when was the last time you heard an undeniable fact from any incumbents? However, youā€™ve probably grown tired of hearing unsubstantiated claims that ā€˜AI is already as smart as high schoolersā€™ or even ā€˜as smart as undergraduatesā€™ multiple times.

The markets, extremely high on ā€˜AI cope,ā€™ instantly buy these statements, but what are these based on? What does it mean to be ā€˜as smart as xā€™?

This leads to one of humanity's greatest unanswered questions: What is intelligence in a non-esoteric way?

General Statements and an Unknown Known

While Iā€™ve read countless definitions of intelligence, the one that stuck is by far the most intuitive and simple:

ā

Intelligence is what you use when you don't know what to do

Jean Piaget

In other words, intelligence is the cognitive action you perform when the solution to a problem canā€™t be routed to your past experiences, knowledge, or to your memory.

However, this is not the definition being propelled by incumbents.

LLMs are Data Compressors

A more popular (and, quite frankly, convenient) conception of intelligence is compression, which is the capacity of a ā€˜beingā€™ to find the key patterns in data. And the best example of this is none other than Large Language Models (LLMs).

Like any other generative AI model, they are taught to replicate data; their performance is explicitly evaluated by how well they imitate the original data.

However, this corpus represents +99% of the public data available, in the multiple trillions of words once curated.

But how large? Using the LLaMa models as an example, we know they were fed 15 trillion words. At two bytes per word, that equals 30 TB of data.

Models are much smaller:

  • an order of magnitude smaller for frontier AI models,

  • almost 2000 times smaller for models like LLaMa 3-8B (16 GB),

  • and around 96.000 times smaller for Metaā€™s new MobileLLM.

Thus, models canā€™t simply rotely memorize the dataset. Consequently, how can these very small models learn so much data?

You guessed: compression.

Simply put, for such a small model to consistently replicate the original corpus, it must learn the key data patterns (syntax, grammar, etc.) and use these priors to generalize into the overall corpus.

Now, would that count as intelligence? Specifically, are memorization and intelligence compatible, or is memorizing a way to deceive us into believing LLMs are intelligent?

Does Memorization Count?

As we recapped how LLMs learn, you might have realized something: they are like a database; by learning to predict the next word, they are essentially memorizing the data.

Again, itā€™s clear it isnā€™t rote memorization (compression proves otherwise) but memorization nonetheless.

This begs the question: when these models perform reasoning, which they clearly do, is this an act of novel reasoning, or are they simply regurgitating the same reasoning chain they saw hundreds of times during training?

Do they understand how to solve a problem, or are they simply memorizing the thought process?

Answering this question is crucial because, in our current LLM benchmarks, memorization plays a huge role and, in some cases, is all you need.

If one takes a naive look at the results of these models in some of the most popular benchmarks, like MMLU, you will think that these models are smarter than most humans.

Nonetheless, we have already seen how some of these models confidently pass the lawyerā€™s bar exam or the SAT.

But hereā€™s the thing: all these tests, or at least a large portion, can be memorized. In other words, if models amass enough knowledge, they can simply memorize their way out of every task in those exams and benchmarks without truly understanding them.

Of course, this isnā€™t different from how most humans proceed in life. Most of our actions are unconscious, based on experiences and knowledge weā€™ve gathered during our lifetimesā€¦ which is why incumbents may deceive us into thinking AI is intelligent when, in reality, it might not beā€”like, at all.

But how can we prove that? And the answer lies in psychology.

Systems of thinking

Most of our daily actions are unconscious.

Often referred to as System 1 thinking, we perform these actions instinctively, with no conscious thought whatsoever. As System 1 is still fundamental to our survival (it liberates ā€˜thought spaceā€™ for non-intuitive tasks), we can certainly make the case that these actions are intelligent.

But you will agree with me that humans also perform well in situations where we donā€™t really know what to do, situations that require novelty. In those situations, our prefrontal cortex kicks in, and we engage in conscious thought, System 2 thinking, to solve a problem our instincts canā€™t solve.

And how does AI fare in those scenarios?

Memorization-resistant Benchmarks

Absolutely awful.

When evaluating LLMs in the ARC-AGI benchmark, a pretty novel evaluation benchmark created by the legendary FranƧois Chollet, models perform horribly.

Specifically, frontier LLMs only reach a measly 9% (GPT-4o), while humans can consistently reach an average of 80% without much previous training. The image below shows an example from the dataset.

According to the man himself, this is because this benchmark is particularly ā€˜memorization resistant,ā€™ a particular set of problems that LLMs couldnā€™t have seen beforehand.

In other words, when confronted by novelty for the first time, all frontier AI models crumble like wet paper.

In summary, while LLMs have conquered memorization and pattern matching (System 1), they must still conquer proper reasoning.

So, are LLMs fooling us? Hereā€™s the dark secret: most people in the industry, although not publicly to avoid scaring the markets, will agree AI is not intelligent.

Therefore, the question becomes more about whether the world has thrown a trillion dollars into the wrong place: LLMs. As you may imagine, many think that is the case.

Conquering Reasoning

Although no one knows the outcome, people in the industry are taking very strong positions on how reasoning will be conquered.

All We Need is Compute

In the first segment, we have those who argue that all we need is more compute. A prime example of a researcher in this field is Leopold Aschenbrenner, an ex-OpenAI researcher.

He argues that by throwing more compute we will create a Researcher/engineer-level AI by 2027-2028, even reaching AGI by that time.

To achieve this, he expects an eight-order-of-magnitude increase in the amount of compute relative to what we used to train GPT-4. In other words, that model will require one hundred million times more compute than what we used to train GPT-4, an already massive endeavor.

Now, besides the fact that the number already sounds completely outrageous, I donā€™t understand the utter disregard for the fact that, under current constraints, we will never have sufficient electrical power to train that model, at least in the time scale he draws.

And another important thing: if we need 1032 flops, an unfathomable amount of learning, to reach human-researcher intelligence level, isnā€™t that telling us that, maybe, our current algorithms and heuristics arenā€™t good enough?

That, maybe, LLMs ainā€™t it?

Noam Brown, reasoning Lead at OpenAI, best summarizes my stance on this view when asked whether just scaling would be enough to reach AGI from our current models:

Although he believes in LLMs (that said, working at OpenAI, saying otherwise could be a PR catastrophe), itā€™s clear that heā€™s skeptical whether we are doing it the right way.

And the fact that Generative AI models seem to be in doubt due to their terrible learning curves leads us to the next stop: the greatest LLM skeptic of our time, Yann LeCun.

GenAI Is Not the Answer

Despite being the lead scientist at Meta, one of the companies building the best LLMs, Yann LeCun is notoriously skeptical of LLMsā€”and Generative AI models in generalā€”as a way to reach common intelligence.

In his view, the quality of the representations these models learn (the measure of how well they understand the world) is extremely poor, explaining why they are terrible learners.

Here, Yann isnā€™t criticising how well LLMs compress data; that would be dumb.

What heā€™s implying is that we canā€™t pretend to build AGI from a model that sees the world through the lens of text, as LLMs essentially build a representation of the world based on another representation of the world, text, and that embodiment and grounding on reality are still required.

In his view, the conquest of true intelligence will come with JEPAs, or Joint-Embedding Predictive Architectures.

The main takeaway is that, according to Yann, these models learn the key aspects of the world and ignore the rest, while a generative model would need to learn every single minor detail of real life to work, as they have to generate every single word, image, or video.

That said, I donā€™t want to spend that much time on JEPAs because we donā€™t have actual applications based on them, and, importantly, the next proposal is the one that asks the right questions.

Active Inference

One of the hardest things to come to terms with current models is that they have a knowledge cutoff. In other words, their existence can be divided into two phases: Learning and Inference.

Once the learning phase ends, the model no longer learns anything else (assuming learning is when the model adapts to new skills and knowledge).

In-context learning, the capacity for these models to use exogenous context to solve new tasks, and the primary driver behind RAG, isnā€™t really model learning, but ā€˜learning on-the-goā€™; as soon as the model doesnā€™t have access to that new context anymore, it automatically forgets it.

That doesnā€™t make that much sense, right?

In an ever-changing world like ours, if we ever expect these models to coexist with us physically, their incapacity to learn from new experiences doesnā€™t sound like the most optimistic path toward AGI, does it?

Therefore, those that fall into this category assume that we need new algorithmic breakthroughs, new discoveries, that go beyond LLMs to unlock reasoning through continuous adaption, just like a human would, a term known as active learning (or active inference, as Karl Friston, the world-famous neuroscientist, would put it).

Models in a never-ending state of learning, like humans.

All sides considered, we are finally ready to analyze and understand where the worldā€™s most brilliant minds point as the next reasoning frontier, including OpenAIā€™s latest leaked intentions.

Subscribe to Leaders to read the rest.

Become a paying subscriber of Leaders to get access to this post and other subscriber-only content.

Already a paying subscriber? Sign In

A subscription gets you:
High-signal deep-dives into the most advanced AI in the world in a easy-to-understand language
Additional insights to other cutting-edge research you should be paying attention to
Curiosity-inducing facts and reflections to make you the most interesting person in the room