- TheTechOasis
- Posts
- Is AI Truly Intelligent?
Is AI Truly Intelligent?
š TheTechOasis š
part of the:
In AI, learning is winning.
This post is part of the Premium Subscription, in which we analyze technology trends, companies, products, and markets to answer AI's most pressing questions.
You will also find this deep-dive in TheWhiteBox āKey Future Trends in AIā, for future reference if necessary and to ask me any questions you consider.
Over the last year, Iāve grown tired of all this hyperbole, moonshot takes, and unrealized promises around AI, so Iāve decided itās time to call bullshit.
After we studied the markets a few weeks ago, we realized that the hype around AI was largely unmet with proper demand, which is confusing considering how smart AI allegedly is.
So, it got me thinking: Is the āIā in āAIā actually real? Is AI actually āintelligentā?
To answer, for weeks, Iāve feasted on the opinions of both sides, from the LLM enthusiasts to those who think current AI is everything but āintelligentā and that the entire world is wrong, very wrong.
So, what should you expect? This article will:
Give you a reality check of current state-of-the-art AI so that you donāt take for granted what many do: That AI may not be what it seems; "known knownsā can be deceivingā¦ or a straight lie.
A clear idea of whatās to come in AI to breach the next frontier of intelligence, reasoning, and the most likely form the next generation of models will have based on the latest research.
Finally, we will reflect on the question that could send our markets into the abyss and shatter the hype around AI for years:
What if LLMs arenāt the way to intelligence?
Letās dive in.
When AI Became a Religion
If you follow the AI industry, you will realize that its fellowship is strongly influenced by faith rather than proof, making it similar to religions.
AI is tremendously inductive, meaning that most of the ābreakthroughsā we see in the industry are not discovered by humans, but discovered by AIs.
As researchers stumble upon breakthroughs instead of actively deducing them, all those claims of āx or y will lead to Artificial General Intelligence (AGI)ā are as good as yours, even when coming from people at the forefront of research.
Bluntly put, while the worldāand marketsātake these statements from experts as facts, they are, in fact, beliefs.
Nonetheless, when was the last time you heard an undeniable fact from any incumbents? However, youāve probably grown tired of hearing unsubstantiated claims that āAI is already as smart as high schoolersā or even āas smart as undergraduatesā multiple times.
The markets, extremely high on āAI cope,ā instantly buy these statements, but what are these based on? What does it mean to be āas smart as xā?
This leads to one of humanity's greatest unanswered questions: What is intelligence in a non-esoteric way?
General Statements and an Unknown Known
While Iāve read countless definitions of intelligence, the one that stuck is by far the most intuitive and simple:
Intelligence is what you use when you don't know what to do
In other words, intelligence is the cognitive action you perform when the solution to a problem canāt be routed to your past experiences, knowledge, or to your memory.
However, this is not the definition being propelled by incumbents.
LLMs are Data Compressors
A more popular (and, quite frankly, convenient) conception of intelligence is compression, which is the capacity of a ābeingā to find the key patterns in data. And the best example of this is none other than Large Language Models (LLMs).
Like any other generative AI model, they are taught to replicate data; their performance is explicitly evaluated by how well they imitate the original data.
However, this corpus represents +99% of the public data available, in the multiple trillions of words once curated.
But how large? Using the LLaMa models as an example, we know they were fed 15 trillion words. At two bytes per word, that equals 30 TB of data.
Models are much smaller:
an order of magnitude smaller for frontier AI models,
almost 2000 times smaller for models like LLaMa 3-8B (16 GB),
and around 96.000 times smaller for Metaās new MobileLLM.
Thus, models canāt simply rotely memorize the dataset. Consequently, how can these very small models learn so much data?
You guessed: compression.
Simply put, for such a small model to consistently replicate the original corpus, it must learn the key data patterns (syntax, grammar, etc.) and use these priors to generalize into the overall corpus.
Now, would that count as intelligence? Specifically, are memorization and intelligence compatible, or is memorizing a way to deceive us into believing LLMs are intelligent?
Does Memorization Count?
As we recapped how LLMs learn, you might have realized something: they are like a database; by learning to predict the next word, they are essentially memorizing the data.
Again, itās clear it isnāt rote memorization (compression proves otherwise) but memorization nonetheless.
This begs the question: when these models perform reasoning, which they clearly do, is this an act of novel reasoning, or are they simply regurgitating the same reasoning chain they saw hundreds of times during training?
Do they understand how to solve a problem, or are they simply memorizing the thought process?
Answering this question is crucial because, in our current LLM benchmarks, memorization plays a huge role and, in some cases, is all you need.
If one takes a naive look at the results of these models in some of the most popular benchmarks, like MMLU, you will think that these models are smarter than most humans.
Nonetheless, we have already seen how some of these models confidently pass the lawyerās bar exam or the SAT.
But hereās the thing: all these tests, or at least a large portion, can be memorized. In other words, if models amass enough knowledge, they can simply memorize their way out of every task in those exams and benchmarks without truly understanding them.
Of course, this isnāt different from how most humans proceed in life. Most of our actions are unconscious, based on experiences and knowledge weāve gathered during our lifetimesā¦ which is why incumbents may deceive us into thinking AI is intelligent when, in reality, it might not beālike, at all.
But how can we prove that? And the answer lies in psychology.
Systems of thinking
Most of our daily actions are unconscious.
Often referred to as System 1 thinking, we perform these actions instinctively, with no conscious thought whatsoever. As System 1 is still fundamental to our survival (it liberates āthought spaceā for non-intuitive tasks), we can certainly make the case that these actions are intelligent.
But you will agree with me that humans also perform well in situations where we donāt really know what to do, situations that require novelty. In those situations, our prefrontal cortex kicks in, and we engage in conscious thought, System 2 thinking, to solve a problem our instincts canāt solve.
And how does AI fare in those scenarios?
Memorization-resistant Benchmarks
Absolutely awful.
When evaluating LLMs in the ARC-AGI benchmark, a pretty novel evaluation benchmark created by the legendary FranƧois Chollet, models perform horribly.
Specifically, frontier LLMs only reach a measly 9% (GPT-4o), while humans can consistently reach an average of 80% without much previous training. The image below shows an example from the dataset.
According to the man himself, this is because this benchmark is particularly āmemorization resistant,ā a particular set of problems that LLMs couldnāt have seen beforehand.
In other words, when confronted by novelty for the first time, all frontier AI models crumble like wet paper.
In summary, while LLMs have conquered memorization and pattern matching (System 1), they must still conquer proper reasoning.
So, are LLMs fooling us? Hereās the dark secret: most people in the industry, although not publicly to avoid scaring the markets, will agree AI is not intelligent.
Therefore, the question becomes more about whether the world has thrown a trillion dollars into the wrong place: LLMs. As you may imagine, many think that is the case.
Conquering Reasoning
Although no one knows the outcome, people in the industry are taking very strong positions on how reasoning will be conquered.
All We Need is Compute
In the first segment, we have those who argue that all we need is more compute. A prime example of a researcher in this field is Leopold Aschenbrenner, an ex-OpenAI researcher.
He argues that by throwing more compute we will create a Researcher/engineer-level AI by 2027-2028, even reaching AGI by that time.
To achieve this, he expects an eight-order-of-magnitude increase in the amount of compute relative to what we used to train GPT-4. In other words, that model will require one hundred million times more compute than what we used to train GPT-4, an already massive endeavor.
Now, besides the fact that the number already sounds completely outrageous, I donāt understand the utter disregard for the fact that, under current constraints, we will never have sufficient electrical power to train that model, at least in the time scale he draws.
And another important thing: if we need 1032 flops, an unfathomable amount of learning, to reach human-researcher intelligence level, isnāt that telling us that, maybe, our current algorithms and heuristics arenāt good enough?
That, maybe, LLMs aināt it?
Noam Brown, reasoning Lead at OpenAI, best summarizes my stance on this view when asked whether just scaling would be enough to reach AGI from our current models:
I think scaling existing techniques would get us there. But if these models canāt even play tic tac toe competently how much would we have to scale them to do even more complex tasks?
ā Noam Brown (@polynoamial)
5:48 PM ā¢ Jun 20, 2024
Although he believes in LLMs (that said, working at OpenAI, saying otherwise could be a PR catastrophe), itās clear that heās skeptical whether we are doing it the right way.
And the fact that Generative AI models seem to be in doubt due to their terrible learning curves leads us to the next stop: the greatest LLM skeptic of our time, Yann LeCun.
GenAI Is Not the Answer
Despite being the lead scientist at Meta, one of the companies building the best LLMs, Yann LeCun is notoriously skeptical of LLMsāand Generative AI models in generalāas a way to reach common intelligence.
In his view, the quality of the representations these models learn (the measure of how well they understand the world) is extremely poor, explaining why they are terrible learners.
Here, Yann isnāt criticising how well LLMs compress data; that would be dumb.
What heās implying is that we canāt pretend to build AGI from a model that sees the world through the lens of text, as LLMs essentially build a representation of the world based on another representation of the world, text, and that embodiment and grounding on reality are still required.
In his view, the conquest of true intelligence will come with JEPAs, or Joint-Embedding Predictive Architectures.
The main takeaway is that, according to Yann, these models learn the key aspects of the world and ignore the rest, while a generative model would need to learn every single minor detail of real life to work, as they have to generate every single word, image, or video.
That said, I donāt want to spend that much time on JEPAs because we donāt have actual applications based on them, and, importantly, the next proposal is the one that asks the right questions.
Active Inference
One of the hardest things to come to terms with current models is that they have a knowledge cutoff. In other words, their existence can be divided into two phases: Learning and Inference.
Once the learning phase ends, the model no longer learns anything else (assuming learning is when the model adapts to new skills and knowledge).
In-context learning, the capacity for these models to use exogenous context to solve new tasks, and the primary driver behind RAG, isnāt really model learning, but ālearning on-the-goā; as soon as the model doesnāt have access to that new context anymore, it automatically forgets it.
That doesnāt make that much sense, right?
In an ever-changing world like ours, if we ever expect these models to coexist with us physically, their incapacity to learn from new experiences doesnāt sound like the most optimistic path toward AGI, does it?
Therefore, those that fall into this category assume that we need new algorithmic breakthroughs, new discoveries, that go beyond LLMs to unlock reasoning through continuous adaption, just like a human would, a term known as active learning (or active inference, as Karl Friston, the world-famous neuroscientist, would put it).
Models in a never-ending state of learning, like humans.
All sides considered, we are finally ready to analyze and understand where the worldās most brilliant minds point as the next reasoning frontier, including OpenAIās latest leaked intentions.
Subscribe to Full Premium package to read the rest.
Become a paying subscriber of Full Premium package to get access to this post and other subscriber-only content.
Already a paying subscriber? Sign In.
A subscription gets you:
- ā¢ NO ADS
- ā¢ An additional insights email on Tuesdays
- ā¢ Gain access to TheWhiteBox's knowledge base to access four times more content than the free version on markets, cutting-edge research, company deep dives, AI engineering tips, & more