TheTechOasis
Posts
Apple's AI Breakthrough Will Transform Your iPhone in 2024

Apple's AI Breakthrough Will Transform Your iPhone in 2024

Ignacio de Gregorio Noblejas
December 25, 2023

🏝 TheTechOasis 🏝

Breaking down the most advanced AI systems in the world to prepare you for your future.

5-minute weekly reads.

TLDR:

AI Research of the Week: Apple’s Discovery That Will Change the iPhone
Leaders: What does it take to Make an AI Superhuman?

🤯 AI Research of the week 🤯

Out of all the big-tech titans, one of them slept through the AI revolution of 2023.

Or so we thought.

Apple, the most valuable company and brand in the history of capitalism, has finally given real details of its intentions with Generative AI with the presentation of Flash-memory Large Language Models (LLMs), a discovery that can change how we deploy AI and make it completely ubiquitous in our lives.

And the main outcome?

Transforming the likes of the iPhone and all other Apple hardware into entirely new digital products.

The Year of Efficiency

Last year, Mark Zuckerberg predicted that “2023 would be the year of efficiency.”

Of course, they meant cutting costs. But in our case, Apple is thinking about AI disruption.

The memory problem

If 2023 is the year humanity discovered it could train huge models, 2024 will be the year we will learn to run these huge models efficiently.

But why is this so important and why is Apple putting its focus here?

Basically, Apple’s entire AI value proposition is achieving what is thought as the holy grail of Generative AI model serving: running huge LLMs on smartphones.

To understand how seemingly impossible this is, let’s see an example.

If we take LLaMa 2 7B, a minuscule model by today’s standards, the weights file alone occupies 14 GB of DRAM memory, knowing it has a half-precision floating point format.

Half-precision means that each parameter occupies 16 bits, or 2 bytes of memory. As we have 7 billion of them, that means you need 14 billion bytes, or 14 GB, just to host the model.

That is almost two times the RAM capacity of the iPhone 15 Pro (8 GB), meaning that even a very small LLM model can’t be used on smartphones.

And for larger models, you need a state-of-the-art GPU cluster, requiring hundreds of thousands of US dollars in investment.

What’s more, that is pocket change if you want to run models like GPT-4, as companies like OpenAI literally burn that money, and more, on the daily.

By the looks of it, even though LLMs could have the potential to disrupt the smartphone market entirely, it’s an impossible task unless we figure out how to increase smartphones’ RAM capacity by orders of magnitude, which is not happening any time soon.

That, or you can leverage Apple’s recent breakthrough.

Flash LLMs

To understand how important Apple’s leap forward is, we need to understand how LLMs are run in the first place.

The Access Issue

Generally speaking, computers leverage two types of memory: Flash and RAM.

The former is used mainly for storage, while the latter is used as the ‘workspace’ of the computer at any given time given the capacity to access it much quicker.

Adding to this, by default, almost all (if not all) Large Language Models today are Transformers, a seminal AI architecture to build sequence-to-sequence models.

But why does that matter?

Put simply, the transformer model is entirely run for every prediction (for every word in ChatGPT’s case).

Transformers yield amazing results, but in turn require the entire model to be stored in quick access memory, aka RAM, as you need to access it continuously with minimal latency.

But what if we could store them in flash memory without the increased latency?

That would change everything as, for example, an iPhone 15 Pro only has 8 GB of RAM, but 128 GB of flash storage.

In that scenario, we could deploy gargantuan LLMs inside our smartphones, unraveling the power of today’s AI in models that fit in our hands.

Sparsity and large chunks

In layperson’s terms, what Apple has managed is to create the first LLM that can be run efficiently while being stored in flash memory.

Although the model still has to be loaded into RAM as any other program, Apple’s researchers leverage FeedForward layer sparsity to only load the important parts of the model.

As proved by multiple papers like the one that presented the Falcon LLM, and covered by one of my recent Medium articles, feedforward layers, an essential piece in Transformer architectures, are notoriously sparse, meaning that most neurons in its layers don’t fire most of the time.

The Transformer. FeedForward Layers in blue

Thus, what Apple researchers’ system does is only load into RAM the parts of the model that are required.

This gives us the best of both worlds: a model that is much bigger than the RAM limit while being stored in flash memory with no latency impact.

To improve efficiency, they introduced the concept of windowing, where they store the neuron activations of only a certain amount of previous predictions.

Besides weights, neuron activations are also usually stored because they are mostly redundant across the word-by-word prediction of a text sequence, saving up computational resources.

Leveraging the fact that for large sequences the amount of new neuron activations gradually decreases (as shown below), just retaining the activations of the most recent predictions doesn’t impact performance while considerably reducing memory requirements.

Additionally, they include other innovations like increasing data loading size, to optimize GB/s retrieval from flash memory, and memory preallocation, to optimize data reload at the RAM level.

But the real question is, what does this mean for all of us?

The AI iPhone

It’s no secret by now that 2024 will be the year Apple brings Generative AI at scale to its products.

Through its secretive ‘Ajax’ project, here are 4 ways I see the iPhone evolving next year.

Advanced Personal Assistants: Long story short, Siri on steroids. It will have access to all your apps, notes, and data. What’s more, Siri could evolve into a true AI companion that leverages your health apps, social media, and other sorts of data to provide more value to you.
Real-Time Universal Translation: Instant translation for texts and conversations, making communication seamless across different languages.
Improved shortcuts: Instead of using the crappy Shortcut builder, just create iPhone Shortcuts on-demand with a simple text instruction.
Enhanced Creative Tools: Collaborative tools for music, art, and writing that assist and enhance the creative process.

And many more.

For point 4, Google is already applying LLMs to photo editing on the Google Pixel 8 as shown in this video

Moreover, over time we could see the interaction between humans and machines becoming less dependent on screens and more dependent on voice, in a similar fashion to Humane’s AI Pin, although I don’t think that will happen anytime soon.

In the meantime, Apple has finally made clear what they are working on, besides the release of the Ferret open-source model last October, and it’s much bigger than what meets the eye, meaning that my predictions could be dwarfed by what Apple will eventually deliver.

🫡 Key contributions 🫡

Flash-memory LLMs: Apple's introduction of flash-memory Large Language Models enables running vast AI models on devices with limited RAM.
Efficiency Innovations: Apple's advancements in sparsity and windowing optimize memory use, allowing large models to run on consumer hardware without latency.

👾 Best news of the week 👾

🧐 AI’s 10 biggest stories of 2023

😍 DNNs show promise for human hearing, according to MIT

🤩 The World’s First Transformer Supercomputer

🫡 The Future of Generative AI according to The Alan Turing Institute

🥇 Leaders 🥇

What does it take to make AI superhuman?

Throughout 2023, AI frontier models have gotten pretty good at imitating us.

Although it still can’t be considered AGI, AI is unwaveringly at our level in many, many tasks, especially when it comes to text.

Until now, ‘superhuman AIs’ have been very, very scarce, limited to very specific tasks.

But 2024 could see the forthcoming of the first general-purpose self-improving model.

In other words, an AI that, combining humanity’s greatest achievement of 2023, multimodal generalization, and enough self-improvement, becomes superhuman in hundreds or thousands of tasks.

Seems like wishful thinking, but there are growing rumors that we might be already creating it.

Compression Meets Synthetic Data

When you ask one of the godfathers of the modern era of AI and the creator of ChatGPT, Ilya Sutskever, “What is ChatGPT?”, this is what he answers:

“Unsupervised compression.”

A lossy zip file

The most common definition of ChatGPT is: “a model that, given a sequence of tokens, usually text, predicts the next token (word) in the sequence.“

A more advanced enthusiast will respond with “a model that, given a sequence of tokens, usually text but not exclusively, predicts the probability distribution over the next token (word) in the sequence.“

This is a far more accurate definition, as indeed ChatGPT, given a sequence of words, will give you the full list of possible next words to that sequence with assigned probabilities.

But if you ask geniuses like Ilya or Andrej Karpathy, their answer will mention the word “compression” one way or another.

Because that’s precisely the key intuition behind GPTs like ChatGPT or Gemini; they are AI models that have ingested humongous amounts of data and compressed them into a weights file that represents that data.

In other words, using statistical correlations present in text, the model learns an approximate representation of all the data it has seen earlier, compressing Terabytes of data into a file orders of magnitude (x100) smaller.

In layman’s terms, you can think of ChatGPT as a “lossy zip file” of the Internet, as Andrej describes it, lossy being because zip files are compressions without data loss, while ChatGPT does lose some data.

Consequently, by compressing much of humanity’s data present on the Internet into one unique file, we achieve what was thought as the holy grail of AI a few years ago, general-purpose models.

But let’s not forget that these models, no matter how impressive they are, are trained on human data, leading us to an obvious conclusion: On our path to creating general-purpose superhuman AIs, they can’t be achieved with human data.

But scientists at labs like OpenAI or Google think they have the answer to this problem.

And to understand it, we need to take a trip down memory lane, to the Lee Sedol incident of 2016.

The Day Humanity was Defeated for the First Time

In 2016, the world witnessed a historic moment in artificial intelligence when AlphaGo, a program developed by Google DeepMind, faced off against Lee Sedol, one of the greatest Go players of all time.

In a shocking and highly publicized battle, AlphaGo defeated Lee Sedol 4-1, marking a monumental milestone in AI development.

Lee Sedol playing against AlphaGo

In particular, during the second game of the match, AlphaGo made a move that stunned both Lee Sedol and the Go community, “Move 37”.

On the 37th move, AlphaGo placed a stone in what was considered a highly unconventional and unexpected position (the 5th row from the edge, a strategy rarely used by professional Go players).

Initially met with skepticism, it eventually helped the AI defeat Lee. Put simply, Move 37 was the first time AIs went beyond human knowledge and capacity to defeat us.

For a more detailed account of the events, I highly recommend Google Deepmind’s documentary on the story.

Technically speaking, AlphaGo was trained using deep learning to understand Go strategies from professional games and reinforcement learning to improve by playing against lesser versions of itself.

This, along with a technique called Monte Carlo Tree Search used to explore and evaluate possible moves, and combined with the fact that it was trained by playing against itself allowed it to, eventually, exceed human capabilities.

Since this landmark event, however, AI has proven capable of going superhuman many other times.

Recently, with the likes of the CyberRunner.

The CyberRunner

A few days ago, a YouTube channel with just 56 subscribers released an awe-inspiring video that, a few days later, had dozens of thousands of views.

The video shows the CyberRunner, an AI model that plays the Labyrinth marble game by learning through Model-Based Reinforcement Learning (RL).

Model-based RL is a type of reinforcement learning where an agent constructs a model of the environment to predict future outcomes based on its previous and future actions.

But no matter what method is used, the intuition is always the same: to create superhuman AIs we need to go beyond human data.

And the truth is… we are already doing so.

Subscribe to Full Premium package to read the rest.

Become a paying subscriber of Full Premium package to get access to this post and other subscriber-only content.

Upgrade

Already a paying subscriber? Sign In.

A subscription gets you:

• NO ADS
• An additional insights email on Tuesdays
• Gain access to TheWhiteBox's knowledge base to access four times more content than the free version on markets, cutting-edge research, company deep dives, AI engineering tips, & more