- TheTechOasis
- Posts
- Apple's AI Breakthrough Will Transform Your iPhone in 2024
Apple's AI Breakthrough Will Transform Your iPhone in 2024
đ TheTechOasis đ
Breaking down the most advanced AI systems in the world to prepare you for your future.
5-minute weekly reads.
TLDR:
AI Research of the Week: Appleâs Discovery That Will Change the iPhone
Leaders: What does it take to Make an AI Superhuman?
đ€Ż AI Research of the week đ€Ż
Out of all the big-tech titans, one of them slept through the AI revolution of 2023.
Or so we thought.
Apple, the most valuable company and brand in the history of capitalism, has finally given real details of its intentions with Generative AI with the presentation of Flash-memory Large Language Models (LLMs), a discovery that can change how we deploy AI and make it completely ubiquitous in our lives.
And the main outcome?
Transforming the likes of the iPhone and all other Apple hardware into entirely new digital products.
The Year of Efficiency
Last year, Mark Zuckerberg predicted that â2023 would be the year of efficiency.â
Of course, they meant cutting costs. But in our case, Apple is thinking about AI disruption.
The memory problem
If 2023 is the year humanity discovered it could train huge models, 2024 will be the year we will learn to run these huge models efficiently.
But why is this so important and why is Apple putting its focus here?
Basically, Appleâs entire AI value proposition is achieving what is thought as the holy grail of Generative AI model serving: running huge LLMs on smartphones.
To understand how seemingly impossible this is, letâs see an example.
If we take LLaMa 2 7B, a minuscule model by todayâs standards, the weights file alone occupies 14 GB of DRAM memory, knowing it has a half-precision floating point format.
Half-precision means that each parameter occupies 16 bits, or 2 bytes of memory. As we have 7 billion of them, that means you need 14 billion bytes, or 14 GB, just to host the model.
That is almost two times the RAM capacity of the iPhone 15 Pro (8 GB), meaning that even a very small LLM model canât be used on smartphones.
And for larger models, you need a state-of-the-art GPU cluster, requiring hundreds of thousands of US dollars in investment.
Whatâs more, that is pocket change if you want to run models like GPT-4, as companies like OpenAI literally burn that money, and more, on the daily.
By the looks of it, even though LLMs could have the potential to disrupt the smartphone market entirely, itâs an impossible task unless we figure out how to increase smartphonesâ RAM capacity by orders of magnitude, which is not happening any time soon.
That, or you can leverage Appleâs recent breakthrough.
Flash LLMs
To understand how important Appleâs leap forward is, we need to understand how LLMs are run in the first place.
The Access Issue
Generally speaking, computers leverage two types of memory: Flash and RAM.
The former is used mainly for storage, while the latter is used as the âworkspaceâ of the computer at any given time given the capacity to access it much quicker.
Adding to this, by default, almost all (if not all) Large Language Models today are Transformers, a seminal AI architecture to build sequence-to-sequence models.
But why does that matter?
Put simply, the transformer model is entirely run for every prediction (for every word in ChatGPTâs case).
Transformers yield amazing results, but in turn require the entire model to be stored in quick access memory, aka RAM, as you need to access it continuously with minimal latency.
But what if we could store them in flash memory without the increased latency?
That would change everything as, for example, an iPhone 15 Pro only has 8 GB of RAM, but 128 GB of flash storage.
In that scenario, we could deploy gargantuan LLMs inside our smartphones, unraveling the power of todayâs AI in models that fit in our hands.
Sparsity and large chunks
In laypersonâs terms, what Apple has managed is to create the first LLM that can be run efficiently while being stored in flash memory.
Although the model still has to be loaded into RAM as any other program, Appleâs researchers leverage FeedForward layer sparsity to only load the important parts of the model.
As proved by multiple papers like the one that presented the Falcon LLM, and covered by one of my recent Medium articles, feedforward layers, an essential piece in Transformer architectures, are notoriously sparse, meaning that most neurons in its layers donât fire most of the time.
The Transformer. FeedForward Layers in blue
Thus, what Apple researchersâ system does is only load into RAM the parts of the model that are required.
This gives us the best of both worlds: a model that is much bigger than the RAM limit while being stored in flash memory with no latency impact.
To improve efficiency, they introduced the concept of windowing, where they store the neuron activations of only a certain amount of previous predictions.
Besides weights, neuron activations are also usually stored because they are mostly redundant across the word-by-word prediction of a text sequence, saving up computational resources.
Leveraging the fact that for large sequences the amount of new neuron activations gradually decreases (as shown below), just retaining the activations of the most recent predictions doesnât impact performance while considerably reducing memory requirements.
Additionally, they include other innovations like increasing data loading size, to optimize GB/s retrieval from flash memory, and memory preallocation, to optimize data reload at the RAM level.
But the real question is, what does this mean for all of us?
The AI iPhone
Itâs no secret by now that 2024 will be the year Apple brings Generative AI at scale to its products.
Through its secretive âAjaxâ project, here are 4 ways I see the iPhone evolving next year.
Advanced Personal Assistants: Long story short, Siri on steroids. It will have access to all your apps, notes, and data. Whatâs more, Siri could evolve into a true AI companion that leverages your health apps, social media, and other sorts of data to provide more value to you.
Real-Time Universal Translation: Instant translation for texts and conversations, making communication seamless across different languages.
Improved shortcuts: Instead of using the crappy Shortcut builder, just create iPhone Shortcuts on-demand with a simple text instruction.
Enhanced Creative Tools: Collaborative tools for music, art, and writing that assist and enhance the creative process.
And many more.
For point 4, Google is already applying LLMs to photo editing on the Google Pixel 8 as shown in this video
Moreover, over time we could see the interaction between humans and machines becoming less dependent on screens and more dependent on voice, in a similar fashion to Humaneâs AI Pin, although I donât think that will happen anytime soon.
In the meantime, Apple has finally made clear what they are working on, besides the release of the Ferret open-source model last October, and itâs much bigger than what meets the eye, meaning that my predictions could be dwarfed by what Apple will eventually deliver.
đ«Ą Key contributions đ«Ą
Flash-memory LLMs: Apple's introduction of flash-memory Large Language Models enables running vast AI models on devices with limited RAM.
Efficiency Innovations: Apple's advancements in sparsity and windowing optimize memory use, allowing large models to run on consumer hardware without latency.
đŸ Best news of the week đŸ
đ DNNs show promise for human hearing, according to MIT
đ€© The Worldâs First Transformer Supercomputer
đ«Ą The Future of Generative AI according to The Alan Turing Institute
đ„ Leaders đ„
What does it take to make AI superhuman?
Throughout 2023, AI frontier models have gotten pretty good at imitating us.
Although it still canât be considered AGI, AI is unwaveringly at our level in many, many tasks, especially when it comes to text.
Until now, âsuperhuman AIsâ have been very, very scarce, limited to very specific tasks.
But 2024 could see the forthcoming of the first general-purpose self-improving model.
In other words, an AI that, combining humanityâs greatest achievement of 2023, multimodal generalization, and enough self-improvement, becomes superhuman in hundreds or thousands of tasks.
Seems like wishful thinking, but there are growing rumors that we might be already creating it.
Compression Meets Synthetic Data
When you ask one of the godfathers of the modern era of AI and the creator of ChatGPT, Ilya Sutskever, âWhat is ChatGPT?â, this is what he answers:
âUnsupervised compression.â
A lossy zip file
The most common definition of ChatGPT is: âa model that, given a sequence of tokens, usually text, predicts the next token (word) in the sequence.â
A more advanced enthusiast will respond with âa model that, given a sequence of tokens, usually text but not exclusively, predicts the probability distribution over the next token (word) in the sequence.â
This is a far more accurate definition, as indeed ChatGPT, given a sequence of words, will give you the full list of possible next words to that sequence with assigned probabilities.
But if you ask geniuses like Ilya or Andrej Karpathy, their answer will mention the word âcompressionâ one way or another.
Because thatâs precisely the key intuition behind GPTs like ChatGPT or Gemini; they are AI models that have ingested humongous amounts of data and compressed them into a weights file that represents that data.
In other words, using statistical correlations present in text, the model learns an approximate representation of all the data it has seen earlier, compressing Terabytes of data into a file orders of magnitude (x100) smaller.
In laymanâs terms, you can think of ChatGPT as a âlossy zip fileâ of the Internet, as Andrej describes it, lossy being because zip files are compressions without data loss, while ChatGPT does lose some data.
Consequently, by compressing much of humanityâs data present on the Internet into one unique file, we achieve what was thought as the holy grail of AI a few years ago, general-purpose models.
But letâs not forget that these models, no matter how impressive they are, are trained on human data, leading us to an obvious conclusion: On our path to creating general-purpose superhuman AIs, they canât be achieved with human data.
But scientists at labs like OpenAI or Google think they have the answer to this problem.
And to understand it, we need to take a trip down memory lane, to the Lee Sedol incident of 2016.
The Day Humanity was Defeated for the First Time
In 2016, the world witnessed a historic moment in artificial intelligence when AlphaGo, a program developed by Google DeepMind, faced off against Lee Sedol, one of the greatest Go players of all time.
In a shocking and highly publicized battle, AlphaGo defeated Lee Sedol 4-1, marking a monumental milestone in AI development.
Lee Sedol playing against AlphaGo
In particular, during the second game of the match, AlphaGo made a move that stunned both Lee Sedol and the Go community, âMove 37â.
On the 37th move, AlphaGo placed a stone in what was considered a highly unconventional and unexpected position (the 5th row from the edge, a strategy rarely used by professional Go players).
Initially met with skepticism, it eventually helped the AI defeat Lee. Put simply, Move 37 was the first time AIs went beyond human knowledge and capacity to defeat us.
For a more detailed account of the events, I highly recommend Google Deepmindâs documentary on the story.
Technically speaking, AlphaGo was trained using deep learning to understand Go strategies from professional games and reinforcement learning to improve by playing against lesser versions of itself.
This, along with a technique called Monte Carlo Tree Search used to explore and evaluate possible moves, and combined with the fact that it was trained by playing against itself allowed it to, eventually, exceed human capabilities.
Since this landmark event, however, AI has proven capable of going superhuman many other times.
Recently, with the likes of the CyberRunner.
The CyberRunner
A few days ago, a YouTube channel with just 56 subscribers released an awe-inspiring video that, a few days later, had dozens of thousands of views.
The video shows the CyberRunner, an AI model that plays the Labyrinth marble game by learning through Model-Based Reinforcement Learning (RL).
Model-based RL is a type of reinforcement learning where an agent constructs a model of the environment to predict future outcomes based on its previous and future actions.
But no matter what method is used, the intuition is always the same: to create superhuman AIs we need to go beyond human data.
And the truth is⊠we are already doing so.
Subscribe to Full Premium package to read the rest.
Become a paying subscriber of Full Premium package to get access to this post and other subscriber-only content.
Already a paying subscriber? Sign In.
A subscription gets you:
- âą NO ADS
- âą An additional insights email on Tuesdays
- âą Gain access to TheWhiteBox's knowledge base to access four times more content than the free version on markets, cutting-edge research, company deep dives, AI engineering tips, & more