- TheTechOasis
- Posts
- Apple Against its Own Demons
Apple Against its Own Demons
š TheTechOasis š
Breaking down the most advanced AI systems in the world to prepare you for the future.
5-minute weekly reads.
TLDR:
AI Research of the Week: Apple Might Have a Real AI Problem
Leaders: Is The Future of AI Becoming a Deadlier Ad Weapon?
š¤© AI Research of the week š¤©
For months, everyone in the AI industry has been asking the same question: āWhat is Apple doing?ā
We now know, and itās not looking good. Or, put bluntly, itās outright embarrassing.
But while some are, in pure sensationalist fashion, proclaiming this is the beginning of the end for Apple, I argue thereās room for hope, as the company of the bitten apple has just released whatās probably the biggest contribution to open-source AI in months, their MM1 model family.
Letās start with the positive stuff before moving on to the very ugly part.
The MM1 Family, Appleās Commitment to Open-Source
In its usual low-key way of publishing Generative AI research, Apple has silently presented the MM1 family, a set of very capable Multimodal Large Language Models (MLLMs) that achieve state-of-the-art results in many tasks.
But more importantly, they are the greatest contribution to open-source we have ever seen from a big-tech company.
Committed to openness
Training MLLM pipelines are some of the most well-guarded secrets in the world today.
Indeed, the coarse guidelines are pretty much known all around:
the use of the Transformer and the attention mechanism as the backbone for data distribution learning,
the global training steps (pre-training, and fine-tuning),
the alignment methods (both Reinforcement Learning-based like RLHF, or non-RL like DPO),
sparsity techniques like Mixture-of-Experts,
And so on.
Not only all previous methods are known but, more importantly, they are common across all LLMs.
But with such large models, the problem is much less of a theoretical one but an engineering and cost one.
In those situations, the devil lies in the details, and simple mistakes might imply losses in the millions of dollars.
Therefore, the small tweaks that some of the brightest minds of our time figure out behind closed doors in Silicon Valley are what set apart some LLMs from others.
Yet, the moat is not only based on human talent, but also on compute. Some tweaks can only be figured out by rote experimentation, something only the richest companies in the world can provide.
Consequently, some claim these companies have a moral duty to share their discoveries.
But, you guessed it, most donāt, which is why Appleās recent research is so important.
The essence of MLLMs
In this case, Apple has taken the opposite approach, and they have released a paper where not only do they explain all the different ablations they perform (testing and retrying different alternatives to see what performs best), but they are totally open on the data usedāāāboth in pre-training and fine-tuningāāāas well as several conclusions they have reached during testing that could help open-source researchers enhance their work.
But before moving into the discoveries made by the team, a quick reminder of what a simple MLLM looks like.
In simple terms, they are models consisting of two or more encoders and a decoder-only LLM that take both images and text tokens and generate a text sequence, usually giving some sort of information contained in the provided inputs.
But first, what is a decoder-only LLM?
ChatGPT, Gemini, and so on are what we define as 'decoder onlyā. In standard Transformers, you first transform the input data, be that text or images, into an embedding, a numerical representation of data in a way that machines can process.
This process is usually performed by a text encoder, a neural network that is specifically meant for that task. Thus, for every input, the encoder will take it and embed the data in real-time.
The original components of the Transformer used on images.
Instead, ChatGPT and others learn those embeddings as part of the training. Consequently, whenever data is received by the model, this data is first tokenized, or āchunkedā into parts known as tokens, that the model recognizes and transforms into their respective embeddings.
Sounds much more complicated than it is.
In decoder-only LLMs, those tokens are inserted into a ālook-upā matrix that finds the row regarding the embedding of that particular token, instead of having to compute it with an encoder, making a much less computationally demanding process overall.
These embeddings are then inserted into the decoder (the actual LLM, hence the name decoder-only) that uses them to predict the next word in the sequence.
And how do we then allow these text-only models to work with images too, to transform an LLM into an MLLM?
Simple, you take a pre-trained open-source LLM like Vicuna and you stitch an image encoder like CLIP to it so that it can process images.
The image below depicts a complete representation of an MLLM, where the āText Embedderā is what we have just described as the embedding lookup matrix.
Source: SmileGate AI
The importance of grafting
Apple, like in most open-source cases, hasnāt trained the entire MLLM from scratch, but has used a pre-trained image encoder alongside their LLM to create the MM1 models, a process called āgraftingā.
However, this stitching process wonāt work just as it is, as you are departing from an LLM and image encoder that were trained separately.
Thus, you need to add an adapter that will take the output of the image encoder and project itāāātransform itāāāinto the vector space of the LLM.
You can think of this adapter as another small model that turns image tokens into word tokens so that the LLM, a text-only model, can process them.
So, now that we understand how MLLMs are trained, letās move on to the notable contributions of the paper.
Thanks, Apple
Apple has really dropped several jewels in this paper, which include:
Design best practices: Through detailed ablation studies, the research identifies several crucial design lessons. Importantly, referring to data, they found that a mix of image-caption, interleaved image-text, and text-only data is essential for achieving state-of-the-art few-shot results across multiple benchmarks.
Impact of Image Encoder and Resolution: The study emphasizes the significant impact of the image encoder, image resolution, and the count of image tokens on the modelās performance. Contrarily, the design of the vision-language adapter was found to be of comparatively negligible importance. You still need it, but its design isnāt that relevant.
Scaling the Model: By scaling up the model size and exploring mixture-of-experts (MoE) models, a family of performant models was achieved. These models not only achieved state-of-the-art pre-training metrics but also competitive performance after fine-tuning on established multimodal benchmarks.
Superior Few-Shot Performance: The configuration-optimized MM1 models demonstrated superior few-shot performance across a range of benchmarks compared to other pre-training approaches.
This list is not exhaustive. I deeply encourage you to check the paper for a complete set of insights.
But besides the benefits to society for being so open about how to create a state-of-the-art model, the MM1 family looks really good overall, with some very interesting capabilities as shown below:
Not revolutionary, but on par with whatās expected at the SOTA level in the sizes of the models they trained.
But enough of the good things regarding Apple and AI, as this paper alone doesnāt come even close to salvaging the narrative drift we are seeing regarding the company.
The Pressure is Mounting
The pressure for āAIā companies to prove the hype and accrued value over 2023 is growing fast.
2024, confirmation or debacle
Throughout 2023, the flame ignited by ChatGPT at the end of 2022 caused any public tech company with the slightest of scents of AI to soar in value.
Generative AI presented itself not only as a really powerful technology, but as a foundation for companies to base all their AI strategy.
Through the use of foundation models (like the one Apple has just presented) models that were good in multiple different downstream tasks, companies finally saw that this āAIā buzzword was, in fact, legit.
Or so we thought.
Still to this day, the adoption curve of Generative AI products remains on the really low end of the spectrum.
Therefore, if tech companies want to retain the confidence of investors, 2024 should confirm that AI is indeed what it has been claimed, and not yet another Silicon Valley-led flop.
Itās no surprise then that most of the current value accrued by AI companies during 2024 has flown into NVIDIA, the only company that is really turning this āAI thingā into tangible value with soaring revenues and consistently beating analyst predictions.
But among the Magnificent Seven, Microsoft, Apple, Nvidia, Meta, Tesla, Netflix, and Amazon, thereās no denying that one has much more to prove than the rest.
Apple.
Against the curve
In pure Apple style, the Cupertino-based company has let others take the risks for them in terms of leading the charge in Generative AI.
Itās understandable, considering that Apple has faired pretty damn well by coming as a second player and stealing all the pie.
They did it with the smartphone,
they did it with the headphone market,
and they might do it with virtual headsets.
But is this the right strategy this time around? Too soon to tell, but the pressure is nevertheless mounting.
Out of the aforementioned companies, besides the honorable exception of Tesla and the problems itās having in the EV market, Apple is the only one with negative results over the year in terms of stock value, with a decrease, year-to-date at the time of writing, of 4%.
It surely isnāt catastrophic, but while they are losing value, Nvidia has almost doubled its value since January 1st, Meta is growing at 40%, and Microsoft, Google, and Netflix all are putting double digits too.
Even Amazon, which isnāt precisely leading the charge in GenAI either, is rocking a solid 17% increase.
Also, the US economy is pretty strong, with an astonishing 4% unemployment rate, and consumer spending, a critical metric for a consumer-end company like Apple, is coming from all-time highs from last year.
Looking at the numbers and the macro environment, we should expect Apple HQs to be in a never-ending āchampagne and cocaineā fest, but itās far from that.
What is wrong with Apple? To me, itās as clear as a sunny day that the issue is their lagging AI strategy.
Itās actually quite simple
Probably the easiest-yet-naive way of justifying Appleās poor results is the poor performance of its flagship product, the iPhone.
But letās be real, that canāt explain the short-term negativity around the stock, because iPhone sales have been stagnant for years now.
iPhone Sales until 2022. Source: Statista
Global revenues almost follow an identical trend:
Source: Statista
And Airpods unit sales peaked in 2020. Thus, none of these metrics comes even close to explaining what is going on with Apple.
Yes, Appleās car flop doesnāt help, but to me sounds more like a PR āLā for Apple than actually making investors nervous.
They also recently got fined $2 billion by the European Union for breaking antitrust laws against Spotify.
āWhat a tragedyā might have been Tim Cookās thoughts after losing the equivalent of 2 days of revenue or 10 days of free cash flow.
I can answer for you, neither Tim or investors could care less about this fine.
But then, what is going on? Simple, they are slowly but steadily losing the AI narrative.
While everyone agrees Microsoft, Meta, and Google are leading the AI revolution, Apple is seen as the lagger.
Everyone seemed to be ok with this, as Apple was hinting they were workingāāāand investingāāāon Generative AI āquite a bitā, as confirmed by Tim Cook in a November 2023 statement.
But more recent announcements paint a picture that is, bluntly speaking, embarrassing.
Straight to the point, Apple has been reportedly in talks with both OpenAI and Google to license their LLMs, ChatGPT and Gemini respectively, to use them as part of Appleās products.
Translated to investor terms, āWe are quite-a-lot-well-actually-incredibly behind the other big companies in terms of our AI initiativesā.
What many people feared is now confirmed, Apple isnāt getting things straight with Generative AI, and that is a huge, huge problem.
They Better Get Their Act Together. Fast.
Thereās no way around it, Apple has a real problem with AI.
Their overly zealous commitment to the Apple Vision Pro, an immensely risky bet, has probably left the company with little margin to focus on what was really moving the needle, AI.
They sure as hell know this by now, and consequently dramatically shifted their strategy, but in pure Warren Buffett analogy style, the tide is falling and they seem to be the only ones swimming naked.
In my humble opinion, their decision to embed OpenAI and Googleās model into the iPhones is surely amazing news for their consumers, including yours truly, as tools like Siri feel almost prehistoric at this point.
But is really not great news for Apple shareholders, which are seeing their golden egg rot by the day.
Luckily for Apple, they are a huge company and immensely cash-rich, so maybe itās about time they reconsider their allergy to acquisitions, and turn it into some M&A goodness that lifts the spirits of many investors that, today, are much less faithful on Appleās capacity to stay afloat in the AI age than yesterday.
This is not financial advice. I do not directly own Apple stock (nor I am long or short on it). All my public investing is done through index funds, which should signal to you that I am no financial advisor. Please perform the necessary due diligence on your side when making financial decisions of this caliber.
š This Weekās Sponsor š
If youāre looking to level up your AI game, the Growth School might be the place for you. Top Start-up in India 2023, and with high praise across TrustPilot and Google Reviews.
Learn from AI experts from Google and others.
Become an AI & ChatGPT Genius in just 3 hours for FREE! (Early Easter Sale)
Join ChatGPT & AI Workshop (worth $199) at no cost (Offer valid for first 100 people only) š
š¾ Best news of the week š¾
š¾ First video of a Neuralink chip being used by a real human. Must watch!
š¾ NVIDIA announces its next moonshot, Gr00t
š¾ New possible leak of OpenAIās Q*, the introduction of energy-based models (not confirmed)
š„ This week on Leadersā¦ š„
This week in the Leaders segment we will take a turn to the dark side of AI. Donāt worry, this is not one of those āAI is going to kill usā speeches that will make you side-eye.
In fact, sometimes I feel that this āAI as an existential threatā is just a tactic by AI companies to distract society from the real threats of AI, like AI-based hyper-specialized ad-targeting weapons that many of the incumbents of the industry will have at its disposal soon, thanks to video models like Gemini 1.5 or Sora.
Thus, my objective is to put into perspective how real this threat is for you and your family, and how we are going to get gaslighted into oblivion to allow for these systems to exist in the first place.
Click upgrade if you care about the future that is upon us.
Do you have any feelings, questions, or intuitions you want to share with me? Reach me at [email protected]