How Google is planning on destroying its own business model
🏝 TheTechOasis 🏝
Breaking down the most advanced AI systems in easy-to-understand, 5-minute reads
🤯 AI Research of the week 🤯
“If you can’t beat it, join it.”
That’s how the saying goes, and that’s precisely what our dear friend Google is doing regarding AI.
In order to remain king of search, Google is purposely contributing to the demise of its ad-base revenue model, one of the most successful businesses in the history of capitalism.
But you shouldn’t fear for Google’s integrity, as we’ll see today… this is the plan all along.
Now, they have released their newest research on ‘AI-enhanced search’ with the release of AVIS, a powerful new framework for efficient question-answering, Image-based search.
The wake-up call
Anyone who follows the AI industry will probably agree on the fact that we’re approaching the death of Internet search as we know it.
And Google knows it too.
Using ChatGPT or Claude is much more quick and convenient than doing link-based searches.
Thus, Google had two options: Milk the dying cash cow while arguing that AI-search is stupid, or acknowledge the obvious and ride the AI wave so that they stay on top once it becomes a thing.
As Google’s execs aren’t dumb, they’ve gone for the second option.
In fact, there’s no company in the world right now more heavily focused on disrupting search with AI than Google.
For instance, a few weeks ago Google Deepmind released WebAgent, an AI autonomous agent capable of searching the web.
Now, as you’ll see, AVIS is another nail in the coffin for standard Internet search.
Of course, this is horrifying for Google’s ad-based model, which relies heavily on the ads displayed to users while they browse the web.
Although this breakdown is from 2022, it’s safe to say that the picture hasn’t changed at all.
Google’s ‘Search advertising’ represents 60% of the revenue! And that figure grows to a staggering 71% if you consider Google AdMob.
But with solutions like AVIS, the ad-model is suddenly completely broken.
The AVIS framework
Autonomous Visual Information Seeking, or AVIS, is a framework developed by Google to leverage Large Language Models (LLMs) to, given an image and a complicated question, have the capacity to dynamically plan, execute tools that provide the required information, and measure its own outputs autonomously until it finds the right answer.
Thus, the AVIS framework is comprised of three distinct elements:
LLM planner: An LLM that, based on the current state and observations, plans what tool and query it needs to get the required information from.
Working Memory: A log that retains the information from past executions.
LLM reasoner: An LLM that, based on the output from the tools, evaluates the obtained information and decides if it needs to progress further or if it’s ready to answer the question.
To better understand, let’s see an example:
Given the image at the top left and the question “When was the drum first used for this event?”, the LLM faces a very complicated task.
For starters, it has no additional information, meaning it’s missing a lot of context not present in the actual image in order to provide an answer.
What type of event is the image actually depicting?
What’s the date of the event?
What type of drum is that?
And, finally, when was that drum first used in the event?
In other words, to answer that question, the LLM needs to search for that additional context.
The first challenge involves drafting the first plan. The LLM planner will define what’s the first tool it needs to start the process. In this case, that process is as follows:
It starts by leveraging an object detection tool to scrutinize the complete image and crop it into distinct objects.
Then, it decides to run an image captioning model such as PALI, to obtain a description of each cropped image.
In parallel, the model chooses the most relevant cropped image based on the question, choosing the image of the drum.
Next, it runs Google Lens Image Search API to retrieve the most similar images from a dataset of images with their descriptive texts. This way, it now knows that the drum is a ‘Taiko’ usually used in the ‘Aoi Festival’
Now, the LLM reasoner evaluates the information and decides if it’s enough to answer the question.
As it still requires the date, it performs an additional search using Google’s Search API, this time including much more relevant info such as the name of the drum and the festival
With the retrieved information, it’s now finally able to answer, “7th century”.
And the impressive thing is that humans didn’t intervene one time in this whole process.
Additionally, to help the models learn, they performed a user study where they identified the most common actions humans take in such situations, and defined a transition graph for the model to use.
This way, the LLM planner had to decide the next action, it decided among the actions that users usually take in that situation.
But the question that comes to mind is pretty obvious.
Why is Google so adamant in pursuing the disruption of Internet search?
From an ad-based to API-based
In my humble opinion, Google’s plan is pretty clear.
If AI is clearly going to destroy our business model, let’s at least make sure it is us who do it.
Considering the priceless search data that Google holds, it’s almost a guarantee to me that Google will eventually deploy priced search APIs for AI agents to consume.
That way, instead of killing their business model, they are simply changing the way they charge for their data.
From Ads to APIs. Pretty genius if you ask me.
AVIS achieves new state-of-the-art AI solutions in complex visual question-answering benchmarks
It’s the first autonomous dynamic planner, allowing it to pivot if necessary
It seamlessly integrates with multiple tools such as image search systems, text search, object detection, image captioning, and with other LLMs, all in one.
🔮 Practical applications 🔮
Allows for what I believe to be Google’s future, a new range of search products providing data to AI agents through priced APIs
It allows performing automatic autonomous web searches, liberating the users from having to perform this search manually
👾 Best news of the week 👾
🧐 Study proves that LLaMa 2 is as factually accurate as GPT-4
🥳 We’re now really close to having the first functional watermarking AI detector
🥸 Google’s AI agent Duet will cost as much as Microsoft’s Copilot, $30
🥇 Leaders 🥇
Last week, I sent a poll to gather opinions regarding who will win the AI race, open-source or private AI.
Knowing the answer to this seems like the trillion-dollar question for everyone, as foreseeing the victory of one or the other will dictate future investments in the AI markets and influence all strategic decisions regarding AI, be those made by the CEO of a Fortune 500 company, the Secretary of State of the US, or you.
Unsurprisingly, the poll I sent was met with quite contrarian views, which got me thinking that this has no clear-cut answer.
Or does it?
In fact, after many hours of reflecting on the topic, I’m convinced one of the two stands to win, and it may not be the answer you expect.
Today, we’re deep-diving into the case for proprietary and open-source claims to the AI throne, in order to elucidate which of the two is more likely to, eventually, win.
The case for Open-source
Open-source AI occurs when the weights and design of an AI model are opened to the public.
Famous cases include Meta’s LLaMa models, initially leaked, now not only publicly available but also for commercial use.
The romanticism associated with the open-source movement is clear: By making the model openly available, you’re democratizing knowledge regarding the most powerful technology ever invented (probably).
But what does it take for open-source to win?
Well, four factors play an important part in this decision: Data security, Model Control, Training methodology, and Regulation.
Everything is about data today. It’s the most precious asset for companies who gather it.
Companies have become extremely zealous about protecting their data, and companies have realized that their unique data is a critical source for training the most powerful technology of them all: Artificial Intelligence.
In fact, CEOs are scare-straight regarding getting their data stolen, according to a recent survey by PwC, both in the short and long term.
Now, with the surge of Generative AI, companies need to face data leakages, as employees with access to confidential data may, mistakingly or not, provide this data to models like ChatGPT to help them with their work, only for that data to then be used to training future versions.
Then, with the proper prompt injection mechanisms, one can easily lure these conversational systems into leaking that data to third parties, something that Samsung already knows a thing or two about.
With open-source models, however, that’s not an issue because the model is stored in your own servers, meaning that your precious data never actually leaves the realms of your company.
And that leads us to the next appetizing thing about going open-source: control.
Having all under control
When using a model like ChatGPT, your only way to access it is through an API endpoint.
In other words, you have no visibility of what the model looks like, how many parameters it has, or else.
Naturally, this entails several issues:
You’re totally subdued to the API pricing set by OpenAI
Fine-tuning is basically limited to style changes and an improvement in quality, but not to cost reduction. Quantization, efficient fine-tuning like QLoRA, and others are out of the question.
You have almost no control over what data is used in the model, meaning that OpenAI is completely free to alter the behavior of its model
With open-source, it’s basically the wrong way around. Everything is under your control.
Furthermore, another great reason to believe in open-source is the actual training methodologies.
We’re simply getting better
Over the course of the last few years, we have gotten immensely better at training LLMs.
In fact, current “small” LLMs are already far superior to older, large models, and we’re already seeing open-source models give proprietary models a run for their money, as examples like Orca prove by beating the almighty ChatGPT-3.5 in average accuracy in reasoning benchmarks, despite being 10 times smaller:
The reason for this is clear.
As LLMs tend to be very underfitted, researchers are getting better at maximizing the quality/size ratio, meaning that we’re maximizing the performance of a model per size, even beating larger models.
For that, distillation, the process of using a larger, teacher model to teach its representations to a smaller model (basically teaching it to imitate its responses) will remain a very active field of research over the next years.
But the biggest elephant in the room is, without a doubt, regulation… or the lack of it.
European winds bring a smelly future for OpenAI
It’s no secret how zealous the European Union is regarding the protection of users’ privacy on the Internet and the rights of someone to withhold the use of its data without his/her consent.
The fears of the EU cracking down on models like ChatGPT and Claude, trained basically with “all” the Internet’s data - no questions asked - has prompted the surge of investments in companies promising LLMs that aren’t based on private data.
The investment mania has led to situations like French start-up Mistral, where a company training models solely based on public data has raised $113 million while being just four weeks old (and no product, obviously).
Considering all the elements discussed, seems like a pretty good case for open-source, right?
Well, I suggest you hold your horses for one second.
Subscribe to Leaders to read the rest.
Become a paying subscriber of Leaders to get access to this post and other subscriber-only content.
Already a paying subscriber? Sign In