- TheTechOasis
- Posts
- How Google is planning on destroying its own business model
How Google is planning on destroying its own business model
š TheTechOasis š
Breaking down the most advanced AI systems in easy-to-understand, 5-minute reads
š¤Æ AI Research of the week š¤Æ
āIf you canāt beat it, join it.ā
Thatās how the saying goes, and thatās precisely what our dear friend Google is doing regarding AI.
In order to remain king of search, Google is purposely contributing to the demise of its ad-base revenue model, one of the most successful businesses in the history of capitalism.
But you shouldnāt fear for Googleās integrity, as weāll see todayā¦ this is the plan all along.
Now, they have released their newest research on āAI-enhanced searchā with the release of AVIS, a powerful new framework for efficient question-answering, Image-based search.
The wake-up call
Anyone who follows the AI industry will probably agree on the fact that weāre approaching the death of Internet search as we know it.
And Google knows it too.
Using ChatGPT or Claude is much more quick and convenient than doing link-based searches.
Thus, Google had two options: Milk the dying cash cow while arguing that AI-search is stupid, or acknowledge the obvious and ride the AI wave so that they stay on top once it becomes a thing.
As Googleās execs arenāt dumb, theyāve gone for the second option.
In fact, thereās no company in the world right now more heavily focused on disrupting search with AI than Google.
For instance, a few weeks ago Google Deepmind released WebAgent, an AI autonomous agent capable of searching the web.
Now, as youāll see, AVIS is another nail in the coffin for standard Internet search.
Of course, this is horrifying for Googleās ad-based model, which relies heavily on the ads displayed to users while they browse the web.
Although this breakdown is from 2022, itās safe to say that the picture hasnāt changed at all.
Googleās āSearch advertisingā represents 60% of the revenue! And that figure grows to a staggering 71% if you consider Google AdMob.
But with solutions like AVIS, the ad-model is suddenly completely broken.
The AVIS framework
Autonomous Visual Information Seeking, or AVIS, is a framework developed by Google to leverage Large Language Models (LLMs) to, given an image and a complicated question, have the capacity to dynamically plan, execute tools that provide the required information, and measure its own outputs autonomously until it finds the right answer.
Thus, the AVIS framework is comprised of three distinct elements:
LLM planner: An LLM that, based on the current state and observations, plans what tool and query it needs to get the required information from.
Working Memory: A log that retains the information from past executions.
LLM reasoner: An LLM that, based on the output from the tools, evaluates the obtained information and decides if it needs to progress further or if itās ready to answer the question.
To better understand, letās see an example:
Source: Google
Given the image at the top left and the question āWhen was the drum first used for this event?ā, the LLM faces a very complicated task.
For starters, it has no additional information, meaning itās missing a lot of context not present in the actual image in order to provide an answer.
For instance:
What type of event is the image actually depicting?
Whatās the date of the event?
What type of drum is that?
And, finally, when was that drum first used in the event?
In other words, to answer that question, the LLM needs to search for that additional context.
The first challenge involves drafting the first plan. The LLM planner will define whatās the first tool it needs to start the process. In this case, that process is as follows:
It starts by leveraging an object detection tool to scrutinize the complete image and crop it into distinct objects.
Then, it decides to run an image captioning model such as PALI, to obtain a description of each cropped image.
In parallel, the model chooses the most relevant cropped image based on the question, choosing the image of the drum.
Next, it runs Google Lens Image Search API to retrieve the most similar images from a dataset of images with their descriptive texts. This way, it now knows that the drum is a āTaikoā usually used in the āAoi Festivalā
To retrieve similar images to a given one, the current state of the art is performing semantic search, where the most similar vector embedding to the embedding of your image is retrieved based on some sort of similarity calculation (usually cosine similarity).
Now, the LLM reasoner evaluates the information and decides if itās enough to answer the question.
As it still requires the date, it performs an additional search using Googleās Search API, this time including much more relevant info such as the name of the drum and the festival
With the retrieved information, itās now finally able to answer, ā7th centuryā.
And the impressive thing is that humans didnāt intervene one time in this whole process.
Additionally, to help the models learn, they performed a user study where they identified the most common actions humans take in such situations, and defined a transition graph for the model to use.
This way, the LLM planner had to decide the next action, it decided among the actions that users usually take in that situation.
But the question that comes to mind is pretty obvious.
Why is Google so adamant in pursuing the disruption of Internet search?
From an ad-based to API-based
In my humble opinion, Googleās plan is pretty clear.
If AI is clearly going to destroy our business model, letās at least make sure it is us who do it.
Considering the priceless search data that Google holds, itās almost a guarantee to me that Google will eventually deploy priced search APIs for AI agents to consume.
That way, instead of killing their business model, they are simply changing the way they charge for their data.
From Ads to APIs. Pretty genius if you ask me.
š«” Key contributions š«”
AVIS achieves new state-of-the-art AI solutions in complex visual question-answering benchmarks
Itās the first autonomous dynamic planner, allowing it to pivot if necessary
It seamlessly integrates with multiple tools such as image search systems, text search, object detection, image captioning, and with other LLMs, all in one.
š® Practical applications š®
Allows for what I believe to be Googleās future, a new range of search products providing data to AI agents through priced APIs
It allows performing automatic autonomous web searches, liberating the users from having to perform this search manually
š¾ Best news of the week š¾
š§ Study proves that LLaMa 2 is as factually accurate as GPT-4
š„³ Weāre now really close to having the first functional watermarking AI detector
š„ø Googleās AI agent Duet will cost as much as Microsoftās Copilot, $30
š„ Leaders š„
Last week, I sent a poll to gather opinions regarding who will win the AI race, open-source or private AI.
Knowing the answer to this seems like the trillion-dollar question for everyone, as foreseeing the victory of one or the other will dictate future investments in the AI markets and influence all strategic decisions regarding AI, be those made by the CEO of a Fortune 500 company, the Secretary of State of the US, or you.
Unsurprisingly, the poll I sent was met with quite contrarian views, which got me thinking that this has no clear-cut answer.
Or does it?
In fact, after many hours of reflecting on the topic, Iām convinced one of the two stands to win, and it may not be the answer you expect.
Today, weāre deep-diving into the case for proprietary and open-source claims to the AI throne, in order to elucidate which of the two is more likely to, eventually, win.
The case for Open-source
Open-source AI occurs when the weights and design of an AI model are opened to the public.
Famous cases include Metaās LLaMa models, initially leaked, now not only publicly available but also for commercial use.
The romanticism associated with the open-source movement is clear: By making the model openly available, youāre democratizing knowledge regarding the most powerful technology ever invented (probably).
But what does it take for open-source to win?
Well, four factors play an important part in this decision: Data security, Model Control, Training methodology, and Regulation.
Data security
Everything is about data today. Itās the most precious asset for companies who gather it.
Companies have become extremely zealous about protecting their data, and companies have realized that their unique data is a critical source for training the most powerful technology of them all: Artificial Intelligence.
It started with Twitter, and soon moved on to other companies like Stack Overflow or, notoriously, The New York Times.
In fact, CEOs are scare-straight regarding getting their data stolen, according to a recent survey by PwC, both in the short and long term.
Now, with the surge of Generative AI, companies need to face data leakages, as employees with access to confidential data may, mistakingly or not, provide this data to models like ChatGPT to help them with their work, only for that data to then be used to training future versions.
Then, with the proper prompt injection mechanisms, one can easily lure these conversational systems into leaking that data to third parties, something that Samsung already knows a thing or two about.
With open-source models, however, thatās not an issue because the model is stored in your own servers, meaning that your precious data never actually leaves the realms of your company.
And that leads us to the next appetizing thing about going open-source: control.
Having all under control
When using a model like ChatGPT, your only way to access it is through an API endpoint.
In other words, you have no visibility of what the model looks like, how many parameters it has, or else.
Naturally, this entails several issues:
Youāre totally subdued to the API pricing set by OpenAI
Fine-tuning is basically limited to style changes and an improvement in quality, but not to cost reduction. Quantization, efficient fine-tuning like QLoRA, and others are out of the question.
You have almost no control over what data is used in the model, meaning that OpenAI is completely free to alter the behavior of its model
In fact, any recurrent user of ChatGPT over the last six months may have noticed what Stanford and Berkeley researchers pointed out, GPT-4 is actually worse now than six months ago.
With open-source, itās basically the wrong way around. Everything is under your control.
Furthermore, another great reason to believe in open-source is the actual training methodologies.
Weāre simply getting better
Over the course of the last few years, we have gotten immensely better at training LLMs.
In fact, current āsmallā LLMs are already far superior to older, large models, and weāre already seeing open-source models give proprietary models a run for their money, as examples like Orca prove by beating the almighty ChatGPT-3.5 in average accuracy in reasoning benchmarks, despite being 10 times smaller:
The reason for this is clear.
As LLMs tend to be very underfitted, researchers are getting better at maximizing the quality/size ratio, meaning that weāre maximizing the performance of a model per size, even beating larger models.
For that, distillation, the process of using a larger, teacher model to teach its representations to a smaller model (basically teaching it to imitate its responses) will remain a very active field of research over the next years.
But the biggest elephant in the room is, without a doubt, regulationā¦ or the lack of it.
European winds bring a smelly future for OpenAI
Itās no secret how zealous the European Union is regarding the protection of usersā privacy on the Internet and the rights of someone to withhold the use of its data without his/her consent.
The fears of the EU cracking down on models like ChatGPT and Claude, trained basically with āallā the Internetās data - no questions asked - has prompted the surge of investments in companies promising LLMs that arenāt based on private data.
The investment mania has led to situations like French start-up Mistral, where a company training models solely based on public data has raised $113 million while being just four weeks old (and no product, obviously).
Crazy valuations are mostly justified based on the gold rush of the 21st century, GPU-hoarding, but more on that later.
Considering all the elements discussed, seems like a pretty good case for open-source, right?
Well, I suggest you hold your horses for one second.
Subscribe to Full Premium package to read the rest.
Become a paying subscriber of Full Premium package to get access to this post and other subscriber-only content.
Already a paying subscriber? Sign In.
A subscription gets you:
- ā¢ NO ADS
- ā¢ An additional insights email on Tuesdays
- ā¢ Gain access to TheWhiteBox's knowledge base to access four times more content than the free version on markets, cutting-edge research, company deep dives, AI engineering tips, & more