• TheTechOasis
  • Posts
  • The Golden Gate Model, AI's First Real Scam?

The Golden Gate Model, AI's First Real Scam?

🏝 TheTechOasis 🏝

part of the:

Welcome to the newsletter that keeps you updated on the latest developments at the cutting edge of AI by breaking down the most advanced systems in the world & the hottest news in the industry.

10-minute weekly reads.

💎 Big Announcement 💎

Finally, I am officially launching my community, TheWhiteBox, tomorrow!

TheWhiteBox is the place for high-quality, highly curated AI content without unnecessary hype or ads across research, models, investing & markets, and AI products and companies. With TheWhiteBox, we guarantee you won’t need anything else.

  • If you signed up for the waitlist, you will have already received the invitation or will receive it as late as today. However, you must accept the invitation you received in your inbox to materialize it.

  • If you were already a Premium subscriber, you have been automatically added, too, while enjoying full access to all spaces and content. You also must accept the email invite you received.

If you haven’t joined yet, click below for a 14-day free trial on the monthly subscription.

🚨 Week’s Update 🚨

This week, we have exciting news from Anthropic, Cohere, OpenAI, Google, & more.

One of the worst news coming from OpenAI last week was the departure of their superalignment lead, Jan Leike. Leike then published a thread stating the reasons for his departure, mainly that OpenAI was no longer a safety-first company.

Well, on Tuesday, he announced he was joining Anthropic to do the exact thing but, hopefully, with real commitment from leadership.

Superalignment efforts hope to find a way to control future models that will be smarter than humans, something we need to figure out before we actually build them to prevent rogue models.

In NBA terms, this is the research scientist equivalent of Lebron James joining the Celtics.

Similarly, OpenAI announced the creation of a new Safety Board and, in the same press release, announced they had started training their new frontier model (GPT-5 or whatever it’s called).

It’s great news that the new frontier is finally under training. And based on Mark Chen, the frontier models lead in OpenAI’s tweet, we should expect the model to be released around November this year.

In parallel, a few days ago Helen Toner, the ex-board member who precipitated Sam Altman’s firing back in November 2023, finally broke silence.

Continuing on AI research labs, Cohere announced Aya 21, a new multilingual LLM that will help bridge the gap between the common-used languages like English or Spanish and other lesser-known ones, a problem that today prevents a huge part of society from leveraging GenAI.

The Canadian firm seems to be very involved in open-source, which is laudable. That said, it’s reportedly valued at 227 times projected earnings, meaning they should get their shit together before it’s too late.

Moving on to the hyperscalers, Microsoft announced some interesting new features for Copilot, a product that, as discussed last week, is far from delivering on its great promise.

In particular, a new ‘Team Copilot’ feature will help the AI act as a team member, a similar concept we saw with Google and its AI Teammate last week.

Will we need project managers in a few years? Dunno’, but certainly not in their standard form.

It’s funny to see how Microsoft, Google, and Amazon are slowly converging into the same company; a cloud and AI B2B provider.

Moving on, did you know it’s not allowed to turn right at a red light in New York? I might have saved you an unpleasant surprise next time to go to the Big Apple.

But fear not, as NVIDIA has announced LLaDA, an LLM-based driving assistant that analyzes the car’s surroundings and helps the driver navigate countries and cities with unfamiliar signs and laws.

The convergence between cars and MLLMs is becoming popular, with Tesla openly studying it, with WAYVE already giving the controls of the car to a MLLM, and now NVIDIA with a more ‘down-to-Earth’ approach that could be extremely useful in unfamiliar driving scenarios.

Finally, we are ending with a bang.

I want to highlight this fascinating research by Fei-Fei Li’s Stanford Lab and Google Research, ‘Going from Anywhere to Everywhere,’ which takes an image and generates an infinite set of 3D scenes departing from it. The link includes several examples that are worth the click.

Fei-Fei Li is a powerhouse figure in AI. Her team built ImageNet, one of the most important datasets in AI’s history by being the dataset used by AlexNet, the model that kick-started the Deep Learning revolution.

Text-to-3D is still in a very nascent phase, but creating models that better understand spatial features of our world is thought to be crucial to take AI into the next level. I highly recommend this TED talk by the women herself on this particular topic (she uses very understandable language).

🧐 You Should Pay Attention to 🧐

  • AI’s First Big Scam?

  • Anthropic’s Great Breakthrough

💀 AI’s First Big Scam? 💀

Last week, I discussed Coffeezilla, a famous YouTuber focused on uncovering scams, and his two-part series about the Rabbit R1, an AI-based hardware conceived as a ‘smartphone killer.’

Four months ago, the product had a stellar demo launch that impressed many, including Satya Nadella, Microsoft’s CEO, who called it the ‘most impressive demo since Steve Jobs’ iPhone Unveiling.’

However, Coffeezilla claims it’s all a lie in what could be the first grand scam in AI, based on the fact that, according to the YouTuber, the AI behind it ‘is not AI,’ which, funnily enough, as you’ll see in a minute, is not true either.

As Coffeezilla clearly isn’t aware of the existence of neurosymbolic systems, which is precisely what the R1 is—or is claimed to be, I thought I could shed some light on this whole thing and avoid confusion.

What is the R1?

The Rabbit R1 is a $200 piece of hardware communicating with its user through voice commands. Although it has a screen, its main use is clicking a side button and commanding your request through your voice.

The tool allegedly allows you to order Ubers, Door-dashes, or pick your next song or playlist in Spotify all by voice, which is a highly appealing idea indeed.

But how is this possible? Well, behind this very orange-looking hardware is a large action model, or LAM.

Unlike standard Large Language Models (LLMs), the LAM has a unique way of executing commands.

  1. First, the user sends a voice command to the R1.

  2. The R1 interprets its meaning similarly to a standard LLM.

  3. However, next, instead of creating an API request to send to the aforementioned applications, it interacts with its interfaces directly, just like a human would, but in a virtual environment.

In other words, it has a ‘super host’ cloud-based system you can’t see that displays the apps’ interfaces to the LAM so that the model interacts with the virtual screens just like a human would, executing the order.

  1. Then, the LAM displays the summary on the screen, allowing the user to verify the execution's correctness.

Long story short, the R1 promises to automate mundane tasks that would require a smartphone by simply using your voice.

Seeing all this, people went crazy, to the point the company has generated, according to Coffeezilla, $10 million in revenues since launch.

Of course, the entire product and value proposition depend on the LAM. So, how does it work?

The Elegant Promises or Neurosymbolic AI

As confirmed by Rabbit’s CEO, the LAM is not an LLM, but a neurosymbolic AI model.

In other words, it’s a combination of a neural network, used to interpret the voice command and the model’s response back to the user, and human-crafted, pre-fixed scripts that guide the actions of the LAM over the app screens.

And that’s what Coffeezilla got wrong. As a part of the model is human-written code, he claimed this principle ‘wasn’t AI’, which is false.

It is.

In fact, neurosymbolic AI is becoming increasingly popular, with examples such as AlphaGeometry by Google Deepmind. In this case, they trained an AI that reached the 85% percentile, a Gold-medalist level, in geometry theory proving among Olympian-level humans.

Succinctly speaking, a neural model suggested different auxiliary constructs to simplify the problem. Then, human-crafted reasoning engines (human-written code) executed the calculations.

In layman’s terms, simply put, why use a neural network to calculate Pythagora’s theorem given two sides of a triangle when you can use a calculator that does it perfectly and with 0% chance of mistake?

Thus, the neural network suggests the solution path, aka narrows the problem from infinite possibilities to a bunch of possible solutions, and the entirely scripted math engines are executed to see if the solution is correct.

According to Coffeezilla’s thought process, this wouldn’t count as ‘AI,’ which is false.

What’s more, as the Rabbit team acknowledged, other AI companies like Adept use a similar approach: a neural prior for interpretation and a client-level agent for execution.

Consequently, saying that the Rabbit team are scammers by saying this isn’t AI is simply not true. But that doesn’t mean the product isn’t a scam.

Overpromising and underdelivering

The Rabbit R1 might be a scam because, as Coffeezilla also mentioned, this LAM model doesn’t even exist.

Despite being allegedly deployed already, the product works horribly, and, as developers who got access to the code mentioned, the LAM is just a bunch of ‘if/else statements’ using the Playwright web extension.

If that’s the case, the problem isn’t that they are following a neurosymbolic approach. This isn’t neurosymbolic AI; this is a glorified Robotic Process Automation (RPA).

In other words, even the slightest change to the UI’s interface confuses the model, which has a very specific and unflexible execution process. In other words, neurosymbolic AI systems are flexible and ‘smart’, this is not.

And here is where the potential scam lies: If the LAM isn’t what the company says it is, it’s a scam.

But let me be clear: calling this a scam because it’s a neurosymbolic system with some ‘dumb’ parts means you do not understand how AI works.

What We Think

The reality today is that the R1 product works horribly, period. It’s buggy, slow, and it simply doesn’t work.

Based on the results of the Humane AI Pin, it’s clear that AI hardware founders are falling into the same traps that other industries in Silicon Valley; they confuse moving fast and selling half-baked products that more or less work with the promise of future greatness, with shipping crappy, overpromised and underdelivered products.

Honestly, considering people have paid $200 for a product that doesn’t work, I would feel scammed, too.

But let’s not use R1’s limitations to tarnish the reputation of the very promising field of neurosymbolic AI, AI systems that work wonderfully, provide a great deal of transparency and interpretability to the reasoning processes of AI systems, and learn much, much faster.

😦 Anthropic’s Big Discovery 😦

This week, Anthropic has published the biggest leap in frontier AI model understanding we have ever seen.

But why? Well, we run into a weird conundrum with current frontier AI models: we know they work, but we don’t know why, and worse, we don’t know how they think.

Indeed, we have great intuition about how neural networks learn, a topic I recently wrote about in my blog. However, with models growing into architectures with billions of parameters, they are still a complete black box, to the point that the name of my company, TheWhiteBox, is inspired by this problem.

However, over the last year, an AI field known as Mechanistic Interpretability has grown heavily in interest and has a clear goal: demystifying the models that could, one day, give us AGI… before it’s too late.

Now, Anthropic, OpenAI’s main rival, has released a paper generating a lot of enthusiasm in space. It is a beautiful paper that gives us new ways of understanding Large Language Models (LLMs) and sheds light on how we could soon steer behavior to prevent unsafe practices.

The Ultimate Black Box

So, firstly, what is mechanistic interpretability?

In simple terms, this field aims to identify the key patterns of knowledge a network possesses and how they are related to its parameters, in order to predict its behavior.

As you probably know by now, neural networks like ChatGPT are made out of neurons (although the appropriate name is ‘hidden units’) located in the ‘hidden layers’ below:

When prompted, these neurons fire (or not), and combining these elements helps the model generate data, such as a sewing learning guide or a narration of the battle of Gettysburg.

Simply put, the model's output depends on how these neurons combine. However, demystifying these combinations, a seemingly harmless activity, is one of the great unsolved problems in AI today.

But why?

At their core, Generative AI models are data compressors. In other words, they are trained to compress data.

They are fed training data three or four orders of magnitude (at least x100) larger than the model and tasked with learning and regenerating it. Thus, they must learn a compression of that data that allows them to, when prompted, ‘recover’ the original data despite being much smaller.

As the model is much smaller, rote memorization is not an option. In fact, that would turn the model into a database of the same size of the original training data, which is a pointless exercise; what we want is a small generative model that still represents the entire data and can be queried about it.

Thus, it’s forced to absorb only the key patterns in the training data. As I’ve mentioned many times, this is the reason researchers around the world think LLMs are the holy grail of AI; the impressive act of compression (just keeping the essential) these models achieve is a clear sign of their intelligence.

But what do I mean by learning the essentials?

For example, instead of learning every sentence by heart, humans learn about grammar and syntax, aka how words are written and how they commonly follow each other to extrapolate that knowledge into new sentences without requiring rote memorization.

This is precisely what LLMs do too, but we can’t really explain why or how, which is the point of mechanistic interpretability.

But why am I telling you all this?

Simple. This amazing compression capability means that their smallest information compression and storage units, the neurons, are polysemantic.

In other words, each neuron becomes knowledgeable in a wide range of semantically unrelated topics. For instance, the same neuron may fire whenever the model generates Shakespeare's poetry and when writing about tropical frogs.

Sadly, this makes the idea of individualizing each neuron to uncover ‘what they know’ and, thus, mapping the entire network’s knowledge, an impossible task.

Luckily, in October 2023, Anthropic made a huge discovery: while neurons are unequivocally polysemantic, certain combinations are monosemantic.

In layman’s terms, although the outcome of a model was unpredictable based solely on one neuron’s behavior, whenever a set of neurons fired together, the outcome was always the same. In other words, the same neurons fired together when the model generated text from a particular topic.

However, the issue was that the discovery was limited, as the neural network they managed to study was very, very small, meaning that monosemantic neuron combinations were a field of promise but not reality.

However, after Anthropic’s new research, it’s a reality now.

A Sonnet of Hope

A few days ago, the same team that discovered monosemantic neuron combinations released a paper in which they applied the same principle to Claude 3 Sonnet, their midrange production frontier model, which is currently the ninth-best LLM in the world.

But what did they do specifically?

They analyzed the model’s activations and trained a parallel model that transformed them into interpretable features.

In layman’s terms, they trained a model that looked at how neurons in Sonnet fired and predicted what key abstract features the combination of these neurons represented, creating a ‘feature map’ that researchers could interpret. In other words, they found a set of world concepts that occurred for a certain combination of neurons.

For instance, they discovered a feature specifically for San Francisco’s “Golden Gate Bridge”:

Importantly, these features were multimodal (reacting to both words and images related to that monument) and multilingual (reacting to the same concepts in other languages).

Additionally, they found features about famous people, monuments, arts, etc.

To achieve this, they trained a sparse autoencoder, a model that took the activations and turned them into real-life features (aka a model that finds patterns such as ‘if neurons fire in a certain way, they are usually referring to Lebron James’) and then tried to reconstruct the activations back.

Autoencoders work so well because, by forcing models to reconstruct the original data, they must learn how that data is distributed originally, which is equivalent to truly understanding the data.

Crucially, they forced this model to be sparse, meaning that a combination of neurons should only yield a handful of features. That way, they could pinpoint the exact real-life concept these neurons were ‘referring’ to.

But why does all this matter?

Simple, because by knowing how neurons combine to generate data related to a specific topic, we can predict its behavior or, crucially, steer it.

Using the same Golden Gate Bridge example, they clamped that specific combination (forcing these specific neurons to activate more intensely), which forced the model to think it was the Golden Gate Bridge:

Of course, that is simply an interesting response, but the key is that they proved that we could actually “clamp up” or “clamp down” specific topics at our pleasing.

But what does all this mean to the industry?

Predictable yet Censorable?

Unsurprisingly, researchers found numerous undesirable features (lying, deception, power-seeking behavior, or even violent responses).

Therefore, we could eventually ‘dial down’ or even prohibit such neuron combinations so that the model never generates that sort of data, no matter how insisting the user is. This is crucial considering how easily jailbreakable models today are.

For instance, while the model may reject a harmful request in English, it may comply in languages that have not been ‘aligned’. As features are multilinguistic, by prohibiting certain neuron combinations, the model will refuse the harmful prompt independently of the language.

In a more futuristic event we manage to create superhuman models; knowing how they process data could help us discover new breakthroughs, but it could also help us learn how to control and prevent these models from going rogue.

However, as with any breakthrough in AI these days, we also have the other side of the coin.

If our most powerful models are controlled by for-profit, agenda-driven corporations, these companies could use these capabilities to censor certain ways of thinking, manipulating or gaslighting society to see the world in a way that benefits the interests of such corporations or governments.

🧐 Closing Thoughts 🧐

The importance of safety in AI is gaining ground despite OpenAI. Anthropic’s discovery is a critical breakthrough for the industry, and signing Jan Leike exacerbates this view that they care about safety much more than the former.

However, AI’s recent flops in consumer hardware, even reaching the level of potential scams, signal how early and unprepared society is for what’s coming but also highlight how much room for improvement these systems have before becoming ubiquitous.

Long story short, if you’re reading this, you are still very early to the party, and that’s great news for you.

Do you have any feelings, questions, or intuitions you want to share with me? Reach me at [email protected]