• TheTechOasis
  • Posts
  • Wayve Announces Lingo, a Car that Explains what it Does and Why

Wayve Announces Lingo, a Car that Explains what it Does and Why

šŸ TheTechOasis šŸ

Breaking down the most advanced AI systems in the world to prepare you for your future.

5-minute weekly reads.

TLDR:

  • AI Research of the Week: Wayveā€™s Lingo-1, the Model that Speaks Back to You

  • Leaders: The Truth about AIā€™s Silent Killer

šŸ¤Æ AI Research of the week šŸ¤Æ

In the near future, you might be able to speak with your car.

Huh?

Let me digress.

One of the coolest, if not the coolest, applications of AI is autonomous driving.

Until last year, autonomous driving systems were considered the most advanced AI models in the world.

But, despite this fact, they are far and wide the least trusted models in the world. And according to Forbes, not only this trust issue isnā€™t decreasing as models get better, but itā€™s actually increasing up to 55% from last year alone, to the point that only 9% of Americans trust them.

Out of the many reasons for this distrust, one of the most common ones mentioned is the lack of transparency in the modelā€™s decisions.

Now, Wayve, a UK company that specializes in this sector, has announced their LINGO-1 model, a Vision-Language-Action (VLA) model that can explain and reason the decisions behind driving actions taken by the autonomous driving system.

As youā€™ll see below, the model reacts to real-life observations and is able to predict what the actions of the car will be based on those observations.

And not only thatā€¦ you can actively chat with it.

But first and foremost, just why?

The Transparency Problem

Put simply, current autonomous driving systems are far away from what most people would think of when talking about autonomous vehicles.

According to the Society of Automotive Engineers, or SAE, there are six levels of autonomous driving, ranging from 0 to 5.

Level 5 means that the car is in complete control and responsible independently of the situation. From Levels 0 to 2, the human is still responsible.

And when thinking about autonomous driving vehicles, almost everyone thinks about Tesla.

However, to this day Tesla is still stuck in Level 2, and only a considerably small group of cars has managed to move onto Level 3, a feat recently achieved by Mercedesā€™ EQS sedan and S-classes.

Level 3 means that, in some very specific situations, the car can be put in fully autonomous mode. Still far away from the goal.

But most of the time, the trust issues humans show are far less rational than we may think. Itā€™s most often pure irrational fear, just like people fear airplanes when itā€™s far and wide the safest transportation system.

The numbers speak for themselves.

ā

There is 1 in 1.2 million chance of dying in a plane crash. This number drops to 1 in 5,000 when talking about cars.

Consequently, the idea behind a ā€œtalking carā€ isnā€™t only about reducing car accidents, itā€™s also about giving humans a sense of control by making the carā€™s decisions more transparent.

But how does it work?

 A Language and Vision Moā€¦ and Driving Model Too

The Lingo-1 is an AI model trained on three types of data:

  • Language data

  • Images data

  • Driving data

Whatā€™s more, these types of data have been aligned using the roadcraft method, the same method that driving instructors have used over decades: while driving, saying out loud what they will do so that the student can learn by imitation.

Thus, Wayve has assembled a dataset built by expert drivers commenting on their travels, synchronizing the language, the images, and the embodiment data (sensors, actuators, and so on) so that the model understands the relationships between them.

The result is a model that takes in embodiment data in real time and can perform visual question answering on tasks such as perception, planning, counterfactuals (reasoning over multiple possible ā€œwhat ifā€ scenarios), and so on.

Source: Wayve

In other words, LINGO-1 is a model that can interpret driving situations and reason about them.

All this is great, but does it actually work?

Commenting and Chatting

Luckily for us, the model has already been tested in real-life scenarios, where the model cleverly interprets whatā€™s ahead of it and explains the decisions behind what itā€™s doing.

But as I said earlier, it can also describe what itā€™s seeing.

These capabilities not only give it the capacity to explain its decisions, but they also have the potential to improve the understanding of the model in unseen situations.

As language can provide nuances to describe open scenery better, the model can use its language representations to generalize into new driving situations.

In fact, the end game is to build a model that enhances the capabilities of the autonomous driving foundation model, in proposed architectures such as the one below, where the explanations of the language model are incorporated into the driving policy (the decision-making system) to enrich the context and improve the outcome.

Though they havenā€™t published how they built the model, itā€™s clear that itā€™s based on a combination between an LLM and an image encoder.

Yes, we are already seeing multimodal models with no image encoder like Adeptā€™s Fuyu, but this is a very new, and somehow unproven, architecture.

Also, considering that this language model (in red) isnā€™t the core model for Wayve (blue), and considering how expensive building these models is (they have ā€˜onlyā€™ raised $258 million, a small amount if you want to compete with the OpenAIs and Anthropics of the world that are casually raising billions every odd month) they probably have gone with the grafting method, using pre-trained image encoder and LLMs to build this model.

All in all, AI has proven once again that the limits of what we can build with it are only imagination bound.

But what do you think?

Does this ā€˜talking carā€™ idea appeal to you or not?

Login or Subscribe to participate in polls.

šŸ«” Key contributions šŸ«”

  • LINGO-1 is a first-of-its-kind VLA model trained to reason and plan driving actions in real time

  • Itā€™s built on a carefully curated driving data set that allows it to explain driving decisions in complex situations

šŸ”® Practical implications šŸ”®

  • The cars of the future could incorporate these ā€˜talking systemsā€™ to provide an extra layer of trust to the model driving autonomously

  • It could also be used in other autonomous systems like robots to improve explainability in an industry that is in dire need of it, especially when the modelā€™s actions impact the physical world

šŸ‘¾ Best news of the week šŸ‘¾

šŸ„ø OpenAI announces it will release an AI image detector that is 99% accurate

šŸ˜ Adept revolutionizes multimodal architectures with Fuyu

šŸ˜… ChatGPT bug shows that models shout at each other

šŸ˜Ÿ This robot painter that will makes painters two times more efficient

šŸ„‡ Leaders šŸ„‡

The Truth About AIā€™s Silent Killer

The AI industry looks unstoppable.

Hailed by some as the most important technology ever created by humans, people will look back at the end of 2022 as the moment humanity finally realized the potential this tech, or should I rather say science, has to dramatically change society.

But will it?

Many are convinced that no regulation or arbitrary decisions can cripple the growth of the technology, but the reality is that AI is really walking on shaky grounds.

OpenAI is chaining lawsuit after lawsuit, with a diverse set of plaintiffs that include George R.R Martin, author of Game of Thrones, and Sarah Silverman, while big tech companies are lobbying the hell out of Washington and Brussels to tame the lion before it bites.

In particular, the EU looks poised to be a profound pain in the butt for these companies, with its AI Act spearheading the efforts to regulate AI.

The fears are such that Microsoftā€™s Copilot, a 30 dollars license product that embeds a distilled version of ChatGPT into every Microsoft product known to man, has such a hefty price partially justified by Microsoft to cover the potential fines of using such technology in lieu of their customers.

Additionally, although later denied, rumors had it that OpenAI could leave the EU were regulations to bite.

But any set of countries that were to cripple AI would surely suffer from that decision as other countries embrace it, right?

No politician would do that! They would get stomped by the people, right?

ā€¦ Right?

Well, actually they may be even incentivized to do so.

Fear Drives Humans

A recent study by IE University, one of the top business schools in the world, revealed that 68% of Europeans want their governments to introduce rules to safeguard jobs from AI advancement.

In other words, fuck progress, I just simply want to protect my job.

Despite being a technology that is already capable of detecting Alzheimerā€™s disease in brain scans or detecting Parkinsonā€™s disease by simply looking at the eyes of the patient, people first and foremost want to protect themselves, and that includes jobs.

And what is the AI industry doing to calm down society?

Well, besides making it worse, not much more.

OpenAI released a report that stated that around 80% of the US workforce was somehow affected by AI, with 19% having more than 50% exposure.

In laymanā€™s terms, one in every person in the US market could soon see half of its activities completely taken by AI.

McKinsey jumped onto the sad train and claimed that by 2030, 30% of current work activities would be already automated, causing more than 12 million people in the US alone to change jobs.

To make matters worse, many voices in Silicon Valley, including the likes of Sam Altman, CEO of OpenAI, and Elon Musk, have openly called for the deployment of Universal Basic Income.

In other words, many tech tycoons that are completely in the line of fire not only agree with McKinsey, but they feel that most people wonā€™t be able to switch jobs and will be, unequivocally and timelessly out of the market.

Somethingā€™s off if some of the most capitalist people in the world are actively requesting basic income.

And considering they are the guys and gals behind the steering wheel of this technology, no surprises then when we see laypeople get scared.

But, honestly, I donā€™t think AI is going to be stopped based on these terms.

Geopolitics and AI

If we take Europe as an example, I as a European can only but watch in sadness as the continent becomes more and more irrelevant.

For instance, as Scott Galloway explained, a European start-up is much more prone to fail than an American one, despite Europe having some of the best schools and talent in the world.

Europe is losing ground by the day as the US looks stronger than ever despite the political turmoil and the BRIC countries are developing at insane speeds despite some fears of collapse.

Itā€™s no surprise then that despite the extremely hawkish regulation that Europe intends to install for AI, public leaders like Macron are actively demanding ā€˜European AI championsā€™ and launching a $500 million AI fund.

Three weeks later, a three-week-old AI Frency company, Mistral, raised $113 million at a quarter of a billion valuation.

Coincidence? You tell me.

But even in the case that Europeā€™s stance continued to be hawkish, it would only be implemented if the US backed the hell out of it.

And today thatā€™s not the case. As outlined by this article, the USā€™ posture will most probably not be focused on regulation, but simply on litigation, as it always has been in the US.

But if thatā€™s the case, OpenAI and others are in a far, far worse position, as the problem here is much harder to solve.

In fact, it hasnā€™t been proven to be solvable.

The Transparency Problem

As mentioned, in the US experts think that the most probable outcome will be that AI issues will be solved not by regulation, but by litigations.

If someone feels negatively impacted by an AI, they sue the company behind it. Simple.

The problem?

With todayā€™s understanding of the AI algorithms running behind the likes of ChatGPT, Claude, or DALL-E, the companies behind these models have lost even before the litigation starts.

ā€˜Your black box model discriminates meā€™

Just picture yourself as a judge.

A plaintiff has sued an insurance company that doesnā€™t want to cover the accident this person had with his insured car.

The reason?

A super fancy and advanced AI algorithm saw the image of the impact and concluded that the fault was of the plaintiff and, thus, the insurance company wasnā€™t going to back him.

Naturally, as a judge, you will ask the insurance company to explain how the model reached that conclusion.

The answer? ā€œWe donā€™t know, the model is a black box so we really canā€™t thoroughly explain how it made that decision.ā€

Now, what would you do? Of course, the insurance company is paying that man.

Put simply, we have no fricking clue, today, how current state-of-the-art AI models behave.

They are at most times unpredictable, and they even manage to develop new capabilities we canā€™t even see coming (which is why we describe them as ā€˜emergent capabilitiesā€™).

Hence, without answering the ā€˜great questionā€™, most AI companies would soon run out of money as they have to deal with guaranteed-to-lose litigations all across the board.

But why canā€™t we explain models?

From statistics to decisions

Although we absolutely understand how they compute every calculation they do, we canā€™t predict why neurons activate and in what situations.

But what is a neuron?

Most current top systems at the center of the discussions are neural networks, networks of millions or even billions of interconnected elements we define as neurons that, ironically, are simple weighted linear calculations that coarsely resemble human neurons.

Each of these neurons has an activation function, a step-function formula that determines if that neuron activates for a certain prediction or not.

This activation function is a required element for neural networks to learn non-linear tasks. As neurons are weighted linear combinations, without this activation function neural networks would only be able to approximate simple, linear tasks.

In other words, some neurons fire in certain situations, like when ChatGPT writes in German, and others will activate for ChatGPT to write in Shakespeareā€™s style.

Therefore, the solution seems clear, right?

To explain how a neural network makes a decision, we simply need to see what neurons were activated in that prediction and, as we know what makes these neurons ā€˜fireā€™ ā€” thatā€™s literally the term, like in neuroscience ā€” we will know the reasoning process.

Taking the car crash example, if the neurons that fired only explicitly fired in cases where the crashed car has signs of being at high speed when crashing, the insurance company can now not only explain the thought process of the model, it has also gained incredibly valuable insights from the image, i.e. that the insured man was traveling at a very high speed.

But, as it turns out, this is easier said than done.

As proven by research, neurons are polysemantic, meaning that they fire to multiple, unrelated situations.

In laymanā€™s terms, a neuron can fire to write a poem in Russian and to predict the impact of the Ukrainian War on oil prices.

Unsurprisingly, most researchers who tried to gain an understanding of how models made decisions and based their assumptions at the neuron level miserably failed.

But a recent study might have the key to uncovering the holy grail of AI, model explainability, by introducing a new concept that might as well change completely how humans understand AI.

Subscribe to Full Premium package to read the rest.

Become a paying subscriber of Full Premium package to get access to this post and other subscriber-only content.

Already a paying subscriber? Sign In.

A subscription gets you:

  • ā€¢ NO ADS
  • ā€¢ An additional insights email on Tuesdays
  • ā€¢ Gain access to TheWhiteBox's knowledge base to access four times more content than the free version on markets, cutting-edge research, company deep dives, AI engineering tips, & more