Neuralink Deep Dive, How Does it Work?

šŸ TheTechOasis šŸ

Breaking down the most advanced AI systems in the world to prepare you for the future.

5-minute weekly reads.

TLDR:

  • AI Research of the Week: Answering the Real Question, How Does Neuralink work?

  • Leaders: EVO, the First Biological Foundation Model that Could Change the World

šŸ’Ž This Weekā€™s Sponsor šŸ’Ž

In the age of AI, I see a future where everyone builds their own AI systems, be that your personal LLMs or your own personal developer like Devin.

Nonetheless, youā€™ll desperately need information on your competitors to drive better decisions. Or you might need data to evolve your skills and stay competitive in the labor market.

One way or another, you are going to need data.

Bright Data is the leading web platform to optimize your public data gathering and processing, from scraping data from the Internet at scale, to harnessing insights from your very own competitors, or identifying key market trends.

Whatever your data use case, Bright Data has got you covered. Start your free trial today by clicking below!

Web Intelligence, Unlocked

With Bright Data's proxies, you can unlock and analyze extensive datasets at scale for ML/AI innovations in Ecommerce, Travel, and Finance. Leverage our flexible, scalable services to drive data-driven decisions.

šŸ¤© AI Research of the week šŸ¤©

The moment we all saw Trevor playing chess with the mind, we knew something had changed.

A few days ago, Neuralink presented its first real patient testing its brain chip. The user played chess, or even Mario Kart.

Overall, an impressive and life-like demonstration of the hugely optimistic future that awaits people with severe disabilities like Trevor, who is a quadriplegic.

And while the exact intricacies of the system arenā€™t fully disclosed, what we know, combined with cutting-edge AI research in Brain-Computer Interfaces (BCIs) is more than enough to give us great insight and understanding of how Neuralink works.

Welcome to Pattern World

Whenever you see an AI product, no matter how complex or secretive it is, it all boils down to the same principle: finding patterns in data.

There are two ways of doing so: generative or discriminative.

Generative AI models

Although the definition is in the name, itā€™s not quite as obvious. Yes, generative models generate data, but Iā€™m not here to state the evident but to help you understand how they truly work.

Simply put, generative models learn the distribution of the training data and sample from it. In other words, they figure out the statistical correlations in the data to maximize the likelihood of being capable of replicating it.

In essence, this is what ChatGPT is, a set of parameters that maximize the likelihood of generating data that is similar to the training data.

In practice, for a given sequence, they give you a statistically reasonable continuation of that sequence. In other words, ā€œbased on my training data, what is the set of next words more likely to be ā€˜similarā€™ to what I saw during training?ā€

At this point, you would be correct to ask, ā€˜But then, wouldnā€™t that make ChatGPT a simple autocomplete incapable of generating new stuff?ā€™

And more importantly, wouldnā€™t that lure the model to always write the same texts, which is not ideal considering that languages allow you to send the same message using different words?

And you would be totally right. Consequently, you make the process stochastic, or random, to a certain level.

For every new word it predicts, the model has to choose among all the words in its vocabulary.

For that, it builds a probability distribution, assigning a probability to each word, no matter how small it is, so that all words in its vocabulary (upwards of 50k) are ranked.

At this point, the obvious thing is to choose the one with the highest, but as you want the process to have some sort of variability, you use algorithms that sample among the top-k words in a completely random fashion.

As there are very high chances that the top 5 words are reasonable continuations, the model still generates accurate sequences of text while still showing variability, allowing for creative processes to flourish.

You can tune this randomness with the ā€˜temperatureā€™ variable, which is available for most generative models, so that you can tune the ā€˜creativityā€™ of the modelā€™s responses.

Although Iā€™ve used ChatGPT as an example, this idea of learning the patterns in data to be capable of generating similar data applies to every single generative model, be that an image generator like Stable Diffusion or a video world simulator like OpenAIā€™s Sora.

But data replication isnā€™t always the objective. Sometimes, like in Neuralinkā€™s case, you simply want to discriminate.

Mapping the World

Funnily enough, generative models are fairly new in the grand scheme of things regarding AI. Initially, most models were discriminative, specifically classification systems.

The idea is pretty simple to understand too, your objective is to figure out the patterns in the training data that allow you to classify it.

Thus, instead of learning how to sample from a learned data distribution, as ChatGPT does, here the objective is to learn a data distribution too but, instead of sampling new content from it, classify its data points.

For instance, if you have a dataset of animal images, you might want to build an AI classifying model that tells you what animal it is for any given new image.

Of course, you can apply this to humans, and you get all these face recognition systems that are becoming more common these days.

And you can also learn brain patterns and classify them into actions, which is precisely what Neuralink does.

From Thoughts to Action

What I am about to tell you might sound too futuristic, but it is what it is.

Succinctly put, what Neuralinkā€™s brain chip does is transform human thoughts into action.

Wait, what?

Sounds completely impossible, I know, but itā€™s not that counterintuitive. And itā€™s not impossible, but a reality already.

Specifically, the chip measures the electric impulses generated in the brain for a given thought and uses AI to map that thought into an action.

Seeing this through the lens of the discriminative/generative AI framework we discussed earlier, itā€™s not that different from an AI that tells you if an image has a dog or not, the only thing that changes here is the data.

But how is this possible?

Based on recent research by the likes of Stability AI, and more importantly, Stanford, we get a nice intuition of how Neuralink works.

As for the former, the idea is to build Mindeye, which is capable of reconstructing previously seen images based on brain data.

In other words, we are capable of reading peopleā€™s minds by measuring changes in blood flow to brain areas with increased neuronal activity.

Source: Stability AI

To do so, you record the fMRI scans of people while they are being shown several images. That way, we can record their brainā€™s signals reacting to each specific image.

Then, when asking those same humans to think about those images, we can use the newly generated brain data to decode it back into actual images, which will be very similar to the original images.

Although MindEye is a really funā€”and impressiveā€”brain decoding model, if we really want to understand how Neuralink works, we need to focus on Stanfordā€™s research.

Non-invasive brain decoding

Going by the name of NOIR, a group of Stanford researchers recently built a Brain-Robot Interface (BRI) AI model that records brain signals in real-time and allows the human to control a robot arm to perform a variety of actions, a very similar outcome to the ones we are seeing with Neuralink, but moving a robot arm instead of the cursor in a computer.

For a visual example, check the following short video

But how does it work?

Unlike MindEye, which relies on fMRI data, data that is not valid for brain chips like those of Neuralink due to its low temporal resolution (it would take seconds to decode each new action), NOIR relies on Electroencephalography (EEG), a non-invasive method that measures electrical activity along the scalp produced by the firing of neurons within the brain.

Source: Top Doctors

Specifically, it focuses on two types of EEG data:

  • SSVEP: Represents the brainā€™s exogenous response to external visual stimulus. In this case, the brain reacts by generating periodic electrical activity at the same frequency as a flickering visual stimulus.

In other words, if you watch an LED light flickering at 10 Hz (ten times per second) your brain will produce SSVEP signals at the 10Hz frequency to match the external stimulus.

  • Motor Imagery (MI): Unlike SSVEP, which reacts to external factors, MI is purely endogenous, requiring the patient to mentally simulate specific actions, such as imagining oneself manipulating an object.

That may sound very abstract, but letā€™s simply what it means. NOIR works as follows:

Source: Stanford

To perform an action, NOIR breaks any human intention into a three-question framework: ā€˜What?ā€™, ā€˜How?ā€™, and ā€˜Where?ā€™

For the ā€˜whatā€™, we use SSVEP signals to identify what object the user is focusing on.

Specifically, they use an AI segmentation system (OwL-ViT in this case) an AI model that will take visual data of the environment and segment it into different objects.

Even though the image below is Metaā€™s Segment Anything model, the output is very similar, obtaining a group of masks to segment objects in a given frame.

Turning images into a set of masks. Source: Meta

OwL-ViT is a transformer-based object segmentation encoder. In other words, just like ChatGPT, it uses the attention mechanism to understand the distributions of data (in this case for images instead of text) to learn how to segment them into objects.

But why do we do this?

As we recall, SSVEPā€™s frequency will match the external flickering frequency of the object the user is focusing on.

So what do we do? Well, we assign a different frequency to each mask the segmentation model gives us, like the ones above.

What is the flickering frequency of an objectā€™s mask?

For each mask, we assign it a frequency. If the flickering frequency of a objectā€™s mask is 5Hz, that means that that mask will ā€˜flickerā€™ 5 times per second.

Then, as the user focuses its attention on a particular object, the recorded SSVEP signals will adopt the frequency of the mask of the specific object.

This way we now know what object the user is paying attention to.

The traditional method was to assign LED lights to each object, which is of course totally unfeasible in real-life environments. Here, they simply make the segmentation model output the masks in different flickering frequencies, avoiding having to use flickering LED lights.

Moving on to the ā€˜howā€™ and ā€˜whereā€™, we now need the user to imagine the actual action, thus requiring Motor Imagery EEG signals.

As the user imagines the action, we record these signals and, using several techniques like Common Spatial Pattern (CSP) and Quadratic Discriminant Analysis (QDA), we discriminate the data to find the desired action.

Not going into the details, the thing is that MI data is very noisy and highly human-specific, making the decoding exercise to understand what movement the user is imagining very hard.

Thus, you use different techniques to discriminate the recorded data and determine the actual action.

In essence, CSP helps us by applying a transformation to the data so that we can more clearly discriminate each action based on the recorded signals, and QDA finds the boundaries between the different classes in the data in order to perform the actual discrimination.

For in-depth detail, this video gives nice intuition about that CSP is. For QDA, check that previous link.

An example of the effect of applying CSP filters to data.

As you can see above, the new channels explain the variance of each movement (blue and red) much more clearly.

Going back to NOIR, as shown in the different videos, it allows the user to think of certain actions that the robot arm accurately executes.

These systems allow humans using them to perform several actions, up to 20 different ones, common to our daily lives like collecting stuff or even petting your dog.

Understanding this, you will now have a much clearer idea of how Neuralink works.

Just like any BCI model, Neuralinkā€™s chips understand the underlying patterns in the human brain signals and decodeā€”mapā€”them into specific actions.

However, there's a twist.

Invasive BCIs

Unlike MindEye or NOIR, which are non-invasive methods, meaning they do not interact with the actual brain tissue, Neuralinkā€™s brain chips do.

Thus, Neuralink can be regarded as the first truly successful and, more importantly, potentially scalable invasive method for brain signal-action decoding.

Best of both worlds

When you work with non-invasive methods, thereā€™s always a trade-off.

Regarding fMRI, the spatial resolution is impressive, to the point that the accuracy of the measurements has millimeter precision.

On the flip side, the temporal resolution is terrible, it takes seconds to decode each signal.

In the case of EEG, the method used in NOIR, the temporal resolution is almost instant, meaning that you can decode almost every single movement.

Yet, the spatial resolution is not great, as thereā€™s a lot of noise between the actual brain signals and the electrodes, as between both you have the userā€™s hair, the scalp, and ultimately the skull.

With invasive methods, you get both, high spatial resolution and high temporal resolution.

The reason for this is that the electrodes are implanted, employing surgery, into the actual brain tissue.

In Neuralinkā€™s case, these are ultra-fine threads thinner than human hair, which are intended to be inserted into the brain to monitor neural activity with minimal damage to brain tissue.

As one neurosurgeon told me once, for every brain surgery, the moment you minimally interact with brain tissue, the human changes forever. For that, Neuralinkā€™s object is to interact as little as possible with the tissue, as any possible mistake can have disastrous consequences.

In this case, the threads are so thin that they have to be inserted by a robot, not by humans.

So how does it work?

The actual implant receives the brain signals generated by the userā€™s thoughts, and these signals are decoded into actions using the AI systems (we can guarantee this), in a very similar way to NOIR.

Although we donā€™t fully know the actual techniques, the essence is the same, the AI model learns to classify brain signals into actions.

However, invasive methods add one extra critical feature that makes Neuralink really stand out.

Brainwriting

Unlike the two previous research methods, Neuralinkā€™s implant can write data onto the brain, thanks to the fact itā€™s directly implanted into the brain tissue.

This feature alone implies huge implications for several reasons:

  1. Restoring Sensory Inputs: Invasive BCIs can be used to restore sensory inputs that have been lost due to injury or disease, such as vision, or hearing.

  2. Prosthetics control. In the videos weā€™ve seen until now, the user simply interacts with computer interfaces, thus requiring no feedback. But in the future, we might want these systems to control prosthetics. In that case, receiving sensory feedback back into the brain is crucial.

For example, if the prosthetic arm touches a very hot surface, you might want the implant to send that sensory information back into the brain to avoid touching it again.

  1. Treating Neurological Conditions. Electrical stimulation of specific brain regions can be used to treat various neurological conditions, including Parkinsonā€™s disease, epilepsy, or even anxiety and depression.

  2. And many others such as Enhancing Brain Functions, like memory and decision-making.

And we can even get more futuristic. In the future, we could even have telepathy, as people with different implants could communicate data through a closed network.

A Gargantuan Leap for Humans

At this point, itā€™s tempting to think that in a few decades, most disabilities wonā€™t exist anymore. Even blindness is poised to be cured, described by Elon as the next big step for the company.

Naturally, these systems also open ethical considerations, like the fact that we could potentially telepathically communicate with others, considerations we are completely unprepared for as a society.

But you canā€™t help but feel extremely happy for people like Trevor, people who lost everything and who, thanks to AI and companies like Neuralink, now have a real reason to wake up every morning looking forward to their future.

šŸ‘¾ Best news of the week šŸ‘¾

šŸ„‡ This week on Leadersā€¦ šŸ„‡

This week we are discussing one of the most exciting breakthroughs Iā€™ve seen in AI for a while: EVO, the first biological foundation model that might change the course of history.

Sounds farfetched, but I am not the only one equally as excited. When the researchers called out they wanted to hire people for this project, this is what the legendary investor and Y Combinator, Paul Graham, had to say:

Comparing a nascent research lab with two of the most important companies of the twentieth century is quite a statement.

But why are some of the brightest minds of our time so excited about EVO?

Besides the fact that it might be the precursor of a complete revolution ranging from gene editing for illness curation to predicting gene essentiality, EVO could also be a generational leap at the tech level, being the first model that truly breaks the Transformer monopoly.

For a more detailed yet easily understandable deep dive, click below.

Do you have any feelings, questions, or intuitions you want to share with me? Reach me at [email protected]