• TheTechOasis
  • Posts
  • AI as a Tool to Understand Our World & Science

AI as a Tool to Understand Our World & Science

In partnership with

🏝 TheTechOasis 🏝

part of the:

Breaking down the most advanced AI systems in the world to prepare you for the future.

10-minute weekly reads.


  • Leaders: Neural Networks as the Secret for Scientific Discovery

  • Did you know about… One photo & one audio is all you need to create deepfakes now

💎 Sponsored content 💎

Free SOC 2 Compliance Checklist from Vanta

Are you building a business? Achieving SOC 2 compliance can help you win bigger deals, enter new markets, and deepen trust with your customers — but it can also cost you real time and money.

Vanta automates up to 90% of the work for SOC 2 (along with other in-demand frameworks like ISO 27001, HIPAA, and GDPR), getting you audit-ready in weeks instead of months and saving you up to 85% of associated costs.

🥇 This week on Leaders… 🥇

Regarding AI, today’s narrative is all about productivity.

It’s what every single product pitch is telling us: “Here’s your AI companion”, “Look at your brand new work accelerator”, “Become a 10x programmer”, and so forth.

But one exciting trend has been emerging for some time and is picking up speed lately: AI as a tool for scientific discovery. In other words, using the power of Neural Networks (NNs) to uncover the mysteries of the world.

Sounds too good to be true, but there are amazing pieces of research to back it.

And the very same people pushing for this vision have another great plan, a foundation model for science that could change the world’s rate of innovation like nothing we’ve ever seen before.

The Great Algorithm

Although the first signs of neural networks date back to 1943 and the perceptron was discovered back in 1958, it wasn’t until the 2010s with the publication of AlexNet that we managed to prove that the theory behind them actually worked.

But first, what is a Neural Net?

Mimicking the Brain… Kind of

Neural networks are Machine Learning algorithms that were conceived as a way to teach machines to learn by mimicking the functioning of the human brain.

NNs are formed by a set of parameters, unsurprisingly named ‘neurons’, that learn from the data.

In broad terms, by activating and deactivating themselves for a given input, they create different combinations that generate a map between the input data and the predictions we actually want the machine to perform.

Thus, the objective of this learning process is to “learn” this mapping between the inputs and the outputs.

This can be done through stochastic gradient descent variants, in which we measure the loss (the difference between the ground truth and the model’s prediction) and ‘tune’ the parameters of the network by measuring their gradients (indicating the rate of change they have over the loss) to, over time, minimize it.

Broadly speaking, it’s a trial-and-error exercise.

The most common optimization algorithm is not SGD, but Adam.

Today, NNs represent the absolute state-of-the-art of the industry, fueling almost every breakthrough that has taken place over the last 10 years.

But why are they so powerful? 

The Universal Approximation Theorem

The reason behind the success of NNs is the Universal Approximation Theorem, which is "a formal proof that, with enough hidden units, a shallow neural network can describe any continuous function to arbitrary precision."

I won’t get into the technical details (for that, I highly recommend the following book) but the key intuition is that we can use NNs to model basically anything.

In other words, for any given dataset and a set of predictions we want to perform (like using housing data to predict prices) a neural network can find the map, or function, that models this prediction.

This works as long as the data is of good quality and highly expressive (in more technical terms, that the data explains the variance of the prediction).

For example, if you try to predict housing prices with a dataset on the vaquita (Phocoena sinus) animal, the rarest marine species in the world, your model is going to perform terribly.

Overall, NNs work great because they are capable of finding key patterns and relationships in data and using them to make useful predictions based on that data.

This has allowed these algorithms to learn some of the most complex models in the world, like learning to model language like Large Language Models (LLMs), or even reconstructing previously seen images based on brain activity derived from thoughts.

But as powerful—and useful—models like ChatGPT are, the most fascinating feature of NNs is their capacity to discover.

Our Best Discovery Tool

Currently, we are using NNs to:

Among plenty of other examples. But things get even crazier.

In the previous cases, we can actually deduce why this is happening, meaning that NNs serve as an enhancer, an accelerator.

Other times, however, we can’t seem to find an explanation for the model’s discoveries, which clearly indicates that NNs seem to come across patterns that humans simply can’t.

So, could demystifying these NNs allow us to discover new theories of science that give us a higher understanding of our world?

The answer, based on recent discoveries, is absolutely yes, and that’s a complete game changer.

From Deductive to Purely Inductive Discovery

In traditional science, we use some data, usually in the form of statistics regarding certain populations or observations, to deduce a set of theories.

From Kepler to Newton

Then, we use those theories to describe the world (and empirically prove them in the process).

We can take the example of Kepler’s Third Law, stipulating that “the square of the orbital period (T) of a planet is directly proportional to the cube of the semi-major axis (a) of its orbit.”

In layman’s terms, it proved that the further a planet is from the sun, the longer it takes to complete one orbit. Furthermore, thanks to the perfect fit that Kepler’s law had on data, Isaac Newton defined the “law of gravitation” to prove Kepler’s observation.

But what if we could use NNs to enhance this process and, importantly, uncover theories that would otherwise remain hidden?

This is what a prominent group of researchers from the Flatiron Institute to the University of Cambridge, among many other institutions, are betting on.

But to understand how NNs help here, we first need to understand their key role as knowledge compressors.

NNs as a means of compression

One of the reasons NNs seem to work so well is that they are great at compressing the input data. In other words, they can take huge amounts of data and generate a global, compressed, and, importantly, elicitable understanding of them.

For instance, if we take the very recent LLaMa 3 models, they compressed the knowledge from 15 trillion tokens, or around 12 trillion words, which amounts to Terabytes of data, into a ‘weights file’ (the file where the neurons’ values are stored) with a file size getting as low as 16 GB in the case of the 8B model, which we can then query about the data.

LLaMa 3 8B has 8 billion parameters at float16 precision. That means that every parameter occupies 2 bytes in memory. Thus, the total weight amounts to 8×2 = 16 GB.

Simply put, that is three orders of magnitude (at the very least) of compression force, meaning that the knowledge derived from the training data is stored in a file 1000 times smaller.

Surely the compression is not perfect (there’s certainly some information loss) which explains why bigger models can compress more knowledge and, thus, perform better.

But at the end of the day, every neuron in the model captures a lot of knowledge that, in the case of generative models like Llama or ChatGPT, can be regenerated.

Naturally, the fact that the knowledge is so compressed forces every neuron in the model to be polysemantic, meaning that they capture knowledge from several semantically-unrelated topics.

This makes the task of decyphering models, a field known as mechanistic interpretability, a real challenge. Anthropic has great research on this.

Consequently, considering that we can use NNs to capture and compress an enormous amount of insight from data, how can we ‘unroll’ these networks to decipher this knowledge? 

The Next Great Scientific Theory is Hiding Inside a NN

Knowing that Neural Nets seem to stumble upon new insights from data that humans had not detected earlier, our objective is to use them to find those insights and then distill their knowledge into tangible new theories.

In other words, use the insights that Neural Nets discover to assemble our theories that are eventually used to describe our world.

But how can we do that?

One of the most promising ways is through symbolic regression, where we turn a neural network into an analytical expression we can understand by learning the surrogate expression that mimics the function output, as depicted below:

This allows us to take a very complex and opaque model such as this:

And turn it into something we humans can understand:

Therefore, as NNs give us the map between the inputs and the outputs, if we can represent this map with a mathematical expression, we can derive new theories from those expressions.

This sounds too good to be true, but we already have precedents. In the following research, Pablo Lemos et al “rediscovered” Newton’s law of gravitation by simply looking at data with a neural network that replicated the Solar System.

Importantly, the approach did not require any assumptions about the masses of planets and moons or physical constants, which were also self-inferred by the neural net.

In layman’s terms, the neural net had figured out Newton’s law and the concept of mass just by looking at the data.

That’s the key intuition in all of this: By using NNs as discovery tools, they stumble upon crucial findings and our job is to turn them into understandable language.

Of course, rediscovering Newton’s laws is interesting but we knew them already, but it isn’t hard to believe what really new discoveries await us through these methods.

In my opinion, these are the types of breakthroughs researchers and experts were pointing toward when they predicted AI being one of the most, if not the most, important discoveries since fire.

Not productivity tools like ChatGPT. This.

Now, if NNs can learn sciences to the point of unlocking new theories for us, what are we waiting for to build such models?

In fact, a group of prominent researchers from all around the world, from Princeton to Cambridge, are doing just that, creating a multiple-sciences foundational model to kickstart the AI-fueled scientific discovery era.

Subscribe to Leaders to read the rest.

Become a paying subscriber of Leaders to get access to this post and other subscriber-only content.

Already a paying subscriber? Sign In

A subscription gets you:
High-signal deep-dives into the most advanced AI in the world in a easy-to-understand language
Additional insights to other cutting-edge research you should be paying attention to
Curiosity-inducing facts and reflections to make you the most interesting person in the room