- Bringing Images Back to Life, by Google
Bringing Images Back to Life, by Google
🏝 TheTechOasis 🏝
Breaking down the most advanced AI systems in the world to prepare you for your future.
5-minute weekly reads.
🤯 AI Research of the week 🤯
There are times when AI feels like magic.
And this is one of those times.
The gif you’re seeing below was actually an image that was brought “back to life” by Google Research.
And that dragging movement you see? That’s actually me, as you can try the model for free by inducing the motion you desire.
In short, Google has presented a new method that predicts the movement of oscillating objects from still images, turning them into videos of that object moving.
Welcome to 2023, when AI meets magic.
The Universal Approximator
When humans try to predict movements around us, we leverage complex physics laws, mathematical equations, and so on.
But do we really need to provide those laws to AI to help it learn to predict them?
Firstly, that would be unfeasible, as when studying nature’s oscillations you have to take into account too many variables: wind, water currents, respiration, or other natural rhythms.
Luckily, as neural networks are universal function approximators, they do not require learning those laws and principles directly, they can induce them indirectly through observation, just like a human would.
In layman’s terms, you can teach a model to predict oscillations by simply looking at them.
Therefore, this model wasn’t trained with the laws of physics and mathematics, but by simply observing videos.
That’s what makes Deep Learning unique and so powerful, given the right data, it can learn anything.
The importance of frequency
Next, another insightful tweak the researchers made to train the model was to do so in the frequency domain.
In other words, instead of the model predicting the movement of the object in the time domain (in second 3 the object will be here, in second 6 here) the model did so in the frequency domain (the movement of the object at 0.2Hz is this, at 0.6Hz this).
If we want to bring an image back to life with a moving object, we need to predict the position of that object at consequent time frames, generate a new image for every time step, and concatenate it, thus creating a video where the object moves.
That means that, in the time domain, for every pixel in the first image, we need to predict the position of that pixel 2T times, ‘T’ being the number of new frames we are generating (which is a lot if we want the video to be smooth) and 2 for requiring ‘x’ and ‘y’ coordinates for every pixel.
However, as this study proved 15 years ago, oscillations seen in nature are primarily composed of low-frequency components.
In other words, we just need to predict movements at low frequencies, meaning that the number of variables to predict falls dramatically because we can ignore high-frequency motion components.
As you can see above, the primary frequencies for the motion in terms of amplitude are those between 0 and 3 Hz, meaning that the overall motion is heavily dependent on those.
So what does the model look like?
From Image to Video
As with any AI video model, the basis is always an image.
The model, for a given image, predicts the future positions of all pixels in the image across time (although modeled in the frequency domain, remember).
Once we have the positions of all pixels in each timestep, we feed this information to an image generation model that synthesizes each frame, creating the video.
Looks very complicated, but it’s actually a very similar process to how diffusion models for generating images work with a Variational AutoEncoder (VAE) and a denoising module (in fact, they are using Stable Diffusion here).
But instead of generating an image that portrays what the person requested in the text, here the model predicts a series of motions across several frequencies.
Put simply, instead of generating a new image, the diffusion model predicts the positions of pixels in an image in future time steps, thus predicting the oscillation of the object.
Then, these ‘S’ motion textures expressed in the frequency domain are ‘reversed’ back into the time domain (my pixel ‘x’ is in position (‘y’,’z’) in time ‘t’).
Finally, we use these motions to create ‘future’ frames based on them:
Taking giant leaps
Google once again shows us that the potential of AI is boundless.
Bringing an image back to life was nothing sort of ‘impossible’ a few years ago, so how exciting is it to think about how many new things are going to be made available to us in the next few years?
Now, try it for yourself here!
Google takes a natural image and turns it into a video where the most salient objects move in AI-predicted patterns
The model works in the frequency domain, reducing the number of variables to model
Proves Neural Nets’ unique value proposition to approximate any motion without actually understanding the laws of physics, but predicting them indirectly
🔮 Practical implications 🔮
Interactive dynamics. Create interactive videos that allow the user to move the image on command
Looping videos. Take natural images from the Internet and turn them into looping videos, ideal for marketing and personalized content
👾 Best news of the week 👾
😍 Introducing Copilot for Windows 11, a magical experience
🥇 Leaders 🥇
This week’s issue:
Copilots are Here: The Biggest Transformation in History for White Collar Jobs
When talking about AI today, the first thing that comes to mind is ChatGPT and the myriad of chatbots that are becoming the norm in our lives these days.
But even as impressive as chatbots like ChatGPT are, their impact on our lives becomes a mere anecdote if we compare it with what Copilots will cause.
And Copilots are here. Finally.
This week we are embarking on a journey to decipher what Copilots are, how they look on the inside, the canonical principles of how they work, and why you should be going crazy for them.
Additionally, we will proceed to envision the future of these already-futuristic solutions, as they take the form of AI companions, the raison d’etre of what AI was meant to be all along.
If you understand Copilots, you will be much better prepared to understand how the lives of millions of people are about to change, including yours, in just mere months.
ChatGPT is just the tip of the iceberg
With Generative AI (GenAI), the world was presented with the first truly functional series of foundation models, the technological achievement that underpins last year’s Cambrian explosion of AI solutions.
Supported by the universal adoption of Transformers as the ‘de facto’ architecture, humans figured out a way to train AI using billions upon trillions of data points (text in most cases).
This led to the creation of models that had ‘seen it all’ and, thus, were capable of performing multiple tasks, even those not previously trained for.
Consequently, a new breed of AI systems emerged: general-purpose models, models that, much like humans, perform brilliantly across a manifold of tasks.
This innovation took its ultimate form with ChatGPT, humanity’s first time where experts and laypersons were baffled alike.
But it was only the beginning.
The Great Job Displacement
Suddenly, AI, mainly used for analytics purposes, became a general-purpose science that could support human users in multiple text-based tasks, becoming a productivity machine.
Of course, this has scared many, including those most unexpected.
What do you think the richest man on the planet, Elon Musk, and the CEO of OpenAI, Sam Altman, have in common?
Besides co-founding OpenAI, they also share a common view regarding one of the most communist-looking proposals ever: a basic universal income to combat the huge job displacement that AI will cause.
Wait, what? Getting paid no matter what for doing basically nothing?
If we analyze history, they both should be wrong. As proven by the graph below and by Harvard, technology doesn’t destroy jobs, it transforms them.
Net jobs aren’t destroyed, people get newly-created jobs adjusted to the changing times.
Source: US Bureau of Economic Analysis
But many fear that this time could be different.
The reason is simple, velocity.
The velocity at which AI could displace people from their jobs could outpace the speed at which new jobs are created.
And, as predicted by McKinsey or OpenAI, the outlook isn’t good.
The former estimates that 13 million people, in the US alone, will have to switch jobs by 2030.
The latter showcases that 80% of currently existing jobs are somehow (directly, indirectly, or totally) exposed to general-purpose technologies (GPTs), aka foundation models, and portrays a rude awakening for white-collar workers, claiming they are, for the first time in history, more exposed than blue-collar workers to this technological shift.
And that statement was probably directed at you, my friend.
Elon and Sam’s proposal doesn’t seem that far-fetched now, huh?
However, the current state of AI is completely unprepared to achieve these predictions.
Put simply, this huge job displacement won’t be driven by ChatGPT.
Because ChatGPT lacks two things that prevent it from going beyond a mere knowledge-gathering/assistant tool:
It lacks the capacity to take action,
and more importantly, it lacks the capacity to plan and iterate
These two issues alone have a considerable impact on the real utility of GenAI.
In fact, despite the huge hype AI is receiving this year, having the fastest-growing software product in history in terms of customers with ChatGPT, the user metrics for AI products are actually horrible.
According to Sequoia Capital, one of the leading venture capital firms in the world, the one-month retention of GenAI products isn’t great.
Actually, far from it.
Only ChatGPT manages to surpass 50%, and falling drastically behind social media products.
But things get worse if we look at the DAU/MAU ratio, the ratio of monthly active users that use the product daily.
Yes, we must take this with a pinch of salt as this only shows mobile app usage (I personally think ChatGPT is mainly used at the desktop level) but the ratio is still actually horrible for ChatGPT, only managing to have a rickety 14%.
Amazingly, this utility gap that ChatGPT has is easily salvaged with Copilots.
And these solutions aren’t a thing of the future, they are here, as Microsoft is launching its Office 365 Copilot as soon as next week.
We will delve into much detail now, but at a glimpse…
Copilots have the power to change the way knowledge workers have worked during the last decades.
They are going to change our daily routines. They are going to influence your decision-making.
Ultimately, they are going to change you.
So, what really is an AI Copilot then?
Subscribe to Leaders to read the rest.
Become a paying subscriber of Leaders to get access to this post and other subscriber-only content.
Already a paying subscriber? Sign In