- TheTechOasis
- Posts
- OpenAI has its biggest week in 6 months
OpenAI has its biggest week in 6 months
š TheTechOasis š
Breaking down the most advanced AI systems in the world to prepare you for your future.
5-minute weekly reads.
TLDR:
AI Research of the Week: OpenAI Releases DALL-E 3 and Changes Image Generation Forever
Leaders: Understanding ChatGPTās Biggest Upgrade Ever, Building the Sensorial Computer
š¤Æ AI Research of the week š¤Æ
Itās pretty clear by now that OpenAI makes announcements in a big way, as this week has come packed with revelations from the star-studded AI company.
Among those, the most interesting by far is the release of DALL-E 3, the new version of their image-generating model.
However, the reason itās interesting is not really the fact it generates better imagesā¦ but how it does it.
This time, DALL-E 3 is built on top of ChatGPT, and that changes everything.
In fact, they have explicitly claimed that DALL-E 3 intends to be the nail in the coffin of what could become the shortest-living hot job in history, prompt engineering, begging the question:
Will AI eradicate AI jobs just as quickly as normal ones?
Source: OpenAI using DALL-E 3
Refining what state-of-the-art means
When working with current image generation models like MidJourney or Stable Diffusion, one has to deal with many problems.
Are you listening to me?
First and foremost, current image generators, models that take a text description and generate the image that portrays that description, are terrible at following instructions.
In most cases, they wonāt respect structure, meaning that if you require a certain object in a certain part of the image, simply forget about it.
Yes, models like ControlNet allow us to condition a model to generate an image that resembles a certain structure or, for instance, a given scribble:
Unlike other image generators that start the denoising process from a purely random point, ControlNet helps you retain a certain structure by parting from a much less random and clearer image (top-left), so the image generation model has a much better idea what it needs to generate
But ControlNet is only great if youāre an artist, but those like me who arenāt, have to deal with the tendency of these models to obviate critical parts of your description and, most often, will require extensive tailoring of the prompt, a concept known as prompt engineering, to end up with the ideal image.
But lack of instruction following isnāt the greatest problem.
The Great Impersonator
Just like Ferdinand Waldo Demara became a legendary impostor portrayed by Tony Curtis in The Great Impostor, image generation models have had their fair share of notoriety in the press for imitating several artists who, put mildly, were not happy about it.
But imitated artists arenāt the only ones unhappy about these models, because unless youāre living under a rock, you have seen AI-generated photos of Donald Trumpās arrest such as the one below:
These images are funny but can scar the people portrayed heavily.
And considering how voice synthesizers are becoming scarily good at imitating the desired voice, one can easily create fake imagery at a scale that could seriously damage the reputation of the people portrayed.
And you donāt have to be famous to get portrayed with these models. The targeted person could be you.
However, I myself tried to generate images from famous people and was quite successful with it.
Luckily for those people, I donāt do it in bad faith, but you know how the world works.
But if thereās one thing that current image generators suck at is at generating text.
What theā¦ never mind
This is what MidJourney, my image generator of choice, gave me when I asked it for a logo displaying āIām a 90s kid!ā:
If you watch carefully, it has represented 90s imagery, giving the image a nostalgic aura, but the textā¦ never mind.
Long story short, current models are really not that useful. But now, DALL-E3 has erased all these problems in one go.
Leveraging synergies
OpenAI has decided that it wants to create the first truly useful image generator, and for that, they have used none other than the crown jewel of the AI industry, ChatGPT.
Put simply, DALL-E 3 has been built natively on top of ChatGPT.
This means that, from early October, you will be able to directly ask ChatGPT to generate images (if youāre a paying subscriber).
Besides the UI appeal of this feature (you can now chat and generate images on the same screen) the real disruption comes when we read between the lines of OpenAIās announcement.
Really, they are not only seamlessly adding DALL-E 3 to the user interface, but they are actively engaging ChatGPT in the process, a monumental change.
But how?
The intelligent middleware
In laymanās terms, the new image generation procedure will have ChatGPT as an active participant that suggests ideas and enhances your prompts behind closed doorsā¦ so that the ending result fits your needs.
In other words, ChatGPT will act as your prompt engineer.
And adding this to the fact that DALL-E 3 will already be great at following instructions, gets you this:
Source: OpenAI
Unlike previous image generators, DALL-E 3 will actually obey your commands and portray objects where they should be.
Sounds too good to be true, but itās here guys and gals.
Itās here.
Adding to this, again leveraging ChatGPTās world representations, DALL-E 3 has gotten a reasoning upgrade toward knowing what prompts it should accept or not.
Specifically, DALL-E 3 is now trained to ignore prompts that ask for public figures or to resemble artists, thanks to OpenAIās great efforts with Reinforcement Learning from Human Feedback.
You can think about this complete solution as one that leverages ChatGPT for two things:
Prepare the best prompts possible
Actively acknowledge when something shouldnāt be generated to protect artists or public figures
Overall, a home run for ChatGPT users.
But we still have to address the elephant in the roomā¦ is this the beginning of the end for prompt engineering?
A quick ride to death
For months, prompt engineering was hailed as the next big job for humans, AI whisperers that maximized the results when interacting with these models.
But as many research papers have proven, LLMs are actually far better than humans at crafting prompts, so DALL-E 3 could signal the first time that humans are thrown out of the picture completely.
This underpins a truly potential case for AI being capable not only of eradicating traditional human jobs, but also devouring jobs that have been born thanks to AIā¦ just as quickly.
Predicting the future of the human labor market has just gotten quite more complex and scary, donāt you think?
š«” Key contributions š«”
DALL-E 3 represents the first image generator that doesnāt require prompt engineering to generate images that adhere to the user description
It leverages ChatGPT to be more āself-awareā of what it can generate or not, becoming much more safe to use
š® Practical implications š®
Marketing campaigns have become something only limited by imagination, and logos, brands, and Internet imagery are now much easier to create, democratizing real art
We should see an explosion of self-service solutions from B2B and DTC companies, and human customer support is seeing its last days
Prompt engineering wonāt become a job in itself, as LLMs will close the expertise gap for us
š¾ Best news of the week š¾
š Legendary creative John Ive & OpenAI discussing creating the āiPhone of AIā
š Future driverless cars will speak to occupants using GenAI
š Metaās new Ray-Ban smart glasses are out of this world
š¤© Mistral releases its first model and makes it totally free
š„ Leaders š„
This weekās issue:
ChatGPT gets its biggest enhancement ever and becomes a different beast
OpenAI has decided it really wants to take over the world.
Built upon the foundation of superstar talent, OpenAI continues to mark the direction of the AI industry, the most advanced technology humans have ever created.
That is no small feat, and the rapid way they have managed to transform the world is hard to describe with words, but slightly better with graphs.
Source: Statista
Five days to reach one million users. Nothing more to add, your honor.
But this week OpenAI has really shaken the grounds of the industry with an announcement that takes ChatGPT miles closer to Artificial General Intelligence, or AGI.
Standing on the shoulders of giants like Yoshua Bengio, Yann LeCun, Ashish Vaswani, and many others, OpenAI has just given us a glimpse of the world weāre heading into.
Today weāre going to go through all the recently added features, and how they have built this solution based on the hints given by the GPT-4V System Card.
By the end of this read, you will not only understand ChatGPTās new impressive formā¦ but also how to leverage it.
Finally, we will reflect on the views of some of the brightest minds of our time like Andrej Karpathy to comprehend that nothing is what it seems, and that this is the precursor of a new computing paradigm.
Understanding markets
Despite what many people may think, what really sets OpenAI apart from other competitors in the Generative AI market isnāt the technology.
Yes, GPT-4 still remains the most advanced model in the industry, but this isnāt what allows them to lead.
The democratization of this technology into tangible value for customers does.
Donāt underestimate the importance of UIs
What OpenAI understood from the get-go, thanks to the leadership of what might be one of the most influential venture capitalists in the last two decades, Sam Altman, was the importance of creating value.
In fact, the technology behind ChatGPT, the Transformer architecture with an added layer of Reinforcement Learning from Human Feedback (RLHF), was far from new.
For reference, the standard Transformer architecture was released in 2017, and so was RLHF.
But it wasnāt until November 2022 that the world finally realized what Generative AI meant for our society.
However, was it really the technology that caused this disruption, or a clear, simple-to-use UI that put this technology in the hands of the masses?
The answer is pretty clear.
They gave everyone, for free, an interface to conversate with an advanced AIā¦ and thatās it.
No ads. All word-of-mouth organic referrals.
And the world explodedā¦ despite it being a āsimpleā chat interface!
But was it really only that?
Simplicity and time machines
One of the most influential people in social media in terms of understanding companies and marketing is Scott Galloway.
He has a great phrase that reads, āThe easiest way to building billion-dollar companies is to build time machinesā.
For instance:
Amazon gets at your door anything in just one day
Klaviyo, one of the protagonists of the recent IPO frenzy, automates marketing
Uber gets you a car ride home in minutes
E-commerce companies get you your desired new clothes without leaving the house
See the pattern? These companies simply make our lives easier and more comfortable.
With ChatGPT it was the same thing:
Students wrote beautiful essays that fooled professors worldwide
Journalists created boilerplates for their articles in seconds
Lawyers summarized long texts into a few sentences
Good programmers became 10x programmers
But not everything was going to be sunshine and flowers.
User retention soon started to fall, and important ratios like the DAU/MAU, the ratio of monthly active users that are also daily users, were not that great for products like ChatGPT or Claude.
At the end of the day, ChatGPT was only text and nothing more, right?
Well, thatās no longer true.
ChatGPT is now multimodal. Or, to put it more explicitly, ChatGPT is not a chatbot anymore.
Now it is becoming a sensorial computer, and to understand why, first we need to understand how it works.
Subscribe to Leaders to read the rest.
Become a paying subscriber of Leaders to get access to this post and other subscriber-only content.
Already a paying subscriber? Sign In