- TheTechOasis
- Posts
- LLaMa 2's release changes the GenAI landscape
LLaMa 2's release changes the GenAI landscape
š TheTechOasis š
In the biggest news of recent months, Meta has launched the second version of its world-famous LLM, LLaMa, and in the process has also released its first chatbot, LLaMa-2-Chat.
But this isnāt your ordinary ālook how cool our new LLM isā type of release, Meta is actively trying to change the AI narrative.
Forever.
In fact, I would say that this release could indeed change AI permanently, and kickstart an era where AI access and knowledge are, finally, democratized.
The Open-Source era.
A new state-of-the-art for open-source
First things first, if something becomes clear after reading the 70+ page paper is that LLaMa 2 is incredibly good.
Trained with a 40% larger dataset, it also doubles its context window to 4k tokens (3,000 words approximately).
Quality-wise, as shown in the image below, the LLaMa 2-Chat 70-billion-parameter model beats basically everyone it competes against, being slightly better than ChatGPT despite being much smaller, and is unequivocally superior to any other open-source model.
Naturally, itās still inferior to GPT-4 (not shown in the image), but weāre talking about a model that could be easily 20 times bigger, so that isnāt surprising.
But the main protagonist of this research isnāt how good this model is, but how much detail theyāve put into explaining how it was trained.
Meta has dropped several gems that are too good not to talk about.
Safety first
The first ādifferentā thing Meta did here is to optimize for helpfulness and harmlessness separately, creating whatās possibly the safest chatbot there is right now.
To do this, they trained LLaMa-2-Chat using two reward models instead of one.
Letās go over this beautifully drawn diagram:
Training a GenAI chatbot involves four steps:
First, we train a base model by optimizing it to predict the next token in a sequence of text. This is the most expensive part because involves ingesting the āentireā Internet text into the model.
This pretrained model is then fine-tuned with a curated dataset of {prompt, desired answer} pairs. Dubbed ābehavior cloningā by OpenAI, the model here learns to behave in a desired way. This is the first version of LLaMa-2-Chat.
Next, we want to optimize the model against human preferences, like following instructions, while becoming much less prone to harmful responses. Using a copy of the previous model, we āchop offā its prediction head to, instead of predicting the next word in a sequence, output a scalar value to how good a response was to a certain prompt, according to human preferences. This is called the Reward Model (RM).
Finally, we train LLaMa-2-Chat against this reward model with the objective of maximizing the reward. In other words, the chatbot learns to write responses to prompts that yield the highest value possible according to the RM.
And that gets you the final LLaMa-2-Chat model.
But if you watch the image carefully, youāll realize that Meta created two RMs, one optimized for helpfulness, the other for safety.
This is a first in AI.
The reason for this is that optimizing for safety (making your model safer to use) normally affects how useful your model is.
Consequently, training the model with two RMs resulted in no statistically relevant loss of helpfulness while becoming very safe to use.
A clear win-win.
But thatās not all, as they introduced another innovation called Ghost Attention (GAtt).
GAtt makes your model remember
Attention is a critical element in LLMs. Itās the way they understand the relationships between words, and the better this mechanism is, the better the model is, period.
Sadly, the longer the text sequence, the harder is for the model to remember the first parts of the sequence.
Thus, if you request the model to āact as Napoleonā in the first prompt, by the 20th turn the model will most probably have forgotten that instruction.
With GAtt, this changes, as they fine-tuned the model to pay specific attention to instructions and remember them across the complete conversation, as you can see in the image below:
The GAtt model (right) clearly remembers the initial instruction and continues to provide emoji answers even though the user doesnāt necessarily request it.
This is very exciting, as instruction following is a cornerstone of a useful chatbot, and enforcing those instructions across the conversation so well is something that most chatbots canāt do right now.
Thus, GAtt is here to stay.
But the biggest announcement of them all came a few days later.
Microsoft and Meta are friends now
In a following press release, Meta announced that they were making LLaMa-2-Chat not only apt for commercial usage, but they were also making it accessible through Microsoftās cloud, Azure.
And this is huge, as enterprise customers can now not only leverage ChatGPT through the Azure cloud, they are now able to access LLaMa.
But the key thing here is that LLaMa 2 is actually downloadable, which means that customers will be able to install it in their own, private servers, closing forever the security risks that entail sending your data to OpenAI or Anthropic servers, wherever they are.
Consequently, LLaMa-2-Chat could become the first truly widely used chatbot for enterprise use cases, which essentially means that maybe, in the end, Yann LeCun, Chief Scientist at Meta, could be right:
āOpen-source will eventually win the AI race.ā
Key AI concepts youāve learned today:
- A new king in the open-source arena
- The Helpfulness-Safety tradeoff
- Ghost Attention
š¾Top AI news for the weekš¾
š«£ Stanford argues ChatGPT has changedā¦ for the worse