TheTechOasis
Posts
LLaMa 2's release changes the GenAI landscape

LLaMa 2's release changes the GenAI landscape

Ignacio de Gregorio Noblejas
July 23, 2023

🏝 TheTechOasis 🏝

In the biggest news of recent months, Meta has launched the second version of its world-famous LLM, LLaMa, and in the process has also released its first chatbot, LLaMa-2-Chat.

But this isn’t your ordinary “look how cool our new LLM is” type of release, Meta is actively trying to change the AI narrative.

Forever.

In fact, I would say that this release could indeed change AI permanently, and kickstart an era where AI access and knowledge are, finally, democratized.

The Open-Source era.

A new state-of-the-art for open-source

First things first, if something becomes clear after reading the 70+ page paper is that LLaMa 2 is incredibly good.

Trained with a 40% larger dataset, it also doubles its context window to 4k tokens (3,000 words approximately).

Quality-wise, as shown in the image below, the LLaMa 2-Chat 70-billion-parameter model beats basically everyone it competes against, being slightly better than ChatGPT despite being much smaller, and is unequivocally superior to any other open-source model.

Naturally, it’s still inferior to GPT-4 (not shown in the image), but we’re talking about a model that could be easily 20 times bigger, so that isn’t surprising.

But the main protagonist of this research isn’t how good this model is, but how much detail they’ve put into explaining how it was trained.

Meta has dropped several gems that are too good not to talk about.

Safety first

The first “different” thing Meta did here is to optimize for helpfulness and harmlessness separately, creating what’s possibly the safest chatbot there is right now.

To do this, they trained LLaMa-2-Chat using two reward models instead of one.

Let’s go over this beautifully drawn diagram:

Training a GenAI chatbot involves four steps:

First, we train a base model by optimizing it to predict the next token in a sequence of text. This is the most expensive part because involves ingesting the “entire” Internet text into the model.
This pretrained model is then fine-tuned with a curated dataset of {prompt, desired answer} pairs. Dubbed ‘behavior cloning’ by OpenAI, the model here learns to behave in a desired way. This is the first version of LLaMa-2-Chat.
Next, we want to optimize the model against human preferences, like following instructions, while becoming much less prone to harmful responses. Using a copy of the previous model, we ‘chop off’ its prediction head to, instead of predicting the next word in a sequence, output a scalar value to how good a response was to a certain prompt, according to human preferences. This is called the Reward Model (RM).
Finally, we train LLaMa-2-Chat against this reward model with the objective of maximizing the reward. In other words, the chatbot learns to write responses to prompts that yield the highest value possible according to the RM.

And that gets you the final LLaMa-2-Chat model.

But if you watch the image carefully, you’ll realize that Meta created two RMs, one optimized for helpfulness, the other for safety.

This is a first in AI.

The reason for this is that optimizing for safety (making your model safer to use) normally affects how useful your model is.

Consequently, training the model with two RMs resulted in no statistically relevant loss of helpfulness while becoming very safe to use.

A clear win-win.

But that’s not all, as they introduced another innovation called Ghost Attention (GAtt).

GAtt makes your model remember

Attention is a critical element in LLMs. It’s the way they understand the relationships between words, and the better this mechanism is, the better the model is, period.

Sadly, the longer the text sequence, the harder is for the model to remember the first parts of the sequence.

Thus, if you request the model to “act as Napoleon” in the first prompt, by the 20th turn the model will most probably have forgotten that instruction.

With GAtt, this changes, as they fine-tuned the model to pay specific attention to instructions and remember them across the complete conversation, as you can see in the image below:

The GAtt model (right) clearly remembers the initial instruction and continues to provide emoji answers even though the user doesn’t necessarily request it.

This is very exciting, as instruction following is a cornerstone of a useful chatbot, and enforcing those instructions across the conversation so well is something that most chatbots can’t do right now.

Thus, GAtt is here to stay.

But the biggest announcement of them all came a few days later.

Microsoft and Meta are friends now

In a following press release, Meta announced that they were making LLaMa-2-Chat not only apt for commercial usage, but they were also making it accessible through Microsoft’s cloud, Azure.

And this is huge, as enterprise customers can now not only leverage ChatGPT through the Azure cloud, they are now able to access LLaMa.

But the key thing here is that LLaMa 2 is actually downloadable, which means that customers will be able to install it in their own, private servers, closing forever the security risks that entail sending your data to OpenAI or Anthropic servers, wherever they are.

Consequently, LLaMa-2-Chat could become the first truly widely used chatbot for enterprise use cases, which essentially means that maybe, in the end, Yann LeCun, Chief Scientist at Meta, could be right:

❝

“Open-source will eventually win the AI race.”

Yann LeCun

LLaMa 2 Paper

Key AI concepts you’ve learned today:

- A new king in the open-source arena

- The Helpfulness-Safety tradeoff

- Ghost Attention