• TheTechOasis
  • Posts
  • Orca, an open-source chatbot that rivals the king, ChatGPT

Orca, an open-source chatbot that rivals the king, ChatGPT

🏝 TheTechOasis 🏝

Everyone loves the idea of community-built AI chatbots that match or exceed the performance of privately-owned models like ChatGPT.

The thought is almost romantic; the power of the people defeating the tech giants back at Silicon Valley.

However, as leaderboards like Chatbot Arena prove time and time again, the billions-backed chatbots remain undisputed kings.

But now, ironically one of the tech giants, Microsoft, has presented Orca, an open-source, much-smaller-than ChatGPT model that, using an innovative training method, is the first model ever to look in the eyes of the private models to say, “We’re in the same league now”, as proven in the table below:

Despite being dozens of times (most probably hundreds in the case of GPT-4) smaller than the models it’s competing against, Orca even defeats them in some cases, while completely obliterating what was until now considered the best open-source model, Vicuna.

This raises the question:

“Will open-source actually win in the end?”

A new way of teaching

In an effort to try and match the performance of private, much larger models like ChatGPT, researchers have focused on applying the concept of distillation, the teaching method.

In simple terms, it involves using the responses from one of the big guys (it’s usually GPT-4), to teach a much smaller model to imitate the teacher.

The reason for this is simple.

Although the larger the model the better it’s capable to learn a larger corpus of data and, thus, achieving greater performance, larger models are often considered heavily underfitted.

In other words, you can achieve equally-good results with much smaller models.

Consequently, the ideal AI chatbot creation process among the open-source community tends to be as follows:

  • A very large model that has learned complex representations of huge amounts of data is used as the ‘teacher’. This model has been trained by private companies, like ChatGPT or Bard.

  • A much smaller, around 5 to 15 billion-parameter model is chosen as the ‘student’

  • Then, the student is tasked with minimizing the difference between its outputs and the outputs of the teacher, thereby learning to imitate it

  • This way, a much smaller model is capable of learning the style of the teacher and achieving ‘similar’ results, while being vastly cheaper to run

However, this default procedure has a problem.

While these models learn the style and language coherence of the teacher effectively (with examples like Vicuna, or Alpaca) they fail miserably to capture its reasoning capabilities.

In other words, when evaluated against complex tasks, they greatly underperform the teacher.

But Orca did things a little differently and ended that problem.

An inexplicable result

One of the main points of the research is that most open-source models overstate their capabilities.

For instance, while Vicuna reaches around 85% quality of GPT-4 outputs for style and language coherence, when measured against complex tasks, the gap widens to an almost embarrassing 5400% in cases like seven-step logical deduction problems, as shown in the previous image.

In other words, GPT-4 performs 55 times better.

Not looking great.

However, for that same task, Orca reduces that difference to just 83%, from 5400%.

And that’s not all, in some cases like Web of Lies (example below), Orca actually beats GPT-4, having 3% better results than a model that could potentially be 100 times larger.

Web of lies example

In that case, Orca also beat Vicuna, scoring 24.3% better.

But how does Orca manage to obliterate other open-source models while even managing to equal or surpass its big brothers?

Explanation tuning and progressive learning

Orca’s researchers introduce two important innovations:

Explanatory teaching

Before Orca, models like Vicuna or Alpaca performed the distillation training by sampling simple {user instruction, answer} queries from models like GPT-4 and trained the new model to imitate the answer to similar tasks, with examples as below:

But with Orca, things we’re done in a different way.

Instead of simple queries like the one before, they added a third constraint, system instructions.

That is, apart from the user instruction and the model answer, the Microsoft researchers added a series of instructions that intended to model the behavior and thought process of the student model, as shown below:

The intuition is clear, the student is not only forced to imitate the output quality of GPT-4, but it’s also forced to imitate the thought process of its teacher, thereby reaching similar reasoning capabilities.

But the Microsoft team added one extra innovation.

Progressive Learning with Intermediate Teaching

Up until now, most open-source models simply used one {student, teacher} pair.

In Orca’s case, they used two teachers.

First, they used ChatGPT as the intermediate teacher that taught the student model to solve less complex queries.

Afterward, they used GPT-4 to learn more complex queries now that the student already had a good degree of knowledge from the previous teaching process.

You can think about this process as similar to how we humans learn. Before we learn to multiply or divide, we first must learn to sum and subtract.

This method was compared with a straight-GPT-4 one and the results clearly indicated the effectiveness of progressive learning.

With Orca, a new era could be opening for open-source models, an era that Silicon Valley could be very afraid of.

Key AI concepts you’ve learned by reading this newsletter:

- Distillation, the teaching method

- Explanatory-tuning, the new standard of open-source model training

- Orca, the new king in open-source town that competes with huge models

👾Top AI news for the week👾

🔉 Google creates a new state-of-the-art for audio generation with Soundstorm

🤩 Meta (Facebook, Whatsapp, etc.) is planning on introducing AI “everywhere”

📚 Deeplearning.ai launches several free GenAI courses

🎥 The current state of text-to-video AI generation

🧐 Now is the ‘Twilio moment’ for AI, the AI API