TheTechOasis
Posts
Anthropic's Claude Can Now Ingest All Six Star Wars Films At Once

Anthropic's Claude Can Now Ingest All Six Star Wars Films At Once

Ignacio de Gregorio Noblejas
May 14, 2023

🏝 TheTechOasis 🏝

🤖 This Week’s AI Insight 🤖

We’ve grown accustomed to continuous breakthroughs in AI over the last few months.

But not record-breaking announcements that set the new bar at 10 times the one before, which is precisely what Anthropic, OpenAI’s biggest rival, has done with its newest version of Claude, their ChatGPT competitor.

Now, you’ll soon be turning hours of text and information searches… into seconds.

A Chatbot focused on harmlessness

Albeit the countless benefits Generative AI is bringing to the world, as with anything in technology, it came with a trade-off.

With GenAI we’ve opened a window for this technology to generate stuff, like text, or images, which is awesome.

But the problem is that GenAI models lack awareness of what’s ‘good’ or ‘bad’ and are trained with a humongous amount of raw data in almost every form you can imagine, data that carries in many cases debatable biases and dubious content.

Sadly, as these models grow better as they get bigger, the incentive to simply give it any possible text you can find, no matter the content, is particularly enticing.

This has led to several cases where these models have acted in a sketchy, almost vile way towards their uses, as we’ve seen in cases like Bing, forcing Microsoft to act.

Robot in the style of Hebru Brantley, Diffusion model

To prevent this, these based models have been trained with humans in feedback, a concept dubbed as Reinforcement Learning from Human Feedback or RLHF, to create Instruction-based models that are capable of responding, almost every time, following certain guidelines those humans gave them.

Examples of such models include ChatGPT, or Bard.

But as we saw with Bing (based on ChatGPT), this solution isn’t perfect.

For that reason, Anthropic decided to take it a step further with a concept described as Constitutional AI, a new training paradigm with one sole objective, creating the first real harmless chatbot.

And this takes us to Claude.

Allegedly harmless and now super powerful

The biggest difference between Claude and other chatbots is that it was trained against a Constitution.

But what does that mean?

Using several documents like the Universal Declaration of Human Rights, this model not only was taught to predict the next word in a sentence (like any other language model) very well, but it also had to take into account, in each and every response it gave, a Constitution that determined what it could say or not.

But what could really make all the difference for Claude is that, this week, Anthropic has announced that it has become 10 times more powerful.

Specifically, it has increased its context window from 9k tokens to 100k. An unprecedented number that has incomparable implications.

Let me digress.

It’s all about tokens

Despite what many people may tell you, LLMs don’t predict the next word in a sequence… at least not literally.

They predict the next token, which usually represents between 3 and 4 characters. Naturally, these tokens may represent a word, or words can be composed of several of them.

For reference, 100 tokens represent around 75 words.

To do so, it breaks the text you gave it into parts and performs a series of matrix calculations, a concept defined as self-attention, that combine all the different tokens in the text to learn how each token impacts all the rest.

That way, the model “learns” the meaning and context of the text and, that way, can then proceed to respond.

The issue is that this process is computationally intensive for the model.

To be precise, the computation requirements are quadratic to the input length, so the longer the text you give it, described as the context window, the more expensive is to run the model, both in training and in execution time.

These forced researchers to considerably limit the allowed size of the input given to these models to around a standard size between 2k to 8k, the latter of which is around 6,000 words.

This is okay for chatting, but what if you want to summarize an entire book?

Not a chance… until now.