TheTechOasis
Posts
Hacks for LLMs, Round 2

Hacks for LLMs, Round 2

Ignacio de Gregorio Noblejas
September 01, 2024

IMPROVE YOUR AI GAME
Hacks To Become an AI Pro, Round 2

Last week, we saw our first set of hacks, which alone will have considerably boosted your AI performance.

Today, we look at more advanced hacks that are also immediately applicable.

From unexpected gifts by Anthropic or Google to how to prompt multimodal models effectively with prompt frameworks like ROCC and leading to synthetic grounding, the hack, I predict, has been the missing piece in the majority of your failed interactions with LLMs.

As some hacks are more advanced, they will include some rather technical passages you can jump if not interested (I will point them).

Let’s dive in!

Squeezing Every Bit of Performance

In the previous episode of this series, we rapidly concluded that the single best advice for talking to LLMs was to imitate how trainers talked to them. And just as I wrote that Anthropic gave the world an unexpected gift.

Copying Never Felt So Good

As trainers decide the training data points, they dictate how and what the model learns. Therefore, imitating them increases the likelihood that the model elicits the desired behavior.

Now, some of these tips are being released to the public. This week, Anthropic released the system prompt for Claude models in the web browser, giving a world of insight into how you should interact with its model Claude.

If you click the link, you’ll see that they apply many of the recommendations we gave last week, such as the use of tags to clearly separate every segment of the prompt, to the use of clear and concise instructions.

But what is the system prompt?

All LLMs have a prompt hierarchy (shown below in OpenAI’s case). This means that the model prioritizes certain parts of the prompt over others.

The system message is at the top of the list, defining the model's behavior. “Be concise” and “Act this way or that way” are examples of instructions provided there.

Source: OpenAI

If you use the model API, you can define your system prompt, and in cases such as OpenAI, you can change it on the web and desktop versions, too.

This is an absolute must because it is an overarching instruction that the LLM will apply to every interaction, considerably reducing the need for you to repeat how you want it to behave every time.

OpenAI lets you update the system instructions in the web interface.

Another tip you can follow comes from Google, which highly recommends avoiding unnecessary jargon and exploiting the use of straightforward words and sentences.

If we watch Anthropic’s system prompt again, we see the same pattern: They completely avoid fancy words such as conjunctive adverbs like “therefore” or “furthermore” that we usually use to make our writing less boring.

Leave artistry to novelists. Instruction, instruction, instruction.

Another tip not shown in Claude’s prompt but recommended by OpenAI or Meta is using capital letters to make the model more compliant.

But isn’t that a really rude way to talk to a model? Yes, but that’s the point; you aren’t talking to a human, and, fascinatingly, doing so degrades your performance. Instead, you have to be assertive.

❝

Lesson learned: LLMs couldn’t care less about style, rhythm, or manners. Talk to LLMs assertively and in very simple terms, avoiding complicated elaborations.

The following section is for API users or people leveraging open-source models. If that’s not your case, move on to the next section: prompting multimodal models.

Use Tools Whenever Possible and Leverage Hyperparameters (API)

If you interact with open-source LLMs or through private provider APIs (i.e., you talk with ChatGPT using the API, not the chat interface), you can take your control over the model to the next level.

For starters, you can leverage the uber-powerful functionality of function calling. For instance, if you have to perform a complex math calculation, using next-word prediction has a high chance of failure because while LLMs are great with language, we can’t say the same for numbers.

Thus, one way to prevent these dumb mistakes is to define a function (like a calculator) that performs the calculation flawlessly, and have the LLM call that function and collect the response.

At first, this may seem like the typical script you must hardcode into the model. However, far from that, LLMs can decide when to use the calculator autonomously.

The API you use will determine how to implement this; OpenAI has a hyperparameter called “tool_choice,” while Anthropic calls it “tools.”

Either way, you are making the LLM aware that it has tools available (which can be described in plain English) by providing it with a JSON schema such as the one below:

[
  {
    "name": "get_stock_price",
    "description": "Get the current stock price for a given ticker symbol.",
    "input_schema": {
      "type": "object",
      "properties": {
        "ticker": {
          "type": "string",
          "description": "The stock ticker symbol, e.g. AAPL for Apple Inc."
        }
      },
      "required": ["ticker"]
    }
  }
]

Then, the LLM might actively choose to call them (you can force it to use them if you wish), generating a ‘stop reason’ in the payload of its API response that you can access, call the tool, and embed into the following message.

For example, below, you can find a simple implementation of Claude detecting the need to call the Brave Search API (the function ‘brave_search’ is not shown) to provide timely content to the user.

The stop reason is ‘tool_use’, which means the LLM has realized it needs information it does not have and thus identifies the need to call the Brave Search API to update its context.

response = anthropic.messages.create(...)
# Check if the model wants to use the search function
if response.stop_reason == 'tool_use':
   search_results = self.brave_search(user_input)
   messages.append({"role": "assistant", "content": "I am now searching the web."})
   messages.append({"role": "user", "content":f"Here are the latest search results: {context}\n\nHere's the user's original question: {user_input}"})

If you want to use an open-source model, you will still access the models using APIs, but this time provided by LLM providers that allow you to store the LLM in your VPC (Virtual Private Cloud). Here’s an example by Together.ai using Llama 3.1 405B.

Even in you have your own GPU cluster, managing it yourself makes no sense, as companies like Lamini offer serverless API interaction with your own cluster, abstracting the deep pains of managing Generative AI workloads. It’s a no brainer.

Additionally, you can also tune the model’s behavior with hyperparameters. Some of the most interesting ones include:

Temperature: As we discussed last week, you can make the model more/less creative by tuning the ‘temperature.’ The higher, the more creative (and more hallucination-prone) the model will be.

Setting it to 0 turns the LLM into a greedy decoder, choosing always the most likely token, reducing hallucination chances (although not guaranteeing full determinism).

Sampling methods: As discussed last week too, you can control the random sampling method used. With min-p hopefully becoming available soon, it allows you to manage how random the model’s outputs are. Thus, it is heavily used in conjunction with the temperature.
Frequency penalty (OpenAI): This prevents the model from repeating specific tokens multiple times to make their outputs less repetitive.
Logit bias (OpenAI): You can add a bias term to certain tokens to make them more or less likely to appear (this requires accessing the tokenizer and choosing the token's ID, which is a bit more advanced than usual).

All in all, the more you work with the APIs, the more control you have over the model. Circling back to prompt engineering, it isn’t unique to text models.

Prompting Multimodal Models

In our first round of hacks, we mainly focused on text-only LLMs, but prompting also requires specific techniques when working with fully multimodal models.

When working with images and text, like asking the model questions about a provided image, the most important thing to consider is that order matters. And it matters a lot.

Specifically, you want to structure your prompts in the following way:

prompt = "

image 1,
instruction 1,
image 2,
instruction 2

"

To increase your chances, you might also want to provide a role (just as we did with text hacks), but again, respecting the order:

Source: Google

If the task is more complex, you then have to take it up a notch and directly use Google’s ROCC framework:

Role: Define the expert the model has to impersonate: “You are a great painter…”
Objective: State the goal you want the model to achieve. This could be answering a question, summarizing a document, generating code, or providing insights.
Context: Provide any background information or relevant data the model needs to understand the task and generate accurate responses. This could include text, images, charts, or other data sources.

Context is absolutely crucial to get right and is usually the ‘weakest link’ in most prompts. Later in today’s newsletter we’ll learn how to guarantee good context.

Constraints: Specify any limitations or requirements you want the model to adhere to. This might include the length of the response, the format of the output, or restrictions on certain types of content.

LLMs are taught to be maximally helpful. Their ‘worst nightmare’ is not complying with a user’s request. Therefore, they are highly susceptible to adversarial prompts that indicate what NOT to do.

In fact, adversarial prompts are one of the most typical techniques used by LLM hackers to hack models.

Hence, I recommend always following the ROCC framework with multimodal models. They can be trickier to use than text-only models and introduce unexpected behaviors.

Crucially, as I said earlier, context matters greatly when applying ROCC or improving performance with any other model.

Thus, next, we’ll look into the two most essential hacks when dealing with more complex tasks like PDF sharing or complex reasoning using two techniques: grounding and chains.

Grounding

I can’t emphasize enough the importance of grounding. To me, it’s the most fundamental hack of all.

Essential to their being

LLMs are parametric curves that take a set of inputs, usually the words in a sequence, and predict the next one. But what do I mean by ‘parametric curve’?

It’s a fancy way of saying that LLMs, like any other neural networks, are mathematical functions with a set of parameters that have learned a mapping between inputs and outputs.

We give them words in a sequence; they give us the next one by querying its parameters that, combined, give us the next most likely word.

LLM -> function(x1, x2, x3,...) = y, or f("the capital of Spain is") = "Madrid".

Consequently, I can’t stress enough that the well-functioning of an LLM largely depends on the quality of your inputs. The better the input, the better the outcome.

A very simple law. But why is this?

LLM’s greatest feature is ‘in-context learning,’ which dictates that, during training, the loss over the next token decreases the deeper you go into the sequence, as per Olsson et al.

In layman’s terms, LLM’s main feature is that the more context it’s given, the better the next prediction will be, even in situations where the context is new to the model.

Or, to be more specific, the more context you provide naturally correlates to better performance. This makes grounding an essential hack.

While other hacks we have discussed are more based on how the models were trained, to LLMs, grounding feels as natural to them as ‘1+1 = 2’ to us because it taps into the model’s mathematical formulation. It’s an inductive bias of the architecture itself; the Transformer is naturally inclined to assume that it will be provided with good context.

They were built that way.

Consequently, it is a no-brainer hack: always take time to provide context. In fact, I would go as far as saying that if you had used grounding effectively, most of the failed interactions you’ve had with LLMs would never have happened.

But grounding isn’t as simple as dumping the entire context and making the LLM figure it all out. Besides, that is a terribly tedious task.

Luckily, with our next hack, synthetic grounding, you won’t just be using the model to answer your questions—you’ll be unlocking its potential to autonomously refine and perfect the prompts themselves, turning it into an active partner in your creative and problem-solving processes.

Subscribe to Full Premium package to read the rest.

Become a paying subscriber of Full Premium package to get access to this post and other subscriber-only content.

Upgrade

Already a paying subscriber? Sign In.

A subscription gets you:

• NO ADS
• An additional insights email on Tuesdays
• Gain access to TheWhiteBox's knowledge base to access four times more content than the free version on markets, cutting-edge research, company deep dives, AI engineering tips, & more