Hacks to Make you a GenAI Pro Today

PRODUCT
5 Hacks To Make You a Pro Today

Over the years of interacting with frontier AI models, I’ve learned much about using this technology… and also how not to use it.

Because when talking about stochastic models with randomness inserted into them by default, it’s really about when you shouldn’t be using them.

For that, I’ve assembled a set of pro hacks. And no, I’m not referring to arduous coding workflows or tips that require $100k implementations. Today, you’ll learn things you can apply immediately (both as a desktop user and API user) with no previous knowledge required.

From choosing your ideal model to true unlockers that have been a real game changer for me, such as distribution adaptation or explanatory search, every new hack discussed here today builds on the foundation of the one before.

And by the end of this read, I guarantee you will be able to confidently state you’re an AI expert to 99% of the population.

Let’s go!

Hack nº1. Choose Your Model Wisely

The first step is to choose your base model. Although there are certainly benefits to using more than one product combined (pro hack that we’ll see in the future), let’s be accurate; you will end up using one most of the time.

If you aren’t willing to run your own cloud space (you can rent them on Hugging Face Spaces among many other providers), or even run your base models locally with tools like Ollama, you will have to settle between four options:

  1. ChatGPT

  2. Gemini

  3. Claude

  4. Grok

When choosing between all these options, price isn’t a differentiator because they are all in the same range ($20/month). Therefore, how do I choose?

Private Models

  • ChatGPT offers the best all-around service. It’s good at almost any task. And once they fully roll out Advance Voice Mode, it will become the default product if you prefer speaking more than writing.

  • Claude is good at most things, too, but particularly for coding. They also have artifacts, an impeccable capability that runs the code in real-time. On the flip side, it has the highest refusal rate, meaning the model tends to deny answering your prompt much more than the other providers.

  • Gemini boasts the largest context window, up to 2 million tokens in length (or 1.5 million words, more or less). That’s almost ten times what OpenAI offers with ChatGPT, so it’s your best bet when you send extensive content to the model, even though a recently published benchmark suggests the long-context window claims are heavily overestimated (in all cases)… and it also struggles with refusal rates.

  • Grok is more like a question mark today. Benchmark-wise, Grok-2 is on par (even beating Claude's 3.5 Sonnet), but we must see a wider use to confirm. That said, the model has the lowest refusal rate; it’s much more keen to answer almost any question than the others, so it offers by far the most no-strings-attached experience.

Open-source

Regarding open-source, the Llama 3.1 ecosystem (base models and awesome third-party fine-tunes like Hermes 3) is a great option. On par with private models, they offer unmatched data security as the model resides in your cloud if needed.

You can try Hermes 3 through Lambda Chat today for free.

Importantly, you can also find task-specific models like DeepSeek-Coder-v2 for coding that offer excellent performance. Now, if you want to run your models locally on your desktop computer, please acknowledge the following:

Although models like Gemma2-2B already offer ChatGPT-3.5-level performanceyou will still need cloud deployments to access the best overall models, even if you opt for open-source.

Luckily, almost all open-source models, including Llama 3.1 405B, are prepared so that you can run them in a single node (8 NVIDIA H100s), ensuring the GPU-to-GPU connection is through NVIDIA NVLink, so throughput will be great even in cloud deployments.

If the model you decide to run is multi-node, things get expensive and slow quickly as Nvidia Infiniband (the cable used two connect two nodes), has 36 times less bandwidth than Nvidia NVLink, the cable used between GPUs in the same node (900 GB/s vs 25).

Pro tip: For the most advanced among you, if you intend to deploy open-source solutions to customers, multi-node implementations are no-go zone; all models should scale on single-node basis.

Finally, if you plan to use these models for coding tasks, it's vital to find an IDE (a developing environment) that supports AI-enhanced coding.

While GitHub Copilot’s VSCode extension is the most popular, Cursor AI’s IDE (a VSCode fork) has gained huge momentum over the last months. It raised $60 million just a few days ago and has been praised by the AI gods, including Andrej Karpathy. From my own experience, once you go full AI-enhanced coding, you never go back.

Hack nº2. Be Humble

As I said in the beginning, playing the AI game correctly involves knowing when to use these models but most of all when to avoid them altogether.

Make Every Prediction Count

If you use these models in situations they aren’t meant for, you are asking to be disappointed. Thus, the first step is pretty simple: see which tasks providers recommend.

But why should you trust these recommendations so much?

Large Language Models (LLMs) are sequence-to-sequence models. In other words, they are just models that input a sequence (usually text) and complete it. Thus, their prediction accuracy highly depends on your input sequence.

Additionally, they are biased toward things they have seen a lot (high-frequency bias), meaning they are much more accurate when the input sequence is familiar to them.

But why is this important? Well, because that means that they are much more efficient for sequences that researchers have used to train them.

Thus, if the provider suggests using ChatGPT for summarization, that means the model has seen ‘summarization request sequences’ much more than other tasks, directly correlating with much higher performance.

Tasks that are highly encouraged by providers include:

  • Summarization

  • Code Reformatting

  • Reframing writing style. “Rewrite my text as an informal email.”

  • Named Entity Recognition. Technique used to identify and classify entities such as people, organizations, locations, and dates within a text

  • Extracting. Like NER, LLMs excel at extracting key concepts or data from text.
    (For example, an assistant to classify customer claim letters uses NER and extraction to extract critical fields like customer, date, and reason for the dispute, reformat them into a JSON, and send an API request to log the claim into your customer support platform.)

  • Idea brainstorming

  • Idea contrasting (more on that later)

  • Sentiment analysis - Analyze a text and infer how the speaker/writer is feeling.

  • Translation. Although most LLMs do an ‘ok’ job with translation, they are decoder-only architectures, meaning they might miss important relationships in data. For that reason, you are probably better off using application-specific products like DeepL for this use case.

But most importantly, what things should you avoid?

Please Avoid This.

As discussed in one of my knowledge base articles, the ChatGPT Fallacy, the biggest mistake (and sadly the most common) is testing models to impress you.

People want to be impressed by LLM so badly that they resort to questions they aren’t experts on just to ‘be impressed.’

But here’s the thing: their response is almost guaranteed not to be impressive. However, you can’t tell because you aren’t knowledgable in that field. Thus, you are easily fooled.

For instance, if you aren’t a poet (or an avid poetry reader) and you ask the model to generate a poem on phone chargers in the voice of Hamlet, you will be impressed by what the model gives you.

But here’s the thing: show the response to a Shakespeare expert, and you’ll realize that the answer probably sucked big time.

Reality check: Please be fully aware of what an LLM has been trained on. LLMs have been trained to have ‘low average error,’ keyword ‘average.’ They are designed to give ‘good enough’ responses, not perfect ones.

This leads us to the next thing to avoid. They are the hammer, not the nail. They are the brush, not the painting.

In other words, they should never be the result of your work, the product, but an assistive tool. This is why using ChatGPT to write always turns out badly; they are the living embodiment of mediocrity.

To make matters worse, they were specially tuned to be extremely eloquent, which turns an already mediocre result into an overly pompous and flat-out obnoxious one. Overconfident yet bad, worst-case scenario.

LLMs are productivity tools; they were never meant to be the final result. Please don’t be mediocre.

Hack nº3. Templating

Once you have your model of choice and better intuition about where to use it, it’s time to structure your requests.

LLMs excel in order and fail in chaos. During instruction fine-tuning and behavior alignment, the last two steps before an LLM is deployed to users, the researchers opt for highly structured prompts to maximize the likelihood of success.

Therefore, as with hack 2, our job is to imitate the structures that researchers previously imposed on the model.

The rationale is simple: LLMs are pattern matchers. Therefore, in our quest to incentivize high-quality generation from the model, our job is to facilitate the elicitation of that content through the input sequence—nothing more, nothing less.

Consequently, to point the LLM toward success, we need to use the structures on which it was trained. For instance, in the case of ChatGPT, they highly recommend the use of delimiters:

In Claude’s case, Anthropic recommends XML tags to structure your inputs by grouping text in <instruction>{your text}</instruction> style of tags.

One way or another, all LLMs have been trained on structured inputs, so by not doing so, you are unequivocally losing performance on the table.

Pro Tip: I know structuring is dreadful. But it has an easy solution: just tell the LLM to do it for you.

In other words, you can be as vague as always but asking it to reformat the task before actually doing it

Prompt = {{

I’m going to give you a task, but before doing it, please reformat my request using the same structure your trainers used during instruction tuning. If my task isn’t explicit enough, feel free to ask me any further questions to enrich the context.

“““
[Your vague request goes here]

“““

}}

As you can see in the image below, the difference between inputs is remarkable, and so will the improvement in your results.

Hack nº4. Behavior tuning

With your chosen model, humble approach, and structured inputs, and before we move into reasoning hacks, it’s time to make models dance with behavior tuning.

One of the most straightforward power hacks to boost LLM performance immediately is behavior adaptation, which can be done with one simple five-word sentence.

It’s hysterical how unreasonably effective such a minor change to the prompt can yield such powerful outcomes.

Subscribe to Full Premium package to read the rest.

Become a paying subscriber of Full Premium package to get access to this post and other subscriber-only content.

Already a paying subscriber? Sign In

A subscription gets you:
NO ADS
Get high-signal insights to the great ideas that are coming to our world
Understand and leverage the most complex AI concepts in the world in a language your grandad would understand
Full access to all TheWhiteBox content on markets, cutting-edge research, company and product deep dives, & more.