++++

Engineering

Mar 2026×6 min read

Master the art of in-context learning. Learn how providing just a few examples can steer LLMs to handle complex tasks and specific output formats.

Few-Shot Prompting 🏗️

Driptanil DattaSoftware Developer

🌍

References & Disclaimer

This content is adapted from Prompting Guide: Few-Shot Prompting. It has been curated and organized for educational purposes on this portfolio. No copyright infringement is intended.

In-Context Learning

While LLMs demonstrate remarkable zero-shot capabilities, they still fall short on more complex tasks. Few-shot prompting can be used as a technique to enable in-context learning where we provide demonstrations in the prompt to steer the model to better performance. The demonstrations serve as conditioning for subsequent examples where we would like the model to generate a response.

According to Touvron et al. 2023 (opens in a new tab) few-shot properties first appeared when models were scaled to a sufficient size (Kaplan et al., 2020 (opens in a new tab)).

The "Whatpu" Example

Let's demonstrate few-shot prompting via an example that was presented in Brown et al. 2020 (opens in a new tab). In the example, the task is to correctly use a new word in a sentence.

💡

Prompt:

A "whatpu" is a small, furry animal native to Tanzania. An example of a sentence that uses the word whatpu is:
We were traveling in Africa and we saw these very cute whatpus.
 
To do a "farduddle" means to jump up and down really fast. An example of a sentence that uses the word farduddle is:

Output:

When we won the game, we all started to farduddle in celebration.

We can observe that the model has somehow learned how to perform the task by providing it with just one example (i.e., 1-shot). For more difficult tasks, we can experiment with increasing the demonstrations (e.g., 3-shot, 5-shot, 10-shot, etc.).

Robustness to Format

Following findings from Min et al. (2022) (opens in a new tab), here are a few key insights:

The label space and the distribution of the input text are both highly important.
The format used also plays a key role in performance, even if you just use random labels.

Interestingly, newer GPT models are becoming more robust to even random formats.

🧪

Prompt:

Positive This is awesome! 
This is bad! Negative
Wow that movie was rad!
Positive
What a horrible show! --

Output:

Negative

There is no consistency in the format above, but the model still predicted the correct label.

Limitations of Few-Shot Prompting

Standard few-shot prompting works well for many tasks but is still not a perfect technique, especially when dealing with complex reasoning tasks. Let's demonstrate why this is the case.

Prompt:

The odd numbers in this group add up to an even number: 15, 32, 5, 13, 82, 7, 1. 
 
A:

Output:

Yes, the odd numbers in this group add up to 107, which is an even number.

This is incorrect (107 is odd). Let's try to add some examples to see if few-shot improves the results.

🚨

Prompt (4-Shot):

The odd numbers in this group add up to an even number: 4, 8, 9, 15, 12, 2, 1.
A: The answer is False.
 
The odd numbers in this group add up to an even number: 17, 10, 19, 4, 8, 12, 24.
A: The answer is True.
 
The odd numbers in this group add up to an even number: 16, 11, 14, 4, 8, 13, 24.
A: The answer is True.
 
The odd numbers in this group add up to an even number: 17, 9, 10, 12, 13, 4, 2.
A: The answer is False.
 
The odd numbers in this group add up to an even number: 15, 32, 5, 13, 82, 7, 1. 
A:

Output:

The answer is True.

That didn't work. It seems like few-shot prompting is not enough to get reliable responses for this type of reasoning problem.

🚀

Next Level: To solve these multi-step reasoning problems, we must transition to Chain-of-Thought (CoT) prompting, which breaks the problem down into steps.