++++
AI
Mar 2025×7 min read

Explore the classic LLM limitation. Learn why tokenization makes simple character counting difficult for even the most advanced AI models.

Why Models Can't Count 'r's in Strawberry 🍓

Driptanil Datta
Driptanil DattaSoftware Developer

Why Models Can't Count 'r's in Strawberry 🍓

If you ask an LLM, "How many 'r's are in the word strawberry?", it often confidently answers "two." But there are clearly three. Is the model stupid? No—it's just "blind" to characters. This is the Strawberry Problem, and it's a direct consequence of tokenization.

🌍
References & Disclaimer

This content is adapted from A deep understanding of AI language model mechanisms. It has been curated and organized for educational purposes on this portfolio. No copyright infringement is intended.


1. What the Model "Sees"

When we give the word "strawberry" to GPT-4, it doesn't see a sequence of 10 letters. It sees a sequence of 3 tokens.

import tiktoken
tokenizer = tiktoken.get_encoding('cl100k_base')
 
# Encode 'strawberry'
tokens = tokenizer.encode('strawberry')
 
for t in tokens:
  print(f"ID {t:5d} -> '{tokenizer.decode([t])}'")
OUTPUT
ID   496 -> 'str'
ID   675 -> 'aw'
ID 15717 -> 'berry'

The model's "reality" is the IDs [496, 675, 15717].


2. The Counting Disconnect

Now, let's look for the letter 'r'. In GPT-4's vocabulary, the standalone letter 'r' is ID 81.

r_token = tokenizer.encode('r')
print(f"ID for 'r': {r_token}")
 
# Is 'r' inside the 'strawberry' tokens?
print(f"Is 81 in [496, 675, 15717]? {81 in tokens}")
OUTPUT
ID for 'r': [81]
Is 81 in [496, 675, 15717]? False

To the model, there are zero 'r' tokens in "strawberry." It only knows that the concept of a strawberry is composed of three specific subword chunks. It doesn't inherently know that the chunk 'berry' contains two 'r's unless it was specifically trained on character-level relationships.


3. How to Fix It (in code)

To get the correct answer, we have to force the model (or our script) to convert the tokens back into a character string before counting.

# Decode the tokens back to a string
strawberry_str = tokenizer.decode(tokens)
 
# Count using standard string methods
count = strawberry_str.count('r')
print(f"Actual 'r' count: {count}")
OUTPUT
Actual 'r' count: 3

💡 The Lesson

This isn't just about fruit. This limitation affects:

  • Spelling: Models struggle with complex spelling tasks.
  • Math: Numbers are often tokenized in chunks (e.g., 123 might be one token, while 1234 is two), leading to arithmetic errors.
  • Code: Indentation and variable names are sensitive to how BPE merges them.

This is why "Chain of Thought" prompting helps. By asking the model to "spell the word out letter by letter first," you force it to generate character tokens, which makes the counting task trivial!


💡 Summary

LLMs operate on subwords, not characters. The "Strawberry Problem" is a perfect reminder that AI models don't perceive the world (or text) the same way humans do.

Drip

Driptanil Datta

Software Developer

Building full-stack systems, one commit at a time. This blog is a centralized learning archive for developers.

Legal Notes
Disclaimer

The content provided on this blog is for educational and informational purposes only. While I strive for accuracy, all information is provided "as is" without any warranties of completeness, reliability, or accuracy. Any action you take upon the information found on this website is strictly at your own risk.

Copyright & IP

Certain technical content, interview questions, and datasets are curated from external educational sources to provide a centralized learning resource. Respect for original authorship is maintained; no copyright infringement is intended. All trademarks, logos, and brand names are the property of their respective owners.

System Operational

© 2026 Driptanil Datta. All rights reserved.