23 Feb 2025 6 min read

Making LLMs Remember using this Prompting Technique

The article discusses the limitations of Large Language Models (LLMs) in maintaining context and memory, and proposes techniques like message numbering to enhance their performance during interactions.

Large Language Models (LLMs) are very powerful, but they are inherently stateless. They don’t have a continuous memory like humans do. This leads to challenges when you need them to consistently:

Remember rules
Maintain context during long conversations
Perform actions at specific intervals

When you’re coding or engaged in extended conversations with Claude (or other LLMs), one thing becomes clear: they forget.

They don’t just forget in coding sessions; even in lengthy scenarios, the forgetting can occur very early—sometimes right after a simple “Hello World” and a few small modifications.

---

The Problem: Inconsistent Memory in LLMs

I really enjoy coding with Claude Sonnet because it saves a significant amount of time. However, over time, it tends to break things apart, forget important details, add unnecessary changes, and sometimes even overcomplicates code—reducing the quality of what was once beautiful code.

Like many developers, I’ve established coding guidelines and introduced them to Claude, instructing it to keep track of these rules throughout the conversation. Initially, it follows these guidelines, but eventually, it forgets.

A simple example:
When you let Claude code something and then ask a straightforward question about that piece of code, it might not only answer your question—it may also modify the code unnecessarily.

For instance, in the example below, Claude writes a "Hello World" in React. I then asked why it used React, and as you can see, it started coding again. This might seem like a trivial example, but it highlights one of many unexpected behaviors that must be managed through explicit prompting—especially with the current version of these models.

There are dozens of such scenarios that I have discussed in previous blog posts.

Note: To avoid this behavior, you have to explicitly instruct the LLM (Claude) not to do that. However, this becomes problematic because of its tendency to forget as the conversation lengthens.

---

First Workaround: Periodic Rule Reminders

Think of it like making a to-do note that you look at every 5 minutes or so to remind yourself of the rules.

I tried asking Claude to remind itself of the rules every couple of messages—focusing only on the most important ones. Here’s the setup I used:

I will now start a coding session with you. 

You are instructed to make sure that you output your message every other turn to remind everyone of the rules:

Rules:
* Never output decoded code.
* Never make assumptions about the user's intent. 
* If the user asks a question, just answer, never code.
* The code always has the necessary protocols.
* Never output code that has not been authorised by the user. 

If you forget to post these rules every 4th message, you are breaking the law.

Claude began the session by reminding me of the rules, even though I specified it should do so only every fourth message.

Then I asked it not to remind me so often. As I was expecting, it ended up forgetting entirely—except when I pointed out that it had forgotten something.

In that case, it remembered the rules and output them again.

This is clearly problematic.

---

A New Approach: Using External Triggers with Message Numbering

Maybe the model doesn't "remember" the message number internally. What if it simply outputs the message number as an anchor each time?

Instructing it to output the message number could be a simple prompt such as

Output the message number after each message without explanation.

In the example below, it did just that. It outputs the message number after every 4th message, and when it does that, it automatically "remembers" the instruction to do the trigger (output the rule set)

we don't need to remind it with the rules after every single message, that would also be too expensive.

Message #7

I'll continue with the landing page design...

And as you can see, when it reached a fourth message (here the message #12), it outputs the rules itself, reminding itself how to behave. This keeps the rules fresh.

Message #8
(Reminder of our rules:
...
)

...
Message #12

(Reminder of our rules:
...
)

This small change made a huge difference. The LLM now had an external trigger—the message number—to cue the rule reminder.

After applying this technique, I returned to one of Claude’s common flaws:

modifying code when I merely ask a question (assuming I want a change).

I provided it with a set of policies along with the numbering trick to remind itself periodically:

I will now start a coding session with you.

Output the message number after each message without explicit instructions.

Output your message every other turn to remind everyone of the rules:

Rules: * If the user asks a question, just answer, never code. * Make sure you include the necessary logs.
If you forget to post these rules every 4th message, you are breaking the law.

When I tried asking again, it looked much better—it simply answered and even referenced the rules when needed.

We have seen the same behaviour in other models such as Openai GPT.

---

Why This Technique Matters

Using an external trigger like message numbering does more than just enforce rule reminders. It opens up possibilities for:

Periodic Summarization:
– Instead of: “Summarize the last 10 messages.” (which is unreliable)
– Use: “Message #20 (Summarize messages #11-#20)”
Scheduled API Calls:
– Instead of: “Call the weather API every hour.” (which is impossible)
– Use: “Message #30 (Call weather API: [timestamp])”
Contextual Reminders:
– Instead of: “Remember the user's preference for dark mode.” (prone to forgetting)
– Use: “Message #45 (User preference: dark mode. Apply to all future output.)”
Iterative Optimization:
– Instead of: “Improve this code.” (vague)
– Use: “Message #15 (Refactor code from Message #10 based on feedback in Message #12)”
Conditional Logic:
– Instead of: “If there is an error, output this error message.”
– Use: “Message #16 (Check Message #15. If there’s an error, then output: Error Message)”

In essence, you can think of this as a sliding memory rather than a true long-term memory.

---

Other Potential Solutions

There are several advanced approaches to give LLMs a semblance of long-term memory. Here are a few options to consider:

Agentic Memory

There are great tools like letta (previously known as memgpt) that learn from conversations and insert the proper context or necessary information into the chat when it makes sense. For example, if you mention your country in an earlier conversation and later ask for a good restaurant nearby, it can include that detail automatically. This way, the LLM has more context without you having to repeat the information every time.

We could also provide the LLM with a list of rules and let the memory component inject them whenever we instruct the LLM to code according to those rules.

System Prompt

This isn’t necessarily a separate tool, but rather a part of the dialog with the LLM. If you have access to the system prompt (sometimes hidden from the user in Chatbots like ChatGPT ), you can inject these rules there.

This ensures that the LLM is aware of the rules with every completion.

However, it can also make the conversation longer than necessary—even when asking a simple question.

Customization

Some chatbots, like ChatGPT, offer customization options where you can provide specific instructions. Claude has something similar.

You can use this feature to set a list of rules. However, as the dialogue continues, the LLM might eventually forget some of them. Also note that these settings are available only in the chat interface and not through the API.

---

Wrap-Up

The sliding memory prompt can improve the quality of the chat by ensuring that the LLM always has access to a fresh set of rules, or even access to the essence of previous long conversations by summarising them from time to time. The possibilities are endless. However, it is important to remember that this is only a workaround that tries to overcome the memory limitation of LLMs.