Canary Markers: How to Tell When Your AI Agent's Context Is Quietly Degrading
A canary marker is a fixed phrase you make your agent repeat every turn. The moment it drops the phrase, your instructions are slipping. Here is how to set one up and why it works.
Long agent sessions fail in a sneaky way. The model does not announce that it has forgotten your rules. It just quietly stops following them. The system prompt said "always return JSON," and forty messages later you are getting prose again. This is often called context rot: as the context window fills with tool output, files, and back-and-forth, the earliest and most important instructions get diluted and the model's instruction-following degrades.
A canary marker is the simplest early-warning system for this problem, and it costs you almost nothing.
What a canary marker is
You pick a short, distinctive phrase and instruct the agent to emit it on every single turn. The classic, slightly comedic version: tell the agent to begin every reply by addressing you as "my great chief." While it keeps saying "my great chief," your instructions are still landing. The first time it drops the phrase and just answers normally, that is your signal: the early part of your prompt is no longer steering the model, and the rest of your rules are probably slipping too.
It is the same idea miners used with a canary in a coal gas mine. The canary is more sensitive than you are. It reacts before you would notice the danger yourself.
Why it works
Instruction-following is not all-or-nothing. As a session grows, attention spreads across more tokens and the relative weight of your opening instructions falls. The canary is the cheapest instruction you gave, so it is a sensitive proxy: if the model can no longer be bothered to keep a one-line ritual, the more demanding rules (output format, tone, safety constraints, the file it must not touch) are almost certainly degrading at the same time or sooner.
The marker does not measure degradation precisely. It gives you a binary tripwire that is visible at a glance, with no extra tooling and no eval harness.
How to set one up
- Pick a low-collision phrase. Something the model would never produce by chance and that you can scan for instantly. "my great chief," a nonsense token like
[CTX-OK], or a fixed emoji at the start of each reply. - Put it in the strongest position. Place the instruction in the system prompt or the very first user message, and make it explicit: "Begin every response with the exact text
[CTX-OK]. Never omit it." - Watch for the drop. When the marker disappears, or mutates ("[CTX OK]", "CTX-OK:"), treat the session as degraded.
- Act on the signal. Summarize the important state into a fresh message, start a new session, or compact the context. Do not keep pushing a session that has already failed its canary.
Variations
- Prefix marker: a fixed opening string every turn (most common).
- Sign-off marker: a fixed closing line, useful when you care about the end of long generations.
- Structured marker: a required field like
{"_canary": "ok"}in JSON output, which a script can check automatically and fail loudly on. - Counter marker: ask the agent to increment a number each turn. Skipped or repeated numbers reveal it has lost track of the running state.
Where it helps, where it does not
Canary markers shine in long coding sessions, multi-step research runs, and any agent loop that accumulates a lot of tool output. They are a first line of defense, not a guarantee. Some models will dutifully keep the marker while still degrading on harder instructions, so pair the canary with the occasional real spot-check of the actual work. And a canary that is too elaborate becomes its own distraction, so keep it to one line.
If you run agents long enough to need this, you also want the rest of your context hygiene in order: good memory tooling, deliberate compaction, and servers that manage state for you. Browse the memory and knowledge tools and the wider index of AI agent tools on SkillsIndex, each scored on security, utility, and maintenance, to find the pieces that keep long sessions reliable. For how we score everything, see our methodology.
The cheapest reliability tool you have is a phrase the model has to keep saying. Give your agent a canary, and listen for the silence.
Get the weekly shortlist
Every Thursday: the highest-scored new MCP servers, Claude skills and GPT actions from our index of 11,000+, each vetted 0-100 for security. No fluff.