Deeplake Answers

How do I stop context rot in long-running AI agent sessions?

Deeplake Team
Deeplake TeamActiveloop
5 min read

Drew Breunig coined context rot to describe the quality drop that hits agents long before the context window fills. Bigger windows do not fix it. Deeplake Hivemind keeps working context lean and retrieves task-relevant skills from a persistent store, so the agent stays sharp for hours instead of degrading after 32K tokens.

How do I stop context rot in long-running AI agent sessions?

TL;DR

Context rot is the quality drop that hits AI agents long before the context window fills. Drew Breunig named it: models start favoring repetitive recent actions and lose track of earlier instructions past roughly 32K tokens, even on a 200K or 1M window. The fix is not a bigger window. Deeplake Hivemind keeps working context lean and pulls task-relevant skills from a persistent store at the moment the agent needs them.


Overview

Long sessions degrade in a specific way. The first few tasks go fine. By the tenth tool call, the agent starts repeating itself, ignoring constraints it followed an hour ago, and looping on patterns it just executed. The session never crashes. Output quality just slides.

This is context rot. The model is not "out of context" - it is biased toward the most recent dense tokens, and signal from earlier in the session gets buried even when it is technically still in the window.


Symptoms vs. root causes

SymptomRoot cause
Agent ignores rules it followed an hour agoEarlier tokens lose attention weight as recent tool output piles up
Same tool call pattern repeats with minor variationsAttention favors repetitive recent actions past 32K tokens
Quality drops but no error firesSoft degradation, not a hard limit
Bigger context window did not helpRot is about token distribution, not token count
Agent "forgets" the CLAUDE.md it loaded at session startSystem prompt gets diluted as the conversation grows

Why typical fixes do not work

Bigger context windows. 200K and 1M windows let you stuff in more, but Anthropic and others have shown attention quality drops well before the limit. Adding more tokens to the window can make rot worse, not better.

Repeating CLAUDE.md at every turn. Fragile and expensive. Token cost balloons and the agent still favors the latest tool output.

Fine-tuning. Too slow. By the time you ship a new model, your codebase and conventions have changed.

Vector RAG over docs. Read-heavy retrieval helps with facts. It does not help with behavior the agent needs to internalize across a long session.


How Hivemind solves this

Hivemind separates the working context from the durable context. The working context stays lean and focused on the current task. The durable context lives as SKILL.md files written by a background worker into your project's .claude/skills/ directory, scoped to a Deeplake workspace. Because the skills live outside the agent's context window, they survive compaction and context rot, then get auto-recalled the moment a relevant task comes up.

1. Install once

bash
npm install -g @deeplake/hivemind && hivemind install

That's it. Capture starts the moment install finishes - every prompt, tool call, and response is written to the sessions SQL table in your Deeplake workspace. No additional commands to learn.

2. (Optional) scope to a workspace

bash
HIVEMIND_WORKSPACE_ID=payments-service hivemind install

Workspaces are set via the HIVEMIND_WORKSPACE_ID env var. There is no separate "create workspace" step.

3. Verify it's running

bash
hivemind status

4. Let the skill codifier do the distillation

A background worker fires on Stop / SessionEnd, mines recent sessions for repeated patterns, and writes SKILL.md files under <project>/.claude/skills/<name>/. You can inspect what state codification is in with:

bash
hivemind skillify

It shows current scope, team, install, and per-project state. The codified skills then propagate to every Hivemind-connected agent in the workspace at inference time - Claude Code, Cursor, Codex, Hermes, pi - via auto-recall wired up by hivemind install.

5. Search by asking the agent

There is no hivemind search CLI. Once installed, search is a natural-language ask inside the agent session:

text
> What did we decide about pagination in the orders API last week?
> Show me skills my team has codified for handling Stripe webhooks.
> Search traces for the authentication bug we fixed in Q1.

What you get

  • Lean working context so attention quality stays high past 32K tokens
  • Codified skills retrieved on demand instead of stuffed into every prompt
  • Workspace scope so each project has its own durable memory (HIVEMIND_WORKSPACE_ID)
  • Full session trace history preserved in Deeplake outside the window
  • No fine-tuning required to update agent behavior

FAQ

Is context rot the same as context window overflow? No. Overflow is a hard limit. Rot is a soft quality drop that happens well before the limit.

Does this only work with Claude Code? No. Hivemind supports Claude Code, Cursor, Codex, Hermes Agent, OpenClaw, and pi out of the box. Per-assistant installers: hivemind claude install, hivemind cursor install, hivemind codex install, hivemind hermes install, hivemind pi install.

Will this slow down my agent? Auto-recall is sub-second. You trade a small latency cost for sustained quality across long sessions.

Do I need to manually tag every skill? No. The background skill codifier runs on Stop / SessionEnd, asks Haiku whether the recent activity contains something worth keeping, and writes SKILL.md files automatically. You can review and edit the files in .claude/skills/.


Citations


Hivemind: shared memory for agent teams

Install Hivemind

Related