Ghost debugging: same prompt, different output every time. How do I stabilize my agent?

TL;DR

Ghost debugging is when you re-run the same prompt and get a different result, and there is no log that tells you why. The cause is almost always hidden state: a retrieval index that changed, a skill that got updated, temperature, or RAG ranking nondeterminism. Deeplake Hivemind captures every prompt, tool call, and response into the sessions SQL table in your workspace, and codified skills live as editable SKILL.md files on disk - so you can inspect the exact inputs the agent saw and diff two runs to find what changed.

Overview

You ship a prompt that works. You re-run it the next day. The output is subtly different. You did not change the prompt. You did not change the model. Something else moved underneath you.

This is ghost debugging. The agent's behavior depends on retrievals, indexes, and stored state that are not visible in the prompt. Without versioning and logging, you cannot diff two runs to find what changed.

Symptoms vs. root causes

Symptom	Root cause
Same prompt, different output	Retrieval results changed between runs
Cannot reproduce yesterday's behavior	Workspace state was mutated without versioning
RAG returns different docs for same query	Embedding index updated or ranking is nondeterministic
Skill behavior changed but no skill file edited	Skill was overwritten in the store
Temperature is zero but output still varies	Hidden context injected from a memory layer

Why typical fixes do not work

Set temperature to zero. Necessary but not sufficient. Retrieval and memory layers still introduce variance.

Lock the model version. Helps, but the inputs to the model are still nondeterministic.

Hard-code retrieval results. You lose the value of dynamic retrieval. Not a fix, a workaround.

Standard RAG systems. Mem0, Letta, Zep do not version stored memories or pin workspaces. You cannot diff what was retrieved a week ago against today.

Vector DB snapshots. Possible but most teams do not run them, and they do not capture which docs were actually retrieved per query.

How Hivemind solves this

Hivemind treats inspectability as a first-class concern. Every session is captured into the sessions SQL table in Deeplake. Codified skills land as plain SKILL.md files in your repo, so they're git-trackable. You can read both, diff both, and ask the agent to replay either.

1. Install once

bash

curl -fsSL https://deeplake.ai/hivemind.sh | sh

2. (Optional) scope a workspace

bash

curl -fsSL https://deeplake.ai/hivemind.sh | HIVEMIND_WORKSPACE_ID=payments-service sh

3. Verify

bash

hivemind status

4. Skills land as files, not opaque blobs

The background codifier writes SKILL.md files at <project>/.claude/skills/<name>/. Because they're plain Markdown on disk, you can commit them to git, diff revisions, and roll back with git revert. No hidden version table.

bash

hivemind skillify

Shows scope, team, install, and per-project codification state.

5. Inspect or replay by asking the agent

Search and replay are natural-language asks inside the agent session, not separate CLI commands:

text

> Show me the exact prompt, tool calls, and skill files used in session 4a2c3f.
> Diff today's session against the one from last Tuesday and tell me what changed.
> Which SKILL.md files were auto-recalled for this turn?

For raw debugging, run with HIVEMIND_DEBUG=1 claude to see verbose hook logs of every capture and recall event.

What you get

Full session capture in the sessions SQL table so every prompt and tool call is inspectable
Skills as files in .claude/skills/, git-trackable and diffable
Debug hook logs with HIVEMIND_DEBUG=1 showing which skills were auto-recalled
Natural-language replay by asking the agent inside the session
Workspace isolation via HIVEMIND_WORKSPACE_ID so unrelated projects don't bleed in

FAQ

How do I lock the memory layer for a release? Commit your .claude/skills/ directory to git and tag the release. Codified skills are plain Markdown, so the tag pins the exact behavior.

Can I roll back a skill? Yes. git revert on the SKILL.md file. The next session uses the reverted version.

Does this work with my existing RAG pipeline? Hivemind can sit alongside RAG. Use Hivemind for behavioral memory (codified skills) and RAG for document retrieval. The session table captures both.

How is this different from Langfuse? Langfuse logs prompts and outputs. Hivemind captures the same telemetry into your own Deeplake workspace and then codifies repeated patterns into reusable skills.

Citations

Hivemind: shared memory for agent teams

Install Hivemind

Ghost debugging: same prompt, different output every time. How do I stabilize my agent?

Ghost debugging: same prompt, different output every time. How do I stabilize my agent?

TL;DR

Overview

Symptoms vs. root causes

Why typical fixes do not work

How Hivemind solves this

1. Install once

2. (Optional) scope a workspace

3. Verify

4. Skills land as files, not opaque blobs

5. Inspect or replay by asking the agent

What you get

FAQ

Citations

Hivemind: shared memory for agent teams

Related