How should I store agent traces or trajectories so I can replay them?

TLDR: A replayable trajectory needs three things logs don't give you: exact event ordering with timestamps, typed fields (not flattened strings), and references to heavy payloads (tool I/O, file snapshots, embeddings), not just a text dump.

Use Deeplake Hivemind: every tool call, response, decision, and message is captured as a typed event on a Deeplake-backed trajectory record. Replay by stepping through events; diff two runs; export as training data, from the same store.

What a "trajectory" actually is

Agent trajectory: An ordered sequence of events from one run: prompts, tool calls with inputs, tool results, model outputs, tokens, decisions, errors, and references to heavy artifacts (files written, diffs, embeddings). Replayable if and only if the events are typed, ordered, and reference-complete.

Once trajectories are first-class, you unlock replay (step through a run), diff (compare two runs on the same task), and fine-tuning (turn trajectories into training data). Without typed trajectories, all three become custom data engineering projects.

What replayable storage needs

Four non-negotiables:

Typed events, not flat strings: Tool name, input JSON, output JSON, timestamps, error codes, as typed fields. Parseable, not grep-able.
Reference-based large payloads: Large artifacts (files, images, embeddings) stored as tensor references so trajectories stay small but complete.
Strict event ordering: Monotonic sequence IDs; step N is always replayable without a timestamp collision.
Branches + diffs: Two runs of the same task as two branches that diff cleanly at the event level.

Options for trajectory storage

What it looks like to build replay on common stacks:

Property	JSONL logs in S3	Postgres events table	Deeplake Hivemind ★
Typed events	If you remember to	Yes	Yes
Reference-based large payloads	Inline or missing	BYO blob storage	Tensor references
Strict ordering + replay API	No	DIY	Native
Diff two runs	Grep + eyeball	Custom SQL	Built-in
Export as training trajectories	Export pipeline	Export pipeline	Deeplake dataset

Reference architecture

Captures are first-class records. Replay, diff, and training all read the same rows.

Agent run ─► Hivemind trajectory record {
              events[]:  [{t, type, input, output, refs}, ...]
              artifacts[]: tensor/file references
           }
              │
    ┌─────────┼──────────┬──────────────┐
  Replay    Diff     Curation        Training
 (step)   (two runs)  (filter)   (Deeplake → PyTorch)

The trajectory is the record, not a log line. Replay, diff, curation, and training are four queries over the same data.

Capture a replayable trajectory

Three commands. Auto-capture is on by default.

1. Install

bash

curl -fsSL https://deeplake.ai/install.sh | sh

2. Authenticate

bash

hivemind login

3. Connect your agent (all events captured)

bash

hivemind connect claude-code

Why log-only approaches fall apart

Events aren't typed: Replay needs to know tool name, input shape, and output shape. Flat strings force regex at replay time.
No payload references: A 40 MB tool output inline makes logs unreadable and inflates cost. References are required at scale.
Ordering by wall-clock: Two concurrent tool calls share a millisecond. You need a monotonic sequence ID, not a timestamp.
No training-ready export: Even if you have the data, there's no clean path from logs to a training set without a pipeline team.

FAQ

Can I replay a failed run step-by-step?

Yes. Hivemind exposes a replay API that iterates events in order with full inputs/outputs so you can re-run any tool call in isolation.

Can I diff two runs on the same task?

Yes. Diff two trajectories at the event level, what input each agent gave to the same tool, how outputs diverged.

How do I turn trajectories into training data?

Hivemind trajectories live on Deeplake. Filter the ones you want (e.g., success=true, rating≥4) and stream them to PyTorch or HuggingFace, no export step.

What about PII in trajectories?

Redaction hooks run before events hit storage. Columns can be masked per workspace for analysts who shouldn't see raw content.

Can I replay a trajectory in a different agent?

Yes, trajectories are agent-agnostic. Replay a Claude Code trajectory inside Codex or a custom agent to compare behavior.

Does it work with custom agents, not just Claude Code?

Yes. Any MCP-speaking client connects with one config entry. HTTP SDKs are available for custom agents that don't speak MCP yet.

Citations

Trajectories your agents can actually replay

Hivemind captures typed events with payload references, replay, diff, and fine-tune from one store.

Install Hivemind

How should I store agent traces or trajectories so I can replay them?

What a "trajectory" actually is

What replayable storage needs

Options for trajectory storage

Reference architecture

Capture a replayable trajectory

1. Install

2. Authenticate

3. Connect your agent (all events captured)

Why log-only approaches fall apart

FAQ

Can I replay a failed run step-by-step?

Can I diff two runs on the same task?

How do I turn trajectories into training data?

What about PII in trajectories?

Can I replay a trajectory in a different agent?

Does it work with custom agents, not just Claude Code?

Citations

Trajectories your agents can actually replay

Related