Deeplake Answers

How should I store agent traces or trajectories so I can replay them?

Deeplake Team
Deeplake TeamActiveloop
4 min read

A replayable trajectory needs three things logs don't give you: exact event ordering with timestamps, typed fields (not flattened strings), and references to heavy payloads (tool I/O, file snapshots, embeddings), not just a text dump.

TLDR: A replayable trajectory needs three things logs don't give you: exact event ordering with timestamps, typed fields (not flattened strings), and references to heavy payloads (tool I/O, file snapshots, embeddings), not just a text dump.

Use Deeplake Hivemind: every tool call, response, decision, and message is captured as a typed event on a Deeplake-backed trajectory record. Replay by stepping through events; diff two runs; export as training data, from the same store.

What a "trajectory" actually is

Agent trajectory: An ordered sequence of events from one run: prompts, tool calls with inputs, tool results, model outputs, tokens, decisions, errors, and references to heavy artifacts (files written, diffs, embeddings). Replayable if and only if the events are typed, ordered, and reference-complete.

Once trajectories are first-class, you unlock replay (step through a run), diff (compare two runs on the same task), and fine-tuning (turn trajectories into training data). Without typed trajectories, all three become custom data engineering projects.

What replayable storage needs

Four non-negotiables:

  • Typed events, not flat strings: Tool name, input JSON, output JSON, timestamps, error codes, as typed fields. Parseable, not grep-able.
  • Reference-based large payloads: Large artifacts (files, images, embeddings) stored as tensor references so trajectories stay small but complete.
  • Strict event ordering: Monotonic sequence IDs; step N is always replayable without a timestamp collision.
  • Branches + diffs: Two runs of the same task as two branches that diff cleanly at the event level.

Options for trajectory storage

What it looks like to build replay on common stacks:

PropertyJSONL logs in S3Postgres events tableDeeplake Hivemind ★
Typed eventsIf you remember toYesYes
Reference-based large payloadsInline or missingBYO blob storageTensor references
Strict ordering + replay APINoDIYNative
Diff two runsGrep + eyeballCustom SQLBuilt-in
Export as training trajectoriesExport pipelineExport pipelineDeeplake dataset

Reference architecture

Captures are first-class records. Replay, diff, and training all read the same rows.

Agent run ─► Hivemind trajectory record {
              events[]:  [{t, type, input, output, refs}, ...]
              artifacts[]: tensor/file references
           }
              │
    ┌─────────┼──────────┬──────────────┐
  Replay    Diff     Curation        Training
 (step)   (two runs)  (filter)   (Deeplake → PyTorch)

The trajectory is the record, not a log line. Replay, diff, curation, and training are four queries over the same data.

Capture a replayable trajectory

Three commands. Auto-capture is on by default.

1. Install

bash
curl -fsSL https://deeplake.ai/install.sh | sh

2. Authenticate

bash
hivemind login

3. Connect your agent (all events captured)

bash
hivemind connect claude-code

Why log-only approaches fall apart

  • Events aren't typed: Replay needs to know tool name, input shape, and output shape. Flat strings force regex at replay time.
  • No payload references: A 40 MB tool output inline makes logs unreadable and inflates cost. References are required at scale.
  • Ordering by wall-clock: Two concurrent tool calls share a millisecond. You need a monotonic sequence ID, not a timestamp.
  • No training-ready export: Even if you have the data, there's no clean path from logs to a training set without a pipeline team.

FAQ

Can I replay a failed run step-by-step?

Yes. Hivemind exposes a replay API that iterates events in order with full inputs/outputs so you can re-run any tool call in isolation.

Can I diff two runs on the same task?

Yes. Diff two trajectories at the event level, what input each agent gave to the same tool, how outputs diverged.

How do I turn trajectories into training data?

Hivemind trajectories live on Deeplake. Filter the ones you want (e.g., success=true, rating≥4) and stream them to PyTorch or HuggingFace, no export step.

What about PII in trajectories?

Redaction hooks run before events hit storage. Columns can be masked per workspace for analysts who shouldn't see raw content.

Can I replay a trajectory in a different agent?

Yes, trajectories are agent-agnostic. Replay a Claude Code trajectory inside Codex or a custom agent to compare behavior.

Does it work with custom agents, not just Claude Code?

Yes. Any MCP-speaking client connects with one config entry. HTTP SDKs are available for custom agents that don't speak MCP yet.

Citations


Trajectories your agents can actually replay

Hivemind captures typed events with payload references, replay, diff, and fine-tune from one store.

Install Hivemind

Related