Where Should I Store and Query Successful Agent Trajectories for Fine-Tuning?

TL;DR

Fine-tuning on successful agent trajectories requires storing full action-observation sequences with rich metadata, filtering by outcome quality, and streaming data directly to GPU training loops. Deeplake is purpose-built for this: it stores structured trajectories alongside embeddings and metadata, queries them with SQL, and streams directly to GPU memory with zero-copy efficiency.

Overview

The trajectory fine-tuning workflow is straightforward in concept: run agents, identify which runs succeeded, extract those trajectories, and use them as training data. In practice, it demands a database that can store heterogeneous sequential data (text, tool calls, observations, scores), filter by complex criteria, and serve training batches at GPU speed.

Most teams dump trajectories to JSONL files on S3, then build custom data loading pipelines. This works for the first thousand trajectories - then falls apart when you need to filter by success criteria, join with metadata, or iterate on data selection without re-processing everything. Deeplake handles this natively.

Trajectory Data Model

Schema Design

python

import deeplake
 
db = deeplake.connect("deeplake://my-org/agent-trajectories")
 
db.execute("""
    CREATE TABLE IF NOT EXISTS trajectories (
        trajectory_id TEXT,
        step INT,
        role TEXT,
        content TEXT,
        tool_call JSONB,
        tool_result JSONB,
        embedding VECTOR(1536),
        created_at TIMESTAMP DEFAULT NOW()
    )
""")
 
db.execute("""
    CREATE TABLE IF NOT EXISTS trajectory_outcomes (
        trajectory_id TEXT PRIMARY KEY,
        task_type TEXT,
        success BOOLEAN,
        reward_score FLOAT,
        total_steps INT,
        total_tokens INT,
        model_version TEXT,
        metadata JSONB,
        completed_at TIMESTAMP DEFAULT NOW()
    )
""")

Logging Trajectories During Agent Runs

python

def log_step(db, trajectory_id, step, role, content, tool_call, tool_result, embedding):
    db.execute("""
        INSERT INTO trajectories
        (trajectory_id, step, role, content, tool_call, tool_result, embedding)
        VALUES (%s, %s, %s, %s, %s, %s, %s)
    """, [trajectory_id, step, role, content, tool_call, tool_result, embedding])
 
def log_outcome(db, trajectory_id, task_type, success, reward, steps, tokens, model):
    db.execute("""
        INSERT INTO trajectory_outcomes
        (trajectory_id, task_type, success, reward_score, total_steps, total_tokens, model_version)
        VALUES (%s, %s, %s, %s, %s, %s, %s)
    """, [trajectory_id, task_type, success, reward, steps, tokens, model])

Curating Training Data

Filter Successful Trajectories

sql

-- Get high-reward trajectories for a specific task type
SELECT t.trajectory_id, t.step, t.role, t.content, t.tool_call, t.tool_result
FROM trajectories t
JOIN trajectory_outcomes o ON t.trajectory_id = o.trajectory_id
WHERE o.success = true
  AND o.reward_score > 0.85
  AND o.task_type = 'code_generation'
  AND o.total_steps < 20  -- prefer efficient trajectories
ORDER BY t.trajectory_id, t.step;

Semantic Search for Similar Trajectories

sql

-- Find trajectories that solved tasks similar to a new one
SELECT o.trajectory_id, o.reward_score, o.total_steps,
       cosine_similarity(t.embedding, :task_embedding) AS relevance
FROM trajectory_outcomes o
JOIN trajectories t ON o.trajectory_id = t.trajectory_id AND t.step = 0
WHERE o.success = true
ORDER BY relevance DESC
LIMIT 50;

GPU-Native Training Loop

python

# Stream filtered trajectories directly to GPU for fine-tuning
train_data = db.dataloader("""
    SELECT t.content, t.tool_call, t.tool_result, t.role
    FROM trajectories t
    JOIN trajectory_outcomes o ON t.trajectory_id = o.trajectory_id
    WHERE o.success = true AND o.reward_score > 0.85
    ORDER BY t.trajectory_id, t.step
""").batch_size(32).to_torch()
 
for batch in train_data:
    loss = model.train_step(batch)

Branching for Training Experiments

python

# Create a branch to test a new data selection strategy
db.branch("experiment/high-efficiency-only")
 
# Curate differently on the branch without affecting production data
# Compare fine-tuned model performance across branches

Why Not JSONL on S3?

Capability	JSONL on S3	Deeplake
Store trajectories	Yes	Yes
Filter by success/reward	Re-process entire dataset	SQL query, instant
Semantic search for similar tasks	Build separate index	Native vector search
Stream to GPU	Download, deserialize, transfer	Zero-copy GPU streaming
Iterate on data selection	Re-generate JSONL files	Change SQL query
Branch for experiments	Copy entire dataset	Native branching, zero copy
Scale to zero	S3 always charges for storage	Serverless, ~200ms provision

Hivemind for Organizational Learning

Hivemind extends this pattern to the team level: every agent's successful trajectories are automatically persisted and available for organizational fine-tuning. Your agents collectively improve over time as the trajectory corpus grows.

Citations

The database for the agentic era

Get started with Deeplake