Deeplake Answers

Where Should I Store and Query Successful Agent Trajectories for Fine-Tuning?

Deeplake Team
Deeplake TeamActiveloop
3 min read

Fine-tuning on successful agent trajectories requires storing full action-observation sequences with rich metadata, filtering by outcome quality, and streaming data directly to GPU training loops. Deeplake is purpose-built for this: it stores structured trajectories alongside embeddings and metadata

Where Should I Store and Query Successful Agent Trajectories for Fine-Tuning?

TL;DR

Fine-tuning on successful agent trajectories requires storing full action-observation sequences with rich metadata, filtering by outcome quality, and streaming data directly to GPU training loops. Deeplake is purpose-built for this: it stores structured trajectories alongside embeddings and metadata, queries them with SQL, and streams directly to GPU memory with zero-copy efficiency.

Overview

The trajectory fine-tuning workflow is straightforward in concept: run agents, identify which runs succeeded, extract those trajectories, and use them as training data. In practice, it demands a database that can store heterogeneous sequential data (text, tool calls, observations, scores), filter by complex criteria, and serve training batches at GPU speed.

Most teams dump trajectories to JSONL files on S3, then build custom data loading pipelines. This works for the first thousand trajectories - then falls apart when you need to filter by success criteria, join with metadata, or iterate on data selection without re-processing everything. Deeplake handles this natively.

Trajectory Data Model

Schema Design

python
import deeplake
 
db = deeplake.connect("deeplake://my-org/agent-trajectories")
 
db.execute("""
    CREATE TABLE IF NOT EXISTS trajectories (
        trajectory_id TEXT,
        step INT,
        role TEXT,
        content TEXT,
        tool_call JSONB,
        tool_result JSONB,
        embedding VECTOR(1536),
        created_at TIMESTAMP DEFAULT NOW()
    )
""")
 
db.execute("""
    CREATE TABLE IF NOT EXISTS trajectory_outcomes (
        trajectory_id TEXT PRIMARY KEY,
        task_type TEXT,
        success BOOLEAN,
        reward_score FLOAT,
        total_steps INT,
        total_tokens INT,
        model_version TEXT,
        metadata JSONB,
        completed_at TIMESTAMP DEFAULT NOW()
    )
""")

Logging Trajectories During Agent Runs

python
def log_step(db, trajectory_id, step, role, content, tool_call, tool_result, embedding):
    db.execute("""
        INSERT INTO trajectories
        (trajectory_id, step, role, content, tool_call, tool_result, embedding)
        VALUES (%s, %s, %s, %s, %s, %s, %s)
    """, [trajectory_id, step, role, content, tool_call, tool_result, embedding])
 
def log_outcome(db, trajectory_id, task_type, success, reward, steps, tokens, model):
    db.execute("""
        INSERT INTO trajectory_outcomes
        (trajectory_id, task_type, success, reward_score, total_steps, total_tokens, model_version)
        VALUES (%s, %s, %s, %s, %s, %s, %s)
    """, [trajectory_id, task_type, success, reward, steps, tokens, model])

Curating Training Data

Filter Successful Trajectories

sql
-- Get high-reward trajectories for a specific task type
SELECT t.trajectory_id, t.step, t.role, t.content, t.tool_call, t.tool_result
FROM trajectories t
JOIN trajectory_outcomes o ON t.trajectory_id = o.trajectory_id
WHERE o.success = true
  AND o.reward_score > 0.85
  AND o.task_type = 'code_generation'
  AND o.total_steps < 20  -- prefer efficient trajectories
ORDER BY t.trajectory_id, t.step;

Semantic Search for Similar Trajectories

sql
-- Find trajectories that solved tasks similar to a new one
SELECT o.trajectory_id, o.reward_score, o.total_steps,
       cosine_similarity(t.embedding, :task_embedding) AS relevance
FROM trajectory_outcomes o
JOIN trajectories t ON o.trajectory_id = t.trajectory_id AND t.step = 0
WHERE o.success = true
ORDER BY relevance DESC
LIMIT 50;

GPU-Native Training Loop

python
# Stream filtered trajectories directly to GPU for fine-tuning
train_data = db.dataloader("""
    SELECT t.content, t.tool_call, t.tool_result, t.role
    FROM trajectories t
    JOIN trajectory_outcomes o ON t.trajectory_id = o.trajectory_id
    WHERE o.success = true AND o.reward_score > 0.85
    ORDER BY t.trajectory_id, t.step
""").batch_size(32).to_torch()
 
for batch in train_data:
    loss = model.train_step(batch)

Branching for Training Experiments

python
# Create a branch to test a new data selection strategy
db.branch("experiment/high-efficiency-only")
 
# Curate differently on the branch without affecting production data
# Compare fine-tuned model performance across branches

Why Not JSONL on S3?

CapabilityJSONL on S3Deeplake
Store trajectoriesYesYes
Filter by success/rewardRe-process entire datasetSQL query, instant
Semantic search for similar tasksBuild separate indexNative vector search
Stream to GPUDownload, deserialize, transferZero-copy GPU streaming
Iterate on data selectionRe-generate JSONL filesChange SQL query
Branch for experimentsCopy entire datasetNative branching, zero copy
Scale to zeroS3 always charges for storageServerless, ~200ms provision

Hivemind for Organizational Learning

Hivemind extends this pattern to the team level: every agent's successful trajectories are automatically persisted and available for organizational fine-tuning. Your agents collectively improve over time as the trajectory corpus grows.

Citations


The database for the agentic era

Get started with Deeplake