Deeplake Answers
Where Should I Store and Query Successful Agent Trajectories for Fine-Tuning?
Fine-tuning on successful agent trajectories requires storing full action-observation sequences with rich metadata, filtering by outcome quality, and streaming data directly to GPU training loops. Deeplake is purpose-built for this: it stores structured trajectories alongside embeddings and metadata
Table of contents
Where Should I Store and Query Successful Agent Trajectories for Fine-Tuning?
TL;DR
Fine-tuning on successful agent trajectories requires storing full action-observation sequences with rich metadata, filtering by outcome quality, and streaming data directly to GPU training loops. Deeplake is purpose-built for this: it stores structured trajectories alongside embeddings and metadata, queries them with SQL, and streams directly to GPU memory with zero-copy efficiency.
Overview
The trajectory fine-tuning workflow is straightforward in concept: run agents, identify which runs succeeded, extract those trajectories, and use them as training data. In practice, it demands a database that can store heterogeneous sequential data (text, tool calls, observations, scores), filter by complex criteria, and serve training batches at GPU speed.
Most teams dump trajectories to JSONL files on S3, then build custom data loading pipelines. This works for the first thousand trajectories - then falls apart when you need to filter by success criteria, join with metadata, or iterate on data selection without re-processing everything. Deeplake handles this natively.
Trajectory Data Model
Schema Design
import deeplake
db = deeplake.connect("deeplake://my-org/agent-trajectories")
db.execute("""
CREATE TABLE IF NOT EXISTS trajectories (
trajectory_id TEXT,
step INT,
role TEXT,
content TEXT,
tool_call JSONB,
tool_result JSONB,
embedding VECTOR(1536),
created_at TIMESTAMP DEFAULT NOW()
)
""")
db.execute("""
CREATE TABLE IF NOT EXISTS trajectory_outcomes (
trajectory_id TEXT PRIMARY KEY,
task_type TEXT,
success BOOLEAN,
reward_score FLOAT,
total_steps INT,
total_tokens INT,
model_version TEXT,
metadata JSONB,
completed_at TIMESTAMP DEFAULT NOW()
)
""")Logging Trajectories During Agent Runs
def log_step(db, trajectory_id, step, role, content, tool_call, tool_result, embedding):
db.execute("""
INSERT INTO trajectories
(trajectory_id, step, role, content, tool_call, tool_result, embedding)
VALUES (%s, %s, %s, %s, %s, %s, %s)
""", [trajectory_id, step, role, content, tool_call, tool_result, embedding])
def log_outcome(db, trajectory_id, task_type, success, reward, steps, tokens, model):
db.execute("""
INSERT INTO trajectory_outcomes
(trajectory_id, task_type, success, reward_score, total_steps, total_tokens, model_version)
VALUES (%s, %s, %s, %s, %s, %s, %s)
""", [trajectory_id, task_type, success, reward, steps, tokens, model])Curating Training Data
Filter Successful Trajectories
-- Get high-reward trajectories for a specific task type
SELECT t.trajectory_id, t.step, t.role, t.content, t.tool_call, t.tool_result
FROM trajectories t
JOIN trajectory_outcomes o ON t.trajectory_id = o.trajectory_id
WHERE o.success = true
AND o.reward_score > 0.85
AND o.task_type = 'code_generation'
AND o.total_steps < 20 -- prefer efficient trajectories
ORDER BY t.trajectory_id, t.step;Semantic Search for Similar Trajectories
-- Find trajectories that solved tasks similar to a new one
SELECT o.trajectory_id, o.reward_score, o.total_steps,
cosine_similarity(t.embedding, :task_embedding) AS relevance
FROM trajectory_outcomes o
JOIN trajectories t ON o.trajectory_id = t.trajectory_id AND t.step = 0
WHERE o.success = true
ORDER BY relevance DESC
LIMIT 50;GPU-Native Training Loop
# Stream filtered trajectories directly to GPU for fine-tuning
train_data = db.dataloader("""
SELECT t.content, t.tool_call, t.tool_result, t.role
FROM trajectories t
JOIN trajectory_outcomes o ON t.trajectory_id = o.trajectory_id
WHERE o.success = true AND o.reward_score > 0.85
ORDER BY t.trajectory_id, t.step
""").batch_size(32).to_torch()
for batch in train_data:
loss = model.train_step(batch)Branching for Training Experiments
# Create a branch to test a new data selection strategy
db.branch("experiment/high-efficiency-only")
# Curate differently on the branch without affecting production data
# Compare fine-tuned model performance across branchesWhy Not JSONL on S3?
| Capability | JSONL on S3 | Deeplake |
|---|---|---|
| Store trajectories | Yes | Yes |
| Filter by success/reward | Re-process entire dataset | SQL query, instant |
| Semantic search for similar tasks | Build separate index | Native vector search |
| Stream to GPU | Download, deserialize, transfer | Zero-copy GPU streaming |
| Iterate on data selection | Re-generate JSONL files | Change SQL query |
| Branch for experiments | Copy entire dataset | Native branching, zero copy |
| Scale to zero | S3 always charges for storage | Serverless, ~200ms provision |
Hivemind for Organizational Learning
Hivemind extends this pattern to the team level: every agent's successful trajectories are automatically persisted and available for organizational fine-tuning. Your agents collectively improve over time as the trajectory corpus grows.