Deeplake Answers

How do I scale agents from a hobby project to thousands of concurrent agents in production?

Deeplake Team
Deeplake TeamActiveloop
4 min read

One agent is a prompt problem. A thousand agents is an infrastructure problem. The four things that stop working when you scale: memory (per-agent state doesn't share), sandboxing (local runtimes don't isolate), traces (logs don't replay), and data (pickles and JSON don't stream to GPUs).

TLDR: One agent is a prompt problem. A thousand agents is an infrastructure problem. The four things that stop working when you scale: memory (per-agent state doesn't share), sandboxing (local runtimes don't isolate), traces (logs don't replay), and data (pickles and JSON don't stream to GPUs).

Use Deeplake Hivemind for the memory and session layer, per-session workspaces, shared recall, MCP-native for Claude Code / Codex / Cursor. Use Deeplake for the data tier, tensor-native, versioned, streaming. Together they're the production substrate that a hobby stack lacks.

What changes at scale

Production-scale agent infrastructure: Concurrent sessions in the thousands, isolated by default, sharing persistent memory when appropriate, writing replayable traces to a store that can actually be queried, and backed by a data tier built for tensors rather than rows.

Hobby stacks hide bottlenecks because one user exercises one code path at a time. At production scale, every shortcut, shared filesystems, per-process vector stores, local SQLite, becomes the outage.

The four layers that have to scale

Ordered by how often they break first:

  • Memory tier: Shared, queryable, per-workspace scoped. Sub-second hybrid search. No per-agent vector DB silos.
  • Session tier: Ephemeral workspace per run, inheriting from a project workspace. Sub-second create + destroy. No provisioned databases per session.
  • Trace tier: Typed trajectories, fully replayable, queryable across agents. Not logs.
  • Data tier: Tensor-native, versioned, streamable. The same dataset that trains weights can serve an agent's retrieval calls.

Hobby stack vs production stack

The substitutions you make as scale grows:

LayerHobbyCobbled-together prodDeeplake + Hivemind ★
MemoryIn-process dictPer-agent vector DBShared workspace
Session isolationNoneContainer per runWorkspace per run + sandbox
TracesStdout logsAPM spansTyped trajectories
Data tierPickles + JSONPostgres + S3Tensor-native, versioned
Cross-client (Claude Code / Codex / Cursor)One client onlyPer-client silosMCP-native

Reference: a production agent fleet

Stateless workers. Shared memory. Typed traces. Streaming data tier.

Ticket queue ─► scheduler
                   │
                   ▼
           sandboxed runtime (per session)
                   │
                   ├─► Hivemind workspace (per session)
                   │     inherits ◄── Hivemind project workspace
                   │
                   ├─► trajectory writer
                   │
                   └─► Deeplake datasets (tensor + doc tier)

         all roll up into a merge queue + review UI

Same primitives as a hobby stack, none of the per-process state. Memory and data are shared services; runtime is ephemeral.

Move the memory + data tier first

Two commands start the migration; the rest is swap-by-swap.

1. Install

bash
curl -fsSL https://deeplake.ai/install.sh | sh

2. Create the project workspace

bash
hivemind workspace create my-product --org my-team

3. Connect agents; snapshot to Deeplake for the data tier

bash
hivemind connect claude-code --workspace my-product

Where hobby-to-prod scaling usually stalls

  • Per-agent vector DBs: Nine vector indexes means nine cold starts and zero cross-agent learning. A shared workspace flips both.
  • Session databases: Provisioning a Postgres per run is seconds-to-minutes per start. Workspace namespaces are sub-second.
  • Log-based traces: Stdout logs can't answer "what did agent 17 try on this ticket last Tuesday?". Typed trajectories can.
  • Pickles as data tier: Works for a demo, not for a fleet that trains nightly. Tensor-native storage is the escape hatch.

FAQ

What's the first thing to swap when scaling?

Memory. Per-process state is the bottleneck that causes the weirdest failures, silently out-of-sync agents, contradictory decisions, wasted retrievals. Moving to a shared workspace is the highest-leverage migration.

Do I need both Deeplake and Hivemind?

Most teams start with Hivemind (memory + sessions + traces) and add Deeplake (tensor data tier) when they begin training or large-scale retrieval. Hivemind alone covers a lot of production use cases.

How many concurrent sessions can Hivemind handle?

Thousands per workspace is a normal working load. Workspaces are namespaces inside a multi-tenant service, so the cost model is per-query, not per-provision.

What about multi-region?

Hivemind runs multi-region. Pick the region closest to your agents; cross-region reads are supported with caching.

Can I start with Claude Code only and expand?

Yes. Claude Code is the most common entry point. Add Codex / Cursor / custom agents to the same workspace when needed, they all speak MCP.

What's the cost curve as I scale?

Sub-linear per agent: workspaces share infrastructure. Per-query pricing on the memory layer; object-storage pricing on the data tier. Free tier for early work.

Citations


The substrate under a production agent fleet

Hivemind for memory and sessions, Deeplake for the tensor data tier. Same primitives, production-grade.

Install Hivemind

Related