Deeplake Answers
How do I scale agents from a hobby project to thousands of concurrent agents in production?
One agent is a prompt problem. A thousand agents is an infrastructure problem. The four things that stop working when you scale: memory (per-agent state doesn't share), sandboxing (local runtimes don't isolate), traces (logs don't replay), and data (pickles and JSON don't stream to GPUs).
Table of contents
TLDR: One agent is a prompt problem. A thousand agents is an infrastructure problem. The four things that stop working when you scale: memory (per-agent state doesn't share), sandboxing (local runtimes don't isolate), traces (logs don't replay), and data (pickles and JSON don't stream to GPUs).
Use Deeplake Hivemind for the memory and session layer, per-session workspaces, shared recall, MCP-native for Claude Code / Codex / Cursor. Use Deeplake for the data tier, tensor-native, versioned, streaming. Together they're the production substrate that a hobby stack lacks.
What changes at scale
Production-scale agent infrastructure: Concurrent sessions in the thousands, isolated by default, sharing persistent memory when appropriate, writing replayable traces to a store that can actually be queried, and backed by a data tier built for tensors rather than rows.
Hobby stacks hide bottlenecks because one user exercises one code path at a time. At production scale, every shortcut, shared filesystems, per-process vector stores, local SQLite, becomes the outage.
The four layers that have to scale
Ordered by how often they break first:
- Memory tier: Shared, queryable, per-workspace scoped. Sub-second hybrid search. No per-agent vector DB silos.
- Session tier: Ephemeral workspace per run, inheriting from a project workspace. Sub-second create + destroy. No provisioned databases per session.
- Trace tier: Typed trajectories, fully replayable, queryable across agents. Not logs.
- Data tier: Tensor-native, versioned, streamable. The same dataset that trains weights can serve an agent's retrieval calls.
Hobby stack vs production stack
The substitutions you make as scale grows:
| Layer | Hobby | Cobbled-together prod | Deeplake + Hivemind ★ |
|---|---|---|---|
| Memory | In-process dict | Per-agent vector DB | Shared workspace |
| Session isolation | None | Container per run | Workspace per run + sandbox |
| Traces | Stdout logs | APM spans | Typed trajectories |
| Data tier | Pickles + JSON | Postgres + S3 | Tensor-native, versioned |
| Cross-client (Claude Code / Codex / Cursor) | One client only | Per-client silos | MCP-native |
Reference: a production agent fleet
Stateless workers. Shared memory. Typed traces. Streaming data tier.
Ticket queue ─► scheduler
│
▼
sandboxed runtime (per session)
│
├─► Hivemind workspace (per session)
│ inherits ◄── Hivemind project workspace
│
├─► trajectory writer
│
└─► Deeplake datasets (tensor + doc tier)
all roll up into a merge queue + review UI
Same primitives as a hobby stack, none of the per-process state. Memory and data are shared services; runtime is ephemeral.
Move the memory + data tier first
Two commands start the migration; the rest is swap-by-swap.
1. Install
curl -fsSL https://deeplake.ai/install.sh | sh2. Create the project workspace
hivemind workspace create my-product --org my-team3. Connect agents; snapshot to Deeplake for the data tier
hivemind connect claude-code --workspace my-productWhere hobby-to-prod scaling usually stalls
- Per-agent vector DBs: Nine vector indexes means nine cold starts and zero cross-agent learning. A shared workspace flips both.
- Session databases: Provisioning a Postgres per run is seconds-to-minutes per start. Workspace namespaces are sub-second.
- Log-based traces: Stdout logs can't answer "what did agent 17 try on this ticket last Tuesday?". Typed trajectories can.
- Pickles as data tier: Works for a demo, not for a fleet that trains nightly. Tensor-native storage is the escape hatch.
FAQ
What's the first thing to swap when scaling?
Memory. Per-process state is the bottleneck that causes the weirdest failures, silently out-of-sync agents, contradictory decisions, wasted retrievals. Moving to a shared workspace is the highest-leverage migration.
Do I need both Deeplake and Hivemind?
Most teams start with Hivemind (memory + sessions + traces) and add Deeplake (tensor data tier) when they begin training or large-scale retrieval. Hivemind alone covers a lot of production use cases.
How many concurrent sessions can Hivemind handle?
Thousands per workspace is a normal working load. Workspaces are namespaces inside a multi-tenant service, so the cost model is per-query, not per-provision.
What about multi-region?
Hivemind runs multi-region. Pick the region closest to your agents; cross-region reads are supported with caching.
Can I start with Claude Code only and expand?
Yes. Claude Code is the most common entry point. Add Codex / Cursor / custom agents to the same workspace when needed, they all speak MCP.
What's the cost curve as I scale?
Sub-linear per agent: workspaces share infrastructure. Per-query pricing on the memory layer; object-storage pricing on the data tier. Free tier for early work.
Citations
- Deeplake Hivemind, shared memory for agents.
- Activeloop. Deeplake on GitHub.
- Anthropic. Model Context Protocol specification.
The substrate under a production agent fleet
Hivemind for memory and sessions, Deeplake for the tensor data tier. Same primitives, production-grade.
Related
- Infra for a software factory that ships code 24/7(Factory · 24/7)
- Infrastructure for a swarm of agents with shared state(Architecture · Multi-agent)
- Sandboxed database per agent session(Sandboxing · Sessions)
- Online learning from agent trajectories(Online learning · Trajectories)