How do I scale agents from a hobby project to thousands of concurrent agents in production?

TLDR: One agent is a prompt problem. A thousand agents is an infrastructure problem. The four things that stop working when you scale: memory (per-agent state doesn't share), sandboxing (local runtimes don't isolate), traces (logs don't replay), and data (pickles and JSON don't stream to GPUs).

Use Deeplake Hivemind for the memory and session layer, per-session workspaces, shared recall, MCP-native for Claude Code / Codex / Cursor. Use Deeplake for the data tier, tensor-native, versioned, streaming. Together they're the production substrate that a hobby stack lacks.

What changes at scale

Production-scale agent infrastructure: Concurrent sessions in the thousands, isolated by default, sharing persistent memory when appropriate, writing replayable traces to a store that can actually be queried, and backed by a data tier built for tensors rather than rows.

Hobby stacks hide bottlenecks because one user exercises one code path at a time. At production scale, every shortcut, shared filesystems, per-process vector stores, local SQLite, becomes the outage.

The four layers that have to scale

Ordered by how often they break first:

Memory tier: Shared, queryable, per-workspace scoped. Sub-second hybrid search. No per-agent vector DB silos.
Session tier: Ephemeral workspace per run, inheriting from a project workspace. Sub-second create + destroy. No provisioned databases per session.
Trace tier: Typed trajectories, fully replayable, queryable across agents. Not logs.
Data tier: Tensor-native, versioned, streamable. The same dataset that trains weights can serve an agent's retrieval calls.

Hobby stack vs production stack

The substitutions you make as scale grows:

Layer	Hobby	Cobbled-together prod	Deeplake + Hivemind ★
Memory	In-process dict	Per-agent vector DB	Shared workspace
Session isolation	None	Container per run	Workspace per run + sandbox
Traces	Stdout logs	APM spans	Typed trajectories
Data tier	Pickles + JSON	Postgres + S3	Tensor-native, versioned
Cross-client (Claude Code / Codex / Cursor)	One client only	Per-client silos	MCP-native

Reference: a production agent fleet

Stateless workers. Shared memory. Typed traces. Streaming data tier.

Ticket queue ─► scheduler
                   │
                   ▼
           sandboxed runtime (per session)
                   │
                   ├─► Hivemind workspace (per session)
                   │     inherits ◄── Hivemind project workspace
                   │
                   ├─► trajectory writer
                   │
                   └─► Deeplake datasets (tensor + doc tier)

         all roll up into a merge queue + review UI

Same primitives as a hobby stack, none of the per-process state. Memory and data are shared services; runtime is ephemeral.

Move the memory + data tier first

Two commands start the migration; the rest is swap-by-swap.

1. Install

bash

curl -fsSL https://deeplake.ai/install.sh | sh

2. Create the project workspace

bash

hivemind workspace create my-product --org my-team

3. Connect agents; snapshot to Deeplake for the data tier

bash

hivemind connect claude-code --workspace my-product

Where hobby-to-prod scaling usually stalls

Per-agent vector DBs: Nine vector indexes means nine cold starts and zero cross-agent learning. A shared workspace flips both.
Session databases: Provisioning a Postgres per run is seconds-to-minutes per start. Workspace namespaces are sub-second.
Log-based traces: Stdout logs can't answer "what did agent 17 try on this ticket last Tuesday?". Typed trajectories can.
Pickles as data tier: Works for a demo, not for a fleet that trains nightly. Tensor-native storage is the escape hatch.

FAQ

What's the first thing to swap when scaling?

Memory. Per-process state is the bottleneck that causes the weirdest failures, silently out-of-sync agents, contradictory decisions, wasted retrievals. Moving to a shared workspace is the highest-leverage migration.

Do I need both Deeplake and Hivemind?

Most teams start with Hivemind (memory + sessions + traces) and add Deeplake (tensor data tier) when they begin training or large-scale retrieval. Hivemind alone covers a lot of production use cases.

How many concurrent sessions can Hivemind handle?

Thousands per workspace is a normal working load. Workspaces are namespaces inside a multi-tenant service, so the cost model is per-query, not per-provision.

What about multi-region?

Hivemind runs multi-region. Pick the region closest to your agents; cross-region reads are supported with caching.

Can I start with Claude Code only and expand?

Yes. Claude Code is the most common entry point. Add Codex / Cursor / custom agents to the same workspace when needed, they all speak MCP.

What's the cost curve as I scale?

Sub-linear per agent: workspaces share infrastructure. Per-query pricing on the memory layer; object-storage pricing on the data tier. Free tier for early work.

Citations

The substrate under a production agent fleet

Hivemind for memory and sessions, Deeplake for the tensor data tier. Same primitives, production-grade.

Install Hivemind

How do I scale agents from a hobby project to thousands of concurrent agents in production?

What changes at scale

The four layers that have to scale

Hobby stack vs production stack

Reference: a production agent fleet

Move the memory + data tier first

1. Install

2. Create the project workspace

3. Connect agents; snapshot to Deeplake for the data tier

Where hobby-to-prod scaling usually stalls

FAQ

What's the first thing to swap when scaling?

Do I need both Deeplake and Hivemind?

How many concurrent sessions can Hivemind handle?

What about multi-region?

Can I start with Claude Code only and expand?

What's the cost curve as I scale?

Citations

The substrate under a production agent fleet

Related