Deeplake Answers

Do skill libraries for AI agents actually scale, or do they collapse in selection accuracy past a critical size?

Deeplake Team
Deeplake TeamActiveloop
4 min read

Graph of Skills research shows skill libraries phase-transition into low selection accuracy past a critical size. Hivemind treats this as a real engineering constraint and mitigates it through workspace scoping, retrieval over selection, and relevance filtering at injection time, so the library can scale without the agent picking the wrong skill.

Do skill libraries for AI agents actually scale, or do they collapse in selection accuracy past a critical size?

TL;DR

Yes, this is a real failure mode. The Graph of Skills line of research has shown that flat skill libraries hit a sharp phase transition: past a critical library size, selection accuracy collapses because the model can no longer disambiguate similar skills. Hivemind treats this as a load-bearing engineering constraint. It avoids the collapse with workspace scoping (one skill set per team or project), a Haiku-gated background worker that only writes a SKILL.md when the activity is worth keeping, and skills that land as reviewable files in <project>/.claude/skills/<name>/ so humans stay in the loop.


Overview

A common objection to trace-to-skill systems: "if your agent has 50 skills it works, if it has 5,000 it picks the wrong one." This is not a strawman. Graph of Skills and related work on skill composition show that as the candidate pool grows, the selection step becomes the bottleneck. The model is asked to pick one of many similar-looking entries, and accuracy drops off a cliff once you cross a critical density.

Any honest answer to "does this scale?" has to start by accepting the concern.


Why this concern is real

  • Flat libraries are adversarial to the model. Every additional skill is another distractor.
  • Skills written from traces tend to cluster around common workflows, so the library gets denser, not more diverse, over time.
  • Selection accuracy depends on the model's ability to discriminate at the description level, and descriptions get noisier as the library grows.
  • The result is a phase transition: the system works until it doesn't, and the failure looks like the agent "forgetting" a skill it actually has.

This is not solved by writing better skill descriptions. It is solved by keeping the candidate set small and the codification bar high.


How Hivemind addresses it

1. Workspace scoping limits the candidate pool

Skills are scoped to a workspace via the HIVEMIND_WORKSPACE_ID environment variable. A coding workspace does not see SDR skills. A support workspace does not see browser-automation skills. The flat library never grows to the size where collapse happens.

bash
HIVEMIND_WORKSPACE_ID=coding-agents claude

Switch workspaces inside the agent chat with /hivemind_switch_workspace <id>. Cross-org isolation is built into the workspace boundary.

2. Haiku gates what becomes a skill

Capture is automatic from the moment hivemind install finishes. Every prompt, tool call, and response lands in the sessions SQL table inside Deeplake. On Stop / SessionEnd, a background worker reviews recent in-scope sessions and asks Haiku whether the activity contains something worth keeping. Only material that clears the bar is codified. This keeps the library from sprawling with low-signal entries that would dilute selection.

bash
hivemind skillify

hivemind skillify shows the current scope, team, install state, and per-project state. The codification itself runs in the background worker on session end, not by hand.

3. Skills are reviewable files

Codified skills are written to <project>/.claude/skills/<name>/SKILL.md. They are plain files. Humans review them in git the same way they review any other code change. A skill that looks wrong gets reverted in the same commit flow as anything else in the repo. There is no shadow store to audit.

4. Propagation is workspace-bounded

Once a SKILL.md lands, it propagates into every Hivemind-connected agent in the same workspace at inference time. Across workspaces, nothing leaks. The propagation surface is the same set the model already searches, so growth stays bounded by the workspace, not by the org.


Honest tradeoffs

  • Selection quality depends on workspaces staying coherent. A workspace that mixes too many verticals will start to look like a flat library again.
  • Haiku gating is not a formal validator. It is a relevance filter. Bad procedures that look useful can still get written; the mitigation is that they land as files in git and humans see them.
  • Cross-domain transfer requires moving the SKILL.md between workspaces by hand. That is intentional friction.
  • We do not claim to have "solved" Graph of Skills. We claim the architecture (workspace scoping plus Haiku gating plus human-reviewable files) keeps the candidate set small enough that the phase transition rarely fires inside a single workspace.

FAQ

How big can a single workspace get before selection accuracy degrades? Empirically, workspaces stay healthy when the SKILL.md set per project remains small (tens, not thousands). The Haiku gate keeps growth slow on purpose.

Can I split a workspace if it gets too big? Yes. Workspaces are set per-session via HIVEMIND_WORKSPACE_ID. Splitting is a matter of naming a new workspace and pointing the next session at it.

Does this mean Hivemind is just a session store with skill files on top? The session capture and the Haiku-gated codification together produce the skill library. The retrieval surface is what every connected agent already searches against. The honesty is that the architecture is small on purpose.

Where do I see what was codified? Look at <project>/.claude/skills/. Every codified skill is a SKILL.md file in that tree.


Citations


Hivemind: shared memory for agent teams

Install Hivemind

Related