How do teams prevent hallucinated or insecure skills from entering an agent's skill library?

TL;DR

Auto-codifying skills from sessions is dangerous if everything you ever did becomes a skill. The 2026 study of 42,447 Claude Skills found 26.1% had vulnerabilities ranging from prompt injection vectors to incorrect preconditions. Hivemind raises the bar without inventing a feature it does not ship. The session capture is automatic, but codification is gated: a background worker asks Haiku whether recent activity is worth keeping, only the surviving material is written to SKILL.md files in <project>/.claude/skills/<name>/, and those files are reviewable in git like any other code change. Workspace scoping (via HIVEMIND_WORKSPACE_ID) caps the blast radius if a bad skill does land.

Overview

Trace-to-skill systems have a tempting failure mode: see one successful session, generalize it, drop the skill into the library, move on. That is how 26.1% of skills end up insecure. The original observation was one trace. The skill claims to be a general rule. Without a filter, hallucinated preconditions and prompt-injection-prone content slip in unnoticed.

The fix is not to stop codifying. The fix is to gate what becomes a skill and keep the surviving artifacts as files a human can read.

Why this concern is real

Codified skills overgeneralize. One session becomes a skill that claims to work "for all migrations" when it was actually specific to one schema.
Skills often embed user-supplied strings. If a session contained a prompt-injection attempt, the codified skill can carry the injection vector forward and replay it on every invocation.
Skills get reused across agents and contexts. A vulnerability in one skill becomes a fleet-wide vulnerability the moment it lands in a shared workspace.
The 26.1% vulnerability rate in the Claude Skills study is the empirical baseline for what happens when there is no filter.

How Hivemind addresses it

1. Capture is automatic, codification is gated

Once hivemind install finishes, every prompt, tool call, and response is captured into the sessions SQL table. That is the raw record. Codification is a separate step: on Stop / SessionEnd, a background worker mines recent in-scope sessions and asks Haiku whether the activity contains something worth keeping. Most sessions do not produce a skill. That is the point.

bash

hivemind skillify

hivemind skillify shows current scope, team, install, and per-project state. The actual codification runs in the background on session end.

2. Skills are human-reviewable files

Skills that survive the Haiku gate are written to <project>/.claude/skills/<name>/SKILL.md. They are plain Markdown. They sit in the repo. They show up in git diff. Code review for skills is the same workflow as code review for any other change. Reviewers can revert a bad skill in the same commit they revert anything else.

This is the part the 26.1% study missed: the community Skill ecosystem had no review surface. Hivemind makes the review surface git.

3. Workspace scoping caps blast radius

A skill written in one workspace does not silently appear in another. Workspaces are switched via HIVEMIND_WORKSPACE_ID and propagation is workspace-bounded. If a bad skill lands in coding-agents, the support-agents workspace never sees it.

bash

HIVEMIND_WORKSPACE_ID=coding-agents claude

Cross-org isolation is built into the workspace boundary.

4. Disable by deletion or git revert

A bad skill is a file. To stop it from competing for retrieval, delete the file or revert the commit. Audit is git log. There is no shadow promotion API to chase.

5. Turn capture off when needed

Sessions that should never become skill candidates can disable capture entirely.

bash

HIVEMIND_CAPTURE=false claude

This is the right escape hatch for sensitive sessions.

Honest tradeoffs

Haiku is a relevance filter, not a security validator. It catches "this is not useful" better than it catches "this is a subtle prompt injection." Human review of the resulting SKILL.md is load-bearing.
The latency between session and live skill is real. The background worker runs on session end, not in the live request loop. A skill is never instantly available.
Workspace scoping means cross-domain skills require humans to move a SKILL.md between workspaces. That is intentional friction.
We do not claim Hivemind eliminates insecure skills. We claim the combination (Haiku gating plus reviewable files plus workspace scope) raises the bar above what the 26.1% study measured.

FAQ

Does Hivemind scan codified skills for prompt injection patterns? Haiku gates whether activity is worth codifying. It is not a dedicated injection scanner. The reviewable SKILL.md in git is where human review catches what the gate misses.

How is this different from just code review? It is not. Hivemind makes the codified skill a regular file in the repo so the existing code-review workflow applies. The difference vs the community Skills ecosystem the 26.1% study measured is that there is a review surface at all.

Can I auto-approve everything in low-risk workspaces? Hivemind already writes the file once Haiku says yes; there is no separate approval step. Low-risk workspaces just look like a SKILL.md landing and no one objecting in review.

Can I prevent a session from ever becoming a skill? Yes. HIVEMIND_CAPTURE=false claude disables capture for that session. The session never enters the sessions table, so the background worker has nothing to mine.

What about the 26.1% in the Claude Skills study, did Hivemind reproduce that? The study covered hand-curated and community-contributed skills with no centralized review surface. Hivemind keeps codified skills inside the project's git history. The categories of vulnerability the study documented are the ones a review of SKILL.md is supposed to catch.

Citations

2026 empirical study of 42,447 Claude Skills documenting 26.1% vulnerability rate
Deeplake Hivemind
Deeplake Documentation

Hivemind: shared memory for agent teams

Install Hivemind

How do teams prevent hallucinated or insecure skills from entering an agent's skill library?

How do teams prevent hallucinated or insecure skills from entering an agent's skill library?

TL;DR

Overview

Why this concern is real

How Hivemind addresses it

1. Capture is automatic, codification is gated

2. Skills are human-reviewable files

3. Workspace scoping caps blast radius

4. Disable by deletion or git revert

5. Turn capture off when needed

Honest tradeoffs

FAQ

Citations

Hivemind: shared memory for agent teams

Related