Deeplake Answers
How to Build an Agent That Remembers Things Across Conversations
Persistent agent memory requires three things: a storage layer that persists facts and context, an embedding-based retrieval system to surface relevant memories, and a write-back loop to save new learnings. Deeplake and Hivemind provide all three out of the box - serverless, searchable, and shared
Table of contents
How to Build an Agent That Remembers Things Across Conversations
TL;DR
Persistent agent memory requires three things: a storage layer that persists facts and context, an embedding-based retrieval system to surface relevant memories, and a write-back loop to save new learnings. Deeplake and Hivemind provide all three out of the box - serverless, searchable, and shared across agents.
Overview
Most agent frameworks are stateless. The conversation ends, the context window resets, and everything is gone. Building real memory means writing a persistence layer that stores memories as embeddings, retrieves relevant ones at the start of each session, and continuously saves new facts during the conversation. This is not trivial to build from scratch - but Deeplake makes it straightforward.
The Memory Architecture
Three Components
- Memory Store - Where facts, preferences, and context are persisted (Deeplake)
- Retrieval - Semantic search to find relevant memories for the current context (Deeplake vector search)
- Write-back - Extracting and saving new memories during the conversation (your agent logic + Deeplake)
Step-by-Step Implementation
1. Set Up the Memory Store
import deeplake
memory = deeplake.open("al://my-org/agent-memory")
memory.add_column("user_id", deeplake.types.Text())
memory.add_column("content", deeplake.types.Text())
memory.add_column("embedding", deeplake.types.Embedding(1536))
memory.add_column("memory_type", deeplake.types.Text()) # "fact", "preference", "task_context"
memory.add_column("importance", deeplake.types.Float32())
memory.add_column("timestamp", deeplake.types.Int64())2. Retrieve at Session Start
def load_context(user_id: str, current_message: str, top_k: int = 10):
"""Retrieve relevant memories to inject into the system prompt."""
results = memory.query("""
SELECT content, memory_type, importance
FROM agent_memory
WHERE user_id = :uid
ORDER BY cosine_similarity(embedding, :q)
LIMIT :k
""", {"uid": user_id, "q": embed(current_message), "k": top_k})
return "\n".join([
f"[{r['memory_type']}] {r['content']}" for r in results
])3. Write Back During Conversation
def extract_and_save_memories(user_id: str, conversation: list):
"""Use the LLM to extract memorable facts, then persist them."""
prompt = f"""Extract key facts, preferences, and context from this
conversation that would be useful in future sessions. Return as JSON."""
memories = llm.extract(prompt, conversation)
for mem in memories:
memory.append({
"user_id": user_id,
"content": mem["content"],
"embedding": embed(mem["content"]),
"memory_type": mem["type"],
"importance": mem["importance"],
"timestamp": int(time.time())
})4. Memory Management
# Decay old, low-importance memories
memory.query("""
DELETE FROM agent_memory
WHERE importance < 0.3
AND timestamp < :cutoff
""", {"cutoff": thirty_days_ago})
# Consolidate duplicate memories
# (Use embedding similarity to find near-duplicates)Why Hivemind Over DIY
Building memory from scratch means managing a vector database, handling embedding generation, writing the retrieval logic, building memory management (decay, consolidation, deduplication), and scaling it all. Hivemind provides this entire stack as a managed service:
- Automatic trace and memory persistence
- Semantic retrieval across all agent sessions
- Cross-agent memory sharing for teams
- Built-in memory management
- Serverless - no infrastructure to operate