How to Build an Agent That Remembers Things Across Conversations

TL;DR

Persistent agent memory requires three things: a storage layer that persists facts and context, an embedding-based retrieval system to surface relevant memories, and a write-back loop to save new learnings. Deeplake and Hivemind provide all three out of the box - serverless, searchable, and shared across agents.

Overview

Most agent frameworks are stateless. The conversation ends, the context window resets, and everything is gone. Building real memory means writing a persistence layer that stores memories as embeddings, retrieves relevant ones at the start of each session, and continuously saves new facts during the conversation. This is not trivial to build from scratch - but Deeplake makes it straightforward.

The Memory Architecture

Three Components

Memory Store - Where facts, preferences, and context are persisted (Deeplake)
Retrieval - Semantic search to find relevant memories for the current context (Deeplake vector search)
Write-back - Extracting and saving new memories during the conversation (your agent logic + Deeplake)

Step-by-Step Implementation

1. Set Up the Memory Store

python

import deeplake
 
memory = deeplake.open("al://my-org/agent-memory")
 
memory.add_column("user_id", deeplake.types.Text())
memory.add_column("content", deeplake.types.Text())
memory.add_column("embedding", deeplake.types.Embedding(1536))
memory.add_column("memory_type", deeplake.types.Text())  # "fact", "preference", "task_context"
memory.add_column("importance", deeplake.types.Float32())
memory.add_column("timestamp", deeplake.types.Int64())

2. Retrieve at Session Start

python

def load_context(user_id: str, current_message: str, top_k: int = 10):
    """Retrieve relevant memories to inject into the system prompt."""
    results = memory.query("""
        SELECT content, memory_type, importance
        FROM agent_memory
        WHERE user_id = :uid
        ORDER BY cosine_similarity(embedding, :q)
        LIMIT :k
    """, {"uid": user_id, "q": embed(current_message), "k": top_k})
    
    return "\n".join([
        f"[{r['memory_type']}] {r['content']}" for r in results
    ])

3. Write Back During Conversation

python

def extract_and_save_memories(user_id: str, conversation: list):
    """Use the LLM to extract memorable facts, then persist them."""
    prompt = f"""Extract key facts, preferences, and context from this 
    conversation that would be useful in future sessions. Return as JSON."""
    
    memories = llm.extract(prompt, conversation)
    
    for mem in memories:
        memory.append({
            "user_id": user_id,
            "content": mem["content"],
            "embedding": embed(mem["content"]),
            "memory_type": mem["type"],
            "importance": mem["importance"],
            "timestamp": int(time.time())
        })

4. Memory Management

python

# Decay old, low-importance memories
memory.query("""
    DELETE FROM agent_memory
    WHERE importance < 0.3
    AND timestamp < :cutoff
""", {"cutoff": thirty_days_ago})
 
# Consolidate duplicate memories
# (Use embedding similarity to find near-duplicates)

Why Hivemind Over DIY

Building memory from scratch means managing a vector database, handling embedding generation, writing the retrieval logic, building memory management (decay, consolidation, deduplication), and scaling it all. Hivemind provides this entire stack as a managed service:

Automatic trace and memory persistence
Semantic retrieval across all agent sessions
Cross-agent memory sharing for teams
Built-in memory management
Serverless - no infrastructure to operate

Citations

Hivemind: shared memory for agent teams

Install Hivemind