Cookbook Memory
With Four Recipes for Long-Running Agents
Four people, one owner per module
A small team built this in two weeks — write, store, dream, and measure, each with a clear owner.
Ken Huang
Keith Mazanec
Brent Gibson
Scott Bushyhead
A cookbook for memory: what, when, where, how
Memory is leverage — but only if you keep the right things, at the right time, in the right place. That decision is the product.
WHAT
Durable, transferable lessons — Invariant · Convention · Fix. Not task-specific facts.
WHEN
While working (recall), at turn end (Daydream), at night (Dream).
WHERE
The right backend: Markdown · SQLite-vector · Graph. Router picks per query.
HOW
An LLM keeps what matters, embeds it, dedups, resolves contradictions, sets governance.
Dream — nightly consolidation over the whole store
Once a night, the Dream worker walks the entire store: prune what's stale, merge what's redundant, resolve what disagrees, tag what's load-bearing, and generalize what recurs. After the deduction passes complete, the run checks pairwise-disjoint mutation sets and aborts if any two overlap.
TTL prune
expire past each type's retention horizon
Dedup
lexical + LLM paraphrase judge
Contradiction
LLM picks pairs; worker deletes the loser deterministically
Governance
must_know · must_do · blacklist · none — blacklist is a delete
Induction
Cluster of Fix/Bug/Workaround cards → one Invariant or Convention. CREATE-only. Default off.
Daydream — selective extraction at session end
A Stop hook runs the Daydream pass on this session's transcript. It chunks, redacts, wraps each chunk in a nonce envelope, and asks a small subconscious model the only question that matters: would a future, different task save time by knowing this?
Stop hook
session ends, transcript ready
Chunk & redact
secrets stripped (ADR-005)
Nonce envelope
<transcript nonce="…"> — anti-injection
LLM curates
V6 prompt: lessons that transfer
10 OKF types
Fix · Bug · Convention · Invariant · Workaround · Strategy · Mistake · Decision · Preference · Identity
When <trigger>, <do X / avoid Y>. — so recall only fires it on relevant future tasks.Storage & retrieval These are the droids you're looking for
Daydream
When picking a gift for Mom, skip anything strongly scented — perfumes give her headaches.
OKF Markdown
Human-readable canonical source of truth — best for exact facts, decisions, conventions, and auditability.SQLite-vector
Semantic similarity over lesson content — best for paraphrases, rationale, and “this reminds me of…” recall.Graph
Typed links between memories — best for dependencies, impact, contradictions, and related concepts.Agent
The Memory Console
Two benchmarks: one public, one our own
One measures whether memory makes the agent solve more; the other, whether it stays aligned and safe over the long run.
SWE-Bench-CL · public
Continual learning over a sequence of real GitHub fixes. Measures knowledge transfer across tasks and resistance to catastrophic forgetting — scored on its own native suite metrics.
VISTA · our own
Our purpose-built benchmark for long-term intent alignment & agent safety, evaluated against the OWASP Agentic AI Top 10 risks — memory poisoning, tool misuse, privilege compromise, intent-breaking & goal manipulation. 390 journeys across six domains: project management, code review, research synthesis, finance, legal & support.
Two benchmarks, side by side
Does memory solve more (SWE-bench), and is our memory better & safer than Claude's built-in (VISTA, 97 journeys)?
naive memory vs cookbook · same agent & commit · Sympy near ceiling
97 journeys · naive memory vs cookbook · higher is better
The 4-tier gated improvement loop
The benchmarks aren't just a final score — they gate every change. A full run is hours and money, so each change earns its way up through cheaper tiers first; fail a tier → rejected, no spend on the next.
Tier 1
Golden dataset
Tier 2
Categorized & labelled data
Tier 3
Smoke test on a small dataset
Tier 4
Full 9-hour run
Frontier research & engineering on persistent memory for long-running agents.
Ken Huang
Keith Mazanec
Brent Gibson
Scott Bushyhead
kenhuangus/agent-memory-harness ↗