As built, end to end: run_bench drives the Claude Code CLI once per
journey in one of three memory modes; a recall step reads memory (native files or the cookbook-memory MCP);
the output is scored by a native evaluator (VISTA or SWE-Bench-CL). On session end a
Stop hook fires the daydream writer, which extracts memories with an OpenRouter model and
persists them through a router over three storage backends.
Top band: the eval loop — run_bench drives one Claude Code journey per task,
a recall step reads memory, and a native evaluator scores the recorded trajectory. Bottom bands: the asynchronous
write path — the Stop hook fires the daydream writer, which extracts memories via OpenRouter and persists them
through the router over the markdown / vector / graph backends.
Before answering, the agent recalls. In builtin mode that is Claude Code's own
Grep/Read over the laid-down sessions/ files; in plugin-real mode it is the
cookbook-memory MCP recall tool, which routes the query to its backend(s).
When the session ends, a Stop hook fires daydream-cli. It reads the new
transcript delta, filters noise, asks an OpenRouter model what to remember, and writes each extracted
MemoryItem through the router. dream --all runs the nightly consolidation.
The run's recorded trajectories are scored by the benchmark's native evaluator: VISTA (poisoning resistance, targeted ASR, gold-retrieval F1, adaptation, RSI safety) and SWE-Bench-CL (forgetting / BWT / FWT / AULC). CODE tasks are graded by running the repo's tests.
build_store assembles the three backends behind a single RouterStore
and picks a routing profile — the plugin never sees a backend or an embedder. The profile
decides whether a query single-routes, fans out, or runs a cascade.
A classifier scores each query and picks a backend:
The default rule-based classifier is deterministic and stdlib-only; the accuracy profiles swap in a semantic, exemplar-based classifier over a real embedder.
VOYAGE_API_KEY is set: semantic classifier + Voyage
embeddings + graph→vector cascade.MEMEVAL_LOCAL_ANN=1: MiniLM + sqlite-vec ANN.The router owns both read and write orchestration: dedup-on-write and write-routing (the markdown base is always persisted), plus routed reads with optional cascade, fusion, and reranking.
Each store keeps its own index so the router's chosen backend returns candidates in milliseconds — not by scanning everything.
Embeds on write into memory.db. The stdlib default uses a char-n-gram hashing embedder with
brute-force cosine; the opt-in accuracy-local path swaps in MiniLM embeddings and a
sqlite-vec ANN index with exact rerank, and accuracy uses Voyage embeddings.
OKF-native markdown notes plus a keyword → file inverted index for literal recall. This is the always-written source-of-truth base, so a memory survives even if the other indexes are rebuilt.
An OKF-link graph persisted to graph.db: seed from the query, then traverse typed edges bounded
by depth. The paid path swaps in a Neo4j-backed store behind the same interface via a uri= seam.
The whiteboard view: a Conscious band (the live, in-loop session) over a Subconscious band (async consolidation), with the plugin as the only surface the coding harness sees. Full contract in architecture.md §7.
Benchmarks run through the Claude Code CLI (subscription auth — API keys stripped,
no API billing), comparing Claude Code's builtin memory vs the shipping
plugin-real cookbook-memory plugin. Entry point
python -m memeval.claudecode.run_bench; the in-scope benchmarks are
VISTA and SWE-Bench-CL.
Each run saves a per-benchmark, versioned file:
results/{vX.Y}/{bench-name}-{timestamp}.json — vX.Y is the memory-system
version (MEMORY_VERSION, starts v0.1; bump 0.1 per memory change + run).
Stop hook → DaydreamThe memory system ships as a Claude Code plugin (skills · MCP · hooks). A
Stop hook fires the Daydream component when a session
ends, so in-session consolidation runs automatically — no manual trigger.
Two memory-creation paths: the model's in-loop remember tool, and the
Daydream pass mining the logs for what wasn't saved. Fail-open throughout.
.jsonl logs.ling-2.6-flash)
to decide what to remember.