Agent Brain: The Context Engine and Local MCP Router
Agent Brain is the memory and routing substrate of the Autonomic ecosystem. It provides every other organ with structured context — not raw LLM prompt stuffing, but filtered, scoped, and ranked retrieval from a durable store.
Architecture
Brain uses a multi-layered storage model. Short-term context lives in an HNSW (Hierarchical Navigable Small World) vector index, filtered by scope, topic, and recency before retrieval. Long-term facts are stored in a SQLite-backed knowledge graph with temporal edges — a fact is a triple of (subject, predicate, object) with valid-from and invalid-at timestamps. This lets memory expire naturally rather than requiring explicit deletion.
The route_task endpoint is the core primitive. Given a task description and open file list, it returns ranked agents, skills, rules, and memory items under a token budget. This prevents the context bloat that plagues naive RAG: instead of dumping every relevant document into the prompt, brain scores each candidate and selects the top-N that fit the budget.
POST /route_task
{
"user_message": "Fix the failing test in src/parser.rs",
"open_files": ["src/parser.rs", "tests/parser_test.rs"],
"max_tokens": 500,
"limits": { "skills": 3, "memory": 5, "rules": 5, "agents": 2 }
}
Response:
{
"recommended_skills": ["rust-patterns", "rust-testing"],
"relevant_memory": [
{ "topic": "parser-module", "fact": "Parser uses nom 7.0 combinators", "score": 0.89 }
],
"applicable_rules": ["test-before-commit"],
"tokens_used": 420,
"latency_ms": 34
} Standalone Mode
Run agent-brain serve to start an MCP stdio server. This exposes every memory operation — store, retrieve, delete, grep, route — as MCP tools. Any MCP-compatible client (Claude Code, Copilot, Cursor) can use brain as its memory backend without running the full ecosystem.
agent-brain export writes all facts to a portable JSON bundle. agent-brain import restores from one. This makes memory migration between environments a file copy.
The HNSW index is configurable via ~/.agent_brain/config.yaml:
hnsw:
ef_construction: 200
ef_search: 50
m: 16
distance: cosine
storage:
sqlite_path: ~/.agent_brain/memory.db
export_dir: ~/.agent_brain/export/ These parameters follow the standard HNSW tuning trade-offs: higher ef_construction improves recall at the cost of slower inserts, while higher m increases memory usage but produces better graph connectivity.
Integrated Mode
In a full deployment, every organ calls brain’s route_task before acting. Spine queries brain for the rules and skills relevant to each workflow node. Heart triggers periodic GC — deduplicating facts, pruning expired edges, vacuuming the vector index. Nerves publishes memory change events so subscribers can invalidate caches.
Design Decisions
We chose HNSW over flat cosine-similarity search because real deployments accumulate tens of thousands of facts. HNSW gives sub-50ms retrieval at 100K vectors on consumer hardware. The temporal knowledge graph model was chosen over a simple key-value store because AI agents produce inherently relational data — a memory is only meaningful in the context of what task was running, what files were open, what decision was made.
The trade-off is write latency. Inserting a vector and multiple KG edges in a transaction is slower than writing to a flat cache. For brain’s use case this is acceptable — writes are rare, reads are constant. In our benchmark deployment with 50K facts, the p95 read latency is 42ms while p95 write latency is 180ms.