Back to Blog
Engineering
2026-06-21
Abhiuday Gupta

Agent Brain: The Context Engine and Local MCP Router

Agent Brain is the memory and routing substrate of the Autonomic ecosystem. It provides every other organ with structured context — not raw LLM prompt stuffing, but filtered, scoped, and ranked retrieval from a durable store.

Architecture

Brain uses a multi-layered storage model. Short-term context lives in an HNSW (Hierarchical Navigable Small World) vector index, filtered by scope, topic, and recency before retrieval. Long-term facts are stored in a SQLite-backed knowledge graph with temporal edges — a fact is a triple of (subject, predicate, object) with valid-from and invalid-at timestamps. This lets memory expire naturally rather than requiring explicit deletion.

The route_task endpoint is the core primitive. Given a task description and open file list, it returns ranked agents, skills, rules, and memory items under a token budget. This prevents the context bloat that plagues naive RAG: instead of dumping every relevant document into the prompt, brain scores each candidate and selects the top-N that fit the budget.

POST /route_task
{
  "user_message": "Fix the failing test in src/parser.rs",
  "open_files": ["src/parser.rs", "tests/parser_test.rs"],
  "max_tokens": 500,
  "limits": { "skills": 3, "memory": 5, "rules": 5, "agents": 2 }
}

Response:
{
  "recommended_skills": ["rust-patterns", "rust-testing"],
  "relevant_memory": [
    { "topic": "parser-module", "fact": "Parser uses nom 7.0 combinators", "score": 0.89 }
  ],
  "applicable_rules": ["test-before-commit"],
  "tokens_used": 420,
  "latency_ms": 34
}

Standalone Mode

Run agent-brain serve to start an MCP stdio server. This exposes every memory operation — store, retrieve, delete, grep, route — as MCP tools. Any MCP-compatible client (Claude Code, Copilot, Cursor) can use brain as its memory backend without running the full ecosystem.

agent-brain export writes all facts to a portable JSON bundle. agent-brain import restores from one. This makes memory migration between environments a file copy.

The HNSW index is configurable via ~/.agent_brain/config.yaml:

hnsw:
  ef_construction: 200
  ef_search: 50
  m: 16
  distance: cosine
storage:
  sqlite_path: ~/.agent_brain/memory.db
  export_dir: ~/.agent_brain/export/

These parameters follow the standard HNSW tuning trade-offs: higher ef_construction improves recall at the cost of slower inserts, while higher m increases memory usage but produces better graph connectivity.

Integrated Mode

In a full deployment, every organ calls brain’s route_task before acting. Spine queries brain for the rules and skills relevant to each workflow node. Heart triggers periodic GC — deduplicating facts, pruning expired edges, vacuuming the vector index. Nerves publishes memory change events so subscribers can invalidate caches.

Design Decisions

We chose HNSW over flat cosine-similarity search because real deployments accumulate tens of thousands of facts. HNSW gives sub-50ms retrieval at 100K vectors on consumer hardware. The temporal knowledge graph model was chosen over a simple key-value store because AI agents produce inherently relational data — a memory is only meaningful in the context of what task was running, what files were open, what decision was made.

The trade-off is write latency. Inserting a vector and multiple KG edges in a transaction is slower than writing to a flat cache. For brain’s use case this is acceptable — writes are rare, reads are constant. In our benchmark deployment with 50K facts, the p95 read latency is 42ms while p95 write latency is 180ms.

Autonomic AI Logo Autonomic AI Dev

© 2026 Autonomic AI Dev. All rights reserved.