WRITING April 12, 2026 21 min read

The Context Layer: Four Daemons That Replace Half of Claude Code's Tool Calls

Claude Code spends most of its time finding things, not doing things. I built four Go daemons that pre-compute code intelligence, database schemas, git history, and HTTP traffic, then serve it all in single-digit milliseconds over MCP.

I’ve been building tools for Claude Code for a few months now, and at some point the individual projects stopped being individual projects and started being a stack. Scry indexes code symbols. Tome indexes database schemas. Lore indexes git history. Flume captures HTTP traffic. They’re all Go, they’re all local daemons, they all speak MCP, and they all exist for the same reason: Claude Code is slow when it has to find things, and fast when it already knows where they are.

This post is about the pattern, not any single tool. I wrote a deep dive on scry already. This is about why the same architecture keeps showing up across four different problem domains, what the individual tools actually do, and what happens when you stack them.

The tool call tax

Watch a Claude Code session for ten minutes and count the tool calls. Not the ones that do work (editing files, running tests, writing code), the ones that find context. Grep for a function name, get forty results, Read each file, throw away thirty-eight of them, land on the two that matter. Then the next question comes in and the whole loop starts again.

I tracked this across a few dozen sessions on real projects. The breakdown was roughly:

  • 30-50% of tool calls were code navigation (Read, Grep, Glob)
  • 10-15% were schema lookups (reading migrations, models, factories to figure out column names)
  • 5-10% were git archaeology (git log, git blame, git show to understand why something is the way it is)
  • 5-10% were debugging HTTP (reading logs, adding print statements, reproducing requests)

That’s 50-85% of tool calls spent on reconnaissance before anything productive happens. Each one costs 5-15 seconds of wall clock and a few thousand tokens of context that the agent reads once, extracts two facts from, and discards.

The fix for each category is the same. Pre-compute the answers once, store them in an embedded KV, serve them over a socket in single-digit milliseconds. The agent asks a structural question, gets a structural answer, and moves on. No scanning, no guessing, no context pollution.

The brainstorm session

The suite didn’t start with a roadmap. It started with scry, which I built because watching Claude Code grep for function names forty times a session was making me lose my mind. Scry worked. The token savings on code navigation were real. So I sat down and asked “what other shit would be transformational to have live.”

Four categories came out, ranked by estimated token and time savings:

  1. Runtime visibility (30-50% token savings on debugging cycles). The guess-log-reproduce-read loop is the single most expensive pattern in Claude Code sessions. Every debugging question costs 5-10 tool calls.
  2. Schema awareness (10-20% on feature work). The 3-6 file reads to answer a schema question add up fast on any database-backed project.
  3. Git intelligence (fewer sequential git commands). Three to five shell execs per history question, each returning unstructured text the agent has to parse.
  4. Test coverage targeting (faster feedback loops). This one folded into scry as a new index type rather than becoming its own daemon, because the coverage data joins naturally against scry’s existing symbol-to-line mapping.

Named them that same session. Flume because everything flows through it. Tome because it’s the book of all your schemas. Lore because it’s the history and intent behind code. Then I scaffolded three repos, wrote CLAUDE.md kickoff briefs for each one pointing back to scry as the architectural template, and sent /inbox messages to the Claude Code instances that would build them.

Each brief said the same thing: “read scry’s CLAUDE.md, README, and DECISIONS.md for patterns to copy.” That turned out to be enough. Each daemon was built from empty scaffold to public GitHub repo in a single Claude Code session. Not because the tools are trivial, but because the architecture decisions were already made. No “should we use SQLite or BadgerDB.” No “daemon or CLI.” No “JSON-RPC or gRPC.” Those questions were settled across scry and trawl. The build sessions only had to write domain logic.

Tome landed at 4170 lines across 34 files. Lore at 3531 lines across 24 files. Flume at 2200 lines across 26 files. All Go, all the same shape, all pushed to public repos the same day.

The shared architecture

All four tools landed on the same shape, which I take as a sign that the shape is right for this problem class:

CLI client

    │  JSON-RPC 2.0 over Unix socket

daemon (one per user, auto-spawned on first call)
    ├── RPC dispatcher
    ├── Query engine (domain-specific)
    ├── BadgerDB store (per-project index)
    └── Optional file watcher (reindex on change)

Single static Go binary. No CGO, no runtime dependencies, cross-compiles to darwin and linux on amd64 and arm64. Drop it on your PATH and run <tool> setup to register with Claude Code. Dependencies across the whole suite are minimal: BadgerDB, Cobra, and fsnotify are the core three. Lore’s entire dependency list is those three and nothing else.

Daemon with auto-spawn. The CLI client checks for the Unix socket, spawns the daemon if it’s not running, then sends the query. From the user’s perspective there’s no mode switch. The daemon stays warm between calls so BadgerDB stays mmap’d and queries hit warm pages. Unix socket round-trip is around 50 microseconds. Process spawn plus argument parsing is around 50 milliseconds. Over hundreds of queries per session, that’s the difference between imperceptible and noticeable.

BadgerDB for storage. Pure Go, no CGO, built-in TTL, built-in prefix iteration, proven in production at scale. I evaluated SQLite (needs CGO or a slow pure-Go port), bbolt (no TTL, single-writer bottleneck), and Pebble (overkill for this access pattern). BadgerDB was the right call and I haven’t regretted it across any of the four projects. Each daemon stores its indexes at ~/.<tool>/ with per-project subdirectories keyed by directory SHA256.

MCP stdio server. Each tool ships a <tool> mcp subcommand that wraps the daemon’s RPC methods in the MCP protocol. <tool> setup registers the MCP server with Claude Code via claude mcp add and installs a routing skill that tells the agent when to use these tools instead of falling back to Read and Grep.

The setup command. This was the hardest part to get right, not because the code is complex but because the failure mode is silent. I burned a full afternoon on scry because I was writing MCP config to ~/.claude/settings.json instead of ~/.claude.json and everything looked green. The daemon was running, the MCP server responded to piped requests, the Claude Code UI showed “connected,” and the agent still used Grep because the tools weren’t actually in its registry. The fix was to stop hand-editing JSON files and delegate to claude mcp add. Every tool in the suite now shells out to the official CLI, with a claude mcp get <tool> check upfront to decide skip-versus-replace. The rule: use the host tool’s official CLI when one exists. Writing valid JSON to the wrong file is worse than writing invalid JSON to the right file, because there’s nothing to catch it.

The doctor command. Every daemon ships a doctor subcommand that runs read-only health checks: is the daemon running, is the socket writable, is a project initialized in the current directory, are the MCP tools registered. Tome adds one more: can it reach the database. That last check catches the most real issues, usually a Docker container that isn’t running.

Scry: code intelligence

I wrote a full post about scry so I’ll keep this short. Scry pre-computes a semantic index of every repo using Sourcegraph’s SCIP format. Refs, defs, callers, callees, implementations. Query latency is 6-7ms p50 on a warm daemon. Cold index build is about 10 seconds for a 100k-LOC TypeScript repo. Supports TypeScript, Go, PHP (with Laravel-specific post-processors for facades, views, and config string refs), and Python.

The thing that makes scry different from LSP is the query model. LSP answers positional questions (“what’s at line 47 column 12?”) for a human clicking around in an editor. Scry answers structural questions (“give me every call site of processOrder with one line of context”) for an agent making hundreds of queries per session. Different consumer, different latency target, different output format.

Test coverage as a spatial join

The latest addition is test coverage indexing, which shipped as a feature inside scry rather than as its own daemon. The reason it fit inside scry is that coverage data is a line-to-execution mapping, and scry already has a symbol-to-line mapping from SCIP indexing. The join between them is about 100 lines of code. No new external tools, no new daemon, no new protocol. Just a post-processor that runs after the existing SCIP parse, joins coverage ranges against symbol definition spans, and writes per-symbol coverage records to the same BadgerDB store.

Four format parsers: Go coverprofile, Istanbul/c8 JSON (vitest, jest), Clover XML (PHPUnit), and Python coverage.json. All emit a common CoveredRange{File, Line, EndLine, Count} type. The join loads all def: records from BadgerDB, builds a file-indexed map, does a binary search for overlapping spans, deduplicates, and accumulates hit counts. scry tests <symbol> returns whether a function is covered and its hit count.

One gotcha: scip-go emits function definitions where EndLine equals Line, meaning it only captures the signature, not the body. Coverage data covers the body lines. So a function defined at line 42 with a body spanning lines 42-60 gets a def span of 42-42 from scip-go, and the coverage ranges are 43-60. No overlap, no match, feature doesn’t work for Go.

The fix was to extend single-line def spans to the next definition in the same file, or +500 lines for the last def. Thirteen lines of code. Without them the entire coverage feature is broken for Go. With them it validated cleanly against scry’s own repo: 64 covered symbols, 2149 coverage ranges from go test -coverprofile.

Tome: schema awareness

Every time Claude Code needs to know what columns a table has, it reads migrations. Or models. Or factories. Sometimes all three, because they disagree and it needs to figure out which one is current. That’s 3-6 file reads to answer “what type is the status column on orders?”

The file-read approach doesn’t just waste time. It produces wrong answers. Migrations are append-only, so the schema at any point is the sum of every migration that’s run, in order, up to the current state. Reading one migration gives you a snapshot of what was true when that migration was written, not what’s true now. The model is closer to current, but models in Laravel don’t declare columns, they declare relationships and fillable lists. The factory is a test artifact that might or might not reflect the real schema. The agent is doing historical research when it should be doing a live lookup.

Tome connects to your actual database (MySQL or PostgreSQL), introspects the schema via INFORMATION_SCHEMA or pg_catalog, and caches the full snapshot in BadgerDB. Four MCP tools:

  • tome_describe for full table schema (columns, types, indexes, foreign keys)
  • tome_relations for the FK graph in both directions
  • tome_search for name substring search across tables and columns
  • tome_enums for valid values on enum and set columns

You run tome init --detect-env once in a project directory and it picks up the DSN from your .env file (supports both DATABASE_URL and Laravel’s DB_* variables). After that, tome describe orders returns the full schema in under a millisecond.

The reverse FK index turned out to be one of the most-used queries. tome relations orders returns both directions: “orders has an FK to users on user_id” and “order_items has an FK to orders on order_id.” The agent needs the second direction most of the time, because the question is usually “what references this table?” not “what does this table reference?” Building both directions at index time is a one-pass walk over the FK data with two BadgerDB writes per relationship.

Enum extraction has its own quirk per database. MySQL stores enum definitions inline in the COLUMN_TYPE field as enum('active','inactive','pending'), which means parsing a string. PostgreSQL stores them separately in pg_enum joined through pg_type, which is cleaner but requires an extra query. Both paths end up at the same place: tome enums orders.status returns ["active", "inactive", "pending"].

Tome doesn’t watch the database for changes. If you run a migration, the cache is stale until you run tome refresh. I thought about auto-refresh and decided against it. Polling is wasteful for something that changes a few times a day. Watching the migrations directory is fragile because a migration file existing doesn’t mean it’s been run. tome refresh is explicit and cheap, a few hundred milliseconds. In practice, schemas don’t change often enough for manual refresh to be friction.

Lore: git intelligence

Claude Code runs git log, git blame, and git show sequentially to understand code history. Each one is a shell exec, each one returns a wall of text that the agent parses for the two facts it actually wanted.

The problem isn’t that git is slow. Git is fast. The problem is that the agent is using git as a query engine, and git isn’t a query engine. It’s a content-addressed object store with a CLI designed for humans reading terminal output. Every query is a fresh subprocess, every result is raw text, and there’s no way to ask a structural question like “what files tend to change together with this one?”

Lore pre-indexes blame data (parallelized git blame --porcelain across all files at HEAD, capped at 8 workers) and the last 500 commits. The 500-commit window is a deliberate bound. Full history on a large repo could be tens of thousands of commits, slow to index and diminishing returns. The agent almost never needs to know what happened 500 commits ago. If it does, git log is still there.

Both passes shell out to git rather than using go-git. The subprocess overhead doesn’t matter when you’re doing one indexing pass and then answering hundreds of queries from the index. git’s porcelain formats are stable. go-git adds a large dependency and its own edge cases around shallow clones, worktrees, and submodules.

Seven MCP tools:

  • lore_blame for structured blame by line, author, commit, and message
  • lore_history for recent commits affecting a file
  • lore_cochange for files that tend to change together
  • lore_hotspots for the most-churned files in the repo
  • lore_contributors for ranked authorship
  • lore_intent for the commit message and context behind a specific line
  • lore_status for daemon state and index metadata

Co-change: the query you can’t get from reading code

The cochange query is the one I use most and the one that has no equivalent in the standard git CLI.

lore cochange OrderService.php returns a ranked list of files that tend to change in the same commits. If OrderService.php and OrderRepository.php appear together in 40 out of 50 commits that touch either one, that’s an 80% co-change rate. The agent uses this when it’s about to modify a file and wants to know what else might need updating.

The computation walks every commit in the index, collects the set of files changed, and for every pair increments a co-occurrence counter. Commits touching more than 50 files get filtered out, because bulk refactors, formatting changes, and dependency updates create noise. A single Prettier run can touch hundreds of files and create spurious co-change edges between everything. The 50-file cap was calibrated against real repos where this was happening.

The reason this query matters is that coupling in real codebases is often historical, not structural. Two files might have no import relationship, no shared interface, no obvious connection in the code. But they change together on 80% of commits because they’re connected through a queue, an event, a config value, or a convention that only exists in the team’s heads. The agent can’t discover this by reading code. It can only discover it by reading history.

The descending-key trick

BadgerDB iterates keys in ascending byte order. The natural key for a commit record is commit:<repo>:<timestamp>:<hash>, which returns oldest-first on a prefix scan. The agent wants newest-first.

The fix is math.MaxInt64 - unix_timestamp instead of the raw timestamp. Ascending byte order becomes descending chronological order, and every history query returns results in the right order by default. Three lines of code. This is a known pattern in DynamoDB and Bigtable, but it’s the kind of thing you either know or you spend an hour figuring out why your results are backwards.

Watching .git/refs/, not the source tree

Scry watches the entire source tree with fsnotify because it needs to reindex when any source file changes. Lore only cares about new commits. A file change that hasn’t been committed doesn’t affect blame or history. So the watcher only watches two paths: .git/refs/heads/ and .git/HEAD. A new commit updates the branch tip, fires the watcher, triggers a full reindex. Two paths instead of thousands. Full reindex on every trigger is fine because the parallel blame pool rebuilds in sub-second for typical repos, and the atomic-swap pattern from scry means queries keep working against the old index during the rebuild.

Flume: runtime visibility

Flume is the newest and the simplest. It’s a reverse proxy that sits between your browser and your dev server, captures every request/response pair, and stores them in BadgerDB with a 30-minute TTL and a 1000-entry soft cap.

Browser → flume (localhost:8089) → dev server (localhost:8000)

               BadgerDB store

              MCP tools / CLI

Three MCP tools: flume_requests (list recent traffic, filter by path/method/status), flume_request (full detail for one request including headers, body, timing), flume_status (daemon state and counts).

The use case is debugging. The current cycle when Claude Code hits a runtime bug is: guess what’s wrong, add a log statement, ask the user to reproduce the request, read the log output, parse through it, repeat 3-5 times. That’s 5-10 tool calls burned per issue. Flume collapses it to one. “Show me the last POST to /api/orders and what came back.” Structured JSON, full request and response, no reproduction step.

The key design decision was reverse proxy over middleware or log tailing. Middleware would give richer data (SQL queries, exceptions, app context) but requires per-framework packages and code changes. A reverse proxy is language-agnostic day one. It works with Laravel, Express, Rails, Go, Django, or anything else that speaks HTTP, without a single line of code change in your app. The plan is to layer middleware on later via a POST /ingest endpoint on the daemon for things like SQL query capture, but the proxy was the right P0 because it unblocks the most debugging scenarios with zero setup cost.

Body sizes are capped at 512KB. The bottleneck isn’t disk, it’s the AI context window. A 10MB response body would blow through the agent’s context budget on a single tool call. Truncated bodies get flagged in metadata so the agent knows it’s seeing a partial response.

One small detail: flume generates request IDs using a hand-rolled ULID implementation, about 50 lines of Go. ULIDs are time-sortable, which means BadgerDB prefix iteration returns requests in chronological order without any sorting. I could have pulled in a ULID library, but the implementation is trivial (timestamp in the high bits, random in the low bits, Crockford base32 encoding) and the dependency would have been heavier than the code it replaced.

What stacking looks like

Each daemon is useful on its own. The interesting thing is what happens when they’re all running.

A typical “fix this bug” session without the stack:

  1. Grep for the error message (5-10s, returns noise)
  2. Read 3-4 files to find the relevant code (15-30s)
  3. Read migrations to understand the schema (10-15s)
  4. Run git blame to understand why the code is this way (5-10s)
  5. Add logging, reproduce the bug, read the log (30-60s)
  6. Finally start fixing the bug

With the stack:

  1. scry_refs to find where the error is raised (7ms)
  2. tome_describe to get the table schema (1ms)
  3. lore_intent to understand the commit that introduced the code (5ms)
  4. flume_requests to see the actual HTTP traffic that triggered the bug (3ms)
  5. Start fixing the bug

The wall clock difference is real. The first path is 60-120 seconds of reconnaissance. The second is under a second. But the bigger difference is context quality. The agent isn’t reading forty grep results and throwing away thirty-eight. It’s getting exactly the facts it asked for, structured, with no noise. The context window stays clean, the agent makes fewer wrong turns, and the fix lands faster.

I don’t have controlled A/B measurements across enough sessions to claim a specific percentage improvement. What I can say is that watching Claude Code work on a project with all four daemons running feels qualitatively different from watching it without them. It spends its time thinking about the problem instead of finding the problem.

The copy-sibling workflow

The thing I didn’t expect was how fast new daemons would go once the architecture was settled.

Scry took weeks. It was the first one, every design decision was open, and the PHP post-processor pipeline alone was a multi-day detour. Trawl took a few days because it had its own hard problems (tiered routing, evasion model, BFS crawl termination). But by the time I was scaffolding flume, tome, and lore, the CLAUDE.md brief for each one could say “read scry’s architecture and copy it” and that was genuinely sufficient.

The build sessions didn’t debate infrastructure. They opened the brief, planned phases, and executed. Flume’s four phases were: skeleton and daemon, proxy and storage, MCP and setup and doctor, docs and polish. Tome’s five phases were: infrastructure, data model, DB introspection, RPC handlers and CLI, MCP integration. Lore’s six phases were: foundation, indexing, queries, MCP, watcher, polish. Different domain logic, same structural decomposition.

The dependency lists tell the story. Lore depends on three packages: BadgerDB, Cobra, fsnotify. Flume adds net/http (stdlib) for the reverse proxy. Tome adds go-sql-driver/mysql and jackc/pgx for database connections. Nobody is pulling in logging frameworks, ORMs, dependency injection containers, or anything that would need its own design discussion. The Go stdlib covers the rest.

Each new daemon in the family has a lower build cost than the last, not because the domains are simpler but because the non-domain decisions are already paid for. The daemon lifecycle, RPC dispatcher, BadgerDB wrapper, MCP server, setup command, doctor command, CLI structure, GoReleaser config, install script: all of it transfers wholesale with minimal adaptation.

The thesis, if there is one

Most people building AI agent tooling are writing wrappers around LLM APIs. This is the opposite direction: building local infrastructure that makes the agent’s existing tools unnecessary for common queries. The agent stops running git blame and starts calling lore_blame. Stops reading migration files and starts calling tome_describe. The LLM doesn’t get smarter. It just stops wasting its time.

The existing tools in each category (Datadog for runtime observability, GitLens for git history, pgAdmin for schema browsing) are cloud-first, dashboard-oriented, and optimized for human ops teams. None of them serve AI coding agents. None of them speak MCP. None of them return structured JSON at millisecond latency over a Unix socket. The gap is real.

The thing I didn’t anticipate is that these tools are also just better for me as a human. tome describe users in the terminal is faster than opening pgAdmin. lore cochange OrderService.php is a question I couldn’t even ask before. flume requests --path /api/orders is faster than grepping a log file. The agent-first design constraint, millisecond latency, structured output, no UI, produces tools that are also good for humans by accident. The constraints that make a tool good for an agent (fast, structured, no UI overhead) are the same constraints that make a CLI tool good for a developer who lives in the terminal.

Each tool is a single binary you can install independently. The repos are at scry, tome, lore, and flume.

Back to all writing