Living Docs: Giving Your Agent a Memory That Survives Sessions

I spent several weeks building a complex UI project with Claude Code and Cursor. Dozens of sessions. Context windows that rolled over. New chats that started cold.

The code shipped. What kept the build from dissolving into drift was not better prompting. It was a small set of markdown files at the repo root that the agent read before every serious edit.

ARCHITECTURE.md. ECONOMY.md. DESIGN_RULES.md. COMMONMISTAKES.md.

These were not documentation in the usual sense. They were memory for the co-builder. A way to preserve structure, domain rules, visual constraints, and lessons already paid for so I did not re-teach them in every new session.

This post is about that layer. If you read my first Grove Keepers write-up, that piece covered the full build story. This one zooms in on the doc stack that made multi-week agentic work possible.

Chat is session memory. Files are project memory.

Every agent session starts with amnesia. You can paste summaries, attach files, or write increasingly long system prompts. That scales to an afternoon. It does not scale to a month.

What breaks first on long builds:

Structural drift. The agent adds a manager, wires a signal the wrong way, or duplicates a pattern you already rejected three sessions ago.
Domain re-litigation. You explain the reward curve, the failure consolation rate, or the color token rules again because the new chat never saw the old one.
Repeat bugs. The model confidently reintroduces a fix you already discovered. Camera coordinate spaces. GUI input order. Language syntax it defaults to from training data.

The instinct is to solve this with more context in the prompt. That hits token limits and still vanishes when the session ends.

The fix I landed on: write the state down in files the agent can re-read on demand. Not after the project ships. From week one, updated as often as the code.

Same philosophy I use in my Obsidian vault (file over AI). The repo version is just project-scoped memory: one codebase, one truth, readable by any agent session that opens the folder.

The doc stack

Here is what lived at the root of the project and what each file was for.

File	What it remembers	When the agent reads it
`ARCHITECTURE.md`	Scene tree, autoload graph, signal wiring, module ownership, data flows	Before adding nodes, scripts, or cross-module connections
`ECONOMY.md`	Formulas, tuning curves, balance invariants, what "fair" means at early vs. late progression	Before touching any number that affects feel
`DESIGN_RULES.md`	Typography scale, spacing grid, color language, non-negotiable visual patterns	Before editing UI scripts or drawn overlays
`COMMONMISTAKES.md`	Symptom, root cause, fix, prevention rule for every non-obvious bug	First stop when debugging; append after every painful fix
`CLAUDE.md`	Index and session contract: which files exist, when to update them, build checklist	Auto-read at session start (Claude Code convention)

Supporting docs (COLOR_OVERHAUL.md, UI_TEARDOWN.md, DESIGN_MOCK_PROMPT.md, IMPLEMENTATION_HANDOFF.md) carried specialized context. The four files above were the core memory. Everything else pointed back to them.

ARCHITECTURE.md: where the system lives

This was the most valuable file. Not a diagram in a wiki nobody opens. A living reference the agent was told to read instead of grepping the whole repo.

It contained:

Scene tree with node names, parent paths, and which script owned each piece
Autoload dependency graph and load order (what can call what on startup)
Signal graph listing every EventBus connection and its consumer
Design system layer pointing to Palette.gd and UITheme.gd as the only sources for visual tokens

The header said what mattered most:

Update this after every session that changes nodes, scripts, signals, or resources.
This is the authoritative reference for how the game is wired together.

That was not aspirational. It was the rule. When the agent added a new HUD listener or moved a node, updating ARCHITECTURE.md was part of finishing the task. If the doc drifted, the next session would wire features against a fiction.

Why this works for agents: models are good at local edits and bad at holding global structure in working memory. A single file that answers "what connects to what" prevents the agent from inventing plausible-but-wrong wiring. When I said "add a listener for seeds changed," the agent could look up the existing EventBus.seeds_changed graph and attach to the pattern already there.

What I would copy: one ARCHITECTURE.md per repo. Scene or module tree, dependency order, event/API graph, and a hard rule: structural changes do not merge without a doc update.

ECONOMY.md: domain spec as contract

Architecture covers wiring. Domain docs cover meaning: the numbers and rules that define how the product should feel.

ECONOMY.md opened with:

All formulas live in scripts/core/EconomyConfig.gd as static methods.
Read this before touching any system that involves numbers.

It documented five interlocking systems (enemy scaling, damage scaling, seed rewards, upgrade costs, grid unlock gates) with the actual formulas, safety invariants, and explicit bans. Example invariant: a level-1 unit must clear a baseline enemy in under four seconds at any wave. Example ban: never add seed income that bypasses the reward curve.

When balance felt wrong, I edited EconomyConfig.gd and checked the doc still described reality. When the agent tuned a reward or cost, I pointed it at ECONOMY.md first so it did not improvise a second economy in a corner of the codebase.

Why this works for agents: without a domain spec, the model optimizes for locally plausible numbers. Each session can nudge rewards, costs, and scaling in slightly different directions until the system incoheres. A single markdown contract gives the agent the same constraints a game designer would hand a junior engineer.

What I would copy: if your project has tunable behavior (pricing, scoring, permissions, rollout percentages), put the formulas and invariants in one DOMAIN.md or ECONOMY.md. Code implements it. The doc explains why the numbers exist.

DESIGN_RULES.md: constraints the agent cannot infer from code

Implementation agents love magic numbers. #FFAA33 padding of 26px. A fourth font size that "looks fine here."

DESIGN_RULES.md was the constraint surface for visual work:

Two token files only: Palette.gd for color, UITheme.gd for type, spacing, motion
Eight allowed type sizes. Eight-point spacing grid. No rogue hex codes.
Color language rules (amber for success affordance, green reserved for nature motifs, not "go" buttons)

It also pointed to Cursor rule files that enforced the same constraints in specific paths. The markdown was the human-readable spec. The rules were the automated guardrail. Both had to agree.

Why this works for agents: visual consistency is not recoverable from screenshots in chat. By the time you are three weeks in, the agent has seen hundreds of partial UI states. Without a written design contract, each new screen drifts. DESIGN_RULES.md gave a single answer to "what size is a caption?" and "what color means affordable?"

What I would copy: if agents touch UI, write DESIGN_RULES.md before the implementation marathon. Link it from your agent entry file. Pair it with linter rules or Cursor rules for the paths where drift is most expensive.

COMMONMISTAKES.md: institutional memory for repeat failures

This file paid for itself fastest.

Every entry followed the same shape:

Symptom → Root cause → Fix → Prevention rule

Some entries were language gotchas the model kept reintroducing (Python f-strings in GDScript, len() instead of .size()). Others were project-specific bugs that took real time to diagnose:

Camera vs. mouse space: comparing global_position on world nodes to raw screen mouse coords silently broke all click hit-tests after a Camera2D was added. Prevention: always convert through get_canvas_transform().affine_inverse().
GUI eating world clicks: full-screen overlays with mouse_filter=STOP did not block _input() on world nodes because GUI processes later. Prevention: use _unhandled_input() for world hit-tests when UI can be on screen.

The header instruction:

CHECK THIS FIRST when debugging. Add a new entry every time you hit a non-obvious error or a fix that took more than one attempt.

Agents repeat mistakes. That is not a character flaw. It is what happens when each session sees the fixed code but not the debugging story. COMMONMISTAKES.md preserved the story. After entry 016 existed, I stopped losing an hour to the same input-order bug every time a new chat tried to wire click handling.

Why this works for agents: it is an append-only error log written for retrieval. Short, searchable, actionable. The model does not need to have lived through the bug. It needs to read entry 015 before touching input code.

What I would copy: start COMMONMISTAKES.md on day one with five likely failures for your stack. Add one entry per painful fix. Tell the agent to read it before debugging and append after fixing.

CLAUDE.md: the index that wires it together

Individual docs help. An entry file that tells the agent they exist helps more.

CLAUDE.md at the repo root (Claude Code reads this automatically) was the table of contents:

Which companion files exist and what each is for
The update rule for ARCHITECTURE.md after structural changes
The "check first" rule for COMMONMISTAKES.md when debugging
The "read before UI work" rule for DESIGN_RULES.md

Think of it as AGENTS.md for a single project: not the memory itself, but the map to the memory.

If you use Cursor instead of Claude Code, the same content can live in .cursor/rules or a root AGENTS.md. The format matters less than the contract: at session start, the agent knows which files carry state.

How I used these in practice

A typical implementation request looked like this:

Scope the change. "Add a panel that shows totem stats on long-press."
Load structure. "Read ARCHITECTURE.md BattleUI section and DESIGN_RULES.md type scale. Propose the node placement and tokens before writing code."
Implement under constraints. Agent edits scripts; uses Palette.* and UITheme.* only.
Update memory. If a new signal or node was added, patch ARCHITECTURE.md in the same session.
Log failures. If we hit a new GDScript or input bug, append COMMONMISTAKES.md before moving on.

That five-step loop was slower per task than "just vibe code it." It was faster per project because I stopped re-explaining the same system every Tuesday.

Token cost also improved. Pointing at one section of ARCHITECTURE.md beat pasting chat transcripts or @-mentioning fifteen files.

Keeping docs alive (or they become harmful)

Stale docs are worse than no docs. The agent will trust them and build the wrong thing with confidence.

Rules that kept the stack honest:

Structural change → update ARCHITECTURE.md before session end. Non-negotiable.
Formula change → update ECONOMY.md in the same PR as EconomyConfig.gd.
New visual pattern → extend DESIGN_RULES.md before copying it elsewhere.
Painful bug → append COMMONMISTAKES.md or the bug will recur.

I treated doc drift like a failing test. If the architecture file did not match the scene tree, we stopped feature work and fixed the doc first.

How this connects to vault-level memory

Repo docs solve one project across weeks.

My Obsidian vault solves identity, procedures, and cross-project context across months (QMD + Obsidian, then Ideaverse + Cursor MCP).

Different scope. Same idea: the model is stateless; the files are not.

Layer	Scope	Examples
Repo root docs	One codebase	`ARCHITECTURE.md`, `ECONOMY.md`, `COMMONMISTAKES.md`
Vault / MCP	All projects, identity, skills	`Me.md`, `Vault Map.md`, `Skills Map.md`

You need both if you ship multiple things with agents. Project docs keep the agent aligned inside the repo. Vault docs keep the agent aligned across repos.

Starter set for your next agent build

You do not need ten files on day one. You need four habits:

ARCHITECTURE.md when the codebase has more than one module and anything talks via events or shared state.
DOMAIN.md (or ECONOMY.md, BILLING.md, PERMISSIONS.md) when tunable rules define product feel.
DESIGN_RULES.md when agents touch UI and you care about consistency.
COMMONMISTAKES.md from the first week, appended every time a fix takes more than one attempt.

Plus an entry file (CLAUDE.md, AGENTS.md, or Cursor rules) that tells the agent these exist and when to read them.

The writing takes an hour upfront. The payoff is not prettier documentation. It is that session forty still knows what session four decided.

Closing

The hardest part of my build was not syntax. The agent typed plenty of syntax. The hard part was continuity: making a stateless collaborator behave like it remembered the system we were building.

Chat could not carry that. Markdown at the repo root could.

If you are building with agents for more than a weekend, invest in the doc stack before you invest in a bigger context window. ARCHITECTURE.md is not paperwork. It is how the model knows where it is.