LLM Wiki — Blake Merryman

From a Github Gist by Andrej Karpathy (local copy for posterity):

A pattern for building personal knowledge bases using LLMs.

[…]

Most people’s experience with LLMs and documents looks like RAG: you upload a collection of files, the LLM retrieves relevant chunks at query time, and generates an answer. This works, but the LLM is rediscovering knowledge from scratch on every question. There’s no accumulation.

[…]

The idea here is different. Instead of just retrieving from raw documents at query time, the LLM incrementally builds and maintains a persistent wiki — a structured, interlinked collection of markdown files that sits between you and the raw sources.

As a human reader, I get immense value out of online wikis (like Wikipedia) and will often not even need the actual source material for the vast majority of my searches. When reading fiction, I will frequently use book-specific wikis as supplemental reading when I’m trying to remember certain details (e.g. where else have I seen this character?). My personal knowledge base (and this site, to some degree) is an effort to build a wiki-like system, that I control, to augment my own memory. Wikis save me a lot of time.

Applying this same principle to LLM systems is a fascinating idea. Wikis (theoretically) can save a lot of tokens.

The knowledge is compiled once and then kept current, not re-derived on every query.
[…]
You never (or rarely) write the wiki yourself — the LLM writes and maintains all of it. You’re in charge of sourcing, exploration, and asking the right questions. The LLM does all the grunt work

Architecture

There are three layers:

Raw sources — your curated collection of source documents. Articles, papers, images, data files. These are immutable — the LLM reads from them but never modifies them. This is your source of truth.

The wiki — a directory of LLM-generated markdown files. Summaries, entity pages, concept pages, comparisons, an overview, a synthesis. The LLM owns this layer entirely. It creates pages, updates them when new sources arrive, maintains cross-references, and keeps everything consistent. You read it; the LLM writes it.

The schema — a document (e.g. CLAUDE.md for Claude Code or AGENTS.md for Codex) that tells the LLM how the wiki is structured, what the conventions are, and what workflows to follow when ingesting sources, answering questions, or maintaining the wiki. This is the key configuration file — it’s what makes the LLM a disciplined wiki maintainer rather than a generic chatbot. You and the LLM co-evolve this over time as you figure out what works for your domain.

Operations

Ingest. You drop a new source into the raw collection and tell the LLM to process it. […]

Query. You ask questions against the wiki. The LLM searches for relevant pages, reads them, and synthesizes an answer with citations. […] The important insight: good answers can be filed back into the wiki as new pages. […] This way your explorations compound in the knowledge base just like ingested sources do.

Lint. Periodically, ask the LLM to health-check the wiki. Look for: contradictions between pages, stale claims that newer sources have superseded, orphan pages with no inbound links, important concepts mentioned but lacking their own page, missing cross-references, data gaps that could be filled with a web search. […]

The tedious part of maintaining a knowledge base is not the reading or the thinking — it’s the bookkeeping.

[…]

The right way to use this is to share it with your LLM agent and work together to instantiate a version that fits your needs.

I’ve applied the concepts for this idea to one of my projects. My first attempt was a failure. It generated a wiki, but the instructions written to CLAUDE.md contained a fatal flaw: Claude would greedily consume the wiki on every query. This was filling my context window quickly. I’ve since updated those instructions and seen efficiency improvements. Plenty of room to improve, though. I plan to explore implementing a few of the operations as commands and an agent for coordination.

Another interesting idea I’ve seen around this concept: for software projects, using compiled artifacts like AST to provide the basis for the wiki. See Codesight for more details.

LLM Wiki (↗)

Architecture

Operations