A conceptual, client‑side KV Cache and Paged Attention visualizer for LLM inference. It demonstrates prefill vs decode, paged KV blocks, and continuous batching without running a real model.
Storage ≠ Attention. Recent‑N limits attention reads, not memory retention.
- Live Demo: https://kvcachevisualizer.vercel.app/
- Single Prompt
- Multi-Prompt
- Prefill writes KV for the full prompt (batch‑parallel).
- Decode reads KV, generates one token, then writes a new KV entry (autoregressive).
KV Cache stores key/value vectors per token and layer so decode can reuse prior context without recomputing.
KV is modeled as fixed‑size blocks (pages) and slots to show how real systems manage paged KV memory.
Multiple prompts run together; decode adds one token per prompt per step while preserving prompt‑owned block chains.
- Single Prompt: one sequence, step‑by‑step prefill → decode.
- Multi Prompt (Continuous Batching): multiple sequences in flight.
- Sliding Window
- Pinned Prefix
- Recent‑N Tokens (attention window only)
This is:
- A conceptual simulator for KV cache mechanics.
- A visual teaching tool for paged attention and continuous batching.
This is not:
- Real model inference or a chatbot.
- A performance benchmark or numeric accuracy test.
- Next.js App Router (16)
- React 19 + TypeScript
- Tailwind CSS 4
/app: Next.js App Router entrypoints/modes: stateful mode containers (single vs multi)/core: pure simulator logic (allocator, stepper, policies)/eviction: eviction policy plug‑ins/prompts: tokenization + prompt streams/components: presentational UI/lib: shared utilities/docs: architecture notes
- Install Node.js 20+.
- Install dependencies:
npm install - Start dev server:
npm run dev - Optional checks:
npm run lint,npm run build
- Connect the repository to Vercel.
- Use the default Next.js build settings.
- Tokens are labels, not tokenizer outputs.
- No tensor math, attention scores, or model weights.
- Deterministic stepping for easy visual verification.
See docs/ARCHITECTURE.md.
- Additional eviction policies
- More detailed per‑prompt debug overlays
- Expanded conceptual annotations
License is not specified yet.
- vLLM
- Paged Attention literature and blog posts

