KV Cache Visualizer

A conceptual, client‑side KV Cache and Paged Attention visualizer for LLM inference. It demonstrates prefill vs decode, paged KV blocks, and continuous batching without running a real model.

Storage ≠ Attention. Recent‑N limits attention reads, not memory retention.

Live Demo

Live Demo: https://kvcachevisualizer.vercel.app/

Screenshots

Single Prompt

Multi-Prompt

Core Concepts

Prefill vs Decode

Prefill writes KV for the full prompt (batch‑parallel).
Decode reads KV, generates one token, then writes a new KV entry (autoregressive).

KV Cache

KV Cache stores key/value vectors per token and layer so decode can reuse prior context without recomputing.

Paged Attention

KV is modeled as fixed‑size blocks (pages) and slots to show how real systems manage paged KV memory.

Continuous Batching

Multiple prompts run together; decode adds one token per prompt per step while preserving prompt‑owned block chains.

Features

Modes

Single Prompt: one sequence, step‑by‑step prefill → decode.
Multi Prompt (Continuous Batching): multiple sequences in flight.

Eviction Policies

Sliding Window
Pinned Prefix
Recent‑N Tokens (attention window only)

What This Is / What This Is NOT

This is:

A conceptual simulator for KV cache mechanics.
A visual teaching tool for paged attention and continuous batching.

This is not:

Real model inference or a chatbot.
A performance benchmark or numeric accuracy test.

Tech Stack

Next.js App Router (16)
React 19 + TypeScript
Tailwind CSS 4

Project Structure (High‑Level)

/app: Next.js App Router entrypoints
/modes: stateful mode containers (single vs multi)
/core: pure simulator logic (allocator, stepper, policies)
/eviction: eviction policy plug‑ins
/prompts: tokenization + prompt streams
/components: presentational UI
/lib: shared utilities
/docs: architecture notes

Local Development

Install Node.js 20+.
Install dependencies: npm install
Start dev server: npm run dev
Optional checks: npm run lint, npm run build

Deployment (Vercel)

Connect the repository to Vercel.
Use the default Next.js build settings.

Limitations & Design Choices

Tokens are labels, not tokenizer outputs.
No tensor math, attention scores, or model weights.
Deterministic stepping for easy visual verification.

Architecture Overview

See docs/ARCHITECTURE.md.

Roadmap (Optional)

Additional eviction policies
More detailed per‑prompt debug overlays
Expanded conceptual annotations

License

License is not specified yet.

Acknowledgements / References

vLLM
Paged Attention literature and blog posts

Name		Name	Last commit message	Last commit date
Latest commit History 13 Commits
app		app
components		components
core		core
docs		docs
eviction		eviction
lib		lib
modes		modes
prompts		prompts
public		public
.gitignore		.gitignore
CONTRIBUTING.md		CONTRIBUTING.md
README.md		README.md
eslint.config.mjs		eslint.config.mjs
next-env.d.ts		next-env.d.ts
next.config.ts		next.config.ts
package-lock.json		package-lock.json
package.json		package.json
postcss.config.mjs		postcss.config.mjs
tsconfig.json		tsconfig.json

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

KV Cache Visualizer

Live Demo

Screenshots

Core Concepts

Prefill vs Decode

KV Cache

Paged Attention

Continuous Batching

Features

Modes

Eviction Policies

What This Is / What This Is NOT

Tech Stack

Project Structure (High‑Level)

Local Development

Deployment (Vercel)

Limitations & Design Choices

Architecture Overview

Roadmap (Optional)

License

Acknowledgements / References

About

Uh oh!

Releases

Packages

Languages

GandholiSarat/kv-cache-visualizer

Folders and files

Latest commit

History

Repository files navigation

KV Cache Visualizer

Live Demo

Screenshots

Core Concepts

Prefill vs Decode

KV Cache

Paged Attention

Continuous Batching

Features

Modes

Eviction Policies

What This Is / What This Is NOT

Tech Stack

Project Structure (High‑Level)

Local Development

Deployment (Vercel)

Limitations & Design Choices

Architecture Overview

Roadmap (Optional)

License

Acknowledgements / References

About

Resources

Contributing

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages