llm-d Website Documentation System - Current State Analysis

In advance of our push to improve the public facing documentation and guides I wanted to have an (AI assisted) analysis of the current state of how docs get to the public llm-d.ai website.   Since I was the one who got the documentation to the site in the first place, this doc does a good job to summarize how it works today, what requirements we were (and are still) working within, and why what we currently have is probably not the solution we want going forward.  

## Executive Summary

The llm-d website (llm-d.github.io) currently uses a **distributed documentation model** where content is pulled from multiple upstream repositories and transformed at build time using Docusaurus. While this approach keeps docs close to code, it has become increasingly complex and fragile due to:

1. **Complex transformation pipeline** with hacky markdown-to-MDX conversions
2. **Limited versioning support** - all content syncs from `main` branch only
3. **Scattered documentation** across 8+ repositories
4. **Build-time dependencies** on external GitHub repositories
5. **Difficult local development** - changes require rebuilding the entire site when testing different branches
6. **No single source of truth** for documentation

---

## Current Architecture

### System Overview

```
┌─────────────────────────────────────────────────────────────┐
│                    Build Process (GitHub Actions)           │
│  - Triggers: Push to main, nightly cron, manual             │
│  - Runs: npm install → npm run build → deploy to GH Pages   │
└─────────────────────────────────────────────────────────────┘
                              ↓
┌─────────────────────────────────────────────────────────────┐
│               Docusaurus Build with Remote Content          │
│                                                              │
│  1. Reads components-data.yaml (release metadata)           │
│  2. Downloads README.md from 8+ GitHub repos                │
│  3. Applies transformations (fix links, images, MDX)        │
│  4. Generates static site with navigation                   │
└─────────────────────────────────────────────────────────────┘
                              ↓
┌─────────────────────────────────────────────────────────────┐
│                    Output: llm-d.ai                         │
│  - Architecture docs from llm-d/llm-d                       │
│  - Component docs from individual repos                     │
│  - Guides from llm-d/llm-d/guides/                         │
│  - Community docs from llm-d/llm-d                         │
│  - Generated release page from YAML                         │
└─────────────────────────────────────────────────────────────┘
```

### Technology Stack

- **Site Generator:** Docusaurus 3.9.2
- **Remote Content Plugin:** `docusaurus-plugin-remote-content` v4.0.0
- **Build Automation:** GitHub Actions (nightly + on-push)
- **Hosting:** GitHub Pages
- **Content Sources:** 8+ GitHub repositories

### File Structure

```
llm-d.github.io/
├── docusaurus.config.js              # Main Docusaurus config
├── sidebars.js                       # Sidebar definitions (auto-generated)
├── package.json                      # Dependencies
│
├── remote-content/                   # Remote content system
│   ├── remote-content.js            # Plugin aggregator
│   └── remote-sources/
│       ├── components-data.yaml     # ⭐ Central config (release metadata)
│       ├── sync-release.mjs         # Script to update YAML from GitHub API
│       ├── component-configs.js     # YAML loader and repo URL generator
│       ├── repo-transforms.js       # ⚠️ Complex transformation logic
│       ├── utils.js                 # Helper functions
│       │
│       ├── architecture/            # Architecture docs config
│       │   ├── architecture-main.js
│       │   └── components-generator.js  # Auto-generates component pages
│       │
│       ├── guide/                   # Guide docs config
│       │   └── guide-generator.js   # Auto-generates guide pages
│       │
│       ├── community/               # Community docs config
│       │   ├── contribute.js
│       │   ├── code-of-conduct.js
│       │   ├── security.js
│       │   └── sigs.js
│       │
│       ├── usage/                   # Usage docs config
│       │   └── usage-generator.js
│       │
│       └── infra-providers/         # Infra provider docs config
│           └── infra-providers-generator.js
│
├── docs/                            # ⚠️ Build output (not source!)
│   ├── architecture/                # Generated from llm-d/llm-d
│   │   ├── architecture.mdx         # Main README
│   │   ├── latest-release.md        # Generated from YAML
│   │   └── Components/              # Component READMEs
│   │       ├── inference-scheduler.md
│   │       ├── modelservice.md
│   │       ├── kv-cache.md
│   │       └── ... (8 components)
│   │
│   ├── guide/                       # Generated from llm-d/llm-d/guides/
│   │   ├── guide.md                 # guides/README.md
│   │   └── Installation/
│   │       ├── prerequisites.md
│   │       ├── quickstart.md
│   │       ├── inference-scheduling.md
│   │       ├── pd-disaggregation.md
│   │       └── ... (12 guides)
│   │
│   ├── community/                   # Generated from llm-d/llm-d
│   │   ├── contribute.md
│   │   ├── code-of-conduct.md
│   │   ├── security.md
│   │   └── sigs.md
│   │
│   └── usage/                       # Generated from component repos
│       └── ...
│
├── blog/                            # ✅ Local content (not synced)
├── src/                             # ✅ Local React components
└── static/                          # ✅ Local static assets
```

---

## How Remote Content Syncing Works

### 1. Configuration (components-data.yaml)

The single source of truth for what gets synced:

```yaml
release:
  version: v0.5.1
  releaseDate: '2026-03-05'
  releaseDateFormatted: March 5, 2026
  releaseUrl: https://github.com/llm-d/llm-d/releases/tag/v0.5.1
  releaseName: Release v0.5.1

components:
  - name: llm-d-inference-scheduler
    org: llm-d
    sidebarLabel: Inference Scheduler
    description: The scheduler that makes optimized routing decisions...
    sidebarPosition: 1
    version: v0.6.0
  # ... 8 more components
```

**Key Point:** Version tags are **only for display** on the Latest Release page. All content syncs from `main` branch.

### 2. Content Download (Build Time)

For each configured source, the `docusaurus-plugin-remote-content` plugin:

1. Downloads content from GitHub raw URL: `https://raw.githubusercontent.com/{org}/{repo}/main/{file}`
2. Passes content through `modifyContent()` function
3. Applies transformations (see next section)
4. Writes transformed content to `docs/` directory

**Example:** Component README sync
```javascript
{
  name: 'component-llm-d-inference-scheduler',
  sourceBaseUrl: 'https://raw.githubusercontent.com/llm-d/llm-d-inference-scheduler/main/',
  outDir: 'docs/architecture/Components',
  documents: ['README.md'],
  modifyContent(filename, content) {
    // Download README.md from GitHub
    // Apply transformations
    // Output to docs/architecture/Components/inference-scheduler.md
  }
}
```

### 3. Content Transformation Pipeline

The most complex and fragile part of the system. Located in `repo-transforms.js`:

#### Phase 1: Basic MDX Fixes (`applyBasicMdxFixes`)

**Problem:** GitHub-flavored Markdown ≠ MDX (Docusaurus uses MDX)

Transformations:
- Convert GitHub callouts → Docusaurus admonitions
  ```markdown
  > [!NOTE]           →    :::note
  > This is a note         This is a note
                           :::
  ```
- Convert custom tab markers → Docusaurus Tabs components
  ```markdown
       →    <Tabs>
                <TabItem value="gke" label="GKE">
  content                       content
               </TabItem></Tabs>
  ```
- Fix HTML tags for MDX compatibility
  - `<br>` → `<br />`
  - Self-closing tags must have `/>`
  - Attributes must be quoted
- Convert HTML comments → JSX comments
  - `` → `{/* comment */}`
  - Multi-line comments removed entirely
- Escape curly braces in code blocks

#### Phase 2: Link Fixing

**Problem:** Relative links in GitHub READMEs break in Docusaurus

Strategy: **Rewrite ALL relative links to point back to GitHub**

```javascript
// Relative markdown link
[Some Doc](./guides/example.md)

// Gets rewritten to:
[Some Doc](https://github.com/llm-d/llm-d/blob/main/guides/example.md)
```

**Exception:** Internal guide links
- Maintains mapping of GitHub paths → Docusaurus paths
- Example: `guides/quickstart/README.md` → `/docs/guide/Installation/quickstart`
- Only works for explicitly listed paths in `INTERNAL_GUIDE_MAPPINGS`

#### Phase 3: Image Fixing

**Problem:** Relative image paths break

Strategy: **Rewrite to GitHub raw URLs**

```javascript
// Relative image
![Diagram](./images/architecture.png)

// Gets rewritten to:
![Diagram](https://github.com/llm-d/llm-d/raw/main/guides/images/architecture.png)
```

#### Phase 4: Known Broken Link Fixes

Hardcoded fixes for upstream issues:
```javascript
// Fix broken 'dev' branch references
.replace(/github\.com\/llm-d\/llm-d\/tree\/dev\//g,
         'github.com/llm-d/llm-d/tree/main/')
```

**TODO comments indicate these are temporary hacks**

#### Phase 5: Frontmatter Injection

Adds YAML frontmatter for Docusaurus:
```yaml
---
title: Inference Scheduler
description: "The scheduler that makes optimized routing decisions..."
sidebar_label: Inference Scheduler
sidebar_position: 1
keywords: [llm-d, inference scheduler, request routing]
---
```

#### Phase 6: Source Attribution

Adds callout banner to bottom of page:
```markdown
:::info Content Source
This content is automatically synced from [README.md](link) on the `main` branch.

📝 To suggest changes, please [edit the source file](link) or [create an issue](link).
:::
```

---

## Content Sources (8+ Repositories)

### Main Repository
- **Repo:** `llm-d/llm-d`
- **Content:**
  - Architecture overview (`README.md` → `docs/architecture/architecture.mdx`)
  - User guides (`guides/**/*.md` → `docs/guide/`)
  - Community docs (`CONTRIBUTING.md`, `CODE_OF_CONDUCT.md`, etc.)
- **Branch:** Always `main`

### Component Repositories

1. **llm-d-inference-scheduler** (`llm-d/llm-d-inference-scheduler`)
2. **llm-d-modelservice** (`llm-d-incubation/llm-d-modelservice`)
3. **llm-d-inference-sim** (`llm-d/llm-d-inference-sim`)
4. **llm-d-infra** (`llm-d-incubation/llm-d-infra`)
5. **llm-d-kv-cache** (`llm-d/llm-d-kv-cache`)
6. **llm-d-benchmark** (`llm-d/llm-d-benchmark`)
7. **workload-variant-autoscaler** (`llm-d-incubation/workload-variant-autoscaler`)
8. **gateway-api-inference-extension** (`kubernetes-sigs/gateway-api-inference-extension`) - skipSync: true

Each component's `README.md` is synced to `docs/architecture/Components/{name}.md`

---

## Build and Deployment Process

### GitHub Actions Workflow (`.github/workflows/deploy.yml`)

**Triggers:**
- Push to `main` branch
- Nightly cron at midnight UTC (`0 0 * * *`)
- Manual trigger (workflow_dispatch)

**Steps:**
1. Checkout code
2. Setup Node.js 20.18.1
3. `npm install`
4. `npm run build` (Downloads remote content + builds Docusaurus site)
5. Upload build artifact
6. Deploy to GitHub Pages

**Build Time:** ~3-5 minutes (includes downloading from 8+ repos)

### Release Update Process

When a new llm-d release is published:

1. **Run sync script:**
   ```bash
   cd remote-content/remote-sources
   node sync-release.mjs
   ```

2. **Script actions:**
   - Queries GitHub Releases API for latest release
   - Parses "LLM-D Component Summary" table from release notes
   - Updates `components-data.yaml`:
     - Release version, date, URL
     - Component versions
     - New/re-enabled container images

3. **Manual review and commit:**
   ```bash
   git diff components-data.yaml
   git add components-data.yaml
   git commit -m "Update to llm-d v0.5.1"
   git push
   ```

4. **Automatic deployment:**
   - GitHub Actions triggers on push
   - Builds site with updated metadata
   - Deploys to llm-d.ai

**Important:** Content (READMEs, guides) still syncs from `main` branch, not the release tag. Version numbers are only used for display on the Latest Release page.

---

## Pain Points and Limitations

### 1. **Complex and Fragile Transformations**

**Problem:** The transformation pipeline in `repo-transforms.js` is 300+ lines of regex-based text manipulation.

**Examples of brittleness:**
- Regex parsing of markdown syntax (tabs, callouts, links, images)
- Special case handling for different link types (relative, root-relative, complex `../` paths)
- Hardcoded fixes for known broken upstream links (with TODO comments)
- Manual mapping of internal guide links
- Edge cases around curly braces, HTML tags, comments

**Why it's hacky:**
- Regex cannot properly parse markdown (markdown is context-dependent)
- Transformations are order-dependent (must run in specific sequence)
- Each new markdown feature requires new regex
- Difficult to test comprehensively
- Easy to introduce regressions

**Example of complexity:**
```javascript
// Convert tabs (60+ lines of code)
.replace(/\n([\s\S]*?)/g, (match, tabsContent) => {
  const tabSections = tabsContent.split(/\n([\s\S]*?)$/);
    // ... more parsing ...
  }
  // Generate Docusaurus Tabs component
  // Add imports at top of file
})
```

### 2. **No Versioning Support**

**Problem:** All content syncs from `main` branch only.

**Impact:**
- Documentation always shows latest development state
- No way to view docs for specific releases (v0.4.x, v0.5.x, etc.)
- Users on older versions see docs for features they don't have
- Breaking changes in docs can confuse users
- Cannot maintain separate docs for LTS versions

**Current workaround:**
- Version tags in YAML only displayed on "Latest Release" page
- To test content from feature branch, must manually edit code:
  ```javascript
  // Temporary hack to test feature branch
  const ref = 'feature-branch'; // Change back to 'main' before committing!
  ```

### 3. **Build-Time External Dependencies**

**Problem:** Site build depends on availability and content of 8+ external GitHub repositories.

**Risks:**
- Build breaks if upstream repo is deleted/moved
- Build breaks if upstream repo is temporarily unavailable
- Build breaks if upstream content has syntax errors
- Cannot build offline
- Cannot guarantee reproducible builds (upstream can change)
- Slow builds (must download from multiple repos)

**Example failure scenario:**
1. Component repo merges breaking markdown change
2. Nightly build runs at midnight
3. Transformation fails on new markdown syntax
4. Site deploy fails
5. Website is down until someone fixes transformation code

### 4. **Difficult Local Development**

**Problem:** Testing doc changes requires complex workflow.

**For synced content:**
1. Fork the source repository (e.g., `llm-d/llm-d`)
2. Make changes to README or guide
3. Commit and push to fork
4. Temporarily modify `component-configs.js` to point to fork/branch in upstream repo
5. Run `npm start` to build site locally
6. Remember to revert config changes before committing

**Pain points:**
- Cannot see doc changes without rebuilding entire site
- Must modify website code to test upstream changes
- Easy to accidentally commit temporary config changes
- Slow iteration cycle (build takes minutes)

### 5. **Scattered Documentation**

**Problem:** Documentation is spread across multiple repositories with different structures.

**Current locations:**
- Architecture: `llm-d/llm-d/README.md`
- Guides: `llm-d/llm-d/guides/*/README.md`
- Component docs: `{org}/{repo}/README.md` (8 repos)
- Community: `llm-d/llm-d/CONTRIBUTING.md`, etc.
- API reference: Not currently documented
- Infrastructure: `llm-d-incubation/llm-d-infra/README.md`

**Impact:**
- No single source of truth
- Difficult to maintain consistency
- Hard to search across all docs
- No unified navigation
- Unclear where to add new documentation
- Contributors don't know which repo to edit

### 6. **Limited SEO and Discoverability**

**Problem:** Docusaurus is primarily designed for single-repo documentation.

**Issues:**
- Documentation and marketing site mixed together
- Harder to optimize for different audiences (users vs. marketers)
- Search results point to generic llm-d.ai (not docs.llm-d.ai)
- Cannot track documentation analytics separately

### 7. **Maintenance Burden**

**Code to maintain:**
- 12 remote content source files (`.js` configs)
- 1 YAML data file (manually updated)
- 1 sync script (430 lines)
- 1 transformation system (300+ lines of regex)
- 1 utility library
- Custom Docusaurus configuration

**Every time:**
- New component is added → Update YAML + regenerate
- New guide is added → Update generator config
- GitHub changes markdown syntax → Update transformations
- Docusaurus updates → May break transformations
- Component repo restructures → Update source paths

---

## Why the Current Approach Was Chosen

**Original goals:**
1. Keep docs close to code (docs live with component)
2. Allow component teams to own their docs
3. Automatic updates (docs deploy when code changes)
4. Single website for everything (docs + marketing)

**This made sense when:**
- Small number of components (3-4)
- Simple markdown (no advanced features)
- No versioning requirements
- Rapid iteration phase

**What changed:**
- Now 8+ components (growing)
- Need versioning for stable releases
- Complex markdown features (tabs, callouts, etc.)
- Users on different versions (v0.4, v0.5, etc.)
- Transformation edge cases accumulating
- More emphasis on documentation quality


## Appendix: Key Files Reference

### Current System

**Configuration:**
- `remote-content/remote-sources/components-data.yaml` - Central configuration
- `remote-content/remote-content.js` - Plugin aggregator
- `docusaurus.config.js` - Docusaurus configuration

**Transformation:**
- `remote-content/remote-sources/repo-transforms.js` - Transformation pipeline (300+ lines)
- `remote-content/remote-sources/utils.js` - Helper functions

**Generators:**
- `remote-content/remote-sources/architecture/components-generator.js` - Component docs
- `remote-content/remote-sources/guide/guide-generator.js` - Guide docs
- `remote-content/remote-sources/sync-release.mjs` - Release sync script (430 lines)

**Workflows:**
- `.github/workflows/deploy.yml` - Build and deployment

**Dependencies:**
- `docusaurus-plugin-remote-content` v4.0.0
- `@docusaurus/core` 3.9.2
- `js-yaml` 4.1.0 (for YAML parsing)

### Generated Output (Do Not Edit!)

- `docs/architecture/` - Architecture and component docs
- `docs/guide/` - User guides
- `docs/community/` - Community docs
- `docs/usage/` - Usage docs

llm-d Website Documentation System - Current State Analysis #220

Description

Executive Summary

Current Architecture

System Overview

Technology Stack

File Structure

How Remote Content Syncing Works

1. Configuration (components-data.yaml)

2. Content Download (Build Time)

3. Content Transformation Pipeline

Phase 1: Basic MDX Fixes (applyBasicMdxFixes)

Phase 2: Link Fixing

Phase 3: Image Fixing

Phase 4: Known Broken Link Fixes

Phase 5: Frontmatter Injection

Phase 6: Source Attribution

Content Sources (8+ Repositories)

Main Repository

Component Repositories

Build and Deployment Process

GitHub Actions Workflow (.github/workflows/deploy.yml)

Release Update Process

Pain Points and Limitations

1. Complex and Fragile Transformations

2. No Versioning Support

3. Build-Time External Dependencies

4. Difficult Local Development

5. Scattered Documentation

6. Limited SEO and Discoverability

7. Maintenance Burden

Why the Current Approach Was Chosen

Appendix: Key Files Reference

Current System

Generated Output (Do Not Edit!)

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions

Phase 1: Basic MDX Fixes (`applyBasicMdxFixes`)

GitHub Actions Workflow (`.github/workflows/deploy.yml`)