In advance of our push to improve the public facing documentation and guides I wanted to have an (AI assisted) analysis of the current state of how docs get to the public llm-d.ai website. Since I was the one who got the documentation to the site in the first place, this doc does a good job to summarize how it works today, what requirements we were (and are still) working within, and why what we currently have is probably not the solution we want going forward.
Executive Summary
The llm-d website (llm-d.github.io) currently uses a distributed documentation model where content is pulled from multiple upstream repositories and transformed at build time using Docusaurus. While this approach keeps docs close to code, it has become increasingly complex and fragile due to:
- Complex transformation pipeline with hacky markdown-to-MDX conversions
- Limited versioning support - all content syncs from
main branch only
- Scattered documentation across 8+ repositories
- Build-time dependencies on external GitHub repositories
- Difficult local development - changes require rebuilding the entire site when testing different branches
- No single source of truth for documentation
Current Architecture
System Overview
┌─────────────────────────────────────────────────────────────┐
│ Build Process (GitHub Actions) │
│ - Triggers: Push to main, nightly cron, manual │
│ - Runs: npm install → npm run build → deploy to GH Pages │
└─────────────────────────────────────────────────────────────┘
↓
┌─────────────────────────────────────────────────────────────┐
│ Docusaurus Build with Remote Content │
│ │
│ 1. Reads components-data.yaml (release metadata) │
│ 2. Downloads README.md from 8+ GitHub repos │
│ 3. Applies transformations (fix links, images, MDX) │
│ 4. Generates static site with navigation │
└─────────────────────────────────────────────────────────────┘
↓
┌─────────────────────────────────────────────────────────────┐
│ Output: llm-d.ai │
│ - Architecture docs from llm-d/llm-d │
│ - Component docs from individual repos │
│ - Guides from llm-d/llm-d/guides/ │
│ - Community docs from llm-d/llm-d │
│ - Generated release page from YAML │
└─────────────────────────────────────────────────────────────┘
Technology Stack
- Site Generator: Docusaurus 3.9.2
- Remote Content Plugin:
docusaurus-plugin-remote-content v4.0.0
- Build Automation: GitHub Actions (nightly + on-push)
- Hosting: GitHub Pages
- Content Sources: 8+ GitHub repositories
File Structure
llm-d.github.io/
├── docusaurus.config.js # Main Docusaurus config
├── sidebars.js # Sidebar definitions (auto-generated)
├── package.json # Dependencies
│
├── remote-content/ # Remote content system
│ ├── remote-content.js # Plugin aggregator
│ └── remote-sources/
│ ├── components-data.yaml # ⭐ Central config (release metadata)
│ ├── sync-release.mjs # Script to update YAML from GitHub API
│ ├── component-configs.js # YAML loader and repo URL generator
│ ├── repo-transforms.js # ⚠️ Complex transformation logic
│ ├── utils.js # Helper functions
│ │
│ ├── architecture/ # Architecture docs config
│ │ ├── architecture-main.js
│ │ └── components-generator.js # Auto-generates component pages
│ │
│ ├── guide/ # Guide docs config
│ │ └── guide-generator.js # Auto-generates guide pages
│ │
│ ├── community/ # Community docs config
│ │ ├── contribute.js
│ │ ├── code-of-conduct.js
│ │ ├── security.js
│ │ └── sigs.js
│ │
│ ├── usage/ # Usage docs config
│ │ └── usage-generator.js
│ │
│ └── infra-providers/ # Infra provider docs config
│ └── infra-providers-generator.js
│
├── docs/ # ⚠️ Build output (not source!)
│ ├── architecture/ # Generated from llm-d/llm-d
│ │ ├── architecture.mdx # Main README
│ │ ├── latest-release.md # Generated from YAML
│ │ └── Components/ # Component READMEs
│ │ ├── inference-scheduler.md
│ │ ├── modelservice.md
│ │ ├── kv-cache.md
│ │ └── ... (8 components)
│ │
│ ├── guide/ # Generated from llm-d/llm-d/guides/
│ │ ├── guide.md # guides/README.md
│ │ └── Installation/
│ │ ├── prerequisites.md
│ │ ├── quickstart.md
│ │ ├── inference-scheduling.md
│ │ ├── pd-disaggregation.md
│ │ └── ... (12 guides)
│ │
│ ├── community/ # Generated from llm-d/llm-d
│ │ ├── contribute.md
│ │ ├── code-of-conduct.md
│ │ ├── security.md
│ │ └── sigs.md
│ │
│ └── usage/ # Generated from component repos
│ └── ...
│
├── blog/ # ✅ Local content (not synced)
├── src/ # ✅ Local React components
└── static/ # ✅ Local static assets
How Remote Content Syncing Works
1. Configuration (components-data.yaml)
The single source of truth for what gets synced:
release:
version: v0.5.1
releaseDate: '2026-03-05'
releaseDateFormatted: March 5, 2026
releaseUrl: https://github.com/llm-d/llm-d/releases/tag/v0.5.1
releaseName: Release v0.5.1
components:
- name: llm-d-inference-scheduler
org: llm-d
sidebarLabel: Inference Scheduler
description: The scheduler that makes optimized routing decisions...
sidebarPosition: 1
version: v0.6.0
# ... 8 more components
Key Point: Version tags are only for display on the Latest Release page. All content syncs from main branch.
2. Content Download (Build Time)
For each configured source, the docusaurus-plugin-remote-content plugin:
- Downloads content from GitHub raw URL:
https://raw.githubusercontent.com/{org}/{repo}/main/{file}
- Passes content through
modifyContent() function
- Applies transformations (see next section)
- Writes transformed content to
docs/ directory
Example: Component README sync
{
name: 'component-llm-d-inference-scheduler',
sourceBaseUrl: 'https://raw.githubusercontent.com/llm-d/llm-d-inference-scheduler/main/',
outDir: 'docs/architecture/Components',
documents: ['README.md'],
modifyContent(filename, content) {
// Download README.md from GitHub
// Apply transformations
// Output to docs/architecture/Components/inference-scheduler.md
}
}
3. Content Transformation Pipeline
The most complex and fragile part of the system. Located in repo-transforms.js:
Phase 1: Basic MDX Fixes (applyBasicMdxFixes)
Problem: GitHub-flavored Markdown ≠ MDX (Docusaurus uses MDX)
Transformations:
- Convert GitHub callouts → Docusaurus admonitions
> [!NOTE] → :::note
> This is a note This is a note
:::
- Convert custom tab markers → Docusaurus Tabs components
<!-- TABS:START --> → <Tabs>
<!-- TAB:GKE --> <TabItem value="gke" label="GKE">
content content
<!-- TABS:END --> </TabItem></Tabs>
- Fix HTML tags for MDX compatibility
<br> → <br />
- Self-closing tags must have
/>
- Attributes must be quoted
- Convert HTML comments → JSX comments
<!-- comment --> → {/* comment */}
- Multi-line comments removed entirely
- Escape curly braces in code blocks
Phase 2: Link Fixing
Problem: Relative links in GitHub READMEs break in Docusaurus
Strategy: Rewrite ALL relative links to point back to GitHub
// Relative markdown link
[Some Doc](./guides/example.md)
// Gets rewritten to:
[Some Doc](https://github.com/llm-d/llm-d/blob/main/guides/example.md)
Exception: Internal guide links
- Maintains mapping of GitHub paths → Docusaurus paths
- Example:
guides/quickstart/README.md → /docs/guide/Installation/quickstart
- Only works for explicitly listed paths in
INTERNAL_GUIDE_MAPPINGS
Phase 3: Image Fixing
Problem: Relative image paths break
Strategy: Rewrite to GitHub raw URLs
// Relative image

// Gets rewritten to:

Phase 4: Known Broken Link Fixes
Hardcoded fixes for upstream issues:
// Fix broken 'dev' branch references
.replace(/github\.com\/llm-d\/llm-d\/tree\/dev\//g,
'github.com/llm-d/llm-d/tree/main/')
TODO comments indicate these are temporary hacks
Phase 5: Frontmatter Injection
Adds YAML frontmatter for Docusaurus:
---
title: Inference Scheduler
description: "The scheduler that makes optimized routing decisions..."
sidebar_label: Inference Scheduler
sidebar_position: 1
keywords: [llm-d, inference scheduler, request routing]
---
Phase 6: Source Attribution
Adds callout banner to bottom of page:
:::info Content Source
This content is automatically synced from [README.md](link) on the `main` branch.
📝 To suggest changes, please [edit the source file](link) or [create an issue](link).
:::
Content Sources (8+ Repositories)
Main Repository
- Repo:
llm-d/llm-d
- Content:
- Architecture overview (
README.md → docs/architecture/architecture.mdx)
- User guides (
guides/**/*.md → docs/guide/)
- Community docs (
CONTRIBUTING.md, CODE_OF_CONDUCT.md, etc.)
- Branch: Always
main
Component Repositories
- llm-d-inference-scheduler (
llm-d/llm-d-inference-scheduler)
- llm-d-modelservice (
llm-d-incubation/llm-d-modelservice)
- llm-d-inference-sim (
llm-d/llm-d-inference-sim)
- llm-d-infra (
llm-d-incubation/llm-d-infra)
- llm-d-kv-cache (
llm-d/llm-d-kv-cache)
- llm-d-benchmark (
llm-d/llm-d-benchmark)
- workload-variant-autoscaler (
llm-d-incubation/workload-variant-autoscaler)
- gateway-api-inference-extension (
kubernetes-sigs/gateway-api-inference-extension) - skipSync: true
Each component's README.md is synced to docs/architecture/Components/{name}.md
Build and Deployment Process
GitHub Actions Workflow (.github/workflows/deploy.yml)
Triggers:
- Push to
main branch
- Nightly cron at midnight UTC (
0 0 * * *)
- Manual trigger (workflow_dispatch)
Steps:
- Checkout code
- Setup Node.js 20.18.1
npm install
npm run build (Downloads remote content + builds Docusaurus site)
- Upload build artifact
- Deploy to GitHub Pages
Build Time: ~3-5 minutes (includes downloading from 8+ repos)
Release Update Process
When a new llm-d release is published:
-
Run sync script:
cd remote-content/remote-sources
node sync-release.mjs
-
Script actions:
- Queries GitHub Releases API for latest release
- Parses "LLM-D Component Summary" table from release notes
- Updates
components-data.yaml:
- Release version, date, URL
- Component versions
- New/re-enabled container images
-
Manual review and commit:
git diff components-data.yaml
git add components-data.yaml
git commit -m "Update to llm-d v0.5.1"
git push
-
Automatic deployment:
- GitHub Actions triggers on push
- Builds site with updated metadata
- Deploys to llm-d.ai
Important: Content (READMEs, guides) still syncs from main branch, not the release tag. Version numbers are only used for display on the Latest Release page.
Pain Points and Limitations
1. Complex and Fragile Transformations
Problem: The transformation pipeline in repo-transforms.js is 300+ lines of regex-based text manipulation.
Examples of brittleness:
- Regex parsing of markdown syntax (tabs, callouts, links, images)
- Special case handling for different link types (relative, root-relative, complex
../ paths)
- Hardcoded fixes for known broken upstream links (with TODO comments)
- Manual mapping of internal guide links
- Edge cases around curly braces, HTML tags, comments
Why it's hacky:
- Regex cannot properly parse markdown (markdown is context-dependent)
- Transformations are order-dependent (must run in specific sequence)
- Each new markdown feature requires new regex
- Difficult to test comprehensively
- Easy to introduce regressions
Example of complexity:
// Convert tabs (60+ lines of code)
.replace(/<!-- TABS:START -->\n([\s\S]*?)<!-- TABS:END -->/g, (match, tabsContent) => {
const tabSections = tabsContent.split(/<!-- TAB:/);
const tabs = [];
for (let i = 1; i < tabSections.length; i++) {
const section = tabSections[i];
const labelMatch = section.match(/^([^:]+?)(?::default)?\s*-->\n([\s\S]*?)$/);
// ... more parsing ...
}
// Generate Docusaurus Tabs component
// Add imports at top of file
})
2. No Versioning Support
Problem: All content syncs from main branch only.
Impact:
- Documentation always shows latest development state
- No way to view docs for specific releases (v0.4.x, v0.5.x, etc.)
- Users on older versions see docs for features they don't have
- Breaking changes in docs can confuse users
- Cannot maintain separate docs for LTS versions
Current workaround:
- Version tags in YAML only displayed on "Latest Release" page
- To test content from feature branch, must manually edit code:
// Temporary hack to test feature branch
const ref = 'feature-branch'; // Change back to 'main' before committing!
3. Build-Time External Dependencies
Problem: Site build depends on availability and content of 8+ external GitHub repositories.
Risks:
- Build breaks if upstream repo is deleted/moved
- Build breaks if upstream repo is temporarily unavailable
- Build breaks if upstream content has syntax errors
- Cannot build offline
- Cannot guarantee reproducible builds (upstream can change)
- Slow builds (must download from multiple repos)
Example failure scenario:
- Component repo merges breaking markdown change
- Nightly build runs at midnight
- Transformation fails on new markdown syntax
- Site deploy fails
- Website is down until someone fixes transformation code
4. Difficult Local Development
Problem: Testing doc changes requires complex workflow.
For synced content:
- Fork the source repository (e.g.,
llm-d/llm-d)
- Make changes to README or guide
- Commit and push to fork
- Temporarily modify
component-configs.js to point to fork/branch in upstream repo
- Run
npm start to build site locally
- Remember to revert config changes before committing
Pain points:
- Cannot see doc changes without rebuilding entire site
- Must modify website code to test upstream changes
- Easy to accidentally commit temporary config changes
- Slow iteration cycle (build takes minutes)
5. Scattered Documentation
Problem: Documentation is spread across multiple repositories with different structures.
Current locations:
- Architecture:
llm-d/llm-d/README.md
- Guides:
llm-d/llm-d/guides/*/README.md
- Component docs:
{org}/{repo}/README.md (8 repos)
- Community:
llm-d/llm-d/CONTRIBUTING.md, etc.
- API reference: Not currently documented
- Infrastructure:
llm-d-incubation/llm-d-infra/README.md
Impact:
- No single source of truth
- Difficult to maintain consistency
- Hard to search across all docs
- No unified navigation
- Unclear where to add new documentation
- Contributors don't know which repo to edit
6. Limited SEO and Discoverability
Problem: Docusaurus is primarily designed for single-repo documentation.
Issues:
- Documentation and marketing site mixed together
- Harder to optimize for different audiences (users vs. marketers)
- Search results point to generic llm-d.ai (not docs.llm-d.ai)
- Cannot track documentation analytics separately
7. Maintenance Burden
Code to maintain:
- 12 remote content source files (
.js configs)
- 1 YAML data file (manually updated)
- 1 sync script (430 lines)
- 1 transformation system (300+ lines of regex)
- 1 utility library
- Custom Docusaurus configuration
Every time:
- New component is added → Update YAML + regenerate
- New guide is added → Update generator config
- GitHub changes markdown syntax → Update transformations
- Docusaurus updates → May break transformations
- Component repo restructures → Update source paths
Why the Current Approach Was Chosen
Original goals:
- Keep docs close to code (docs live with component)
- Allow component teams to own their docs
- Automatic updates (docs deploy when code changes)
- Single website for everything (docs + marketing)
This made sense when:
- Small number of components (3-4)
- Simple markdown (no advanced features)
- No versioning requirements
- Rapid iteration phase
What changed:
- Now 8+ components (growing)
- Need versioning for stable releases
- Complex markdown features (tabs, callouts, etc.)
- Users on different versions (v0.4, v0.5, etc.)
- Transformation edge cases accumulating
- More emphasis on documentation quality
Appendix: Key Files Reference
Current System
Configuration:
remote-content/remote-sources/components-data.yaml - Central configuration
remote-content/remote-content.js - Plugin aggregator
docusaurus.config.js - Docusaurus configuration
Transformation:
remote-content/remote-sources/repo-transforms.js - Transformation pipeline (300+ lines)
remote-content/remote-sources/utils.js - Helper functions
Generators:
remote-content/remote-sources/architecture/components-generator.js - Component docs
remote-content/remote-sources/guide/guide-generator.js - Guide docs
remote-content/remote-sources/sync-release.mjs - Release sync script (430 lines)
Workflows:
.github/workflows/deploy.yml - Build and deployment
Dependencies:
docusaurus-plugin-remote-content v4.0.0
@docusaurus/core 3.9.2
js-yaml 4.1.0 (for YAML parsing)
Generated Output (Do Not Edit!)
docs/architecture/ - Architecture and component docs
docs/guide/ - User guides
docs/community/ - Community docs
docs/usage/ - Usage docs
In advance of our push to improve the public facing documentation and guides I wanted to have an (AI assisted) analysis of the current state of how docs get to the public llm-d.ai website. Since I was the one who got the documentation to the site in the first place, this doc does a good job to summarize how it works today, what requirements we were (and are still) working within, and why what we currently have is probably not the solution we want going forward.
Executive Summary
The llm-d website (llm-d.github.io) currently uses a distributed documentation model where content is pulled from multiple upstream repositories and transformed at build time using Docusaurus. While this approach keeps docs close to code, it has become increasingly complex and fragile due to:
mainbranch onlyCurrent Architecture
System Overview
Technology Stack
docusaurus-plugin-remote-contentv4.0.0File Structure
How Remote Content Syncing Works
1. Configuration (components-data.yaml)
The single source of truth for what gets synced:
Key Point: Version tags are only for display on the Latest Release page. All content syncs from
mainbranch.2. Content Download (Build Time)
For each configured source, the
docusaurus-plugin-remote-contentplugin:https://raw.githubusercontent.com/{org}/{repo}/main/{file}modifyContent()functiondocs/directoryExample: Component README sync
3. Content Transformation Pipeline
The most complex and fragile part of the system. Located in
repo-transforms.js:Phase 1: Basic MDX Fixes (
applyBasicMdxFixes)Problem: GitHub-flavored Markdown ≠ MDX (Docusaurus uses MDX)
Transformations:
<br>→<br />/><!-- comment -->→{/* comment */}Phase 2: Link Fixing
Problem: Relative links in GitHub READMEs break in Docusaurus
Strategy: Rewrite ALL relative links to point back to GitHub
Exception: Internal guide links
guides/quickstart/README.md→/docs/guide/Installation/quickstartINTERNAL_GUIDE_MAPPINGSPhase 3: Image Fixing
Problem: Relative image paths break
Strategy: Rewrite to GitHub raw URLs
Phase 4: Known Broken Link Fixes
Hardcoded fixes for upstream issues:
TODO comments indicate these are temporary hacks
Phase 5: Frontmatter Injection
Adds YAML frontmatter for Docusaurus:
Phase 6: Source Attribution
Adds callout banner to bottom of page:
Content Sources (8+ Repositories)
Main Repository
llm-d/llm-dREADME.md→docs/architecture/architecture.mdx)guides/**/*.md→docs/guide/)CONTRIBUTING.md,CODE_OF_CONDUCT.md, etc.)mainComponent Repositories
llm-d/llm-d-inference-scheduler)llm-d-incubation/llm-d-modelservice)llm-d/llm-d-inference-sim)llm-d-incubation/llm-d-infra)llm-d/llm-d-kv-cache)llm-d/llm-d-benchmark)llm-d-incubation/workload-variant-autoscaler)kubernetes-sigs/gateway-api-inference-extension) - skipSync: trueEach component's
README.mdis synced todocs/architecture/Components/{name}.mdBuild and Deployment Process
GitHub Actions Workflow (
.github/workflows/deploy.yml)Triggers:
mainbranch0 0 * * *)Steps:
npm installnpm run build(Downloads remote content + builds Docusaurus site)Build Time: ~3-5 minutes (includes downloading from 8+ repos)
Release Update Process
When a new llm-d release is published:
Run sync script:
cd remote-content/remote-sources node sync-release.mjsScript actions:
components-data.yaml:Manual review and commit:
git diff components-data.yaml git add components-data.yaml git commit -m "Update to llm-d v0.5.1" git pushAutomatic deployment:
Important: Content (READMEs, guides) still syncs from
mainbranch, not the release tag. Version numbers are only used for display on the Latest Release page.Pain Points and Limitations
1. Complex and Fragile Transformations
Problem: The transformation pipeline in
repo-transforms.jsis 300+ lines of regex-based text manipulation.Examples of brittleness:
../paths)Why it's hacky:
Example of complexity:
2. No Versioning Support
Problem: All content syncs from
mainbranch only.Impact:
Current workaround:
3. Build-Time External Dependencies
Problem: Site build depends on availability and content of 8+ external GitHub repositories.
Risks:
Example failure scenario:
4. Difficult Local Development
Problem: Testing doc changes requires complex workflow.
For synced content:
llm-d/llm-d)component-configs.jsto point to fork/branch in upstream reponpm startto build site locallyPain points:
5. Scattered Documentation
Problem: Documentation is spread across multiple repositories with different structures.
Current locations:
llm-d/llm-d/README.mdllm-d/llm-d/guides/*/README.md{org}/{repo}/README.md(8 repos)llm-d/llm-d/CONTRIBUTING.md, etc.llm-d-incubation/llm-d-infra/README.mdImpact:
6. Limited SEO and Discoverability
Problem: Docusaurus is primarily designed for single-repo documentation.
Issues:
7. Maintenance Burden
Code to maintain:
.jsconfigs)Every time:
Why the Current Approach Was Chosen
Original goals:
This made sense when:
What changed:
Appendix: Key Files Reference
Current System
Configuration:
remote-content/remote-sources/components-data.yaml- Central configurationremote-content/remote-content.js- Plugin aggregatordocusaurus.config.js- Docusaurus configurationTransformation:
remote-content/remote-sources/repo-transforms.js- Transformation pipeline (300+ lines)remote-content/remote-sources/utils.js- Helper functionsGenerators:
remote-content/remote-sources/architecture/components-generator.js- Component docsremote-content/remote-sources/guide/guide-generator.js- Guide docsremote-content/remote-sources/sync-release.mjs- Release sync script (430 lines)Workflows:
.github/workflows/deploy.yml- Build and deploymentDependencies:
docusaurus-plugin-remote-contentv4.0.0@docusaurus/core3.9.2js-yaml4.1.0 (for YAML parsing)Generated Output (Do Not Edit!)
docs/architecture/- Architecture and component docsdocs/guide/- User guidesdocs/community/- Community docsdocs/usage/- Usage docs