Improve agent-friendly docs compliance by JakeSCahill · Pull Request #186 · redpanda-data/docs-extensions-and-macros

JakeSCahill · 2026-04-06T18:01:16Z

Summary

Improves compliance with the Agent-Friendly Docs spec to make our documentation more accessible to AI agents.

Changes

llms-txt Directive (convert-to-markdown.js):

Move directive to appear immediately after H1 heading (before frontmatter)
Change from HTML comment to visible markdown blockquote format
Follows spec recommendation: can be hidden with CSS while remaining accessible to agents

Comprehensive Navigation (convert-llms-to-txt.js):

Inject navigation section programmatically into llms.txt
Link to sitemap.md and component sitemaps for improved coverage
Include key documentation section links
Respect 45K character limit (buffer below 50K spec limit)

Expected Improvements

Check	Before	Expected After
llms-txt-directive	Warning (buried deep)	Pass (near top)
llms-txt-freshness	10%	80%+ (via sitemap links)

Related PR

Improve agent-friendly docs compliance docs-site#163

Test plan

Build docs locally and verify llms.txt contains navigation section
Verify markdown files have blockquote directive near top
Run afdocs check on deploy preview

- Move llms.txt directive to appear immediately after H1 heading - Change directive from HTML comment to markdown blockquote format - Add comprehensive navigation section to llms.txt programmatically - Include sitemap links for improved llms-txt-freshness score - Strip directive from llms.txt output (would be redundant)

netlify · 2026-04-06T18:01:22Z

👷 Deploy Preview for docs-extensions-and-macros processing.

Name	Link
🔨 Latest commit	`4050fd8`
🔍 Latest deploy log	https://app.netlify.com/projects/docs-extensions-and-macros/deploys/69d5269075217200089412d0

coderabbitai · 2026-04-06T18:01:31Z

Important

Review skipped

Auto incremental reviews are disabled on this repository.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: a064f1d0-7a2b-4b20-a812-44bb53972b79

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Use the checkbox below for a quick retry:

🔍 Trigger review

📝 Walkthrough

Walkthrough

This PR modifies the markdown extraction and llms.txt generation pipeline in two extension files. In convert-llms-to-txt.js, it updates the markdown cleanup logic by changing metadata removal patterns and adds a new generateNavigationSection() function that builds a documentation index section. The llms.txt generation is enhanced with a MAX_LLMS_TXT_CHARS size cap (45,000 characters) that conditionally injects the navigation section only when it fits. In convert-to-markdown.js, the AI note format changes from an HTML comment to a blockquote directive pointing to /llms.txt, with optional component-specific links, and reorders the markdown assembly structure.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Possibly related PRs

PR #174: Overlapping changes to convert-llms-to-txt.js markdown extraction/cleanup and llms.txt generation logic.
PR #182: Direct modifications to convert-to-markdown.js affecting H1, frontmatter, and AI/llms directive placement and ordering in exported Markdown.
PR #178: Related changes across both files to add navigation sections and component-specific full.txt references in llms.txt generation and markdown directives.

Suggested reviewers

micheleRP
paulohtb6
Feediver1

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 66.67% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (2 passed)

Check name	Status	Explanation
Title check	✅ Passed	The title 'Improve agent-friendly docs compliance' is clear and directly related to the main objective of the PR, which is enhancing compliance with the Agent-Friendly Docs spec. It accurately summarizes the primary change without being vague or misleading.
Description check	✅ Passed	The pull request description clearly relates to the changeset, detailing modifications to convert-to-markdown.js and convert-llms-to-txt.js to improve Agent-Friendly Docs spec compliance.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch improve-afdocs-directive

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 3

🧹 Nitpick comments (1)

extensions/convert-to-markdown.js (1)
503-516: Centralize the directive text shared with the llms.txt cleaner.

extensions/convert-llms-to-txt.js:79 strips this note with a regex that depends on the exact sentence emitted here. A copy tweak in one file will silently leak the directive back into llms.txt, so please move the formatter/pattern into a shared helper or constant.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@extensions/convert-to-markdown.js` around lines 503 - 516, The llms directive
string (currently built into convert-to-markdown.js as the llmsDirective
variable) must be centralized so the regex in convert-llms-to-txt.js (line ~79)
can reliably strip it; create a shared export (e.g., a constant like
LLMS_DIRECTIVE_TEXT or a helper function formatLlmsDirective(componentName)) in
a new module and import it into both convert-to-markdown.js and
convert-llms-to-txt.js, replace the inline llmsDirective construction in
convert-to-markdown.js with the shared helper/constant (used to render with or
without componentName/canonicalUrl) and update convert-llms-to-txt.js to use the
same constant or a precomputed regex derived from that constant so both files
reference a single source of truth.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@extensions/convert-llms-to-txt.js`:
- Around line 74-80: Extract the metadata-stripping logic into a shared helper
(e.g., stripMetadata(content)) and call it wherever page bodies are prepared:
use it on llmsPage and also on the content you append when building
llms-full.txt and `${componentName}-full.txt` (replace direct uses of
page.markdownContents with stripMetadata(page.markdownContents)); the helper
should perform the same two regex replacements currently applied to content
(removing HTML comment source markers and the "> For the complete documentation
index..." blockquote) and be invoked before concatenation so all aggregated
exports get the cleaned body.
- Around line 563-581: The numeric page counts in the nav are derived from
component.latest.pages (via components.reduce and
component?.latest?.pages?.length) which can undercount compared to the actual
sitemaps used to build sitemap-all.md; update the code in
extensions/convert-llms-to-txt.js to either (A) compute page totals from the
same canonical source used by convert-sitemap-to-markdown.js (e.g., the allUrls
collection or the aggregated sitemap data) and use those counts when rendering
sitemap-all.md and each component entry, or (B) remove the numeric claims
entirely and render the list without counts; locate the references to
components, componentSitemaps, and the sitemap-all.md string to change where
counts are computed/inserted.
- Around line 280-296: The current check only prevents appending navSection but
doesn't enforce MAX_LLMS_TXT_CHARS on the original llmsPage.llmsTxtContent;
update the logic around MAX_LLMS_TXT_CHARS, llmsTxtContent and
llmsPage.llmsTxtContent so the final llmsTxtContent never exceeds the limit: if
llmsPage.llmsTxtContent.length >= MAX_LLMS_TXT_CHARS, either truncate
llmsPage.llmsTxtContent to MAX_LLMS_TXT_CHARS (with a clear logger.warn) or
reject/publish a reduced file (whichever policy fits), and when appending
navSection compute available space = MAX_LLMS_TXT_CHARS - llmsTxtContent.length
and append only navSection.slice(0, availableSpace) (logging the truncation).
Make changes where navSection is created by generateNavigationSection and where
logger.info/warn report sizes to reflect truncation or rejection.

---

Nitpick comments:
In `@extensions/convert-to-markdown.js`:
- Around line 503-516: The llms directive string (currently built into
convert-to-markdown.js as the llmsDirective variable) must be centralized so the
regex in convert-llms-to-txt.js (line ~79) can reliably strip it; create a
shared export (e.g., a constant like LLMS_DIRECTIVE_TEXT or a helper function
formatLlmsDirective(componentName)) in a new module and import it into both
convert-to-markdown.js and convert-llms-to-txt.js, replace the inline
llmsDirective construction in convert-to-markdown.js with the shared
helper/constant (used to render with or without componentName/canonicalUrl) and
update convert-llms-to-txt.js to use the same constant or a precomputed regex
derived from that constant so both files reference a single source of truth.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 87c30925-08bb-4609-9d5a-c036e22f4c8f

📥 Commits

Reviewing files that changed from the base of the PR and between 9bba072 and d186f30.

📒 Files selected for processing (2)

extensions/convert-llms-to-txt.js
extensions/convert-to-markdown.js

extensions/convert-llms-to-txt.js

- Extract metadata-stripping logic into shared helper (llms-utils.js) - Apply stripMarkdownMetadata to llms-full.txt and component-full.txt pages - Remove inaccurate page counts from navigation section - Properly enforce MAX_LLMS_TXT_CHARS with truncation logic - Centralize directive format in shared module for single source of truth

micheleRP

Review Summary

Overall: Clean, well-structured PR. The shared utility module (llms-utils.js) is a nice touch that addresses the exact concern CodeRabbit raised about keeping the directive format and its stripping regex in sync.

Findings

1. Hardcoded navigation links are fragile (Medium)

generateNavigationSection() hardcodes ~20 documentation section URLs (e.g., /current/deploy.md, /redpanda-cloud/ai-agents.md). If any page is renamed, moved, or removed, these links will 404 silently. Consider:

Generating these from the content catalog at build time instead, or
At minimum, adding a comment noting these need manual updates when pages move

2. `siteUrl` could include a trailing slash (Low)

generateNavigationSection(siteUrl) concatenates ${siteUrl}/sitemap.md etc. If siteUrl already ends with /, you'll get double slashes (https://docs.redpanda.com//sitemap.md). A quick siteUrl.replace(/\/+$/, '') would be defensive.

3. Truncation uses naive `slice()` (Low)

Both truncation paths (llmsTxtContent.slice(0, MAX_LLMS_TXT_CHARS - 100) and navSection.slice(0, availableSpace - 50)) slice at arbitrary character positions. This could cut mid-line or mid-URL. Consider truncating at the last \n before the limit for cleaner output.

4. Shared utils module is well done (Positive)

llms-utils.js centralizes the directive text, formatting, regex, and stripping into one module — exactly the right pattern. The regex in LLMS_DIRECTIVE_REGEX correctly matches the format produced by formatLlmsDirective().

5. Directive placement change looks correct (Positive)

Moving the directive from a buried HTML comment to a blockquote immediately after H1 (H1 → directive → frontmatter → source → content) aligns with the agent-friendly docs spec requirement for near-top-of-page placement.

Verdict

Good to merge. The hardcoded nav links (#1) are the main concern — they'll need maintenance as the docs site evolves. The other items are minor defensive improvements.

micheleRP

lgtm!

… hardcoded links

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>

coderabbitai bot reviewed Apr 6, 2026

View reviewed changes

extensions/convert-llms-to-txt.js Outdated Show resolved Hide resolved

extensions/convert-llms-to-txt.js Outdated Show resolved Hide resolved

extensions/convert-llms-to-txt.js Outdated Show resolved Hide resolved

JakeSCahill mentioned this pull request Apr 7, 2026

Improve agent-friendly docs compliance redpanda-data/docs-site#163

Open

3 tasks

micheleRP reviewed Apr 7, 2026

View reviewed changes

micheleRP approved these changes Apr 7, 2026

View reviewed changes

JakeSCahill and others added 3 commits April 7, 2026 16:22

Fix llms.txt generation: trailing slash, newline truncation, document…

7a02ab8

… hardcoded links

chore: bump version to 4.15.10

6508397

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>

Merge branch 'main' into improve-afdocs-directive

4050fd8

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improve agent-friendly docs compliance#186

Improve agent-friendly docs compliance#186
JakeSCahill wants to merge 5 commits intomainfrom
improve-afdocs-directive

JakeSCahill commented Apr 6, 2026 •

edited

Loading

Uh oh!

netlify bot commented Apr 6, 2026 •

edited

Loading

Uh oh!

coderabbitai bot commented Apr 6, 2026 •

edited

Loading

Review skipped

Walkthrough

Estimated code review effort

Possibly related PRs

Suggested reviewers

❌ Failed checks (1 warning)

Uh oh!

coderabbitai bot left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

micheleRP left a comment

Uh oh!

micheleRP left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

JakeSCahill commented Apr 6, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Changes

Expected Improvements

Related PR

Test plan

Uh oh!

netlify bot commented Apr 6, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

👷 Deploy Preview for docs-extensions-and-macros processing.

Uh oh!

coderabbitai bot commented Apr 6, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Review skipped

Walkthrough

Estimated code review effort

Possibly related PRs

Suggested reviewers

❌ Failed checks (1 warning)

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

micheleRP left a comment

Choose a reason for hiding this comment

Review Summary

Findings

1. Hardcoded navigation links are fragile (Medium)

2. siteUrl could include a trailing slash (Low)

3. Truncation uses naive slice() (Low)

4. Shared utils module is well done (Positive)

5. Directive placement change looks correct (Positive)

Verdict

Uh oh!

micheleRP left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

JakeSCahill commented Apr 6, 2026 •

edited

Loading

netlify bot commented Apr 6, 2026 •

edited

Loading

coderabbitai bot commented Apr 6, 2026 •

edited

Loading

2. `siteUrl` could include a trailing slash (Low)

3. Truncation uses naive `slice()` (Low)