Skip to content

Improve agent-friendly docs compliance#186

Open
JakeSCahill wants to merge 5 commits intomainfrom
improve-afdocs-directive
Open

Improve agent-friendly docs compliance#186
JakeSCahill wants to merge 5 commits intomainfrom
improve-afdocs-directive

Conversation

@JakeSCahill
Copy link
Copy Markdown
Contributor

@JakeSCahill JakeSCahill commented Apr 6, 2026

Summary

Improves compliance with the Agent-Friendly Docs spec to make our documentation more accessible to AI agents.

Changes

llms-txt Directive (convert-to-markdown.js):

  • Move directive to appear immediately after H1 heading (before frontmatter)
  • Change from HTML comment to visible markdown blockquote format
  • Follows spec recommendation: can be hidden with CSS while remaining accessible to agents

Comprehensive Navigation (convert-llms-to-txt.js):

  • Inject navigation section programmatically into llms.txt
  • Link to sitemap.md and component sitemaps for improved coverage
  • Include key documentation section links
  • Respect 45K character limit (buffer below 50K spec limit)

Expected Improvements

Check Before Expected After
llms-txt-directive Warning (buried deep) Pass (near top)
llms-txt-freshness 10% 80%+ (via sitemap links)

Related PR

Test plan

  • Build docs locally and verify llms.txt contains navigation section
  • Verify markdown files have blockquote directive near top
  • Run afdocs check on deploy preview

- Move llms.txt directive to appear immediately after H1 heading
- Change directive from HTML comment to markdown blockquote format
- Add comprehensive navigation section to llms.txt programmatically
- Include sitemap links for improved llms-txt-freshness score
- Strip directive from llms.txt output (would be redundant)
@netlify
Copy link
Copy Markdown

netlify bot commented Apr 6, 2026

👷 Deploy Preview for docs-extensions-and-macros processing.

Name Link
🔨 Latest commit 4050fd8
🔍 Latest deploy log https://app.netlify.com/projects/docs-extensions-and-macros/deploys/69d5269075217200089412d0

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai bot commented Apr 6, 2026

Important

Review skipped

Auto incremental reviews are disabled on this repository.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: a064f1d0-7a2b-4b20-a812-44bb53972b79

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Use the checkbox below for a quick retry:

  • 🔍 Trigger review
📝 Walkthrough

Walkthrough

This PR modifies the markdown extraction and llms.txt generation pipeline in two extension files. In convert-llms-to-txt.js, it updates the markdown cleanup logic by changing metadata removal patterns and adds a new generateNavigationSection() function that builds a documentation index section. The llms.txt generation is enhanced with a MAX_LLMS_TXT_CHARS size cap (45,000 characters) that conditionally injects the navigation section only when it fits. In convert-to-markdown.js, the AI note format changes from an HTML comment to a blockquote directive pointing to /llms.txt, with optional component-specific links, and reorders the markdown assembly structure.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Possibly related PRs

  • PR #174: Overlapping changes to convert-llms-to-txt.js markdown extraction/cleanup and llms.txt generation logic.
  • PR #182: Direct modifications to convert-to-markdown.js affecting H1, frontmatter, and AI/llms directive placement and ordering in exported Markdown.
  • PR #178: Related changes across both files to add navigation sections and component-specific full.txt references in llms.txt generation and markdown directives.

Suggested reviewers

  • micheleRP
  • paulohtb6
  • Feediver1
🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 66.67% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (2 passed)
Check name Status Explanation
Title check ✅ Passed The title 'Improve agent-friendly docs compliance' is clear and directly related to the main objective of the PR, which is enhancing compliance with the Agent-Friendly Docs spec. It accurately summarizes the primary change without being vague or misleading.
Description check ✅ Passed The pull request description clearly relates to the changeset, detailing modifications to convert-to-markdown.js and convert-llms-to-txt.js to improve Agent-Friendly Docs spec compliance.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch improve-afdocs-directive

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

🧹 Nitpick comments (1)
extensions/convert-to-markdown.js (1)

503-516: Centralize the directive text shared with the llms.txt cleaner.

extensions/convert-llms-to-txt.js:79 strips this note with a regex that depends on the exact sentence emitted here. A copy tweak in one file will silently leak the directive back into llms.txt, so please move the formatter/pattern into a shared helper or constant.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@extensions/convert-to-markdown.js` around lines 503 - 516, The llms directive
string (currently built into convert-to-markdown.js as the llmsDirective
variable) must be centralized so the regex in convert-llms-to-txt.js (line ~79)
can reliably strip it; create a shared export (e.g., a constant like
LLMS_DIRECTIVE_TEXT or a helper function formatLlmsDirective(componentName)) in
a new module and import it into both convert-to-markdown.js and
convert-llms-to-txt.js, replace the inline llmsDirective construction in
convert-to-markdown.js with the shared helper/constant (used to render with or
without componentName/canonicalUrl) and update convert-llms-to-txt.js to use the
same constant or a precomputed regex derived from that constant so both files
reference a single source of truth.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@extensions/convert-llms-to-txt.js`:
- Around line 74-80: Extract the metadata-stripping logic into a shared helper
(e.g., stripMetadata(content)) and call it wherever page bodies are prepared:
use it on llmsPage and also on the content you append when building
llms-full.txt and `${componentName}-full.txt` (replace direct uses of
page.markdownContents with stripMetadata(page.markdownContents)); the helper
should perform the same two regex replacements currently applied to content
(removing HTML comment source markers and the "> For the complete documentation
index..." blockquote) and be invoked before concatenation so all aggregated
exports get the cleaned body.
- Around line 563-581: The numeric page counts in the nav are derived from
component.latest.pages (via components.reduce and
component?.latest?.pages?.length) which can undercount compared to the actual
sitemaps used to build sitemap-all.md; update the code in
extensions/convert-llms-to-txt.js to either (A) compute page totals from the
same canonical source used by convert-sitemap-to-markdown.js (e.g., the allUrls
collection or the aggregated sitemap data) and use those counts when rendering
sitemap-all.md and each component entry, or (B) remove the numeric claims
entirely and render the list without counts; locate the references to
components, componentSitemaps, and the sitemap-all.md string to change where
counts are computed/inserted.
- Around line 280-296: The current check only prevents appending navSection but
doesn't enforce MAX_LLMS_TXT_CHARS on the original llmsPage.llmsTxtContent;
update the logic around MAX_LLMS_TXT_CHARS, llmsTxtContent and
llmsPage.llmsTxtContent so the final llmsTxtContent never exceeds the limit: if
llmsPage.llmsTxtContent.length >= MAX_LLMS_TXT_CHARS, either truncate
llmsPage.llmsTxtContent to MAX_LLMS_TXT_CHARS (with a clear logger.warn) or
reject/publish a reduced file (whichever policy fits), and when appending
navSection compute available space = MAX_LLMS_TXT_CHARS - llmsTxtContent.length
and append only navSection.slice(0, availableSpace) (logging the truncation).
Make changes where navSection is created by generateNavigationSection and where
logger.info/warn report sizes to reflect truncation or rejection.

---

Nitpick comments:
In `@extensions/convert-to-markdown.js`:
- Around line 503-516: The llms directive string (currently built into
convert-to-markdown.js as the llmsDirective variable) must be centralized so the
regex in convert-llms-to-txt.js (line ~79) can reliably strip it; create a
shared export (e.g., a constant like LLMS_DIRECTIVE_TEXT or a helper function
formatLlmsDirective(componentName)) in a new module and import it into both
convert-to-markdown.js and convert-llms-to-txt.js, replace the inline
llmsDirective construction in convert-to-markdown.js with the shared
helper/constant (used to render with or without componentName/canonicalUrl) and
update convert-llms-to-txt.js to use the same constant or a precomputed regex
derived from that constant so both files reference a single source of truth.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 87c30925-08bb-4609-9d5a-c036e22f4c8f

📥 Commits

Reviewing files that changed from the base of the PR and between 9bba072 and d186f30.

📒 Files selected for processing (2)
  • extensions/convert-llms-to-txt.js
  • extensions/convert-to-markdown.js

- Extract metadata-stripping logic into shared helper (llms-utils.js)
- Apply stripMarkdownMetadata to llms-full.txt and component-full.txt pages
- Remove inaccurate page counts from navigation section
- Properly enforce MAX_LLMS_TXT_CHARS with truncation logic
- Centralize directive format in shared module for single source of truth
Copy link
Copy Markdown

@micheleRP micheleRP left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review Summary

Overall: Clean, well-structured PR. The shared utility module (llms-utils.js) is a nice touch that addresses the exact concern CodeRabbit raised about keeping the directive format and its stripping regex in sync.

Findings

1. Hardcoded navigation links are fragile (Medium)

generateNavigationSection() hardcodes ~20 documentation section URLs (e.g., /current/deploy.md, /redpanda-cloud/ai-agents.md). If any page is renamed, moved, or removed, these links will 404 silently. Consider:

  • Generating these from the content catalog at build time instead, or
  • At minimum, adding a comment noting these need manual updates when pages move

2. siteUrl could include a trailing slash (Low)

generateNavigationSection(siteUrl) concatenates ${siteUrl}/sitemap.md etc. If siteUrl already ends with /, you'll get double slashes (https://docs.redpanda.com//sitemap.md). A quick siteUrl.replace(/\/+$/, '') would be defensive.

3. Truncation uses naive slice() (Low)

Both truncation paths (llmsTxtContent.slice(0, MAX_LLMS_TXT_CHARS - 100) and navSection.slice(0, availableSpace - 50)) slice at arbitrary character positions. This could cut mid-line or mid-URL. Consider truncating at the last \n before the limit for cleaner output.

4. Shared utils module is well done (Positive)

llms-utils.js centralizes the directive text, formatting, regex, and stripping into one module — exactly the right pattern. The regex in LLMS_DIRECTIVE_REGEX correctly matches the format produced by formatLlmsDirective().

5. Directive placement change looks correct (Positive)

Moving the directive from a buried HTML comment to a blockquote immediately after H1 (H1 → directive → frontmatter → source → content) aligns with the agent-friendly docs spec requirement for near-top-of-page placement.

Verdict

Good to merge. The hardcoded nav links (#1) are the main concern — they'll need maintenance as the docs site evolves. The other items are minor defensive improvements.

Copy link
Copy Markdown

@micheleRP micheleRP left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants