Conversation
- Move llms.txt directive to appear immediately after H1 heading - Change directive from HTML comment to markdown blockquote format - Add comprehensive navigation section to llms.txt programmatically - Include sitemap links for improved llms-txt-freshness score - Strip directive from llms.txt output (would be redundant)
👷 Deploy Preview for docs-extensions-and-macros processing.
|
|
Important Review skippedAuto incremental reviews are disabled on this repository. Please check the settings in the CodeRabbit UI or the ⚙️ Run configurationConfiguration used: Organization UI Review profile: CHILL Plan: Pro Run ID: You can disable this status message by setting the Use the checkbox below for a quick retry:
📝 WalkthroughWalkthroughThis PR modifies the markdown extraction and llms.txt generation pipeline in two extension files. In Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes Possibly related PRs
Suggested reviewers
🚥 Pre-merge checks | ✅ 2 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 3
🧹 Nitpick comments (1)
extensions/convert-to-markdown.js (1)
503-516: Centralize the directive text shared with the llms.txt cleaner.
extensions/convert-llms-to-txt.js:79strips this note with a regex that depends on the exact sentence emitted here. A copy tweak in one file will silently leak the directive back intollms.txt, so please move the formatter/pattern into a shared helper or constant.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@extensions/convert-to-markdown.js` around lines 503 - 516, The llms directive string (currently built into convert-to-markdown.js as the llmsDirective variable) must be centralized so the regex in convert-llms-to-txt.js (line ~79) can reliably strip it; create a shared export (e.g., a constant like LLMS_DIRECTIVE_TEXT or a helper function formatLlmsDirective(componentName)) in a new module and import it into both convert-to-markdown.js and convert-llms-to-txt.js, replace the inline llmsDirective construction in convert-to-markdown.js with the shared helper/constant (used to render with or without componentName/canonicalUrl) and update convert-llms-to-txt.js to use the same constant or a precomputed regex derived from that constant so both files reference a single source of truth.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@extensions/convert-llms-to-txt.js`:
- Around line 74-80: Extract the metadata-stripping logic into a shared helper
(e.g., stripMetadata(content)) and call it wherever page bodies are prepared:
use it on llmsPage and also on the content you append when building
llms-full.txt and `${componentName}-full.txt` (replace direct uses of
page.markdownContents with stripMetadata(page.markdownContents)); the helper
should perform the same two regex replacements currently applied to content
(removing HTML comment source markers and the "> For the complete documentation
index..." blockquote) and be invoked before concatenation so all aggregated
exports get the cleaned body.
- Around line 563-581: The numeric page counts in the nav are derived from
component.latest.pages (via components.reduce and
component?.latest?.pages?.length) which can undercount compared to the actual
sitemaps used to build sitemap-all.md; update the code in
extensions/convert-llms-to-txt.js to either (A) compute page totals from the
same canonical source used by convert-sitemap-to-markdown.js (e.g., the allUrls
collection or the aggregated sitemap data) and use those counts when rendering
sitemap-all.md and each component entry, or (B) remove the numeric claims
entirely and render the list without counts; locate the references to
components, componentSitemaps, and the sitemap-all.md string to change where
counts are computed/inserted.
- Around line 280-296: The current check only prevents appending navSection but
doesn't enforce MAX_LLMS_TXT_CHARS on the original llmsPage.llmsTxtContent;
update the logic around MAX_LLMS_TXT_CHARS, llmsTxtContent and
llmsPage.llmsTxtContent so the final llmsTxtContent never exceeds the limit: if
llmsPage.llmsTxtContent.length >= MAX_LLMS_TXT_CHARS, either truncate
llmsPage.llmsTxtContent to MAX_LLMS_TXT_CHARS (with a clear logger.warn) or
reject/publish a reduced file (whichever policy fits), and when appending
navSection compute available space = MAX_LLMS_TXT_CHARS - llmsTxtContent.length
and append only navSection.slice(0, availableSpace) (logging the truncation).
Make changes where navSection is created by generateNavigationSection and where
logger.info/warn report sizes to reflect truncation or rejection.
---
Nitpick comments:
In `@extensions/convert-to-markdown.js`:
- Around line 503-516: The llms directive string (currently built into
convert-to-markdown.js as the llmsDirective variable) must be centralized so the
regex in convert-llms-to-txt.js (line ~79) can reliably strip it; create a
shared export (e.g., a constant like LLMS_DIRECTIVE_TEXT or a helper function
formatLlmsDirective(componentName)) in a new module and import it into both
convert-to-markdown.js and convert-llms-to-txt.js, replace the inline
llmsDirective construction in convert-to-markdown.js with the shared
helper/constant (used to render with or without componentName/canonicalUrl) and
update convert-llms-to-txt.js to use the same constant or a precomputed regex
derived from that constant so both files reference a single source of truth.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: 87c30925-08bb-4609-9d5a-c036e22f4c8f
📒 Files selected for processing (2)
extensions/convert-llms-to-txt.jsextensions/convert-to-markdown.js
- Extract metadata-stripping logic into shared helper (llms-utils.js) - Apply stripMarkdownMetadata to llms-full.txt and component-full.txt pages - Remove inaccurate page counts from navigation section - Properly enforce MAX_LLMS_TXT_CHARS with truncation logic - Centralize directive format in shared module for single source of truth
micheleRP
left a comment
There was a problem hiding this comment.
Review Summary
Overall: Clean, well-structured PR. The shared utility module (llms-utils.js) is a nice touch that addresses the exact concern CodeRabbit raised about keeping the directive format and its stripping regex in sync.
Findings
1. Hardcoded navigation links are fragile (Medium)
generateNavigationSection() hardcodes ~20 documentation section URLs (e.g., /current/deploy.md, /redpanda-cloud/ai-agents.md). If any page is renamed, moved, or removed, these links will 404 silently. Consider:
- Generating these from the content catalog at build time instead, or
- At minimum, adding a comment noting these need manual updates when pages move
2. siteUrl could include a trailing slash (Low)
generateNavigationSection(siteUrl) concatenates ${siteUrl}/sitemap.md etc. If siteUrl already ends with /, you'll get double slashes (https://docs.redpanda.com//sitemap.md). A quick siteUrl.replace(/\/+$/, '') would be defensive.
3. Truncation uses naive slice() (Low)
Both truncation paths (llmsTxtContent.slice(0, MAX_LLMS_TXT_CHARS - 100) and navSection.slice(0, availableSpace - 50)) slice at arbitrary character positions. This could cut mid-line or mid-URL. Consider truncating at the last \n before the limit for cleaner output.
4. Shared utils module is well done (Positive)
llms-utils.js centralizes the directive text, formatting, regex, and stripping into one module — exactly the right pattern. The regex in LLMS_DIRECTIVE_REGEX correctly matches the format produced by formatLlmsDirective().
5. Directive placement change looks correct (Positive)
Moving the directive from a buried HTML comment to a blockquote immediately after H1 (H1 → directive → frontmatter → source → content) aligns with the agent-friendly docs spec requirement for near-top-of-page placement.
Verdict
Good to merge. The hardcoded nav links (#1) are the main concern — they'll need maintenance as the docs site evolves. The other items are minor defensive improvements.
Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
Summary
Improves compliance with the Agent-Friendly Docs spec to make our documentation more accessible to AI agents.
Changes
llms-txt Directive (
convert-to-markdown.js):Comprehensive Navigation (
convert-llms-to-txt.js):Expected Improvements
Related PR
Test plan