Skip to content

fix(docs): auto-inject git-derived last-updated into markdown frontmatter#13268

Open
aditya-arolkar-swe wants to merge 11 commits intomainfrom
aa/autogen-last-mod-frontmatter
Open

fix(docs): auto-inject git-derived last-updated into markdown frontmatter#13268
aditya-arolkar-swe wants to merge 11 commits intomainfrom
aa/autogen-last-mod-frontmatter

Conversation

@aditya-arolkar-swe
Copy link
Contributor

@aditya-arolkar-swe aditya-arolkar-swe commented Mar 9, 2026

Description

Requested by: @aditya-arolkar-swe
Link to Devin Session

During fern docs publish, automatically sets the last-updated frontmatter field on markdown pages using per-file git history. This field is already read by the platform to populate <lastmod> in sitemaps — this PR automates keeping it current.

Behavior with existing last-updated frontmatter

Scenario Behavior
Page has no last-updated Injected from git
Page has last-updated and git history exists Overwritten with git date
Page has last-updated but no git history (untracked file, non-git env) Left unchanged
API-generated page (OpenAPI tag descriptions) Skipped entirely — never injected

Git is the single source of truth. User-specified last-updated values are always replaced when git history exists, so the field never goes stale. Whitespace-only commits (formatting, indentation) are ignored via git log -G'[^\s]' to avoid SEO noise from trivial edits.

Design decisions

  • Whitespace-aware: git log -1 -G'[^\s]' --format=%cs — only non-whitespace changes count
  • Content-type filtering: OpenAPI-generated pages excluded via excludePaths (one spec → N pages with same timestamp = Google distrust)
  • Concurrency-limited: asyncPool (cap: 50) to avoid EMFILE on large doc sites
  • In-memory only: Source .mdx files on disk are never modified

Changes Made

  • injectLastUpdated.ts (new) — Core utility: getGitLastModifiedDate() queries git with -G'[^\s]' whitespace filtering, injectLastUpdatedIntoMarkdown() / replaceLastUpdatedInMarkdown() handle frontmatter manipulation (including CRLF), injectLastUpdatedDates() orchestrates everything
  • asyncPool.ts (new) — Concurrency-limited async task runner with .finally() cleanup
  • DocsDefinitionResolver.ts — Tracks apiGeneratedPagePaths, calls injectLastUpdatedDates() after image-path replacement
  • injectLastUpdated.test.ts (new) — 24 unit tests
  • versions.yml — Added v4.20.5 changelog entry
  • Updated README.md generator (if applicable) — N/A

Testing

  • Unit tests added/updated — 24 passing
  • All CI checks passing (35/35)
  • Manual testing completed

Review checklist:

  • Verify downstream sitemap generator converts "Month Day, Year" → W3C Datetime (YYYY-MM-DD) for <lastmod> XML. If not, all <lastmod> values will be invalid.
  • Confirm -G '[^\s]' performance is acceptable on large doc repos (more expensive than plain git log -1)
  • Confirm always-overwriting user-specified last-updated with git dates is acceptable

@github-actions
Copy link
Contributor

github-actions bot commented Mar 9, 2026

🌱 Seed Test Selector

Select languages to run seed tests for:

  • Python
  • TypeScript
  • Java
  • Go
  • Ruby
  • C#
  • PHP
  • Swift
  • Rust
  • OpenAPI
  • Postman

How to use: Click the ⋯ menu above → "Edit" → check the boxes you want → click "Update comment". Tests will run automatically and snapshots will be committed to this PR.

Comment on lines +52 to +58
const frontmatterMatch = /^---\r?\n([\s\S]*?\r?\n)?---(\r?\n|$)/.exec(markdown);
if (frontmatterMatch != null) {
// Find the last occurrence of '\n---' within the match to locate the closing delimiter
const matchStr = frontmatterMatch[0];
const closingIdx = matchStr.lastIndexOf("\n---");
const insertPos = frontmatterMatch.index + closingIdx;
return markdown.slice(0, insertPos) + `\nlast-updated: ${date}` + markdown.slice(insertPos);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The regex and string manipulation logic has a bug with CRLF line endings. The regex allows \r?\n for line endings, but line 56 uses lastIndexOf("\n---") which doesn't account for \r\n. On Windows systems with CRLF line endings, if frontmatter is ---\r\ntitle: Foo\r\n---, the lastIndexOf("\n---") will find the wrong position because it searches for \n--- but the actual pattern is \r\n---. This causes the last-updated field to be inserted at the wrong position, potentially corrupting the frontmatter.

Fix:

const closingIdx = matchStr.lastIndexOf("---");
const insertPos = frontmatterMatch.index + closingIdx;

Or handle both line ending types:

const closingMatch = /\r?\n---$/.exec(matchStr);
const insertPos = frontmatterMatch.index + (closingMatch?.index ?? 0);
Suggested change
const frontmatterMatch = /^---\r?\n([\s\S]*?\r?\n)?---(\r?\n|$)/.exec(markdown);
if (frontmatterMatch != null) {
// Find the last occurrence of '\n---' within the match to locate the closing delimiter
const matchStr = frontmatterMatch[0];
const closingIdx = matchStr.lastIndexOf("\n---");
const insertPos = frontmatterMatch.index + closingIdx;
return markdown.slice(0, insertPos) + `\nlast-updated: ${date}` + markdown.slice(insertPos);
const frontmatterMatch = /^---\r?\n([\s\S]*?\r?\n)?---(\r?\n|$)/.exec(markdown);
if (frontmatterMatch != null) {
// Find the last occurrence of '\n---' within the match to locate the closing delimiter
const matchStr = frontmatterMatch[0];
const closingMatch = /\r?\n---$/.exec(matchStr);
const insertPos = frontmatterMatch.index + (closingMatch?.index ?? 0);
return markdown.slice(0, insertPos) + `\nlast-updated: ${date}` + markdown.slice(insertPos);

Spotted by Graphite

Fix in Graphite


Is this helpful? React 👍 or 👎 to let us know.

devin-ai-integration bot and others added 2 commits March 10, 2026 03:27
- Skip API-generated pages (OpenAPI tag descriptions) from last-updated
  injection to avoid identical timestamps across N pages, which causes
  Google to distrust <lastmod> domain-wide and wastes Bing crawl budget
- Fix CRLF line ending handling in frontmatter closing delimiter detection
- Add excludePaths parameter to injectLastUpdatedDates for explicit
  content-type differentiation
- Add JSDoc documenting lastmod policy table from sitemap research
- Add tests for CRLF handling and excludePaths behavior

Co-Authored-By: adi <aditya.arolkar@berkeley.edu>
…tamps

- When a page already has last-updated in frontmatter, compare it against
  the git timestamp; if git is newer, replace the stale value automatically
- Add getExistingLastUpdated(), replaceLastUpdatedInMarkdown(),
  parseFormattedDate(), and getGitLastModifiedISO() helper functions
- Add tests for the new helpers and stale-date override behavior
- Resolve merge conflict with main (versions.yml: bump to 4.20.4)

Co-Authored-By: adi <aditya.arolkar@berkeley.edu>
@devin-ai-integration devin-ai-integration bot changed the title feat(docs): autogenerated last modified datetime in frontmatter fix(docs): content-type-aware lastmod injection with stale-date override Mar 10, 2026
devin-ai-integration bot and others added 3 commits March 10, 2026 03:38
Co-Authored-By: adi <aditya.arolkar@berkeley.edu>
- Add asyncPool with limit of 50 to prevent EMFILE on large sites (issue 1)
- Match detected line ending (CRLF/LF) when injecting last-updated (issue 2)
- Switch git log from %aI (author date) to %cI (commit date) for accuracy
  on cherry-picked/rebased commits (issue 4)
- Issue 3 (downstream flow): last-updated stays in raw markdown sent to FDR
  as PageContent.markdown; the docs platform reads it from frontmatter at
  render time. Format is human-readable ('Month Day, Year') in frontmatter;
  the platform is responsible for converting to W3C Datetime for <lastmod>.

Co-Authored-By: adi <aditya.arolkar@berkeley.edu>
…ons.yml conflict)

Co-Authored-By: adi <aditya.arolkar@berkeley.edu>
Comment on lines +24 to +29
const promise = (async () => {
const result = await fn(item, i);
results[i] = result;
})().then(() => {
executing.delete(promise);
});
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Critical bug: Promise cleanup fails on rejection

The .then() callback only runs on successful promise resolution. If fn(item, i) throws an error or the git command fails, the promise will reject and never be removed from the executing set. This causes:

  1. Memory leak - rejected promises accumulate in the set
  2. Incorrect concurrency limiting - the pool thinks it's at capacity when rejected promises still occupy slots
  3. Potential hang - if enough promises fail, Promise.race() keeps racing the same rejected promises

Fix: Use .finally() instead of .then() to ensure cleanup happens on both success and failure:

const promise = (async () => {
    const result = await fn(item, i);
    results[i] = result;
})().finally(() => {
    executing.delete(promise);
});

This is critical because git commands can fail (permissions, corrupted repo, missing files), and the concurrent processing will break when processing large documentation sites.

Suggested change
const promise = (async () => {
const result = await fn(item, i);
results[i] = result;
})().then(() => {
executing.delete(promise);
});
const promise = (async () => {
const result = await fn(item, i);
results[i] = result;
})().finally(() => {
executing.delete(promise);
});

Spotted by Graphite

Fix in Graphite


Is this helpful? React 👍 or 👎 to let us know.

devin-ai-integration bot and others added 2 commits March 10, 2026 04:34
Fixes cleanup of rejected promises from the executing set. Without this,
failed git commands would leak slots and eventually hang the pool.

Co-Authored-By: adi <aditya.arolkar@berkeley.edu>
Replaces plain 'git log -1' with 'git log -1 -G [^\s]' so that
whitespace/formatting-only commits do not bump last-updated dates.
This prevents SEO noise from trivial edits (indentation, trailing
spaces) while still updating on any non-whitespace content change.

Also switches from %cI (full ISO 8601) to %cs (short YYYY-MM-DD)
to avoid timezone-induced off-by-one errors in date formatting.

Removes getGitLastModifiedISO and parseFormattedDate — the stale-date
comparison logic is no longer needed since git -G handles filtering.

Co-Authored-By: adi <aditya.arolkar@berkeley.edu>
Comment on lines +213 to +218
// Set (or overwrite) last-updated with the git-derived date.
if (hasLastUpdated(markdown)) {
result[key] = replaceLastUpdatedInMarkdown(markdown, date);
} else {
result[key] = injectLastUpdatedIntoMarkdown(markdown, date);
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Critical logic bug: The code unconditionally overwrites existing last-updated dates without checking if the git date is actually newer.

The PR description states: "Git timestamps override stale user-specified dates when the file has been modified more recently." However, the implementation always replaces the existing date with the git date, even if the user-specified date is newer.

Fix: Parse and compare dates before overwriting:

if (hasLastUpdated(markdown)) {
    const existingDate = getExistingLastUpdated(markdown);
    if (existingDate != null) {
        const existingParsed = new Date(existingDate);
        const gitParsed = new Date(date);
        // Only replace if git date is newer than existing date
        if (!isNaN(gitParsed.getTime()) && !isNaN(existingParsed.getTime()) && gitParsed > existingParsed) {
            result[key] = replaceLastUpdatedInMarkdown(markdown, date);
        } else {
            result[key] = markdown; // Keep existing date
        }
    } else {
        result[key] = markdown;
    }
} else {
    result[key] = injectLastUpdatedIntoMarkdown(markdown, date);
}

Without this fix, manually-set future dates (e.g., for scheduled content) will be incorrectly overwritten with older git dates.

Suggested change
// Set (or overwrite) last-updated with the git-derived date.
if (hasLastUpdated(markdown)) {
result[key] = replaceLastUpdatedInMarkdown(markdown, date);
} else {
result[key] = injectLastUpdatedIntoMarkdown(markdown, date);
}
// Set (or overwrite) last-updated with the git-derived date.
if (hasLastUpdated(markdown)) {
const existingDate = getExistingLastUpdated(markdown);
if (existingDate != null) {
const existingParsed = new Date(existingDate);
const gitParsed = new Date(date);
// Only replace if git date is newer than existing date
if (!isNaN(gitParsed.getTime()) && !isNaN(existingParsed.getTime()) && gitParsed > existingParsed) {
result[key] = replaceLastUpdatedInMarkdown(markdown, date);
} else {
result[key] = markdown; // Keep existing date
}
} else {
result[key] = markdown;
}
} else {
result[key] = injectLastUpdatedIntoMarkdown(markdown, date);
}

Spotted by Graphite

Fix in Graphite


Is this helpful? React 👍 or 👎 to let us know.

@devin-ai-integration devin-ai-integration bot changed the title fix(docs): content-type-aware lastmod injection with stale-date override fix(docs): content-type-aware lastmod injection with whitespace-aware git filtering Mar 10, 2026
@devin-ai-integration devin-ai-integration bot changed the title fix(docs): content-type-aware lastmod injection with whitespace-aware git filtering fix(docs): auto-inject git-derived last-updated into markdown frontmatter Mar 10, 2026
devin-ai-integration bot and others added 3 commits March 10, 2026 17:06
If a user manually sets last-updated in frontmatter, preserve it for
30 days.  After 30 days, automatically revert to git-based injection
and emit a warning log so authors know their override has expired.

Co-Authored-By: adi <aditya.arolkar@berkeley.edu>
…ons.yml conflict)

Co-Authored-By: adi <aditya.arolkar@berkeley.edu>
…dated

Reverts to always-override behavior: git is the single source of truth
for last-updated. User-specified values are always replaced when git
history exists. Whitespace-only commit filtering (-G'[^\s]') is retained.

Co-Authored-By: adi <aditya.arolkar@berkeley.edu>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Development

Successfully merging this pull request may close these issues.

1 participant