fix(docs): auto-inject git-derived last-updated into markdown frontmatter#13268
fix(docs): auto-inject git-derived last-updated into markdown frontmatter#13268aditya-arolkar-swe wants to merge 11 commits intomainfrom
last-updated into markdown frontmatter#13268Conversation
🌱 Seed Test SelectorSelect languages to run seed tests for:
How to use: Click the ⋯ menu above → "Edit" → check the boxes you want → click "Update comment". Tests will run automatically and snapshots will be committed to this PR. |
| const frontmatterMatch = /^---\r?\n([\s\S]*?\r?\n)?---(\r?\n|$)/.exec(markdown); | ||
| if (frontmatterMatch != null) { | ||
| // Find the last occurrence of '\n---' within the match to locate the closing delimiter | ||
| const matchStr = frontmatterMatch[0]; | ||
| const closingIdx = matchStr.lastIndexOf("\n---"); | ||
| const insertPos = frontmatterMatch.index + closingIdx; | ||
| return markdown.slice(0, insertPos) + `\nlast-updated: ${date}` + markdown.slice(insertPos); |
There was a problem hiding this comment.
The regex and string manipulation logic has a bug with CRLF line endings. The regex allows \r?\n for line endings, but line 56 uses lastIndexOf("\n---") which doesn't account for \r\n. On Windows systems with CRLF line endings, if frontmatter is ---\r\ntitle: Foo\r\n---, the lastIndexOf("\n---") will find the wrong position because it searches for \n--- but the actual pattern is \r\n---. This causes the last-updated field to be inserted at the wrong position, potentially corrupting the frontmatter.
Fix:
const closingIdx = matchStr.lastIndexOf("---");
const insertPos = frontmatterMatch.index + closingIdx;Or handle both line ending types:
const closingMatch = /\r?\n---$/.exec(matchStr);
const insertPos = frontmatterMatch.index + (closingMatch?.index ?? 0);| const frontmatterMatch = /^---\r?\n([\s\S]*?\r?\n)?---(\r?\n|$)/.exec(markdown); | |
| if (frontmatterMatch != null) { | |
| // Find the last occurrence of '\n---' within the match to locate the closing delimiter | |
| const matchStr = frontmatterMatch[0]; | |
| const closingIdx = matchStr.lastIndexOf("\n---"); | |
| const insertPos = frontmatterMatch.index + closingIdx; | |
| return markdown.slice(0, insertPos) + `\nlast-updated: ${date}` + markdown.slice(insertPos); | |
| const frontmatterMatch = /^---\r?\n([\s\S]*?\r?\n)?---(\r?\n|$)/.exec(markdown); | |
| if (frontmatterMatch != null) { | |
| // Find the last occurrence of '\n---' within the match to locate the closing delimiter | |
| const matchStr = frontmatterMatch[0]; | |
| const closingMatch = /\r?\n---$/.exec(matchStr); | |
| const insertPos = frontmatterMatch.index + (closingMatch?.index ?? 0); | |
| return markdown.slice(0, insertPos) + `\nlast-updated: ${date}` + markdown.slice(insertPos); | |
Spotted by Graphite
Is this helpful? React 👍 or 👎 to let us know.
- Skip API-generated pages (OpenAPI tag descriptions) from last-updated injection to avoid identical timestamps across N pages, which causes Google to distrust <lastmod> domain-wide and wastes Bing crawl budget - Fix CRLF line ending handling in frontmatter closing delimiter detection - Add excludePaths parameter to injectLastUpdatedDates for explicit content-type differentiation - Add JSDoc documenting lastmod policy table from sitemap research - Add tests for CRLF handling and excludePaths behavior Co-Authored-By: adi <aditya.arolkar@berkeley.edu>
…tamps - When a page already has last-updated in frontmatter, compare it against the git timestamp; if git is newer, replace the stale value automatically - Add getExistingLastUpdated(), replaceLastUpdatedInMarkdown(), parseFormattedDate(), and getGitLastModifiedISO() helper functions - Add tests for the new helpers and stale-date override behavior - Resolve merge conflict with main (versions.yml: bump to 4.20.4) Co-Authored-By: adi <aditya.arolkar@berkeley.edu>
Co-Authored-By: adi <aditya.arolkar@berkeley.edu>
- Add asyncPool with limit of 50 to prevent EMFILE on large sites (issue 1)
- Match detected line ending (CRLF/LF) when injecting last-updated (issue 2)
- Switch git log from %aI (author date) to %cI (commit date) for accuracy
on cherry-picked/rebased commits (issue 4)
- Issue 3 (downstream flow): last-updated stays in raw markdown sent to FDR
as PageContent.markdown; the docs platform reads it from frontmatter at
render time. Format is human-readable ('Month Day, Year') in frontmatter;
the platform is responsible for converting to W3C Datetime for <lastmod>.
Co-Authored-By: adi <aditya.arolkar@berkeley.edu>
…ons.yml conflict) Co-Authored-By: adi <aditya.arolkar@berkeley.edu>
| const promise = (async () => { | ||
| const result = await fn(item, i); | ||
| results[i] = result; | ||
| })().then(() => { | ||
| executing.delete(promise); | ||
| }); |
There was a problem hiding this comment.
Critical bug: Promise cleanup fails on rejection
The .then() callback only runs on successful promise resolution. If fn(item, i) throws an error or the git command fails, the promise will reject and never be removed from the executing set. This causes:
- Memory leak - rejected promises accumulate in the set
- Incorrect concurrency limiting - the pool thinks it's at capacity when rejected promises still occupy slots
- Potential hang - if enough promises fail,
Promise.race()keeps racing the same rejected promises
Fix: Use .finally() instead of .then() to ensure cleanup happens on both success and failure:
const promise = (async () => {
const result = await fn(item, i);
results[i] = result;
})().finally(() => {
executing.delete(promise);
});This is critical because git commands can fail (permissions, corrupted repo, missing files), and the concurrent processing will break when processing large documentation sites.
| const promise = (async () => { | |
| const result = await fn(item, i); | |
| results[i] = result; | |
| })().then(() => { | |
| executing.delete(promise); | |
| }); | |
| const promise = (async () => { | |
| const result = await fn(item, i); | |
| results[i] = result; | |
| })().finally(() => { | |
| executing.delete(promise); | |
| }); | |
Spotted by Graphite
Is this helpful? React 👍 or 👎 to let us know.
Fixes cleanup of rejected promises from the executing set. Without this, failed git commands would leak slots and eventually hang the pool. Co-Authored-By: adi <aditya.arolkar@berkeley.edu>
Replaces plain 'git log -1' with 'git log -1 -G [^\s]' so that whitespace/formatting-only commits do not bump last-updated dates. This prevents SEO noise from trivial edits (indentation, trailing spaces) while still updating on any non-whitespace content change. Also switches from %cI (full ISO 8601) to %cs (short YYYY-MM-DD) to avoid timezone-induced off-by-one errors in date formatting. Removes getGitLastModifiedISO and parseFormattedDate — the stale-date comparison logic is no longer needed since git -G handles filtering. Co-Authored-By: adi <aditya.arolkar@berkeley.edu>
| // Set (or overwrite) last-updated with the git-derived date. | ||
| if (hasLastUpdated(markdown)) { | ||
| result[key] = replaceLastUpdatedInMarkdown(markdown, date); | ||
| } else { | ||
| result[key] = injectLastUpdatedIntoMarkdown(markdown, date); | ||
| } |
There was a problem hiding this comment.
Critical logic bug: The code unconditionally overwrites existing last-updated dates without checking if the git date is actually newer.
The PR description states: "Git timestamps override stale user-specified dates when the file has been modified more recently." However, the implementation always replaces the existing date with the git date, even if the user-specified date is newer.
Fix: Parse and compare dates before overwriting:
if (hasLastUpdated(markdown)) {
const existingDate = getExistingLastUpdated(markdown);
if (existingDate != null) {
const existingParsed = new Date(existingDate);
const gitParsed = new Date(date);
// Only replace if git date is newer than existing date
if (!isNaN(gitParsed.getTime()) && !isNaN(existingParsed.getTime()) && gitParsed > existingParsed) {
result[key] = replaceLastUpdatedInMarkdown(markdown, date);
} else {
result[key] = markdown; // Keep existing date
}
} else {
result[key] = markdown;
}
} else {
result[key] = injectLastUpdatedIntoMarkdown(markdown, date);
}Without this fix, manually-set future dates (e.g., for scheduled content) will be incorrectly overwritten with older git dates.
| // Set (or overwrite) last-updated with the git-derived date. | |
| if (hasLastUpdated(markdown)) { | |
| result[key] = replaceLastUpdatedInMarkdown(markdown, date); | |
| } else { | |
| result[key] = injectLastUpdatedIntoMarkdown(markdown, date); | |
| } | |
| // Set (or overwrite) last-updated with the git-derived date. | |
| if (hasLastUpdated(markdown)) { | |
| const existingDate = getExistingLastUpdated(markdown); | |
| if (existingDate != null) { | |
| const existingParsed = new Date(existingDate); | |
| const gitParsed = new Date(date); | |
| // Only replace if git date is newer than existing date | |
| if (!isNaN(gitParsed.getTime()) && !isNaN(existingParsed.getTime()) && gitParsed > existingParsed) { | |
| result[key] = replaceLastUpdatedInMarkdown(markdown, date); | |
| } else { | |
| result[key] = markdown; // Keep existing date | |
| } | |
| } else { | |
| result[key] = markdown; | |
| } | |
| } else { | |
| result[key] = injectLastUpdatedIntoMarkdown(markdown, date); | |
| } | |
Spotted by Graphite
Is this helpful? React 👍 or 👎 to let us know.
last-updated into markdown frontmatter
If a user manually sets last-updated in frontmatter, preserve it for 30 days. After 30 days, automatically revert to git-based injection and emit a warning log so authors know their override has expired. Co-Authored-By: adi <aditya.arolkar@berkeley.edu>
…ons.yml conflict) Co-Authored-By: adi <aditya.arolkar@berkeley.edu>
…dated Reverts to always-override behavior: git is the single source of truth for last-updated. User-specified values are always replaced when git history exists. Whitespace-only commit filtering (-G'[^\s]') is retained. Co-Authored-By: adi <aditya.arolkar@berkeley.edu>
Description
Requested by: @aditya-arolkar-swe
Link to Devin Session
During
fern docs publish, automatically sets thelast-updatedfrontmatter field on markdown pages using per-file git history. This field is already read by the platform to populate<lastmod>in sitemaps — this PR automates keeping it current.Behavior with existing
last-updatedfrontmatterlast-updatedlast-updatedand git history existslast-updatedbut no git history (untracked file, non-git env)Git is the single source of truth. User-specified
last-updatedvalues are always replaced when git history exists, so the field never goes stale. Whitespace-only commits (formatting, indentation) are ignored viagit log -G'[^\s]'to avoid SEO noise from trivial edits.Design decisions
git log -1 -G'[^\s]' --format=%cs— only non-whitespace changes countexcludePaths(one spec → N pages with same timestamp = Google distrust)asyncPool(cap: 50) to avoid EMFILE on large doc sites.mdxfiles on disk are never modifiedChanges Made
injectLastUpdated.ts(new) — Core utility:getGitLastModifiedDate()queries git with-G'[^\s]'whitespace filtering,injectLastUpdatedIntoMarkdown()/replaceLastUpdatedInMarkdown()handle frontmatter manipulation (including CRLF),injectLastUpdatedDates()orchestrates everythingasyncPool.ts(new) — Concurrency-limited async task runner with.finally()cleanupDocsDefinitionResolver.ts— TracksapiGeneratedPagePaths, callsinjectLastUpdatedDates()after image-path replacementinjectLastUpdated.test.ts(new) — 24 unit testsversions.yml— Added v4.20.5 changelog entryTesting
Review checklist:
"Month Day, Year"→ W3C Datetime (YYYY-MM-DD) for<lastmod>XML. If not, all<lastmod>values will be invalid.-G '[^\s]'performance is acceptable on large doc repos (more expensive than plaingit log -1)last-updatedwith git dates is acceptable