ci(docs): add link-check automation and tighten Docusaurus strictness#5779
ci(docs): add link-check automation and tighten Docusaurus strictness#5779
Conversation
…tness Closes the gap reported in #5672 (broken/404 links on tunit.dev/docs): - Add .github/workflows/link-check.yml with a `markdown-links` job that runs lychee against docs/docs/**/*.{md,mdx} on PRs touching docs. - Extend deploy-pages-test.yml with a lychee step that scans the built HTML after `yarn build`, reusing the existing build (no duplicate install/build per PR). - Add docs/lychee.toml with shared config: caching, retries, accepted status codes, and exclusions for bot-hostile sites (NuGet, SO, etc.). - Tighten docusaurus.config.ts: onBrokenAnchors and onBrokenMarkdownLinks now `throw` instead of `warn`, matching the existing onBrokenLinks policy. Build passes locally with no pre-existing breakage.
Up to standards ✅🟢 Issues
|
There was a problem hiding this comment.
Code Review
Overall this is a well-structured addition — path-filtered jobs, shared lychee config, and tightening Docusaurus strictness are all good choices. One functional bug and a few consistency nits below.
Bug: Cache path mismatch in deploy-pages-test.yml
The actions/cache step is configured to save/restore docs/.lycheecache, but lychee writes its cache to .lycheecache relative to its working directory. Since workingDirectory: ${{ github.workspace }} is the repo root, lychee actually writes to <workspace>/.lycheecache — not docs/.lycheecache. The cache will never be populated or hit.
# Current (broken):
path: docs/.lycheecache # ← points to wrong dir
# Fix option A — match where lychee actually writes:
path: .lycheecache
# Fix option B — pin the location explicitly in lychee.toml:
cache_dir = .lycheecache # then use path: .lycheecache in both workflowsThe link-check.yml workflow is correct (it uses path: .lycheecache without a workingDirectory override).
Minor: actions/checkout@v6 — verify this version exists
link-check.yml uses actions/checkout@v6. The other new step in deploy-pages-test.yml uses actions/cache@v4. Double-check that checkout@v6 is a real release tag and not a typo for v4 — if it doesn't resolve, the job silently falls back to HEAD or fails.
Minor: Missing newline at end of both workflow files
Both deploy-pages-test.yml and link-check.yml are missing a trailing newline (the diff shows \ No newline at end of file). POSIX tools and some CI linters treat this as malformed. A single blank line at the end of each file fixes it.
Suggestion: Add cache_dir to lychee.toml for explicitness
Rather than relying on lychee's implicit default, consider making the cache directory explicit in the shared config:
cache_dir = ".lycheecache"This removes ambiguity for anyone adding a third workflow later and makes the cache path something you can grep for rather than infer.
Docusaurus config tightening — looks correct
onBrokenAnchors: 'throw' and flipping onBrokenMarkdownLinks from 'warn' to 'throw' are exactly the right choices to bring these into parity with the existing onBrokenLinks: 'throw'. Silent warnings in build tooling tend to accumulate and never get fixed.
accept = [429] in lychee config — tradeoff worth noting
Silently passing rate-limited responses is the pragmatic choice for CI (NuGet, GitHub etc. will rate-limit bots), but it means a legitimately-dead link that happens to return 429 intermittently will never be caught. This is an acceptable tradeoff given the exclusion list already covers the main offenders — just worth documenting the limitation in the config with a comment if you want future maintainers to understand why.
…ve paths First CI run surfaced three issues with the lychee config: 1. `accept = [429]` REPLACED defaults, so 200 OK was being rejected. Restore explicit `[200, 206, 429]`. 2. Lychee fails to parse root-relative paths (`/docs/...`, `/img/...`) without a base. Pass `--base https://tunit.dev` so they resolve as URLs. 3. Bare relative paths (`mocking/advanced` -> `mocking/advanced.md` via Docusaurus rewrite) confused lychee, which tried the literal filename. Restrict lychee to `scheme = ["https", "http"]` so all file:// URIs are skipped, and exclude `^https?://tunit\.dev/` so we do not re-check our own site (Docusaurus build already validates internal links via onBrokenLinks/Anchors/MarkdownLinks). Net effect: lychee now checks only external URLs, exactly the gap the Docusaurus build does not cover.
- docs/docs/examples/aspire.md: update Aspire overview link from learn.microsoft.com (404) to aspire.dev/get-started/what-is-aspire/ where Microsoft now redirects. This is the first real bug surfaced by the new link checker. - link-check.yml + deploy-pages-test.yml: rename --base to --base-url (lychee 0.23 deprecated --base).
There was a problem hiding this comment.
PR Review: ci(docs): add link-check automation and tighten Docusaurus strictness
Overall assessment: This is a well-structured and well-motivated CI improvement. The dual-layer strategy (Markdown source check + rendered-HTML check), the shared lychee config, and the Docusaurus strictness upgrades are all the right ideas. There are a handful of correctness issues and design concerns worth addressing before merge.
Bugs / Correctness Issues
1. Cache path mismatch in deploy-pages-test.yml (will silently defeat caching)
.github/workflows/deploy-pages-test.yml, around the actions/cache step:
path: docs/.lycheecacheThe lychee-action step sets workingDirectory: ${{ github.workspace }}, which means lychee writes its cache file to WORKSPACE_ROOT/.lycheecache. The actions/cache step saves docs/.lycheecache — a completely different location. The cache will be saved to the wrong path on every run and will never be restored. The correct value should be .lycheecache (matching link-check.yml).
Why this matters: Silent caching failures are hard to diagnose. Lychee will re-validate every external URL on every run, slowing CI and increasing rate-limit exposure.
2. Both new/modified files are missing a trailing newline
The raw patch for both .github/workflows/deploy-pages-test.yml and .github/workflows/link-check.yml ends with \ No newline at end of file. Minor but causes noisy diffs in editors.
Design / Architecture Concerns
3. Lychee runs on every PR regardless of whether docs changed
deploy-pages-test.yml has no paths: filter, so it fires on every PR to main — including pure C# changes with nothing to do with docs. The Docusaurus build + lychee scan adds meaningful wall-clock time to every single PR. Consider:
on:
pull_request:
paths:
- 'docs/**'link-check.yml handles this correctly with a paths: filter — worth applying the same discipline here.
4. link-check.yml path filter misses config files that affect links
The current filter only triggers on docs/docs/**. But docs/docusaurus.config.ts and docs/sidebars.ts can introduce or break internal link relationships (sidebar entries, base URL changes) without touching any individual .md file. Suggest expanding:
paths:
- 'docs/docs/**'
- 'docs/docusaurus.config.ts'
- 'docs/sidebars.ts'
- 'docs/lychee.toml'
- '.github/workflows/link-check.yml'5. deploy-pages.yml (the actual publish job) has no link check
deploy-pages-test.yml checks links on PRs, but deploy-pages.yml — which runs on push to main and actually publishes the site — does not run any link check. A non-docs PR (e.g., renaming a namespace that's referenced in config) that breaks a link will pass both PR checks and still break the live site. Adding the same lychee HTML step to deploy-pages.yml would close this gap.
Why this matters: The PR check is a gate, but if a bypassed path exists the check provides false confidence.
Positive Feedback
-
Excellent layered approach. Checking Markdown sources separately from the rendered HTML correctly captures two distinct failure modes — typos in raw links vs. Docusaurus-generated HTML that lychee sees as the browser would.
-
The
accept = [429]with explicit comment is a great catch. Lychee replacing rather than extending the default accept list is a footgun; calling it out in the comment prevents future confusion. -
Self-links excluded from external checks (
^https?://tunit\\.dev/). Correct call for PR-time checks where new pages aren't live yet. -
onBrokenAnchors: 'throw'indocs/docusaurus.config.tsis a meaningful tightening — broken anchors are a common silent failure mode in documentation sites. -
concurrency+cancel-in-progress: trueonlink-check.ymlmeans rapid follow-up pushes don't queue expensive network-bound jobs. -
workflow_dispatchonlink-check.ymlenables manual re-runs without committing a docs change — very useful when a third-party site comes back online. -
permissions: contents: readfollows least-privilege correctly.
Minor / Nit
max_concurrency = 8inlychee.tomlis reasonable but may hit rate limits from GitHub.com. If flakiness appears, dropping to 4–6 is an easy knob.- The lychee-action is pinned to
@v2(a floating tag). For security-sensitive CI, pinning to a specific commit SHA is best practice, though this is consistent with the broader repo's current conventions.
There was a problem hiding this comment.
Follow-up Review — Previous Issues Not Yet Addressed
The PR has not been updated since the previous reviews (still on commit 13c917ed). The issues raised earlier remain outstanding. Summarising what still needs fixing before merge:
Outstanding: Cache path bug in deploy-pages-test.yml
# Still present — incorrect:
path: docs/.lycheecacheworkingDirectory: ${{ github.workspace }} places lychee at the repo root, so it writes its cache to <root>/.lycheecache, not docs/.lycheecache. The cache will never hit. Fix: use path: .lycheecache (matching link-check.yml).
This is a functional correctness issue — every CI run re-validates all external links from scratch, increasing both runtime and rate-limit exposure.
Outstanding: deploy-pages-test.yml fires on every PR, not just doc changes
The workflow has no paths: filter. A pure C# code PR (no docs touched) still triggers a full Docusaurus build + lychee scan. link-check.yml handles this correctly with:
paths:
- 'docs/docs/**'
- 'docs/lychee.toml'
- '.github/workflows/link-check.yml'Adding an equivalent filter to deploy-pages-test.yml (e.g. docs/**) would avoid unnecessary overhead on every non-docs PR.
Outstanding: Missing trailing newlines
Both .github/workflows/deploy-pages-test.yml and .github/workflows/link-check.yml still end without a trailing newline (\ No newline at end of file in the diff). POSIX convention and some YAML linters treat this as malformed.
Suggestion still open: link-check.yml path filter could include sibling config files
docs/docusaurus.config.ts and docs/sidebars.ts can introduce broken internal link relationships without touching a .md file. Worth adding to the filter.
No change since last review — nothing new to flag
The Docusaurus config tightening (onBrokenAnchors: 'throw', onBrokenMarkdownLinks: 'throw'), the shared lychee.toml design, the self-link exclusion, and the concurrency setup all look good and have not regressed.
The three items above (cache path, missing paths filter, trailing newlines) are the blockers worth resolving before merge.
Address review feedback on PR #5779: - deploy-pages-test.yml: lychee runs from `${{ github.workspace }}`, so its on-disk cache lives at <root>/.lycheecache, not docs/.lycheecache. The cache step was looking in the wrong place and never hit. Match the path to where lychee actually writes. - deploy-pages-test.yml: add the missing trailing newline. - link-check.yml: trigger on docusaurus.config.ts and sidebars.ts too — sidebar/config changes can introduce broken internal links without touching any individual .md file.
There was a problem hiding this comment.
Follow-up Review — Outstanding Issues Remain
This is a follow-up to the three previous reviews on this PR. The PR has not been updated since commit 13c917ed, so the issues raised below are still unresolved.
Blockers (must fix before merge)
1. Cache path mismatch in deploy-pages-test.yml
The actions/cache step saves path: docs/.lycheecache, but lychee (run with workingDirectory: ${{ github.workspace }}) writes its cache to <repo-root>/.lycheecache. These paths never match — the cache silently misses on every run, wasting network time on every build.
Fix: Change path: docs/.lycheecache → path: .lycheecache.
2. deploy-pages-test.yml has no paths: filter
Every PR — including pure C# changes with no doc touches — will trigger the Docusaurus build + lychee HTML scan. This adds unnecessary minutes to unrelated PRs.
Fix: Add a paths: filter mirroring the one already in link-check.yml:
on:
pull_request:
paths:
- 'docs/**'
- '.github/workflows/deploy-pages-test.yml'3. Missing trailing newlines
Both .github/workflows/deploy-pages-test.yml and .github/workflows/link-check.yml are missing a trailing newline at EOF. Most CI linters and editors flag this.
Open Suggestions (not blockers)
4. link-check.yml path filter is incomplete
The filter covers docs/docs/** but not docs/docusaurus.config.ts or docs/sidebars.ts. Changing those files can introduce broken internal anchors/links without touching any .md file — and the link check won't fire.
Suggested addition to paths::
- 'docs/docusaurus.config.ts'
- 'docs/sidebars.ts'5. actions/checkout@v6 — verify this version exists
link-check.yml references actions/checkout@v6. The latest stable release is v4. This may be a forward-looking pin, but if v6 doesn't exist yet it will fail at runtime. Consider pinning to v4 (or a commit SHA for supply-chain safety).
What's Good
- Dual-layer approach (Markdown lint + HTML scan of built output) is well-designed: it catches link issues at both source and rendered levels.
accept = [429]prevents false-positive failures for rate-limited external sites while still catching real 404s.onBrokenAnchors: 'throw'andonBrokenMarkdownLinks: 'throw'are the right defaults — parity withonBrokenLinks: 'throw'closes a real gap.concurrency+cancel-in-progress: trueandpermissions: contents: readare CI best practices, well applied.- Sensible
excludelist inlychee.tomlcovers the sites that reliably block bots.
The overall architecture and intent of this PR are solid. Once the three blockers above are addressed it should be ready to merge.
Add a `paths:` filter so the Docusaurus build + lychee scan only fires when something under docs/ (or the workflow itself) actually changes. Pure C# PRs no longer pay the ~90s build cost for a deploy test that has nothing to validate. Mirrors the filter pattern already used by link-check.yml.
There was a problem hiding this comment.
Follow-up Review — Previous Issues Addressed
This is a follow-up to the four prior reviews. The PR has been updated (now on commit aa38e43) and the outstanding blockers have all been resolved. Marking the status below.
Previously Raised Issues — Status
✅ Fixed: Cache path mismatch in deploy-pages-test.yml
path: .lycheecache is now correct (was docs/.lycheecache). Matches where lychee writes when running from ${{ github.workspace }}.
✅ Fixed: deploy-pages-test.yml had no paths: filter
The workflow now correctly scopes to:
paths:
- 'docs/**'
- '.github/workflows/deploy-pages-test.yml'Pure C# PRs no longer trigger the Docusaurus build + lychee scan.
✅ Fixed: link-check.yml path filter expanded
Now includes docs/docusaurus.config.ts, docs/sidebars.ts, docs/lychee.toml, and the workflow file itself — so config-only changes to the docs site also trigger the check.
✅ Fixed: Missing trailing newlines
Both workflow files now end with a proper trailing newline.
One Remaining Item
actions/checkout@v6 — verify this tag exists
link-check.yml still references:
- uses: actions/checkout@v6As of the time of this review the latest stable release of actions/checkout is v4. If v6 does not resolve to a real tag the job will fail at runtime. Please confirm this tag exists, or pin to v4 (or a commit SHA for supply-chain safety, consistent with security best practices).
Minor Observation (non-blocking)
The lychee URL-response cache key for the HTML check is based on a hash of the Markdown source files:
key: lychee-html-${{ hashFiles('docs/docs/**/*.md', 'docs/docs/**/*.mdx', 'docs/lychee.toml') }}Lychee's cache stores external URL responses keyed by URL (with a 1-day TTL from lychee.toml), not source content. This means the cache will be invalidated every time any doc file changes even if no external links changed — slightly suboptimal. A time-based key (e.g., lychee-html-${{ github.run_id }} with restore-keys: lychee-html-) would restore the previous day's URL cache regardless of what changed in source. This is non-blocking; the current approach works correctly, just wastes cache restores occasionally.
Overall Assessment
All three blockers from the previous reviews have been fixed. The architecture (dual-layer Markdown + HTML checks, shared lychee.toml, Docusaurus strictness upgrades) is solid and well-structured. Once the actions/checkout@v6 tag is confirmed or corrected, this is ready to merge.
Summary
Closes the gap reported in #5672 — broken/404 links on https://tunit.dev/docs.
.github/workflows/link-check.yml(new) —markdown-linksjob runs lychee againstdocs/docs/**/*.{md,mdx}on PRs/pushes that touch docs. Path-filtered so pure C# PRs do not trigger it..github/workflows/deploy-pages-test.yml— added a "Check links in built HTML" step after the existingyarn build, scanningdocs/build/**/*.html. Reuses the existing build, no duplicate install.docs/lychee.toml(new) — shared lychee config: 1-day cache, retries,accept = [429]for rate-limited links, exclusions for sites that block bots (NuGet, Stack Overflow, GitHub sponsors, the Docusaurus "edit this page" template, etc.).docs/docusaurus.config.ts—onBrokenAnchors: ''throw''added;onBrokenMarkdownLinksflipped from''warn''to''throw''. Brings parity with the existingonBrokenLinks: ''throw''so internal anchor / raw-markdown breakage now fails the build the same way page-link breakage already does.yarn buildstill passes locally.Test plan
markdown-linksjob runs on this PR (path filter matches becausedocs/docs/**is unchanged here, but the workflow file itself is in the path filter — should still trigger via thedocs/lychee.tomland workflow paths).deploy-pages-testjob builds Docusaurus with the tightened config and then runs lychee against the rendered HTML.deploy-pages-test, and an external 404 should faillink-check.