Skip to content

ci(docs): add link-check automation and tighten Docusaurus strictness#5779

Merged
thomhurst merged 5 commits intomainfrom
docs/link-check-automation
Apr 27, 2026
Merged

ci(docs): add link-check automation and tighten Docusaurus strictness#5779
thomhurst merged 5 commits intomainfrom
docs/link-check-automation

Conversation

@thomhurst
Copy link
Copy Markdown
Owner

Summary

Closes the gap reported in #5672 — broken/404 links on https://tunit.dev/docs.

  • .github/workflows/link-check.yml (new) — markdown-links job runs lychee against docs/docs/**/*.{md,mdx} on PRs/pushes that touch docs. Path-filtered so pure C# PRs do not trigger it.
  • .github/workflows/deploy-pages-test.yml — added a "Check links in built HTML" step after the existing yarn build, scanning docs/build/**/*.html. Reuses the existing build, no duplicate install.
  • docs/lychee.toml (new) — shared lychee config: 1-day cache, retries, accept = [429] for rate-limited links, exclusions for sites that block bots (NuGet, Stack Overflow, GitHub sponsors, the Docusaurus "edit this page" template, etc.).
  • docs/docusaurus.config.tsonBrokenAnchors: ''throw'' added; onBrokenMarkdownLinks flipped from ''warn'' to ''throw''. Brings parity with the existing onBrokenLinks: ''throw'' so internal anchor / raw-markdown breakage now fails the build the same way page-link breakage already does. yarn build still passes locally.

Test plan

  • CI: markdown-links job runs on this PR (path filter matches because docs/docs/** is unchanged here, but the workflow file itself is in the path filter — should still trigger via the docs/lychee.toml and workflow paths).
  • CI: deploy-pages-test job builds Docusaurus with the tightened config and then runs lychee against the rendered HTML.
  • Confirm both lychee invocations either pass or surface concrete broken links to triage.
  • After merge, the next docs PR that introduces a broken anchor / raw markdown link should fail deploy-pages-test, and an external 404 should fail link-check.

…tness

Closes the gap reported in #5672 (broken/404 links on tunit.dev/docs):

- Add .github/workflows/link-check.yml with a `markdown-links` job that
  runs lychee against docs/docs/**/*.{md,mdx} on PRs touching docs.
- Extend deploy-pages-test.yml with a lychee step that scans the built
  HTML after `yarn build`, reusing the existing build (no duplicate
  install/build per PR).
- Add docs/lychee.toml with shared config: caching, retries, accepted
  status codes, and exclusions for bot-hostile sites (NuGet, SO, etc.).
- Tighten docusaurus.config.ts: onBrokenAnchors and onBrokenMarkdownLinks
  now `throw` instead of `warn`, matching the existing onBrokenLinks
  policy. Build passes locally with no pre-existing breakage.
@codacy-production
Copy link
Copy Markdown

codacy-production Bot commented Apr 27, 2026

Up to standards ✅

🟢 Issues 0 issues

Results:
0 new issues

View in Codacy

NEW Get contextual insights on your PRs based on Codacy's metrics, along with PR and Jira context, without leaving GitHub. Enable AI reviewer
TIP This summary will be updated as you push new changes.

Copy link
Copy Markdown
Contributor

@claude claude Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

Overall this is a well-structured addition — path-filtered jobs, shared lychee config, and tightening Docusaurus strictness are all good choices. One functional bug and a few consistency nits below.


Bug: Cache path mismatch in deploy-pages-test.yml

The actions/cache step is configured to save/restore docs/.lycheecache, but lychee writes its cache to .lycheecache relative to its working directory. Since workingDirectory: ${{ github.workspace }} is the repo root, lychee actually writes to <workspace>/.lycheecache — not docs/.lycheecache. The cache will never be populated or hit.

# Current (broken):
path: docs/.lycheecache        # ← points to wrong dir

# Fix option A — match where lychee actually writes:
path: .lycheecache

# Fix option B — pin the location explicitly in lychee.toml:
cache_dir = .lycheecache    # then use path: .lycheecache in both workflows

The link-check.yml workflow is correct (it uses path: .lycheecache without a workingDirectory override).


Minor: actions/checkout@v6 — verify this version exists

link-check.yml uses actions/checkout@v6. The other new step in deploy-pages-test.yml uses actions/cache@v4. Double-check that checkout@v6 is a real release tag and not a typo for v4 — if it doesn't resolve, the job silently falls back to HEAD or fails.


Minor: Missing newline at end of both workflow files

Both deploy-pages-test.yml and link-check.yml are missing a trailing newline (the diff shows \ No newline at end of file). POSIX tools and some CI linters treat this as malformed. A single blank line at the end of each file fixes it.


Suggestion: Add cache_dir to lychee.toml for explicitness

Rather than relying on lychee's implicit default, consider making the cache directory explicit in the shared config:

cache_dir = ".lycheecache"

This removes ambiguity for anyone adding a third workflow later and makes the cache path something you can grep for rather than infer.


Docusaurus config tightening — looks correct

onBrokenAnchors: 'throw' and flipping onBrokenMarkdownLinks from 'warn' to 'throw' are exactly the right choices to bring these into parity with the existing onBrokenLinks: 'throw'. Silent warnings in build tooling tend to accumulate and never get fixed.


accept = [429] in lychee config — tradeoff worth noting

Silently passing rate-limited responses is the pragmatic choice for CI (NuGet, GitHub etc. will rate-limit bots), but it means a legitimately-dead link that happens to return 429 intermittently will never be caught. This is an acceptable tradeoff given the exclusion list already covers the main offenders — just worth documenting the limitation in the config with a comment if you want future maintainers to understand why.

…ve paths

First CI run surfaced three issues with the lychee config:

1. `accept = [429]` REPLACED defaults, so 200 OK was being rejected.
   Restore explicit `[200, 206, 429]`.
2. Lychee fails to parse root-relative paths (`/docs/...`, `/img/...`)
   without a base. Pass `--base https://tunit.dev` so they resolve as URLs.
3. Bare relative paths (`mocking/advanced` -> `mocking/advanced.md` via
   Docusaurus rewrite) confused lychee, which tried the literal filename.

Restrict lychee to `scheme = ["https", "http"]` so all file:// URIs are
skipped, and exclude `^https?://tunit\.dev/` so we do not re-check our
own site (Docusaurus build already validates internal links via
onBrokenLinks/Anchors/MarkdownLinks).

Net effect: lychee now checks only external URLs, exactly the gap the
Docusaurus build does not cover.
- docs/docs/examples/aspire.md: update Aspire overview link from
  learn.microsoft.com (404) to aspire.dev/get-started/what-is-aspire/
  where Microsoft now redirects. This is the first real bug surfaced
  by the new link checker.
- link-check.yml + deploy-pages-test.yml: rename --base to --base-url
  (lychee 0.23 deprecated --base).
Copy link
Copy Markdown
Contributor

@claude claude Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PR Review: ci(docs): add link-check automation and tighten Docusaurus strictness

Overall assessment: This is a well-structured and well-motivated CI improvement. The dual-layer strategy (Markdown source check + rendered-HTML check), the shared lychee config, and the Docusaurus strictness upgrades are all the right ideas. There are a handful of correctness issues and design concerns worth addressing before merge.


Bugs / Correctness Issues

1. Cache path mismatch in deploy-pages-test.yml (will silently defeat caching)

.github/workflows/deploy-pages-test.yml, around the actions/cache step:

path: docs/.lycheecache

The lychee-action step sets workingDirectory: ${{ github.workspace }}, which means lychee writes its cache file to WORKSPACE_ROOT/.lycheecache. The actions/cache step saves docs/.lycheecache — a completely different location. The cache will be saved to the wrong path on every run and will never be restored. The correct value should be .lycheecache (matching link-check.yml).

Why this matters: Silent caching failures are hard to diagnose. Lychee will re-validate every external URL on every run, slowing CI and increasing rate-limit exposure.

2. Both new/modified files are missing a trailing newline

The raw patch for both .github/workflows/deploy-pages-test.yml and .github/workflows/link-check.yml ends with \ No newline at end of file. Minor but causes noisy diffs in editors.


Design / Architecture Concerns

3. Lychee runs on every PR regardless of whether docs changed

deploy-pages-test.yml has no paths: filter, so it fires on every PR to main — including pure C# changes with nothing to do with docs. The Docusaurus build + lychee scan adds meaningful wall-clock time to every single PR. Consider:

on:
  pull_request:
    paths:
      - 'docs/**'

link-check.yml handles this correctly with a paths: filter — worth applying the same discipline here.

4. link-check.yml path filter misses config files that affect links

The current filter only triggers on docs/docs/**. But docs/docusaurus.config.ts and docs/sidebars.ts can introduce or break internal link relationships (sidebar entries, base URL changes) without touching any individual .md file. Suggest expanding:

paths:
  - 'docs/docs/**'
  - 'docs/docusaurus.config.ts'
  - 'docs/sidebars.ts'
  - 'docs/lychee.toml'
  - '.github/workflows/link-check.yml'

5. deploy-pages.yml (the actual publish job) has no link check

deploy-pages-test.yml checks links on PRs, but deploy-pages.yml — which runs on push to main and actually publishes the site — does not run any link check. A non-docs PR (e.g., renaming a namespace that's referenced in config) that breaks a link will pass both PR checks and still break the live site. Adding the same lychee HTML step to deploy-pages.yml would close this gap.

Why this matters: The PR check is a gate, but if a bypassed path exists the check provides false confidence.


Positive Feedback

  • Excellent layered approach. Checking Markdown sources separately from the rendered HTML correctly captures two distinct failure modes — typos in raw links vs. Docusaurus-generated HTML that lychee sees as the browser would.

  • The accept = [429] with explicit comment is a great catch. Lychee replacing rather than extending the default accept list is a footgun; calling it out in the comment prevents future confusion.

  • Self-links excluded from external checks (^https?://tunit\\.dev/). Correct call for PR-time checks where new pages aren't live yet.

  • onBrokenAnchors: 'throw' in docs/docusaurus.config.ts is a meaningful tightening — broken anchors are a common silent failure mode in documentation sites.

  • concurrency + cancel-in-progress: true on link-check.yml means rapid follow-up pushes don't queue expensive network-bound jobs.

  • workflow_dispatch on link-check.yml enables manual re-runs without committing a docs change — very useful when a third-party site comes back online.

  • permissions: contents: read follows least-privilege correctly.


Minor / Nit

  • max_concurrency = 8 in lychee.toml is reasonable but may hit rate limits from GitHub.com. If flakiness appears, dropping to 4–6 is an easy knob.
  • The lychee-action is pinned to @v2 (a floating tag). For security-sensitive CI, pinning to a specific commit SHA is best practice, though this is consistent with the broader repo's current conventions.

Copy link
Copy Markdown
Contributor

@claude claude Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Follow-up Review — Previous Issues Not Yet Addressed

The PR has not been updated since the previous reviews (still on commit 13c917ed). The issues raised earlier remain outstanding. Summarising what still needs fixing before merge:


Outstanding: Cache path bug in deploy-pages-test.yml

# Still present — incorrect:
path: docs/.lycheecache

workingDirectory: ${{ github.workspace }} places lychee at the repo root, so it writes its cache to <root>/.lycheecache, not docs/.lycheecache. The cache will never hit. Fix: use path: .lycheecache (matching link-check.yml).

This is a functional correctness issue — every CI run re-validates all external links from scratch, increasing both runtime and rate-limit exposure.


Outstanding: deploy-pages-test.yml fires on every PR, not just doc changes

The workflow has no paths: filter. A pure C# code PR (no docs touched) still triggers a full Docusaurus build + lychee scan. link-check.yml handles this correctly with:

paths:
  - 'docs/docs/**'
  - 'docs/lychee.toml'
  - '.github/workflows/link-check.yml'

Adding an equivalent filter to deploy-pages-test.yml (e.g. docs/**) would avoid unnecessary overhead on every non-docs PR.


Outstanding: Missing trailing newlines

Both .github/workflows/deploy-pages-test.yml and .github/workflows/link-check.yml still end without a trailing newline (\ No newline at end of file in the diff). POSIX convention and some YAML linters treat this as malformed.


Suggestion still open: link-check.yml path filter could include sibling config files

docs/docusaurus.config.ts and docs/sidebars.ts can introduce broken internal link relationships without touching a .md file. Worth adding to the filter.


No change since last review — nothing new to flag

The Docusaurus config tightening (onBrokenAnchors: 'throw', onBrokenMarkdownLinks: 'throw'), the shared lychee.toml design, the self-link exclusion, and the concurrency setup all look good and have not regressed.

The three items above (cache path, missing paths filter, trailing newlines) are the blockers worth resolving before merge.

Address review feedback on PR #5779:

- deploy-pages-test.yml: lychee runs from `${{ github.workspace }}`,
  so its on-disk cache lives at <root>/.lycheecache, not
  docs/.lycheecache. The cache step was looking in the wrong place
  and never hit. Match the path to where lychee actually writes.
- deploy-pages-test.yml: add the missing trailing newline.
- link-check.yml: trigger on docusaurus.config.ts and sidebars.ts
  too — sidebar/config changes can introduce broken internal links
  without touching any individual .md file.
Copy link
Copy Markdown
Contributor

@claude claude Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Follow-up Review — Outstanding Issues Remain

This is a follow-up to the three previous reviews on this PR. The PR has not been updated since commit 13c917ed, so the issues raised below are still unresolved.


Blockers (must fix before merge)

1. Cache path mismatch in deploy-pages-test.yml

The actions/cache step saves path: docs/.lycheecache, but lychee (run with workingDirectory: ${{ github.workspace }}) writes its cache to <repo-root>/.lycheecache. These paths never match — the cache silently misses on every run, wasting network time on every build.

Fix: Change path: docs/.lycheecachepath: .lycheecache.

2. deploy-pages-test.yml has no paths: filter

Every PR — including pure C# changes with no doc touches — will trigger the Docusaurus build + lychee HTML scan. This adds unnecessary minutes to unrelated PRs.

Fix: Add a paths: filter mirroring the one already in link-check.yml:

on:
  pull_request:
    paths:
      - 'docs/**'
      - '.github/workflows/deploy-pages-test.yml'

3. Missing trailing newlines

Both .github/workflows/deploy-pages-test.yml and .github/workflows/link-check.yml are missing a trailing newline at EOF. Most CI linters and editors flag this.


Open Suggestions (not blockers)

4. link-check.yml path filter is incomplete

The filter covers docs/docs/** but not docs/docusaurus.config.ts or docs/sidebars.ts. Changing those files can introduce broken internal anchors/links without touching any .md file — and the link check won't fire.

Suggested addition to paths::

- 'docs/docusaurus.config.ts'
- 'docs/sidebars.ts'

5. actions/checkout@v6 — verify this version exists

link-check.yml references actions/checkout@v6. The latest stable release is v4. This may be a forward-looking pin, but if v6 doesn't exist yet it will fail at runtime. Consider pinning to v4 (or a commit SHA for supply-chain safety).


What's Good

  • Dual-layer approach (Markdown lint + HTML scan of built output) is well-designed: it catches link issues at both source and rendered levels.
  • accept = [429] prevents false-positive failures for rate-limited external sites while still catching real 404s.
  • onBrokenAnchors: 'throw' and onBrokenMarkdownLinks: 'throw' are the right defaults — parity with onBrokenLinks: 'throw' closes a real gap.
  • concurrency + cancel-in-progress: true and permissions: contents: read are CI best practices, well applied.
  • Sensible exclude list in lychee.toml covers the sites that reliably block bots.

The overall architecture and intent of this PR are solid. Once the three blockers above are addressed it should be ready to merge.

Add a `paths:` filter so the Docusaurus build + lychee scan only fires
when something under docs/ (or the workflow itself) actually changes.
Pure C# PRs no longer pay the ~90s build cost for a deploy test that
has nothing to validate.

Mirrors the filter pattern already used by link-check.yml.
Copy link
Copy Markdown
Contributor

@claude claude Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Follow-up Review — Previous Issues Addressed

This is a follow-up to the four prior reviews. The PR has been updated (now on commit aa38e43) and the outstanding blockers have all been resolved. Marking the status below.


Previously Raised Issues — Status

✅ Fixed: Cache path mismatch in deploy-pages-test.yml

path: .lycheecache is now correct (was docs/.lycheecache). Matches where lychee writes when running from ${{ github.workspace }}.

✅ Fixed: deploy-pages-test.yml had no paths: filter

The workflow now correctly scopes to:

paths:
  - 'docs/**'
  - '.github/workflows/deploy-pages-test.yml'

Pure C# PRs no longer trigger the Docusaurus build + lychee scan.

✅ Fixed: link-check.yml path filter expanded

Now includes docs/docusaurus.config.ts, docs/sidebars.ts, docs/lychee.toml, and the workflow file itself — so config-only changes to the docs site also trigger the check.

✅ Fixed: Missing trailing newlines

Both workflow files now end with a proper trailing newline.


One Remaining Item

actions/checkout@v6 — verify this tag exists

link-check.yml still references:

- uses: actions/checkout@v6

As of the time of this review the latest stable release of actions/checkout is v4. If v6 does not resolve to a real tag the job will fail at runtime. Please confirm this tag exists, or pin to v4 (or a commit SHA for supply-chain safety, consistent with security best practices).


Minor Observation (non-blocking)

The lychee URL-response cache key for the HTML check is based on a hash of the Markdown source files:

key: lychee-html-${{ hashFiles('docs/docs/**/*.md', 'docs/docs/**/*.mdx', 'docs/lychee.toml') }}

Lychee's cache stores external URL responses keyed by URL (with a 1-day TTL from lychee.toml), not source content. This means the cache will be invalidated every time any doc file changes even if no external links changed — slightly suboptimal. A time-based key (e.g., lychee-html-${{ github.run_id }} with restore-keys: lychee-html-) would restore the previous day's URL cache regardless of what changed in source. This is non-blocking; the current approach works correctly, just wastes cache restores occasionally.


Overall Assessment

All three blockers from the previous reviews have been fixed. The architecture (dual-layer Markdown + HTML checks, shared lychee.toml, Docusaurus strictness upgrades) is solid and well-structured. Once the actions/checkout@v6 tag is confirmed or corrected, this is ready to merge.

@thomhurst thomhurst merged commit 2fe6f10 into main Apr 27, 2026
16 checks passed
@thomhurst thomhurst deleted the docs/link-check-automation branch April 27, 2026 23:25
This was referenced Apr 29, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant