Skip to content

refactor(citation-validator): restructure validation error suggestions as YAML with correct link terminology #6

@WesleyMFrederick

Description

@WesleyMFrederick

Problem

citation-manager validate produces hard-to-read error suggestions that mislabel anchor types and dump everything into a single flat string. When a link target is not found, the suggestion output mixes header anchors, block anchors, and fuzzy matches into one unstructured blob:

Suggestion: Available anchors: TRACE%20LLM%20..., TRACE: LLM...; Available headers: "evidence-tag-numbering — Baseline" → #evidence-tag-numbering — Baseline, ...; Available block refs: ^O-001, ^A-001, ...

Three specific problems:

  1. "Available anchors" mixes header IDs and block anchor IDs with no type distinction
  2. "Available headers" uses "rawText" → #id format that is noisy and hard to parse with long header names
  3. Error message always says Anchor not found: #X regardless of whether the link was a #header link or a #^block-anchor link

Reproduction Steps

  1. Run citation-manager validate openspec/changes/evidence-tag-numbering/baseline.md
  2. Observe error on line 155:
    ├─ Line 155: [O-012](#TRACE:%20LLM%20creates%20evidence%20tags%20during%20artifact%20authoring)
    │  └─ Anchor not found: #TRACE:%20LLM%20creates%20evidence%20tags%20during%20artifact%20authoring
    │  └─ Suggestion: Available anchors: TRACE%20LLM%20creates%20evidence%20tags%20during%20artifact%20authoring%20(opsxcontinue), TRACE: LLM creates evidence tags during artifact authoring (opsx:continue), TRACE%20Human%20hand-numbers%20tags%20in%20evidence-tag-numbering%20whiteboard%20(manual); Available headers: "evidence-tag-numbering — Baseline" → #evidence-tag-numbering — Baseline, "Artifacts (minimum set for this baseline)" → #Artifacts (minimum set for this baseline), "Traces" → #Traces, "TRACE: LLM creates evidence tags during artifact authoring (opsx:continue)" → #TRACE: LLM creates evidence tags during artifact authoring (opsx:continue), "TRACE: Human hand-numbers tags in evidence-tag-numbering whiteboard (manual)" → #TRACE: Human hand-numbers tags in evidence-tag-numbering whiteboard (manual); Available block refs: ^O-001, ^A-001, ^Q-002, ^D-005, ^TAG-NNN
    
  3. Note: suggestion is one massive semicolon-joined line, "Available anchors" contains both header and block types, and error message doesn't distinguish #header from #^block-ref links

Root Cause

Three components contribute:

1. CitationValidator.validateAnchorExists() (lines 856-893)

  • Lines 857: findSimilarAnchors(anchor) returns fuzzy matches from ALL anchor types (headers + blocks mixed) via ParsedDocument._getAnchorIds()
  • Lines 860-863: Headers formatted as "rawText" → #id — noisy with long names
  • Lines 870-891: All three suggestion arrays (suggestions, availableHeaders, availableBlockRefs) are joined with ; into one flat string
  • Lines 452, 494, 591, 676: Error always says Anchor not found: #${anchor} — no distinction between #header and #^block-ref link types

2. ParsedDocument._getAnchorIds() (lines 306-329)

  • Returns a flat string[] combining both header IDs (with urlEncodedId variants) and block anchor IDs
  • No type information preserved — consumer (findSimilarAnchors) cannot distinguish header from block results

3. citation-manager.ts formatForCLI() (lines 325-331)

  • Dumps link.validation.suggestion as-is: Suggestion: ${link.validation.suggestion}
  • No structured formatting — whatever string the validator produces gets printed verbatim

Expected Behavior

Error output should use YAML-style structured format with correct markdown link terminology:

├─ Line 155: [O-012](#TRACE:%20LLM%20creates%20evidence%20tags%20during%20artifact%20authoring)
│  error: "#Header not found"
│  link:
│    display: "O-012"
│    target: "#TRACE:%20LLM%20creates%20evidence%20tags%20during%20artifact%20authoring"
│  suggestion:
│    similar_anchors:
│      - "TRACE%20LLM%20creates%20...%20(opsx%20continue)"
│    available_headers:
│      - "#evidence-tag-numbering — Baseline"
│      - "#Artifacts (minimum set for this baseline)"
│      - "#Traces"
│    available_block_anchors:
│      - "^O-001"
│      - "^A-001"

For #^block-ref links, error should say "^Block-anchor not found".

Related

  • Discovered while validating openspec/changes/evidence-tag-numbering/baseline.md
  • 9 of 12 citations failed with hard-to-read suggestions

Note

Key design decisions for implementation:

  1. Link type detection: Parse the link target to determine type — #^ prefix = block anchor, # prefix = header. The citation.target.anchor already has this info.
  2. Suggestion structure: Return structured object from validateAnchorExists() instead of flat string. Let formatForCLI() handle rendering.
  3. YAML rendering: Use indented key-value pairs in CLI output (not a YAML library — just formatted strings matching YAML style for readability).
  4. Backward compatibility: JSON output format (--format json) should also return structured suggestions. CLI consumers parsing the text output may need migration.

Acceptance Criteria

  • Error message distinguishes #Header not found from ^Block-anchor not found based on link type
  • Suggestion output uses YAML-style structured format with link.display, link.target, and categorized suggestions
  • similar_anchors, available_headers, and available_block_anchors are separate sections in output
  • Headers shown as #Header Name (not "rawText" → #id)
  • Block anchors shown as ^ID (not mixed into "Available anchors")
  • findSimilarAnchors() preserves type information or results are filtered by type
  • JSON output (--format json) returns structured suggestion object
  • Existing valid citation output is unchanged

Definition of Done

  • Failing tests written (RED phase)
  • Implementation complete (GREEN phase)
  • All tests pass
  • Build succeeds (npm run build -w tools/citation-manager)
  • Committed with conventional commit

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions