Skip to content

Bug: ContentExtractor reports 'Heading not found' for valid citations #14

@WesleyMFrederick

Description

@WesleyMFrederick

Problem

The citation-manager extract links command reports links as failed with "Heading not found" even though the same links are validated as valid by both the internal validation phase and the validate command. This creates contradictory status messages that confuse users.

Reproduction Steps

  1. Run extract links on a file with heading-anchored citations:
citation-manager extract links tools/citation-manager/design-docs/features/20251119-type-contract-restoration/ROLLBACK-PLAN.md
  1. Observe contradictory output for link on line 329:
[Level 3: Components](../../ARCHITECTURE-Citation-Manager.md#Level%203%20Components)
{
  "sourceLink": {
    "text": "Level 3: Components",
    "validation": {
      "status": "valid"
    }
  },
  "status": "failed",
  "failureDetails": {
    "reason": "Heading not found: Level 3 Components"
  }
}
  1. Confirm validation command says link is valid:
citation-manager validate tools/citation-manager/design-docs/features/20251119-type-contract-restoration/ROLLBACK-PLAN.md --lines 329-329
# Output: VALID CITATIONS (1)
  1. Confirm heading exists in target file:
grep -n "^## Level 3: Components" tools/citation-manager/design-docs/ARCHITECTURE-Citation-Manager.md
# Output: 79:## Level 3: Components

Root Cause

ContentExtractor uses different heading resolution logic than CitationValidator, causing inconsistent results. Possible issues:

  • URL encoding handling (Level%203%20Components vs Level 3 Components)
  • Case sensitivity differences
  • Whitespace normalization differences
  • Different anchor matching algorithms

Expected Behavior

If validation.status is "valid", the extraction should either:

  1. Succeed in extracting content from the valid heading, OR
  2. Report a different, accurate error (not "heading not found" for a heading that exists)

Related


Acceptance Criteria

  • extract links returns status: "success" for citations that validate reports as valid
  • ContentExtractor correctly resolves URL-encoded anchors (e.g., Level%203%20ComponentsLevel 3: Components)
  • Heading resolution handles colon characters in headings (e.g., ## Level 3: Components)
  • No contradictory status between validation.status and extraction status
  • Extraction succeeds for headings with special characters (spaces, colons, dashes)

Definition of Done

  • Failing test written: extract links with URL-encoded heading anchor returns "failed" (RED phase)
  • Root cause identified: diff between ContentExtractor and CitationValidator heading resolution
  • Fix implemented: align ContentExtractor heading resolution with CitationValidator logic (GREEN phase)
  • All existing citation-manager tests pass
  • Build succeeds: npm run build -w tools/citation-manager && npm link -w tools/citation-manager
  • Manual verification: reproduction command above returns status: "success" for the test link
  • Committed with conventional commit message referencing this issue

Metadata

Metadata

Assignees

No one assigned

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions