Skip to content

bug: issue-sync creates duplicate issues due to silent gh create failure + insufficient --limit 50 #15234

@robstiles

Description

@robstiles

Description

issue-sync-helper.sh push creates duplicate GitHub issues for the same TODO.md task. Observed in a managed private repo: 19 total duplicates across 8 task IDs, some tripled. Root cause verified from CI run logs.

Prior related issues: t1141 (#1714, search→list API) and t1142 (#1724, concurrency guard) addressed earlier duplicate vectors. The bugs below are distinct and survived those fixes.

Root Cause (verified from CI logs)

Bug 1 (Primary): gh issue create returns empty URL despite server-side creation

In _push_create_issue() (issue-sync-helper.sh line 325):

url=$(gh "${args[@]}" 2>/dev/null || echo "")
[[ -z "$url" ]] && { print_error "Failed to create issue for $task_id"; return 2; }

gh issue create creates the issue on GitHub, but returns empty stdout. The 2>/dev/null suppresses any stderr diagnostics. Code treats empty URL as full failure → no ref:GH# written to TODO.md. The pull step later finds the issue and writes the ref, but if the commit/push step fails or the user overwrites TODO.md from local, the ref is lost.

Evidence from CI push run logs (two separate runs):

  • Run A push step: "Processing 10 task(s) → 0 created, 0 skipped" (all 10 "failed")
  • But all 10 issues WERE created on GitHub (confirmed by creation timestamps matching exactly)
  • Commit step: "No TODO.md changes to commit" — no refs persisted
  • Run B push step: "Processing 29 task(s) → 0 created, 14 skipped" (15 "failed")
  • But those 15 issues WERE created on GitHub

Likely cause: --label "$all_labels" includes labels like status:available,origin:worker. If label application fails after issue creation, gh may exit non-zero without printing the URL. Needs stderr capture to confirm.

Bug 2 (Amplifier): --limit 50 in gh_find_issue_by_title is insufficient

gh issue list --repo "$repo" --state "$state" --limit "$limit" \  # limit defaults to 50
    --json number,title --jq "..."

In a repo with 500+ issues, --limit 50 only covers the 50 most recent. Issues created >50 issues ago are invisible to dedup.

Evidence: --limit 50 in the affected repo returns only issues numbered #482-548. The first duplicate batch (#329-339, created 5 days earlier) is completely outside this window. When the next CI run checked for existing issues, the originals were invisible → duplicates created.

Both bugs compound: Bug 1 prevents ref write-back. Bug 2 prevents the fallback dedup check from catching the already-created issue. Together they guarantee duplicates on every subsequent push run.

Observed Pattern

  • 8 task IDs affected, producing 19 total duplicate issues
  • Duplicates created in 3 distinct CI push batches over 5 days
  • Triple duplicates for 3 tasks, double duplicates for 5 tasks
  • All created by github-actions (CI push workflow), zero from local/pulse

Reproduction Steps

  1. Set up a repo with >50 open+closed issues and issue-sync.yml
  2. Add tasks to TODO.md without ref:GH# and push to main
  3. If gh issue create fails to return URL (intermittent), refs are not written
  4. Push another TODO.md change → tasks without refs are re-processed
  5. gh_find_issue_by_title --limit 50 misses the old issues → duplicates created

Recommended Fixes

Fix 1: Resilient gh issue create error handling (critical)

# In _push_create_issue(), replace line 325-329:
local url
url=$(gh "${args[@]}" 2>&1)
local gh_exit=$?
if [[ $gh_exit -ne 0 || -z "$url" || ! "$url" =~ ^https:// ]]; then
    # Issue may have been created despite error — recovery check
    sleep 1  # Brief delay for API consistency
    local recovery
    recovery=$(gh_find_issue_by_title "$repo" "${task_id}:" "all" 500)
    if [[ -n "$recovery" && "$recovery" != "null" ]]; then
        print_warning "gh create exited $gh_exit but issue found: #$recovery"
        _PUSH_CREATED_NUM="$recovery"
        return 0
    fi
    print_error "Failed to create issue for $task_id (exit $gh_exit): ${url:0:200}"
    return 2
fi

Fix 2: Increase --limit from 50 to 500

# In gh_find_issue_by_title(), change default:
local repo="$1" prefix="$2" state="${3:-all}" limit="${4:-500}"

Also update all call sites passing explicit limit. 500 covers repos up to ~1000 issues; for larger repos, consider paginated search.

Fix 3: Use cancel-in-progress: false for sync-on-push (latent risk)

The original t1142 recommended cancel-in-progress: false, but the implementation used true. While zero cancelled runs were observed, this is a latent risk — a cancelled run after issue creation but before ref write-back produces the same orphaned-issue pattern as Bug 1.

Fix 4: cmd_pull misleading success messages

cmd_pull prints print_success "Added ref:GH#..." even when add_gh_ref_to_todo silently skips (because a ref already exists). This made debugging harder. Check return value or file modification before printing success.

Factors Ruled Out

  • cancel-in-progress killing runs: 0 cancelled runs in 500-run history. Not the trigger, but still a latent risk.
  • Pulse/local push: No pulse scheduled in affected repo. All dupes from CI. CI-only guard at line 425 works correctly.
  • GitHub search API lag: gh_find_issue_by_title already uses gh issue list (real-time API, fixed by t1141), not search. Not a factor.
  • Race between concurrent runs: Concurrency group exists (t1142). Not the trigger.
  • User error: Normal operation — push TODO.md to main, CI syncs. Working as designed; bugs are in framework code.

Environment

  • aidevops version: 3.5.565
  • OS: Ubuntu 24.04.4 LTS
  • Shell: bash 5.2.21(1)-release
  • gh CLI: gh version 2.89.0
  • Repo: a managed private repo (548 issues at time of analysis)

Cleanup Already Done

12 duplicate issues closed and 11 tasks annotated with ref:GH# in the affected repo.

Metadata

Metadata

Assignees

Labels

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions