Skip to content

t1011: Model contest mode — run top-3 models in parallel, cross-rank results#1304

Closed
marcusquinn wants to merge 4 commits intomainfrom
feature/t1011
Closed

t1011: Model contest mode — run top-3 models in parallel, cross-rank results#1304
marcusquinn wants to merge 4 commits intomainfrom
feature/t1011

Conversation

@marcusquinn
Copy link
Copy Markdown
Owner

@marcusquinn marcusquinn commented Feb 12, 2026

Summary

Model contest mode for the supervisor (t1011). When model selection is uncertain, dispatches the same task to top-3 models in parallel, then cross-ranks all outputs to pick the winner. Builds permanent routing data for future model selection.

Ref #1301

Changes

New: contest-helper.sh (standalone orchestrator)

  • create — creates a contest with entries for top-3 models
  • dispatch — launches parallel workers (one per model) via supervisor
  • evaluate — cross-ranks outputs (each model scores all, anonymised as A/B/C)
  • apply — promotes winners PR, cancels losers
  • should-contest — detects when contest mode should trigger
  • pulse-check — for supervisor pulse integration
  • Records results in both pattern-tracker and response-scoring DBs

Modified: supervisor-helper.sh

  • DB migration: Contest tables (contests, contest_entries) created in ensure_db()
  • Model routing: resolve_task_model() detects model:contest (priority 0) and auto-contest via SUPERVISOR_CONTEST_AUTO=true (step 4.5)
  • Dispatch: cmd_dispatch() intercepts CONTEST model resolution, delegates to contest-helper
  • Pulse Phase 2.5: Checks running contests for completion, triggers evaluation + apply
  • Main router: Added contest command that delegates to contest-helper.sh

New: tests/test-contest-helper.sh

  • 20 tests covering: syntax, ShellCheck, help, table creation, contest CRUD, duplicate prevention, should-contest logic, error handling
  • All 20 pass

Updated: subagent-index.toon

  • Added contest-helper.sh entry

Flow

  1. Supervisor detects uncertainty (no pattern data, new task type, or explicit model:contest)
  2. Dispatches 3 workers with same prompt to different models (e.g. opus, sonnet, gemini)
  3. Collects outputs when all workers complete
  4. Sends all 3 outputs to each model for blind cross-ranking (outputs anonymised as A/B/C)
  5. Aggregates scores, picks winner
  6. Records results in pattern-tracker and response-scoring DB
  7. Applies winning output (promotes winners PR, cancels losers)

Trigger conditions

  • Explicit: model:contest in TODO.md task entry
  • Auto (opt-in via SUPERVISOR_CONTEST_AUTO=true):
    • No pattern data for the task type
    • Insufficient samples (<3)
    • Low success rate (<75%)

Cost

~3x a single run, but builds permanent routing data. Only triggers for genuinely uncertain cases.

Testing

All 20 tests pass: bash tests/test-contest-helper.sh --verbose

Summary by CodeRabbit

Release Notes

  • New Features

    • Added contest mode: submit uncertain tasks to top 3 models simultaneously and automatically select the best result based on cross-ranking evaluation.
    • Introduced contest command with controls for creating contests, dispatching to models, evaluating results, and managing outcomes.
  • Tests

    • Added comprehensive test coverage for contest mode functionality.

@gemini-code-assist
Copy link
Copy Markdown

Warning

You have reached your daily quota limit. Please wait up to 24 hours and I will start processing your requests again!

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai bot commented Feb 12, 2026

Walkthrough

Introduces a model contest mode system that dispatches uncertain tasks to top-3 models in parallel, collects anonymized outputs, cross-ranks results through weighted scoring criteria, and promotes the winner's output to the original task. Includes SQLite-backed contest lifecycle management, new supervisor command delegation, and comprehensive test coverage.

Changes

Cohort / File(s) Summary
Contest Helper & Tests
.agents/scripts/contest-helper.sh, tests/test-contest-helper.sh
New contest orchestration system with SQLite persistence for contests/contest_entries tables, model selection via registry, parallel task dispatch, anonymized output collection, multi-judge cross-ranking with weighted scoring, winner promotion, and full test harness covering lifecycle, DB schema, duplicate prevention, and CLI commands.
Supervisor Integration
.agents/scripts/supervisor-helper.sh
Adds contest mode detection in model resolution (task_model=="contest" → "CONTEST"), delegates dispatch to contest-helper.sh, integrates new public contest CLI command via cmd_contest(), extends pulse-check with Phase 2.5 contest evaluation tracking, and manages DB schema migrations for contests and contest_entries tables.
Registry Entry
.agents/subagent-index.toon
Registers contest-helper.sh in model registry with description of contest mode capabilities (create, dispatch, status, evaluate, apply, list, should-contest, pulse-check).

Sequence Diagram

sequenceDiagram
    participant Client
    participant Supervisor
    participant ContestHelper
    participant ModelRegistry
    participant Database as SQLite<br/>DB
    participant Models
    participant Judges
    
    Client->>Supervisor: dispatch task with model="contest"
    Supervisor->>Supervisor: resolve_task_model() → "CONTEST"
    Supervisor->>ContestHelper: create contest
    ContestHelper->>Database: INSERT contests, contest_entries
    ContestHelper->>ModelRegistry: select_top_models(3)
    ModelRegistry-->>ContestHelper: model1, model2, model3
    Supervisor->>ContestHelper: dispatch contest
    ContestHelper->>Models: dispatch subtask to each model (parallel)
    Models->>Models: process task
    Models->>Database: store results
    Supervisor->>ContestHelper: evaluate contest (when running)
    ContestHelper->>Database: fetch anonymized outputs (A/B/C)
    ContestHelper->>Judges: request cross-ranking scores
    Judges->>Judges: evaluate alternatives
    Judges-->>ContestHelper: scores per entry
    ContestHelper->>Database: aggregate weighted scores, determine winner
    ContestHelper->>Supervisor: apply winner (promote output)
    Supervisor->>Database: update original task with winner output
    Supervisor->>Models: cancel loser subtasks
    Database-->>Supervisor: contest marked complete
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Possibly related issues

  • t1011: Model contest mode #1301 — Directly implements the t1011 "model contest mode" feature with full lifecycle management, model dispatch, cross-ranking, and scoring integration as specified in the issue.

Possibly related PRs

Poem

🏆 Three models enter, one shall rise,
Anonymized to judge with clear eyes,
Contests crowned by weighted might,
Winners flow where tasks took flight,
Contest mode brings clarity to sight! ✨

🚥 Pre-merge checks | ✅ 2 | ❌ 2
❌ Failed checks (2 warnings)
Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 76.47% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
Merge Conflict Detection ⚠️ Warning ❌ Merge conflicts detected (6 files):

⚔️ .agents/scripts/issue-sync-helper.sh (content)
⚔️ .agents/scripts/supervisor-helper.sh (content)
⚔️ .agents/scripts/supervisor/issue-sync.sh (content)
⚔️ .agents/subagent-index.toon (content)
⚔️ TODO.md (content)
⚔️ todo/VERIFY.md (content)

These conflicts must be resolved before merging into main.
Resolve conflicts locally and push changes to this branch.
✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title accurately and concisely summarizes the primary change: introducing model contest mode that dispatches tasks to top-3 models in parallel and cross-ranks results, which aligns with the core feature added across contest-helper.sh and supervisor-helper.sh.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch feature/t1011
⚔️ Resolve merge conflicts (beta)
  • Auto-commit resolved conflicts to branch feature/t1011
  • Create stacked PR with resolved conflicts
  • Post resolved changes as copyable diffs in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@github-actions
Copy link
Copy Markdown
Contributor

🔍 Code Quality Report

�[0;35m[MONITOR]�[0m Code Review Monitoring Report

�[0;34m[INFO]�[0m Latest Quality Status:
SonarCloud: 0 bugs, 0 vulnerabilities, 0 code smells

�[0;34m[INFO]�[0m Recent monitoring activity:
Thu Feb 12 21:53:42 UTC 2026: Code review monitoring started
Thu Feb 12 21:53:42 UTC 2026: SonarCloud - Bugs: 0, Vulnerabilities: 0, Code Smells: 0

📈 Current Quality Metrics

  • BUGS: 0
  • CODE SMELLS: 0
  • VULNERABILITIES: 0

Generated on: Thu Feb 12 21:53:45 UTC 2026


Generated by AI DevOps Framework Code Review Monitoring

@github-actions
Copy link
Copy Markdown
Contributor

🔍 Code Quality Report

�[0;35m[MONITOR]�[0m Code Review Monitoring Report

�[0;34m[INFO]�[0m Latest Quality Status:
SonarCloud: 0 bugs, 0 vulnerabilities, 0 code smells

�[0;34m[INFO]�[0m Recent monitoring activity:
Thu Feb 12 22:18:15 UTC 2026: Code review monitoring started
Thu Feb 12 22:18:15 UTC 2026: SonarCloud - Bugs: 0, Vulnerabilities: 0, Code Smells: 0

📈 Current Quality Metrics

  • BUGS: 0
  • CODE SMELLS: 0
  • VULNERABILITIES: 0

Generated on: Thu Feb 12 22:18:17 UTC 2026


Generated by AI DevOps Framework Code Review Monitoring

@sonarqubecloud
Copy link
Copy Markdown

@marcusquinn marcusquinn marked this pull request as ready for review February 12, 2026 22:19
@gemini-code-assist
Copy link
Copy Markdown

Warning

You have reached your daily quota limit. Please wait up to 24 hours and I will start processing your requests again!

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 8

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
.agents/scripts/supervisor-helper.sh (1)

13328-13347: ⚠️ Potential issue | 🟠 Major

Avoid embedding GH_TOKEN in crontab (plaintext secret exposure).
Line 13333–13345 injects GH_TOKEN directly into the cron entry, which is readable via crontab -l and often visible to system tooling. The script already resolves GH_TOKEN at runtime (cache/keyring/credentials), so you can keep the cron entry free of secrets and rely on the existing token resolution logic.

🔒 Proposed fix (keep PATH, drop GH_TOKEN from cron entry)
-	# Detect GH_TOKEN from gh CLI if available (t1006)
-	local gh_token=""
-	if command -v gh &>/dev/null; then
-		gh_token=$(gh auth token 2>/dev/null || true)
-	fi
-
 	# Build cron command with environment variables
 	local env_vars=""
 	if [[ -n "$user_path" ]]; then
 		env_vars="PATH=${user_path}"
 	fi
-	if [[ -n "$gh_token" ]]; then
-		env_vars="${env_vars:+${env_vars} }GH_TOKEN=${gh_token}"
-	fi
🤖 Fix all issues with AI agents
In @.agents/scripts/contest-helper.sh:
- Around line 57-61: The db() helper currently hides all sqlite3 stderr by
appending "2>/dev/null" which hides real errors; change db() to stop
unconditionally discarding stderr — remove the global "2>/dev/null" and instead
add an optional suppression parameter or respect an environment flag (e.g.,
DB_SILENT) so callers can opt into silencing; ensure db() invokes sqlite3 with
the same arguments but leaves stderr unredirected by default, and update call
sites that intentionally probe (e.g., table-existence checks) to pass the
suppression flag or explicitly redirect stderr to /dev/null there or to a log
file.
- Around line 964-968: The loop clamping scores currently uses eval for indirect
assignment (loop over int_correct int_complete int_quality int_clarity), replace
eval with bash's printf -v to set the variable by name without eval: after
computing local val="${!var}", use printf -v "$var" '%s' "$val" (or the clamped
numeric value) so the four variables are safely updated without eval; update the
block that iterates over int_correct/int_complete/int_quality/int_clarity to use
printf -v for assignment.
- Around line 186-202: The integer-comparison fails when sed yields empty
strings for total_samples or success_rate; after extracting from pattern_json
(variables total_samples and success_rate), ensure they default to 0 before the
[[ comparisons — e.g. immediately after the sed assignments set
total_samples=${total_samples:-0} and success_rate=${success_rate:-0} (or use
parameter expansion when comparing) so [[ "$total_samples" -lt 3 ]] and [[
"$success_rate" -lt 75 ]] always receive numeric values.
- Around line 341-361: The loop that splits $models uses "local IFS=','" and
iterates an unquoted $models, then calls unset IFS — replace that fragile IFS
manipulation by reading $models into an array with read -ra (e.g. read -ra
model_arr <<< "$models") and iterate over "${model_arr[@]}"; keep the existing
logic that increments model_index and constructs entry_id/entry_task_id and the
db/sql_escape/log_info calls (references: models, model_index, entry_id,
entry_task_id, db, sql_escape, log_info), and remove the local IFS/unset IFS
handling.
- Around line 1139-1181: The cmd_pulse_check logic currently only selects
contests that already have zero non-terminal entries, so _sync_entry_statuses
never runs for contests with stale dispatched/running subtasks; change the flow
to first enumerate running contests, call _sync_entry_statuses for each
contest_id, then re-query that contest's entries to see if pending count is zero
and proceed to cmd_evaluate/cmd_apply; specifically update cmd_pulse_check to
fetch running contest ids (no subquery filtering), call _sync_entry_statuses
"$contest_id" immediately for each, then run the existing pending-count query
and evaluation steps for that same contest_id.
- Around line 724-731: The script currently passes the large variable
ranking_prompt directly to opencode via --prompt which can hit ARG_MAX and the
trailing "|| true" hides E2BIG failures; modify the block that invokes opencode
(the timeout/opencode run call) to write ranking_prompt to a temp file (e.g.,
use the existing score_tmpfile pattern or a new prompt_tmpfile) and pass that
file to opencode using the CLI's file-based input option (or feed via stdin)
instead of --prompt "$ranking_prompt"; also remove the unconditional "|| true"
and handle non-zero exit by logging the error and preserving any opencode stderr
for debugging so you don't silently drop E2BIG errors.
- Around line 643-651: The script currently hardcodes "main" when building the
diffs (variables summary and full_diff using git -C "$ewt" diff "main..HEAD")
and swallows git errors, which hides repos whose default is "master" or another
branch; change it to detect the repo's default branch first (e.g., run git -C
"$ewt" to get origin/HEAD via symbolic-ref or rev-parse and strip the "origin/"
prefix) into a variable like base_branch, fall back to "main" only if detection
fails, then use "$base_branch..HEAD" for both git diff --stat and git diff, and
stop redirecting stderr to /dev/null so failures surface (or at least preserve
error output for logging) instead of silently returning "No diff available";
update references to ewt, summary, and full_diff accordingly.

In @.agents/scripts/supervisor-helper.sh:
- Around line 5859-5887: Phase 1 is prematurely evaluating contest tasks because
they remain status 'running' and lack PID files; fix by either (A) changing the
Phase 1 selection query to exclude tasks where error LIKE 'contest:%' (i.e. add
AND error NOT LIKE 'contest:%' to the tasks query used by Phase 1), or (B) when
delegating to contest-helper.sh in the contest branch (the block that calls
contest-helper.sh, sets contest_id and calls db "...UPDATE tasks SET error =
'contest:${contest_id}'..." and then returns), update the task row to a distinct
status such as 'contest_running' instead of leaving it 'running' (modify the db
UPDATE in that block to set status='contest_running' and
error='contest:${contest_id}'), and ensure Phase 1 excludes 'contest_running'
from its status IN (...) checks; implement one of these approaches so Phase 1
skips in-flight contest tasks.
🧹 Nitpick comments (2)
.agents/scripts/contest-helper.sh (2)

1036-1042: Storing contest-winner metadata in the error column is semantically misleading.

Downstream tooling or dashboards that treat non-null error as a failure indicator will misreport winning tasks as errored. Consider using the metadata or description field on the task instead, or adding a dedicated column.


733-743: Fragile regex-based JSON parsing — consider jq with a fallback.

The grep -oE '\{[^}]*"label"[^}]*\}' breaks on multi-line or nested JSON output from the judge models. Since this is the core scoring path, a mis-parse silently produces zero scores and a meaningless winner.

jq is widely available in DevOps environments and would handle edge cases robustly:

Proposed approach
+		if command -v jq &>/dev/null; then
+			json_scores=$(echo "$score_output" | jq -c 'select(.label)' 2>/dev/null || true)
+		else
 			json_scores=$(echo "$score_output" | grep -oE '\{[^}]*"label"[^}]*\}' || true)
+		fi

Comment on lines +57 to +61
db() {
local db_path="$1"
shift
sqlite3 -batch "$db_path" "$@" 2>/dev/null
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

db() unconditionally suppresses all SQLite errors — hampers debugging and masks real failures.

Redirecting stderr to /dev/null silences corruption, lock contention, and schema-mismatch errors. For a system that manages contest lifecycle state, silent DB failures can leave contests in inconsistent states with zero diagnostic trail.

Consider logging stderr to a file or at least allowing callers to opt in to suppression rather than baking it in globally.

Proposed fix
 db() {
 	local db_path="$1"
 	shift
-	sqlite3 -batch "$db_path" "$@" 2>/dev/null
+	sqlite3 -batch "$db_path" "$@"
 }

Then suppress selectively at call sites where it's intentional (e.g., probing for table existence).

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
db() {
local db_path="$1"
shift
sqlite3 -batch "$db_path" "$@" 2>/dev/null
}
db() {
local db_path="$1"
shift
sqlite3 -batch "$db_path" "$@"
}
🤖 Prompt for AI Agents
In @.agents/scripts/contest-helper.sh around lines 57 - 61, The db() helper
currently hides all sqlite3 stderr by appending "2>/dev/null" which hides real
errors; change db() to stop unconditionally discarding stderr — remove the
global "2>/dev/null" and instead add an optional suppression parameter or
respect an environment flag (e.g., DB_SILENT) so callers can opt into silencing;
ensure db() invokes sqlite3 with the same arguments but leaves stderr
unredirected by default, and update call sites that intentionally probe (e.g.,
table-existence checks) to pass the suppression flag or explicitly redirect
stderr to /dev/null there or to a log file.

Comment on lines +186 to +202
# Check if any tier has strong enough signal (>75% success, 3+ samples)
local total_samples
total_samples=$(echo "$pattern_json" | sed -n 's/.*"total_samples"[[:space:]]*:[[:space:]]*\([0-9]*\).*/\1/p' 2>/dev/null || echo "0")
local success_rate
success_rate=$(echo "$pattern_json" | sed -n 's/.*"success_rate"[[:space:]]*:[[:space:]]*\([0-9]*\).*/\1/p' 2>/dev/null || echo "0")

if [[ "$total_samples" -lt 3 ]]; then
log_info "Insufficient pattern data ($total_samples samples) for $task_id — contest mode triggered"
echo "insufficient_data"
return 0
fi

if [[ "$success_rate" -lt 75 ]]; then
log_info "Low success rate (${success_rate}%) for $task_id — contest mode triggered"
echo "low_success_rate"
return 0
fi
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

Empty sed output causes integer comparison failure under set -e.

If pattern_json doesn't contain "total_samples" or "success_rate", sed -n exits 0 with empty output — the || echo "0" fallback never fires. Then [[ "" -lt 3 ]] produces integer expression expected and crashes the function under set -e.

Proposed fix — apply default at use site
-	if [[ "$total_samples" -lt 3 ]]; then
+	if [[ "${total_samples:-0}" -lt 3 ]]; then
 		log_info "Insufficient pattern data ($total_samples samples) for $task_id — contest mode triggered"
 		echo "insufficient_data"
 		return 0
 	fi
 
-	if [[ "$success_rate" -lt 75 ]]; then
+	if [[ "${success_rate:-0}" -lt 75 ]]; then
 		log_info "Low success rate (${success_rate}%) for $task_id — contest mode triggered"
 		echo "low_success_rate"
 		return 0
 	fi
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
# Check if any tier has strong enough signal (>75% success, 3+ samples)
local total_samples
total_samples=$(echo "$pattern_json" | sed -n 's/.*"total_samples"[[:space:]]*:[[:space:]]*\([0-9]*\).*/\1/p' 2>/dev/null || echo "0")
local success_rate
success_rate=$(echo "$pattern_json" | sed -n 's/.*"success_rate"[[:space:]]*:[[:space:]]*\([0-9]*\).*/\1/p' 2>/dev/null || echo "0")
if [[ "$total_samples" -lt 3 ]]; then
log_info "Insufficient pattern data ($total_samples samples) for $task_id — contest mode triggered"
echo "insufficient_data"
return 0
fi
if [[ "$success_rate" -lt 75 ]]; then
log_info "Low success rate (${success_rate}%) for $task_id — contest mode triggered"
echo "low_success_rate"
return 0
fi
# Check if any tier has strong enough signal (>75% success, 3+ samples)
local total_samples
total_samples=$(echo "$pattern_json" | sed -n 's/.*"total_samples"[[:space:]]*:[[:space:]]*\([0-9]*\).*/\1/p' 2>/dev/null || echo "0")
local success_rate
success_rate=$(echo "$pattern_json" | sed -n 's/.*"success_rate"[[:space:]]*:[[:space:]]*\([0-9]*\).*/\1/p' 2>/dev/null || echo "0")
if [[ "${total_samples:-0}" -lt 3 ]]; then
log_info "Insufficient pattern data ($total_samples samples) for $task_id — contest mode triggered"
echo "insufficient_data"
return 0
fi
if [[ "${success_rate:-0}" -lt 75 ]]; then
log_info "Low success rate (${success_rate}%) for $task_id — contest mode triggered"
echo "low_success_rate"
return 0
fi
🤖 Prompt for AI Agents
In @.agents/scripts/contest-helper.sh around lines 186 - 202, The
integer-comparison fails when sed yields empty strings for total_samples or
success_rate; after extracting from pattern_json (variables total_samples and
success_rate), ensure they default to 0 before the [[ comparisons — e.g.
immediately after the sed assignments set total_samples=${total_samples:-0} and
success_rate=${success_rate:-0} (or use parameter expansion when comparing) so
[[ "$total_samples" -lt 3 ]] and [[ "$success_rate" -lt 75 ]] always receive
numeric values.

Comment on lines +341 to +361
local model_index=0
local IFS=','
for model in $models; do
model_index=$((model_index + 1))
local entry_id="${contest_id}-entry-${model_index}"
local entry_task_id="${task_id}-contest-${model_index}"

db "$SUPERVISOR_DB" "
INSERT INTO contest_entries (id, contest_id, model, task_id, status)
VALUES (
'$(sql_escape "$entry_id")',
'$(sql_escape "$contest_id")',
'$(sql_escape "$model")',
'$(sql_escape "$entry_task_id")',
'pending'
);
"

log_info "Created entry $entry_id for model $model (task: $entry_task_id)"
done
unset IFS
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion | 🟠 Major

IFS manipulation is fragile and flagged by static analysis (Codacy).

Setting local IFS=',' then iterating an unquoted $models works, but unset IFS after the loop removes the local, which is subtly different from restoring the default. Static analysis rightfully flags this pattern.

Use read -ra into an array to avoid IFS gymnastics entirely:

Proposed fix
-	local model_index=0
-	local IFS=','
-	for model in $models; do
+	local model_index=0
+	local -a model_array
+	IFS=',' read -ra model_array <<< "$models"
+	for model in "${model_array[@]}"; do
 		model_index=$((model_index + 1))
 		local entry_id="${contest_id}-entry-${model_index}"
 		local entry_task_id="${task_id}-contest-${model_index}"
@@ ...
 		log_info "Created entry $entry_id for model $model (task: $entry_task_id)"
 	done
-	unset IFS
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
local model_index=0
local IFS=','
for model in $models; do
model_index=$((model_index + 1))
local entry_id="${contest_id}-entry-${model_index}"
local entry_task_id="${task_id}-contest-${model_index}"
db "$SUPERVISOR_DB" "
INSERT INTO contest_entries (id, contest_id, model, task_id, status)
VALUES (
'$(sql_escape "$entry_id")',
'$(sql_escape "$contest_id")',
'$(sql_escape "$model")',
'$(sql_escape "$entry_task_id")',
'pending'
);
"
log_info "Created entry $entry_id for model $model (task: $entry_task_id)"
done
unset IFS
local model_index=0
local -a model_array
IFS=',' read -ra model_array <<< "$models"
for model in "${model_array[@]}"; do
model_index=$((model_index + 1))
local entry_id="${contest_id}-entry-${model_index}"
local entry_task_id="${task_id}-contest-${model_index}"
db "$SUPERVISOR_DB" "
INSERT INTO contest_entries (id, contest_id, model, task_id, status)
VALUES (
'$(sql_escape "$entry_id")',
'$(sql_escape "$contest_id")',
'$(sql_escape "$model")',
'$(sql_escape "$entry_task_id")',
'pending'
);
"
log_info "Created entry $entry_id for model $model (task: $entry_task_id)"
done
🧰 Tools
🪛 GitHub Check: Codacy Static Code Analysis

[warning] 342-342: .agents/scripts/contest-helper.sh#L342
The special variable IFS affects how splitting takes place when expanding unquoted variables.

🤖 Prompt for AI Agents
In @.agents/scripts/contest-helper.sh around lines 341 - 361, The loop that
splits $models uses "local IFS=','" and iterates an unquoted $models, then calls
unset IFS — replace that fragile IFS manipulation by reading $models into an
array with read -ra (e.g. read -ra model_arr <<< "$models") and iterate over
"${model_arr[@]}"; keep the existing logic that increments model_index and
constructs entry_id/entry_task_id and the db/sql_escape/log_info calls
(references: models, model_index, entry_id, entry_task_id, db, sql_escape,
log_info), and remove the local IFS/unset IFS handling.

Comment on lines +643 to +651
if [[ -n "$ewt" && -d "$ewt" ]]; then
# Get the diff as the "output"
summary=$(git -C "$ewt" diff --stat "main..HEAD" 2>/dev/null || echo "No diff available")
local full_diff
full_diff=$(git -C "$ewt" diff "main..HEAD" 2>/dev/null | head -500 || echo "")
summary="${summary}

--- Code Changes ---
${full_diff}"
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Hardcoded main as diff base — silently produces empty output for repos using master or other default branches.

The 2>/dev/null || echo "No diff available" fallback masks the real problem. The contest evaluation then scores entries with no meaningful data, producing arbitrary results.

Proposed fix — detect default branch
+		local base_branch
+		base_branch=$(git -C "$ewt" symbolic-ref refs/remotes/origin/HEAD 2>/dev/null | sed 's|refs/remotes/origin/||' || echo "main")
 		if [[ -n "$ewt" && -d "$ewt" ]]; then
-			summary=$(git -C "$ewt" diff --stat "main..HEAD" 2>/dev/null || echo "No diff available")
+			summary=$(git -C "$ewt" diff --stat "${base_branch}..HEAD" 2>/dev/null || echo "No diff available")
 			local full_diff
-			full_diff=$(git -C "$ewt" diff "main..HEAD" 2>/dev/null | head -500 || echo "")
+			full_diff=$(git -C "$ewt" diff "${base_branch}..HEAD" 2>/dev/null | head -500 || echo "")
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
if [[ -n "$ewt" && -d "$ewt" ]]; then
# Get the diff as the "output"
summary=$(git -C "$ewt" diff --stat "main..HEAD" 2>/dev/null || echo "No diff available")
local full_diff
full_diff=$(git -C "$ewt" diff "main..HEAD" 2>/dev/null | head -500 || echo "")
summary="${summary}
--- Code Changes ---
${full_diff}"
local base_branch
base_branch=$(git -C "$ewt" symbolic-ref refs/remotes/origin/HEAD 2>/dev/null | sed 's|refs/remotes/origin/||' || echo "main")
if [[ -n "$ewt" && -d "$ewt" ]]; then
# Get the diff as the "output"
summary=$(git -C "$ewt" diff --stat "${base_branch}..HEAD" 2>/dev/null || echo "No diff available")
local full_diff
full_diff=$(git -C "$ewt" diff "${base_branch}..HEAD" 2>/dev/null | head -500 || echo "")
summary="${summary}
--- Code Changes ---
${full_diff}"
🤖 Prompt for AI Agents
In @.agents/scripts/contest-helper.sh around lines 643 - 651, The script
currently hardcodes "main" when building the diffs (variables summary and
full_diff using git -C "$ewt" diff "main..HEAD") and swallows git errors, which
hides repos whose default is "master" or another branch; change it to detect the
repo's default branch first (e.g., run git -C "$ewt" to get origin/HEAD via
symbolic-ref or rev-parse and strip the "origin/" prefix) into a variable like
base_branch, fall back to "main" only if detection fails, then use
"$base_branch..HEAD" for both git diff --stat and git diff, and stop redirecting
stderr to /dev/null so failures surface (or at least preserve error output for
logging) instead of silently returning "No diff available"; update references to
ewt, summary, and full_diff accordingly.

Comment on lines +724 to +731
# Use opencode for scoring if available, otherwise fall back to direct API
if command -v opencode &>/dev/null; then
timeout 120 opencode run --format json \
--model "$judge_model" \
--prompt "$ranking_prompt" \
>"$score_tmpfile" 2>/dev/null || true
score_output=$(cat "$score_tmpfile" 2>/dev/null || echo "")
fi
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Large prompt passed as CLI argument risks hitting ARG_MAX — silent failure under || true.

ranking_prompt contains full diffs (up to 500 lines per entry × 3 entries). Passing this as --prompt "$ranking_prompt" can exceed ARG_MAX or per-argument limits on some systems. The || true suppresses the resulting E2BIG error, and the contest silently produces zero scores with only a vague "no parseable scores" warning.

Feed the prompt via a temp file or stdin instead:

Proposed fix
+		local prompt_file
+		prompt_file=$(mktemp "${TMPDIR:-/tmp}/contest-prompt-XXXXXX")
+		printf '%s' "$ranking_prompt" > "$prompt_file"
+
 		if command -v opencode &>/dev/null; then
 			timeout 120 opencode run --format json \
 				--model "$judge_model" \
-				--prompt "$ranking_prompt" \
+				--prompt-file "$prompt_file" \
 				>"$score_tmpfile" 2>/dev/null || true
 			score_output=$(cat "$score_tmpfile" 2>/dev/null || echo "")
 		fi
+
+		rm -f "$prompt_file"

(Adjust --prompt-file to whatever flag opencode supports for file-based input.)

🤖 Prompt for AI Agents
In @.agents/scripts/contest-helper.sh around lines 724 - 731, The script
currently passes the large variable ranking_prompt directly to opencode via
--prompt which can hit ARG_MAX and the trailing "|| true" hides E2BIG failures;
modify the block that invokes opencode (the timeout/opencode run call) to write
ranking_prompt to a temp file (e.g., use the existing score_tmpfile pattern or a
new prompt_tmpfile) and pass that file to opencode using the CLI's file-based
input option (or feed via stdin) instead of --prompt "$ranking_prompt"; also
remove the unconditional "|| true" and handle non-zero exit by logging the error
and preserving any opencode stderr for debugging so you don't silently drop
E2BIG errors.

Comment on lines +964 to +968
for var in int_correct int_complete int_quality int_clarity; do
local val="${!var}"
[[ "$val" -lt 1 ]] && eval "$var=1"
[[ "$val" -gt 5 ]] && eval "$var=5"
done
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion | 🟠 Major

Replace eval with printf -v for indirect variable assignment.

While the loop variable names are hardcoded and safe here, eval is an anti-pattern that erodes the "zero technical debt" posture. Bash's printf -v achieves the same thing without eval:

Proposed fix
 		for var in int_correct int_complete int_quality int_clarity; do
 			local val="${!var}"
-			[[ "$val" -lt 1 ]] && eval "$var=1"
-			[[ "$val" -gt 5 ]] && eval "$var=5"
+			[[ "$val" -lt 1 ]] && printf -v "$var" '%s' 1
+			[[ "$val" -gt 5 ]] && printf -v "$var" '%s' 5
 		done
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
for var in int_correct int_complete int_quality int_clarity; do
local val="${!var}"
[[ "$val" -lt 1 ]] && eval "$var=1"
[[ "$val" -gt 5 ]] && eval "$var=5"
done
for var in int_correct int_complete int_quality int_clarity; do
local val="${!var}"
[[ "$val" -lt 1 ]] && printf -v "$var" '%s' 1
[[ "$val" -gt 5 ]] && printf -v "$var" '%s' 5
done
🤖 Prompt for AI Agents
In @.agents/scripts/contest-helper.sh around lines 964 - 968, The loop clamping
scores currently uses eval for indirect assignment (loop over int_correct
int_complete int_quality int_clarity), replace eval with bash's printf -v to set
the variable by name without eval: after computing local val="${!var}", use
printf -v "$var" '%s' "$val" (or the clamped numeric value) so the four
variables are safely updated without eval; update the block that iterates over
int_correct/int_complete/int_quality/int_clarity to use printf -v for
assignment.

Comment on lines +1139 to +1181
cmd_pulse_check() {
ensure_contest_tables || return 1

local evaluated=0

# Find running contests where all entries are done
local running_contests
running_contests=$(db "$SUPERVISOR_DB" "
SELECT c.id FROM contests c
WHERE c.status = 'running'
AND (
SELECT count(*) FROM contest_entries ce
WHERE ce.contest_id = c.id
AND ce.status NOT IN ('complete','failed','cancelled')
) = 0;
")

while IFS= read -r contest_id; do
[[ -z "$contest_id" ]] && continue

# Sync entry statuses from their subtasks
_sync_entry_statuses "$contest_id"

# Re-check after sync
local still_pending
still_pending=$(db "$SUPERVISOR_DB" "
SELECT count(*) FROM contest_entries
WHERE contest_id = '$(sql_escape "$contest_id")'
AND status NOT IN ('complete','failed','cancelled');
")

if [[ "$still_pending" -eq 0 ]]; then
log_info "Contest $contest_id ready for evaluation"
if cmd_evaluate "$contest_id"; then
cmd_apply "$contest_id" || true
evaluated=$((evaluated + 1))
fi
fi
done <<<"$running_contests"

echo "$evaluated"
return 0
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

Logic bug: _sync_entry_statuses is only called for contests that already have all entries in terminal states — so unsynced entries are never updated.

The SQL at Line 1146 selects running contests where zero entries are still in non-terminal states. But _sync_entry_statuses (Line 1160) is the function that transitions entries from dispatched/runningcomplete/failed. Since it's called after the filter, contests with unsynced entries are never selected, and their entries remain stuck.

Fix: sync all running contests first, then query for those ready to evaluate.

Proposed fix
 cmd_pulse_check() {
 	ensure_contest_tables || return 1
 
 	local evaluated=0
 
-	# Find running contests where all entries are done
-	local running_contests
-	running_contests=$(db "$SUPERVISOR_DB" "
-		SELECT c.id FROM contests c
-		WHERE c.status = 'running'
-		AND (
-			SELECT count(*) FROM contest_entries ce
-			WHERE ce.contest_id = c.id
-			AND ce.status NOT IN ('complete','failed','cancelled')
-		) = 0;
-	")
-
-	while IFS= read -r contest_id; do
-		[[ -z "$contest_id" ]] && continue
-
-		# Sync entry statuses from their subtasks
+	# First: sync ALL running contests' entry statuses
+	local all_running
+	all_running=$(db "$SUPERVISOR_DB" "SELECT id FROM contests WHERE status = 'running';")
+	while IFS= read -r contest_id; do
+		[[ -z "$contest_id" ]] && continue
 		_sync_entry_statuses "$contest_id"
+	done <<<"$all_running"
 
-		# Re-check after sync
-		local still_pending
-		still_pending=$(db "$SUPERVISOR_DB" "
+	# Then: find contests where all entries are now done
+	local ready_contests
+	ready_contests=$(db "$SUPERVISOR_DB" "
+		SELECT c.id FROM contests c
+		WHERE c.status = 'running'
+		AND (
 			SELECT count(*) FROM contest_entries
-			WHERE contest_id = '$(sql_escape "$contest_id")'
-			AND status NOT IN ('complete','failed','cancelled');
-		")
+			WHERE contest_id = c.id
+			AND status NOT IN ('complete','failed','cancelled')
+		) = 0;
+	")
 
-		if [[ "$still_pending" -eq 0 ]]; then
-			log_info "Contest $contest_id ready for evaluation"
-			if cmd_evaluate "$contest_id"; then
-				cmd_apply "$contest_id" || true
-				evaluated=$((evaluated + 1))
-			fi
+	while IFS= read -r contest_id; do
+		[[ -z "$contest_id" ]] && continue
+		log_info "Contest $contest_id ready for evaluation"
+		if cmd_evaluate "$contest_id"; then
+			cmd_apply "$contest_id" || true
+			evaluated=$((evaluated + 1))
 		fi
-	done <<<"$running_contests"
+	done <<<"$ready_contests"
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
cmd_pulse_check() {
ensure_contest_tables || return 1
local evaluated=0
# Find running contests where all entries are done
local running_contests
running_contests=$(db "$SUPERVISOR_DB" "
SELECT c.id FROM contests c
WHERE c.status = 'running'
AND (
SELECT count(*) FROM contest_entries ce
WHERE ce.contest_id = c.id
AND ce.status NOT IN ('complete','failed','cancelled')
) = 0;
")
while IFS= read -r contest_id; do
[[ -z "$contest_id" ]] && continue
# Sync entry statuses from their subtasks
_sync_entry_statuses "$contest_id"
# Re-check after sync
local still_pending
still_pending=$(db "$SUPERVISOR_DB" "
SELECT count(*) FROM contest_entries
WHERE contest_id = '$(sql_escape "$contest_id")'
AND status NOT IN ('complete','failed','cancelled');
")
if [[ "$still_pending" -eq 0 ]]; then
log_info "Contest $contest_id ready for evaluation"
if cmd_evaluate "$contest_id"; then
cmd_apply "$contest_id" || true
evaluated=$((evaluated + 1))
fi
fi
done <<<"$running_contests"
echo "$evaluated"
return 0
}
cmd_pulse_check() {
ensure_contest_tables || return 1
local evaluated=0
# First: sync ALL running contests' entry statuses
local all_running
all_running=$(db "$SUPERVISOR_DB" "SELECT id FROM contests WHERE status = 'running';")
while IFS= read -r contest_id; do
[[ -z "$contest_id" ]] && continue
_sync_entry_statuses "$contest_id"
done <<<"$all_running"
# Then: find contests where all entries are now done
local ready_contests
ready_contests=$(db "$SUPERVISOR_DB" "
SELECT c.id FROM contests c
WHERE c.status = 'running'
AND (
SELECT count(*) FROM contest_entries
WHERE contest_id = c.id
AND status NOT IN ('complete','failed','cancelled')
) = 0;
")
while IFS= read -r contest_id; do
[[ -z "$contest_id" ]] && continue
log_info "Contest $contest_id ready for evaluation"
if cmd_evaluate "$contest_id"; then
cmd_apply "$contest_id" || true
evaluated=$((evaluated + 1))
fi
done <<<"$ready_contests"
echo "$evaluated"
return 0
}
🤖 Prompt for AI Agents
In @.agents/scripts/contest-helper.sh around lines 1139 - 1181, The
cmd_pulse_check logic currently only selects contests that already have zero
non-terminal entries, so _sync_entry_statuses never runs for contests with stale
dispatched/running subtasks; change the flow to first enumerate running
contests, call _sync_entry_statuses for each contest_id, then re-query that
contest's entries to see if pending count is zero and proceed to
cmd_evaluate/cmd_apply; specifically update cmd_pulse_check to fetch running
contest ids (no subquery filtering), call _sync_entry_statuses "$contest_id"
immediately for each, then run the existing pending-count query and evaluation
steps for that same contest_id.

Comment on lines +5859 to +5887
# Contest mode intercept (t1011): if model resolves to CONTEST, delegate to
# contest-helper.sh which dispatches the same task to top-3 models in parallel.
# The original task stays in 'running' state while contest entries execute.
if [[ "$resolved_model" == "CONTEST" ]]; then
log_info "Contest mode activated for $task_id — delegating to contest-helper.sh"
local contest_helper="${SCRIPT_DIR}/contest-helper.sh"
if [[ -x "$contest_helper" ]]; then
local contest_id
contest_id=$("$contest_helper" create "$task_id" ${batch_id:+--batch "$batch_id"} 2>/dev/null)
if [[ -n "$contest_id" ]]; then
"$contest_helper" dispatch "$contest_id" 2>/dev/null || {
log_error "Contest dispatch failed for $task_id"
cmd_transition "$task_id" "failed" --error "Contest dispatch failed"
return 1
}
# Keep original task in running state — pulse Phase 2.5 will check contest completion
db "$SUPERVISOR_DB" "UPDATE tasks SET error = 'contest:${contest_id}' WHERE id = '$(sql_escape "$task_id")';"
log_success "Contest $contest_id dispatched for $task_id"
echo "contest:${contest_id}"
return 0
else
log_error "Failed to create contest for $task_id — falling back to default model"
resolved_model=$(resolve_model "coding" "$ai_cli")
fi
else
log_warn "contest-helper.sh not found — falling back to default model"
resolved_model=$(resolve_model "coding" "$ai_cli")
fi
fi
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
# First, verify the provided rg commands to understand contest logic
rg -n "contest" .agents/scripts/contest-helper.sh .agents/scripts/supervisor-helper.sh | head -50

Repository: marcusquinn/aidevops

Length of output: 5402


🏁 Script executed:

#!/bin/bash
# Search for task status updates in contest-helper
rg -n "UPDATE tasks SET .*status" .agents/scripts/contest-helper.sh

Repository: marcusquinn/aidevops

Length of output: 46


🏁 Script executed:

#!/bin/bash
# Search for contest error marker patterns
rg -n "contest:" .agents/scripts/supervisor-helper.sh

Repository: marcusquinn/aidevops

Length of output: 220


🏁 Script executed:

#!/bin/bash
# Look for Phase 1 evaluation logic and PID file handling
rg -n "Phase 1|phase.*1|PID.*file|pidfile|pid_file" .agents/scripts/supervisor-helper.sh | head -40

Repository: marcusquinn/aidevops

Length of output: 2131


🏁 Script executed:

#!/bin/bash
# Search for guards that skip tasks during evaluation
rg -n "error.*contest|contest.*error|skip.*task|WHERE.*error" .agents/scripts/supervisor-helper.sh | head -40

Repository: marcusquinn/aidevops

Length of output: 2121


🏁 Script executed:

#!/bin/bash
# Find where Phase 1 selects tasks to evaluate
sed -n '10189,10250p' .agents/scripts/supervisor-helper.sh

Repository: marcusquinn/aidevops

Length of output: 2109


🏁 Script executed:

#!/bin/bash
# Search for all task status queries in Phase 1 region
rg -n "SELECT.*status|status.*running" .agents/scripts/supervisor-helper.sh | grep -E "10[0-9]{3}:|Phase.*1"

Repository: marcusquinn/aidevops

Length of output: 2304


🏁 Script executed:

#!/bin/bash
# Look for task selection logic and WHERE clauses in Phase 1
sed -n '10180,10220p' .agents/scripts/supervisor-helper.sh

Repository: marcusquinn/aidevops

Length of output: 1134


🏁 Script executed:

#!/bin/bash
# Search for any skip/guard logic referencing error field or contest
rg -n "error.*!=|WHERE.*error|contest:" .agents/scripts/supervisor-helper.sh | grep -A2 -B2 "10[0-9]{3}:"

Repository: marcusquinn/aidevops

Length of output: 46


Contest tasks are evaluated as failed by Phase 1 due to missing PID file guard.

Phase 1 queries all tasks with status IN ('running', 'dispatched', 'evaluating') without exception. Contest tasks remain in 'running' status but don't create PID files—they're delegated to contest-helper.sh. When Phase 1 finds no PID file, it treats the task as complete and evaluates it, which can mark it failed while the contest is still running.

Add a guard in Phase 1 to skip tasks where error LIKE 'contest:%', or transition contest tasks to a status excluded from Phase 1 evaluation.

🤖 Prompt for AI Agents
In @.agents/scripts/supervisor-helper.sh around lines 5859 - 5887, Phase 1 is
prematurely evaluating contest tasks because they remain status 'running' and
lack PID files; fix by either (A) changing the Phase 1 selection query to
exclude tasks where error LIKE 'contest:%' (i.e. add AND error NOT LIKE
'contest:%' to the tasks query used by Phase 1), or (B) when delegating to
contest-helper.sh in the contest branch (the block that calls contest-helper.sh,
sets contest_id and calls db "...UPDATE tasks SET error =
'contest:${contest_id}'..." and then returns), update the task row to a distinct
status such as 'contest_running' instead of leaving it 'running' (modify the db
UPDATE in that block to set status='contest_running' and
error='contest:${contest_id}'), and ensure Phase 1 excludes 'contest_running'
from its status IN (...) checks; implement one of these approaches so Phase 1
skips in-flight contest tasks.

@marcusquinn
Copy link
Copy Markdown
Owner Author

Closing: Codacy changes requested + merge conflicts. Task t1011 will be re-dispatched to implement fresh against current main.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant