t1011: Model contest mode — run top-3 models in parallel, cross-rank results#1304
t1011: Model contest mode — run top-3 models in parallel, cross-rank results#1304marcusquinn wants to merge 4 commits intomainfrom
Conversation
|
Warning You have reached your daily quota limit. Please wait up to 24 hours and I will start processing your requests again! |
WalkthroughIntroduces a model contest mode system that dispatches uncertain tasks to top-3 models in parallel, collects anonymized outputs, cross-ranks results through weighted scoring criteria, and promotes the winner's output to the original task. Includes SQLite-backed contest lifecycle management, new supervisor command delegation, and comprehensive test coverage. Changes
Sequence DiagramsequenceDiagram
participant Client
participant Supervisor
participant ContestHelper
participant ModelRegistry
participant Database as SQLite<br/>DB
participant Models
participant Judges
Client->>Supervisor: dispatch task with model="contest"
Supervisor->>Supervisor: resolve_task_model() → "CONTEST"
Supervisor->>ContestHelper: create contest
ContestHelper->>Database: INSERT contests, contest_entries
ContestHelper->>ModelRegistry: select_top_models(3)
ModelRegistry-->>ContestHelper: model1, model2, model3
Supervisor->>ContestHelper: dispatch contest
ContestHelper->>Models: dispatch subtask to each model (parallel)
Models->>Models: process task
Models->>Database: store results
Supervisor->>ContestHelper: evaluate contest (when running)
ContestHelper->>Database: fetch anonymized outputs (A/B/C)
ContestHelper->>Judges: request cross-ranking scores
Judges->>Judges: evaluate alternatives
Judges-->>ContestHelper: scores per entry
ContestHelper->>Database: aggregate weighted scores, determine winner
ContestHelper->>Supervisor: apply winner (promote output)
Supervisor->>Database: update original task with winner output
Supervisor->>Models: cancel loser subtasks
Database-->>Supervisor: contest marked complete
Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes Possibly related issues
Possibly related PRs
Poem
🚥 Pre-merge checks | ✅ 2 | ❌ 2❌ Failed checks (2 warnings)
✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing touches
🧪 Generate unit tests (beta)
⚔️ Resolve merge conflicts (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
🔍 Code Quality Report�[0;35m[MONITOR]�[0m Code Review Monitoring Report �[0;34m[INFO]�[0m Latest Quality Status: �[0;34m[INFO]�[0m Recent monitoring activity: 📈 Current Quality Metrics
Generated on: Thu Feb 12 21:53:45 UTC 2026 Generated by AI DevOps Framework Code Review Monitoring |
🔍 Code Quality Report�[0;35m[MONITOR]�[0m Code Review Monitoring Report �[0;34m[INFO]�[0m Latest Quality Status: �[0;34m[INFO]�[0m Recent monitoring activity: 📈 Current Quality Metrics
Generated on: Thu Feb 12 22:18:17 UTC 2026 Generated by AI DevOps Framework Code Review Monitoring |
|
|
Warning You have reached your daily quota limit. Please wait up to 24 hours and I will start processing your requests again! |
There was a problem hiding this comment.
Actionable comments posted: 8
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
.agents/scripts/supervisor-helper.sh (1)
13328-13347:⚠️ Potential issue | 🟠 MajorAvoid embedding GH_TOKEN in crontab (plaintext secret exposure).
Line 13333–13345 injects GH_TOKEN directly into the cron entry, which is readable viacrontab -land often visible to system tooling. The script already resolves GH_TOKEN at runtime (cache/keyring/credentials), so you can keep the cron entry free of secrets and rely on the existing token resolution logic.🔒 Proposed fix (keep PATH, drop GH_TOKEN from cron entry)
- # Detect GH_TOKEN from gh CLI if available (t1006) - local gh_token="" - if command -v gh &>/dev/null; then - gh_token=$(gh auth token 2>/dev/null || true) - fi - # Build cron command with environment variables local env_vars="" if [[ -n "$user_path" ]]; then env_vars="PATH=${user_path}" fi - if [[ -n "$gh_token" ]]; then - env_vars="${env_vars:+${env_vars} }GH_TOKEN=${gh_token}" - fi
🤖 Fix all issues with AI agents
In @.agents/scripts/contest-helper.sh:
- Around line 57-61: The db() helper currently hides all sqlite3 stderr by
appending "2>/dev/null" which hides real errors; change db() to stop
unconditionally discarding stderr — remove the global "2>/dev/null" and instead
add an optional suppression parameter or respect an environment flag (e.g.,
DB_SILENT) so callers can opt into silencing; ensure db() invokes sqlite3 with
the same arguments but leaves stderr unredirected by default, and update call
sites that intentionally probe (e.g., table-existence checks) to pass the
suppression flag or explicitly redirect stderr to /dev/null there or to a log
file.
- Around line 964-968: The loop clamping scores currently uses eval for indirect
assignment (loop over int_correct int_complete int_quality int_clarity), replace
eval with bash's printf -v to set the variable by name without eval: after
computing local val="${!var}", use printf -v "$var" '%s' "$val" (or the clamped
numeric value) so the four variables are safely updated without eval; update the
block that iterates over int_correct/int_complete/int_quality/int_clarity to use
printf -v for assignment.
- Around line 186-202: The integer-comparison fails when sed yields empty
strings for total_samples or success_rate; after extracting from pattern_json
(variables total_samples and success_rate), ensure they default to 0 before the
[[ comparisons — e.g. immediately after the sed assignments set
total_samples=${total_samples:-0} and success_rate=${success_rate:-0} (or use
parameter expansion when comparing) so [[ "$total_samples" -lt 3 ]] and [[
"$success_rate" -lt 75 ]] always receive numeric values.
- Around line 341-361: The loop that splits $models uses "local IFS=','" and
iterates an unquoted $models, then calls unset IFS — replace that fragile IFS
manipulation by reading $models into an array with read -ra (e.g. read -ra
model_arr <<< "$models") and iterate over "${model_arr[@]}"; keep the existing
logic that increments model_index and constructs entry_id/entry_task_id and the
db/sql_escape/log_info calls (references: models, model_index, entry_id,
entry_task_id, db, sql_escape, log_info), and remove the local IFS/unset IFS
handling.
- Around line 1139-1181: The cmd_pulse_check logic currently only selects
contests that already have zero non-terminal entries, so _sync_entry_statuses
never runs for contests with stale dispatched/running subtasks; change the flow
to first enumerate running contests, call _sync_entry_statuses for each
contest_id, then re-query that contest's entries to see if pending count is zero
and proceed to cmd_evaluate/cmd_apply; specifically update cmd_pulse_check to
fetch running contest ids (no subquery filtering), call _sync_entry_statuses
"$contest_id" immediately for each, then run the existing pending-count query
and evaluation steps for that same contest_id.
- Around line 724-731: The script currently passes the large variable
ranking_prompt directly to opencode via --prompt which can hit ARG_MAX and the
trailing "|| true" hides E2BIG failures; modify the block that invokes opencode
(the timeout/opencode run call) to write ranking_prompt to a temp file (e.g.,
use the existing score_tmpfile pattern or a new prompt_tmpfile) and pass that
file to opencode using the CLI's file-based input option (or feed via stdin)
instead of --prompt "$ranking_prompt"; also remove the unconditional "|| true"
and handle non-zero exit by logging the error and preserving any opencode stderr
for debugging so you don't silently drop E2BIG errors.
- Around line 643-651: The script currently hardcodes "main" when building the
diffs (variables summary and full_diff using git -C "$ewt" diff "main..HEAD")
and swallows git errors, which hides repos whose default is "master" or another
branch; change it to detect the repo's default branch first (e.g., run git -C
"$ewt" to get origin/HEAD via symbolic-ref or rev-parse and strip the "origin/"
prefix) into a variable like base_branch, fall back to "main" only if detection
fails, then use "$base_branch..HEAD" for both git diff --stat and git diff, and
stop redirecting stderr to /dev/null so failures surface (or at least preserve
error output for logging) instead of silently returning "No diff available";
update references to ewt, summary, and full_diff accordingly.
In @.agents/scripts/supervisor-helper.sh:
- Around line 5859-5887: Phase 1 is prematurely evaluating contest tasks because
they remain status 'running' and lack PID files; fix by either (A) changing the
Phase 1 selection query to exclude tasks where error LIKE 'contest:%' (i.e. add
AND error NOT LIKE 'contest:%' to the tasks query used by Phase 1), or (B) when
delegating to contest-helper.sh in the contest branch (the block that calls
contest-helper.sh, sets contest_id and calls db "...UPDATE tasks SET error =
'contest:${contest_id}'..." and then returns), update the task row to a distinct
status such as 'contest_running' instead of leaving it 'running' (modify the db
UPDATE in that block to set status='contest_running' and
error='contest:${contest_id}'), and ensure Phase 1 excludes 'contest_running'
from its status IN (...) checks; implement one of these approaches so Phase 1
skips in-flight contest tasks.
🧹 Nitpick comments (2)
.agents/scripts/contest-helper.sh (2)
1036-1042: Storing contest-winner metadata in theerrorcolumn is semantically misleading.Downstream tooling or dashboards that treat non-null
erroras a failure indicator will misreport winning tasks as errored. Consider using themetadataordescriptionfield on the task instead, or adding a dedicated column.
733-743: Fragile regex-based JSON parsing — considerjqwith a fallback.The
grep -oE '\{[^}]*"label"[^}]*\}'breaks on multi-line or nested JSON output from the judge models. Since this is the core scoring path, a mis-parse silently produces zero scores and a meaningless winner.
jqis widely available in DevOps environments and would handle edge cases robustly:Proposed approach
+ if command -v jq &>/dev/null; then + json_scores=$(echo "$score_output" | jq -c 'select(.label)' 2>/dev/null || true) + else json_scores=$(echo "$score_output" | grep -oE '\{[^}]*"label"[^}]*\}' || true) + fi
| db() { | ||
| local db_path="$1" | ||
| shift | ||
| sqlite3 -batch "$db_path" "$@" 2>/dev/null | ||
| } |
There was a problem hiding this comment.
db() unconditionally suppresses all SQLite errors — hampers debugging and masks real failures.
Redirecting stderr to /dev/null silences corruption, lock contention, and schema-mismatch errors. For a system that manages contest lifecycle state, silent DB failures can leave contests in inconsistent states with zero diagnostic trail.
Consider logging stderr to a file or at least allowing callers to opt in to suppression rather than baking it in globally.
Proposed fix
db() {
local db_path="$1"
shift
- sqlite3 -batch "$db_path" "$@" 2>/dev/null
+ sqlite3 -batch "$db_path" "$@"
}Then suppress selectively at call sites where it's intentional (e.g., probing for table existence).
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| db() { | |
| local db_path="$1" | |
| shift | |
| sqlite3 -batch "$db_path" "$@" 2>/dev/null | |
| } | |
| db() { | |
| local db_path="$1" | |
| shift | |
| sqlite3 -batch "$db_path" "$@" | |
| } |
🤖 Prompt for AI Agents
In @.agents/scripts/contest-helper.sh around lines 57 - 61, The db() helper
currently hides all sqlite3 stderr by appending "2>/dev/null" which hides real
errors; change db() to stop unconditionally discarding stderr — remove the
global "2>/dev/null" and instead add an optional suppression parameter or
respect an environment flag (e.g., DB_SILENT) so callers can opt into silencing;
ensure db() invokes sqlite3 with the same arguments but leaves stderr
unredirected by default, and update call sites that intentionally probe (e.g.,
table-existence checks) to pass the suppression flag or explicitly redirect
stderr to /dev/null there or to a log file.
| # Check if any tier has strong enough signal (>75% success, 3+ samples) | ||
| local total_samples | ||
| total_samples=$(echo "$pattern_json" | sed -n 's/.*"total_samples"[[:space:]]*:[[:space:]]*\([0-9]*\).*/\1/p' 2>/dev/null || echo "0") | ||
| local success_rate | ||
| success_rate=$(echo "$pattern_json" | sed -n 's/.*"success_rate"[[:space:]]*:[[:space:]]*\([0-9]*\).*/\1/p' 2>/dev/null || echo "0") | ||
|
|
||
| if [[ "$total_samples" -lt 3 ]]; then | ||
| log_info "Insufficient pattern data ($total_samples samples) for $task_id — contest mode triggered" | ||
| echo "insufficient_data" | ||
| return 0 | ||
| fi | ||
|
|
||
| if [[ "$success_rate" -lt 75 ]]; then | ||
| log_info "Low success rate (${success_rate}%) for $task_id — contest mode triggered" | ||
| echo "low_success_rate" | ||
| return 0 | ||
| fi |
There was a problem hiding this comment.
Empty sed output causes integer comparison failure under set -e.
If pattern_json doesn't contain "total_samples" or "success_rate", sed -n exits 0 with empty output — the || echo "0" fallback never fires. Then [[ "" -lt 3 ]] produces integer expression expected and crashes the function under set -e.
Proposed fix — apply default at use site
- if [[ "$total_samples" -lt 3 ]]; then
+ if [[ "${total_samples:-0}" -lt 3 ]]; then
log_info "Insufficient pattern data ($total_samples samples) for $task_id — contest mode triggered"
echo "insufficient_data"
return 0
fi
- if [[ "$success_rate" -lt 75 ]]; then
+ if [[ "${success_rate:-0}" -lt 75 ]]; then
log_info "Low success rate (${success_rate}%) for $task_id — contest mode triggered"
echo "low_success_rate"
return 0
fi📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| # Check if any tier has strong enough signal (>75% success, 3+ samples) | |
| local total_samples | |
| total_samples=$(echo "$pattern_json" | sed -n 's/.*"total_samples"[[:space:]]*:[[:space:]]*\([0-9]*\).*/\1/p' 2>/dev/null || echo "0") | |
| local success_rate | |
| success_rate=$(echo "$pattern_json" | sed -n 's/.*"success_rate"[[:space:]]*:[[:space:]]*\([0-9]*\).*/\1/p' 2>/dev/null || echo "0") | |
| if [[ "$total_samples" -lt 3 ]]; then | |
| log_info "Insufficient pattern data ($total_samples samples) for $task_id — contest mode triggered" | |
| echo "insufficient_data" | |
| return 0 | |
| fi | |
| if [[ "$success_rate" -lt 75 ]]; then | |
| log_info "Low success rate (${success_rate}%) for $task_id — contest mode triggered" | |
| echo "low_success_rate" | |
| return 0 | |
| fi | |
| # Check if any tier has strong enough signal (>75% success, 3+ samples) | |
| local total_samples | |
| total_samples=$(echo "$pattern_json" | sed -n 's/.*"total_samples"[[:space:]]*:[[:space:]]*\([0-9]*\).*/\1/p' 2>/dev/null || echo "0") | |
| local success_rate | |
| success_rate=$(echo "$pattern_json" | sed -n 's/.*"success_rate"[[:space:]]*:[[:space:]]*\([0-9]*\).*/\1/p' 2>/dev/null || echo "0") | |
| if [[ "${total_samples:-0}" -lt 3 ]]; then | |
| log_info "Insufficient pattern data ($total_samples samples) for $task_id — contest mode triggered" | |
| echo "insufficient_data" | |
| return 0 | |
| fi | |
| if [[ "${success_rate:-0}" -lt 75 ]]; then | |
| log_info "Low success rate (${success_rate}%) for $task_id — contest mode triggered" | |
| echo "low_success_rate" | |
| return 0 | |
| fi |
🤖 Prompt for AI Agents
In @.agents/scripts/contest-helper.sh around lines 186 - 202, The
integer-comparison fails when sed yields empty strings for total_samples or
success_rate; after extracting from pattern_json (variables total_samples and
success_rate), ensure they default to 0 before the [[ comparisons — e.g.
immediately after the sed assignments set total_samples=${total_samples:-0} and
success_rate=${success_rate:-0} (or use parameter expansion when comparing) so
[[ "$total_samples" -lt 3 ]] and [[ "$success_rate" -lt 75 ]] always receive
numeric values.
| local model_index=0 | ||
| local IFS=',' | ||
| for model in $models; do | ||
| model_index=$((model_index + 1)) | ||
| local entry_id="${contest_id}-entry-${model_index}" | ||
| local entry_task_id="${task_id}-contest-${model_index}" | ||
|
|
||
| db "$SUPERVISOR_DB" " | ||
| INSERT INTO contest_entries (id, contest_id, model, task_id, status) | ||
| VALUES ( | ||
| '$(sql_escape "$entry_id")', | ||
| '$(sql_escape "$contest_id")', | ||
| '$(sql_escape "$model")', | ||
| '$(sql_escape "$entry_task_id")', | ||
| 'pending' | ||
| ); | ||
| " | ||
|
|
||
| log_info "Created entry $entry_id for model $model (task: $entry_task_id)" | ||
| done | ||
| unset IFS |
There was a problem hiding this comment.
🛠️ Refactor suggestion | 🟠 Major
IFS manipulation is fragile and flagged by static analysis (Codacy).
Setting local IFS=',' then iterating an unquoted $models works, but unset IFS after the loop removes the local, which is subtly different from restoring the default. Static analysis rightfully flags this pattern.
Use read -ra into an array to avoid IFS gymnastics entirely:
Proposed fix
- local model_index=0
- local IFS=','
- for model in $models; do
+ local model_index=0
+ local -a model_array
+ IFS=',' read -ra model_array <<< "$models"
+ for model in "${model_array[@]}"; do
model_index=$((model_index + 1))
local entry_id="${contest_id}-entry-${model_index}"
local entry_task_id="${task_id}-contest-${model_index}"
@@ ...
log_info "Created entry $entry_id for model $model (task: $entry_task_id)"
done
- unset IFS📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| local model_index=0 | |
| local IFS=',' | |
| for model in $models; do | |
| model_index=$((model_index + 1)) | |
| local entry_id="${contest_id}-entry-${model_index}" | |
| local entry_task_id="${task_id}-contest-${model_index}" | |
| db "$SUPERVISOR_DB" " | |
| INSERT INTO contest_entries (id, contest_id, model, task_id, status) | |
| VALUES ( | |
| '$(sql_escape "$entry_id")', | |
| '$(sql_escape "$contest_id")', | |
| '$(sql_escape "$model")', | |
| '$(sql_escape "$entry_task_id")', | |
| 'pending' | |
| ); | |
| " | |
| log_info "Created entry $entry_id for model $model (task: $entry_task_id)" | |
| done | |
| unset IFS | |
| local model_index=0 | |
| local -a model_array | |
| IFS=',' read -ra model_array <<< "$models" | |
| for model in "${model_array[@]}"; do | |
| model_index=$((model_index + 1)) | |
| local entry_id="${contest_id}-entry-${model_index}" | |
| local entry_task_id="${task_id}-contest-${model_index}" | |
| db "$SUPERVISOR_DB" " | |
| INSERT INTO contest_entries (id, contest_id, model, task_id, status) | |
| VALUES ( | |
| '$(sql_escape "$entry_id")', | |
| '$(sql_escape "$contest_id")', | |
| '$(sql_escape "$model")', | |
| '$(sql_escape "$entry_task_id")', | |
| 'pending' | |
| ); | |
| " | |
| log_info "Created entry $entry_id for model $model (task: $entry_task_id)" | |
| done |
🧰 Tools
🪛 GitHub Check: Codacy Static Code Analysis
[warning] 342-342: .agents/scripts/contest-helper.sh#L342
The special variable IFS affects how splitting takes place when expanding unquoted variables.
🤖 Prompt for AI Agents
In @.agents/scripts/contest-helper.sh around lines 341 - 361, The loop that
splits $models uses "local IFS=','" and iterates an unquoted $models, then calls
unset IFS — replace that fragile IFS manipulation by reading $models into an
array with read -ra (e.g. read -ra model_arr <<< "$models") and iterate over
"${model_arr[@]}"; keep the existing logic that increments model_index and
constructs entry_id/entry_task_id and the db/sql_escape/log_info calls
(references: models, model_index, entry_id, entry_task_id, db, sql_escape,
log_info), and remove the local IFS/unset IFS handling.
| if [[ -n "$ewt" && -d "$ewt" ]]; then | ||
| # Get the diff as the "output" | ||
| summary=$(git -C "$ewt" diff --stat "main..HEAD" 2>/dev/null || echo "No diff available") | ||
| local full_diff | ||
| full_diff=$(git -C "$ewt" diff "main..HEAD" 2>/dev/null | head -500 || echo "") | ||
| summary="${summary} | ||
|
|
||
| --- Code Changes --- | ||
| ${full_diff}" |
There was a problem hiding this comment.
Hardcoded main as diff base — silently produces empty output for repos using master or other default branches.
The 2>/dev/null || echo "No diff available" fallback masks the real problem. The contest evaluation then scores entries with no meaningful data, producing arbitrary results.
Proposed fix — detect default branch
+ local base_branch
+ base_branch=$(git -C "$ewt" symbolic-ref refs/remotes/origin/HEAD 2>/dev/null | sed 's|refs/remotes/origin/||' || echo "main")
if [[ -n "$ewt" && -d "$ewt" ]]; then
- summary=$(git -C "$ewt" diff --stat "main..HEAD" 2>/dev/null || echo "No diff available")
+ summary=$(git -C "$ewt" diff --stat "${base_branch}..HEAD" 2>/dev/null || echo "No diff available")
local full_diff
- full_diff=$(git -C "$ewt" diff "main..HEAD" 2>/dev/null | head -500 || echo "")
+ full_diff=$(git -C "$ewt" diff "${base_branch}..HEAD" 2>/dev/null | head -500 || echo "")📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| if [[ -n "$ewt" && -d "$ewt" ]]; then | |
| # Get the diff as the "output" | |
| summary=$(git -C "$ewt" diff --stat "main..HEAD" 2>/dev/null || echo "No diff available") | |
| local full_diff | |
| full_diff=$(git -C "$ewt" diff "main..HEAD" 2>/dev/null | head -500 || echo "") | |
| summary="${summary} | |
| --- Code Changes --- | |
| ${full_diff}" | |
| local base_branch | |
| base_branch=$(git -C "$ewt" symbolic-ref refs/remotes/origin/HEAD 2>/dev/null | sed 's|refs/remotes/origin/||' || echo "main") | |
| if [[ -n "$ewt" && -d "$ewt" ]]; then | |
| # Get the diff as the "output" | |
| summary=$(git -C "$ewt" diff --stat "${base_branch}..HEAD" 2>/dev/null || echo "No diff available") | |
| local full_diff | |
| full_diff=$(git -C "$ewt" diff "${base_branch}..HEAD" 2>/dev/null | head -500 || echo "") | |
| summary="${summary} | |
| --- Code Changes --- | |
| ${full_diff}" |
🤖 Prompt for AI Agents
In @.agents/scripts/contest-helper.sh around lines 643 - 651, The script
currently hardcodes "main" when building the diffs (variables summary and
full_diff using git -C "$ewt" diff "main..HEAD") and swallows git errors, which
hides repos whose default is "master" or another branch; change it to detect the
repo's default branch first (e.g., run git -C "$ewt" to get origin/HEAD via
symbolic-ref or rev-parse and strip the "origin/" prefix) into a variable like
base_branch, fall back to "main" only if detection fails, then use
"$base_branch..HEAD" for both git diff --stat and git diff, and stop redirecting
stderr to /dev/null so failures surface (or at least preserve error output for
logging) instead of silently returning "No diff available"; update references to
ewt, summary, and full_diff accordingly.
| # Use opencode for scoring if available, otherwise fall back to direct API | ||
| if command -v opencode &>/dev/null; then | ||
| timeout 120 opencode run --format json \ | ||
| --model "$judge_model" \ | ||
| --prompt "$ranking_prompt" \ | ||
| >"$score_tmpfile" 2>/dev/null || true | ||
| score_output=$(cat "$score_tmpfile" 2>/dev/null || echo "") | ||
| fi |
There was a problem hiding this comment.
Large prompt passed as CLI argument risks hitting ARG_MAX — silent failure under || true.
ranking_prompt contains full diffs (up to 500 lines per entry × 3 entries). Passing this as --prompt "$ranking_prompt" can exceed ARG_MAX or per-argument limits on some systems. The || true suppresses the resulting E2BIG error, and the contest silently produces zero scores with only a vague "no parseable scores" warning.
Feed the prompt via a temp file or stdin instead:
Proposed fix
+ local prompt_file
+ prompt_file=$(mktemp "${TMPDIR:-/tmp}/contest-prompt-XXXXXX")
+ printf '%s' "$ranking_prompt" > "$prompt_file"
+
if command -v opencode &>/dev/null; then
timeout 120 opencode run --format json \
--model "$judge_model" \
- --prompt "$ranking_prompt" \
+ --prompt-file "$prompt_file" \
>"$score_tmpfile" 2>/dev/null || true
score_output=$(cat "$score_tmpfile" 2>/dev/null || echo "")
fi
+
+ rm -f "$prompt_file"(Adjust --prompt-file to whatever flag opencode supports for file-based input.)
🤖 Prompt for AI Agents
In @.agents/scripts/contest-helper.sh around lines 724 - 731, The script
currently passes the large variable ranking_prompt directly to opencode via
--prompt which can hit ARG_MAX and the trailing "|| true" hides E2BIG failures;
modify the block that invokes opencode (the timeout/opencode run call) to write
ranking_prompt to a temp file (e.g., use the existing score_tmpfile pattern or a
new prompt_tmpfile) and pass that file to opencode using the CLI's file-based
input option (or feed via stdin) instead of --prompt "$ranking_prompt"; also
remove the unconditional "|| true" and handle non-zero exit by logging the error
and preserving any opencode stderr for debugging so you don't silently drop
E2BIG errors.
| for var in int_correct int_complete int_quality int_clarity; do | ||
| local val="${!var}" | ||
| [[ "$val" -lt 1 ]] && eval "$var=1" | ||
| [[ "$val" -gt 5 ]] && eval "$var=5" | ||
| done |
There was a problem hiding this comment.
🛠️ Refactor suggestion | 🟠 Major
Replace eval with printf -v for indirect variable assignment.
While the loop variable names are hardcoded and safe here, eval is an anti-pattern that erodes the "zero technical debt" posture. Bash's printf -v achieves the same thing without eval:
Proposed fix
for var in int_correct int_complete int_quality int_clarity; do
local val="${!var}"
- [[ "$val" -lt 1 ]] && eval "$var=1"
- [[ "$val" -gt 5 ]] && eval "$var=5"
+ [[ "$val" -lt 1 ]] && printf -v "$var" '%s' 1
+ [[ "$val" -gt 5 ]] && printf -v "$var" '%s' 5
done📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| for var in int_correct int_complete int_quality int_clarity; do | |
| local val="${!var}" | |
| [[ "$val" -lt 1 ]] && eval "$var=1" | |
| [[ "$val" -gt 5 ]] && eval "$var=5" | |
| done | |
| for var in int_correct int_complete int_quality int_clarity; do | |
| local val="${!var}" | |
| [[ "$val" -lt 1 ]] && printf -v "$var" '%s' 1 | |
| [[ "$val" -gt 5 ]] && printf -v "$var" '%s' 5 | |
| done |
🤖 Prompt for AI Agents
In @.agents/scripts/contest-helper.sh around lines 964 - 968, The loop clamping
scores currently uses eval for indirect assignment (loop over int_correct
int_complete int_quality int_clarity), replace eval with bash's printf -v to set
the variable by name without eval: after computing local val="${!var}", use
printf -v "$var" '%s' "$val" (or the clamped numeric value) so the four
variables are safely updated without eval; update the block that iterates over
int_correct/int_complete/int_quality/int_clarity to use printf -v for
assignment.
| cmd_pulse_check() { | ||
| ensure_contest_tables || return 1 | ||
|
|
||
| local evaluated=0 | ||
|
|
||
| # Find running contests where all entries are done | ||
| local running_contests | ||
| running_contests=$(db "$SUPERVISOR_DB" " | ||
| SELECT c.id FROM contests c | ||
| WHERE c.status = 'running' | ||
| AND ( | ||
| SELECT count(*) FROM contest_entries ce | ||
| WHERE ce.contest_id = c.id | ||
| AND ce.status NOT IN ('complete','failed','cancelled') | ||
| ) = 0; | ||
| ") | ||
|
|
||
| while IFS= read -r contest_id; do | ||
| [[ -z "$contest_id" ]] && continue | ||
|
|
||
| # Sync entry statuses from their subtasks | ||
| _sync_entry_statuses "$contest_id" | ||
|
|
||
| # Re-check after sync | ||
| local still_pending | ||
| still_pending=$(db "$SUPERVISOR_DB" " | ||
| SELECT count(*) FROM contest_entries | ||
| WHERE contest_id = '$(sql_escape "$contest_id")' | ||
| AND status NOT IN ('complete','failed','cancelled'); | ||
| ") | ||
|
|
||
| if [[ "$still_pending" -eq 0 ]]; then | ||
| log_info "Contest $contest_id ready for evaluation" | ||
| if cmd_evaluate "$contest_id"; then | ||
| cmd_apply "$contest_id" || true | ||
| evaluated=$((evaluated + 1)) | ||
| fi | ||
| fi | ||
| done <<<"$running_contests" | ||
|
|
||
| echo "$evaluated" | ||
| return 0 | ||
| } |
There was a problem hiding this comment.
Logic bug: _sync_entry_statuses is only called for contests that already have all entries in terminal states — so unsynced entries are never updated.
The SQL at Line 1146 selects running contests where zero entries are still in non-terminal states. But _sync_entry_statuses (Line 1160) is the function that transitions entries from dispatched/running → complete/failed. Since it's called after the filter, contests with unsynced entries are never selected, and their entries remain stuck.
Fix: sync all running contests first, then query for those ready to evaluate.
Proposed fix
cmd_pulse_check() {
ensure_contest_tables || return 1
local evaluated=0
- # Find running contests where all entries are done
- local running_contests
- running_contests=$(db "$SUPERVISOR_DB" "
- SELECT c.id FROM contests c
- WHERE c.status = 'running'
- AND (
- SELECT count(*) FROM contest_entries ce
- WHERE ce.contest_id = c.id
- AND ce.status NOT IN ('complete','failed','cancelled')
- ) = 0;
- ")
-
- while IFS= read -r contest_id; do
- [[ -z "$contest_id" ]] && continue
-
- # Sync entry statuses from their subtasks
+ # First: sync ALL running contests' entry statuses
+ local all_running
+ all_running=$(db "$SUPERVISOR_DB" "SELECT id FROM contests WHERE status = 'running';")
+ while IFS= read -r contest_id; do
+ [[ -z "$contest_id" ]] && continue
_sync_entry_statuses "$contest_id"
+ done <<<"$all_running"
- # Re-check after sync
- local still_pending
- still_pending=$(db "$SUPERVISOR_DB" "
+ # Then: find contests where all entries are now done
+ local ready_contests
+ ready_contests=$(db "$SUPERVISOR_DB" "
+ SELECT c.id FROM contests c
+ WHERE c.status = 'running'
+ AND (
SELECT count(*) FROM contest_entries
- WHERE contest_id = '$(sql_escape "$contest_id")'
- AND status NOT IN ('complete','failed','cancelled');
- ")
+ WHERE contest_id = c.id
+ AND status NOT IN ('complete','failed','cancelled')
+ ) = 0;
+ ")
- if [[ "$still_pending" -eq 0 ]]; then
- log_info "Contest $contest_id ready for evaluation"
- if cmd_evaluate "$contest_id"; then
- cmd_apply "$contest_id" || true
- evaluated=$((evaluated + 1))
- fi
+ while IFS= read -r contest_id; do
+ [[ -z "$contest_id" ]] && continue
+ log_info "Contest $contest_id ready for evaluation"
+ if cmd_evaluate "$contest_id"; then
+ cmd_apply "$contest_id" || true
+ evaluated=$((evaluated + 1))
fi
- done <<<"$running_contests"
+ done <<<"$ready_contests"📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| cmd_pulse_check() { | |
| ensure_contest_tables || return 1 | |
| local evaluated=0 | |
| # Find running contests where all entries are done | |
| local running_contests | |
| running_contests=$(db "$SUPERVISOR_DB" " | |
| SELECT c.id FROM contests c | |
| WHERE c.status = 'running' | |
| AND ( | |
| SELECT count(*) FROM contest_entries ce | |
| WHERE ce.contest_id = c.id | |
| AND ce.status NOT IN ('complete','failed','cancelled') | |
| ) = 0; | |
| ") | |
| while IFS= read -r contest_id; do | |
| [[ -z "$contest_id" ]] && continue | |
| # Sync entry statuses from their subtasks | |
| _sync_entry_statuses "$contest_id" | |
| # Re-check after sync | |
| local still_pending | |
| still_pending=$(db "$SUPERVISOR_DB" " | |
| SELECT count(*) FROM contest_entries | |
| WHERE contest_id = '$(sql_escape "$contest_id")' | |
| AND status NOT IN ('complete','failed','cancelled'); | |
| ") | |
| if [[ "$still_pending" -eq 0 ]]; then | |
| log_info "Contest $contest_id ready for evaluation" | |
| if cmd_evaluate "$contest_id"; then | |
| cmd_apply "$contest_id" || true | |
| evaluated=$((evaluated + 1)) | |
| fi | |
| fi | |
| done <<<"$running_contests" | |
| echo "$evaluated" | |
| return 0 | |
| } | |
| cmd_pulse_check() { | |
| ensure_contest_tables || return 1 | |
| local evaluated=0 | |
| # First: sync ALL running contests' entry statuses | |
| local all_running | |
| all_running=$(db "$SUPERVISOR_DB" "SELECT id FROM contests WHERE status = 'running';") | |
| while IFS= read -r contest_id; do | |
| [[ -z "$contest_id" ]] && continue | |
| _sync_entry_statuses "$contest_id" | |
| done <<<"$all_running" | |
| # Then: find contests where all entries are now done | |
| local ready_contests | |
| ready_contests=$(db "$SUPERVISOR_DB" " | |
| SELECT c.id FROM contests c | |
| WHERE c.status = 'running' | |
| AND ( | |
| SELECT count(*) FROM contest_entries | |
| WHERE contest_id = c.id | |
| AND status NOT IN ('complete','failed','cancelled') | |
| ) = 0; | |
| ") | |
| while IFS= read -r contest_id; do | |
| [[ -z "$contest_id" ]] && continue | |
| log_info "Contest $contest_id ready for evaluation" | |
| if cmd_evaluate "$contest_id"; then | |
| cmd_apply "$contest_id" || true | |
| evaluated=$((evaluated + 1)) | |
| fi | |
| done <<<"$ready_contests" | |
| echo "$evaluated" | |
| return 0 | |
| } |
🤖 Prompt for AI Agents
In @.agents/scripts/contest-helper.sh around lines 1139 - 1181, The
cmd_pulse_check logic currently only selects contests that already have zero
non-terminal entries, so _sync_entry_statuses never runs for contests with stale
dispatched/running subtasks; change the flow to first enumerate running
contests, call _sync_entry_statuses for each contest_id, then re-query that
contest's entries to see if pending count is zero and proceed to
cmd_evaluate/cmd_apply; specifically update cmd_pulse_check to fetch running
contest ids (no subquery filtering), call _sync_entry_statuses "$contest_id"
immediately for each, then run the existing pending-count query and evaluation
steps for that same contest_id.
| # Contest mode intercept (t1011): if model resolves to CONTEST, delegate to | ||
| # contest-helper.sh which dispatches the same task to top-3 models in parallel. | ||
| # The original task stays in 'running' state while contest entries execute. | ||
| if [[ "$resolved_model" == "CONTEST" ]]; then | ||
| log_info "Contest mode activated for $task_id — delegating to contest-helper.sh" | ||
| local contest_helper="${SCRIPT_DIR}/contest-helper.sh" | ||
| if [[ -x "$contest_helper" ]]; then | ||
| local contest_id | ||
| contest_id=$("$contest_helper" create "$task_id" ${batch_id:+--batch "$batch_id"} 2>/dev/null) | ||
| if [[ -n "$contest_id" ]]; then | ||
| "$contest_helper" dispatch "$contest_id" 2>/dev/null || { | ||
| log_error "Contest dispatch failed for $task_id" | ||
| cmd_transition "$task_id" "failed" --error "Contest dispatch failed" | ||
| return 1 | ||
| } | ||
| # Keep original task in running state — pulse Phase 2.5 will check contest completion | ||
| db "$SUPERVISOR_DB" "UPDATE tasks SET error = 'contest:${contest_id}' WHERE id = '$(sql_escape "$task_id")';" | ||
| log_success "Contest $contest_id dispatched for $task_id" | ||
| echo "contest:${contest_id}" | ||
| return 0 | ||
| else | ||
| log_error "Failed to create contest for $task_id — falling back to default model" | ||
| resolved_model=$(resolve_model "coding" "$ai_cli") | ||
| fi | ||
| else | ||
| log_warn "contest-helper.sh not found — falling back to default model" | ||
| resolved_model=$(resolve_model "coding" "$ai_cli") | ||
| fi | ||
| fi |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
# First, verify the provided rg commands to understand contest logic
rg -n "contest" .agents/scripts/contest-helper.sh .agents/scripts/supervisor-helper.sh | head -50Repository: marcusquinn/aidevops
Length of output: 5402
🏁 Script executed:
#!/bin/bash
# Search for task status updates in contest-helper
rg -n "UPDATE tasks SET .*status" .agents/scripts/contest-helper.shRepository: marcusquinn/aidevops
Length of output: 46
🏁 Script executed:
#!/bin/bash
# Search for contest error marker patterns
rg -n "contest:" .agents/scripts/supervisor-helper.shRepository: marcusquinn/aidevops
Length of output: 220
🏁 Script executed:
#!/bin/bash
# Look for Phase 1 evaluation logic and PID file handling
rg -n "Phase 1|phase.*1|PID.*file|pidfile|pid_file" .agents/scripts/supervisor-helper.sh | head -40Repository: marcusquinn/aidevops
Length of output: 2131
🏁 Script executed:
#!/bin/bash
# Search for guards that skip tasks during evaluation
rg -n "error.*contest|contest.*error|skip.*task|WHERE.*error" .agents/scripts/supervisor-helper.sh | head -40Repository: marcusquinn/aidevops
Length of output: 2121
🏁 Script executed:
#!/bin/bash
# Find where Phase 1 selects tasks to evaluate
sed -n '10189,10250p' .agents/scripts/supervisor-helper.shRepository: marcusquinn/aidevops
Length of output: 2109
🏁 Script executed:
#!/bin/bash
# Search for all task status queries in Phase 1 region
rg -n "SELECT.*status|status.*running" .agents/scripts/supervisor-helper.sh | grep -E "10[0-9]{3}:|Phase.*1"Repository: marcusquinn/aidevops
Length of output: 2304
🏁 Script executed:
#!/bin/bash
# Look for task selection logic and WHERE clauses in Phase 1
sed -n '10180,10220p' .agents/scripts/supervisor-helper.shRepository: marcusquinn/aidevops
Length of output: 1134
🏁 Script executed:
#!/bin/bash
# Search for any skip/guard logic referencing error field or contest
rg -n "error.*!=|WHERE.*error|contest:" .agents/scripts/supervisor-helper.sh | grep -A2 -B2 "10[0-9]{3}:"Repository: marcusquinn/aidevops
Length of output: 46
Contest tasks are evaluated as failed by Phase 1 due to missing PID file guard.
Phase 1 queries all tasks with status IN ('running', 'dispatched', 'evaluating') without exception. Contest tasks remain in 'running' status but don't create PID files—they're delegated to contest-helper.sh. When Phase 1 finds no PID file, it treats the task as complete and evaluates it, which can mark it failed while the contest is still running.
Add a guard in Phase 1 to skip tasks where error LIKE 'contest:%', or transition contest tasks to a status excluded from Phase 1 evaluation.
🤖 Prompt for AI Agents
In @.agents/scripts/supervisor-helper.sh around lines 5859 - 5887, Phase 1 is
prematurely evaluating contest tasks because they remain status 'running' and
lack PID files; fix by either (A) changing the Phase 1 selection query to
exclude tasks where error LIKE 'contest:%' (i.e. add AND error NOT LIKE
'contest:%' to the tasks query used by Phase 1), or (B) when delegating to
contest-helper.sh in the contest branch (the block that calls contest-helper.sh,
sets contest_id and calls db "...UPDATE tasks SET error =
'contest:${contest_id}'..." and then returns), update the task row to a distinct
status such as 'contest_running' instead of leaving it 'running' (modify the db
UPDATE in that block to set status='contest_running' and
error='contest:${contest_id}'), and ensure Phase 1 excludes 'contest_running'
from its status IN (...) checks; implement one of these approaches so Phase 1
skips in-flight contest tasks.
|
Closing: Codacy changes requested + merge conflicts. Task t1011 will be re-dispatched to implement fresh against current main. |



Summary
Model contest mode for the supervisor (t1011). When model selection is uncertain, dispatches the same task to top-3 models in parallel, then cross-ranks all outputs to pick the winner. Builds permanent routing data for future model selection.
Ref #1301
Changes
New: contest-helper.sh (standalone orchestrator)
Modified: supervisor-helper.sh
New: tests/test-contest-helper.sh
Updated: subagent-index.toon
Flow
Trigger conditions
Cost
~3x a single run, but builds permanent routing data. Only triggers for genuinely uncertain cases.
Testing
All 20 tests pass: bash tests/test-contest-helper.sh --verbose
Summary by CodeRabbit
Release Notes
New Features
contestcommand with controls for creating contests, dispatching to models, evaluating results, and managing outcomes.Tests