GH#3711: Fix critical quality-debt in codacy-collector-helper.sh#4067
GH#3711: Fix critical quality-debt in codacy-collector-helper.sh#4067marcusquinn merged 1 commit intomainfrom
Conversation
…3711) Address all 13 findings from PR #1384 review feedback (coderabbit, gemini-code-assist, human reviewers): CRITICAL: - Anchor AUDIT_CONFIG paths to REPO_ROOT instead of relative CWD paths - Validate --limit as positive integer (max 10000) to prevent SQL injection HIGH: - Use jq to build JSON request body safely (cursor escaping) - Wrap batch inserts in BEGIN/COMMIT transaction for atomicity + performance - Rate-limit retries (429) use separate budget, don't consume error retries MEDIUM: - Remove dead tmp_headers allocation and -D curl arg - Use workspace tmp dir (~/.aidevops/.agent-workspace/tmp/) for all temp files - Fix SC2155: separate declare and assign for REPO_ROOT Previously fixed in b0c77ce (verified still present): - ${MAX_RETRIES:-3} defaults throughout - ${ERROR_UNKNOWN_COMMAND:-Unknown command:} default - Severity/category input validation with case statements
WalkthroughThis change refactors Changes
Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes Possibly related PRs
Poem
🚥 Pre-merge checks | ✅ 5✅ Passed checks (5 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
🔍 Code Quality Report�[0;35m[MONITOR]�[0m Code Review Monitoring Report �[0;34m[INFO]�[0m Latest Quality Status: �[0;34m[INFO]�[0m Recent monitoring activity: 📈 Current Quality Metrics
Generated on: Tue Mar 10 10:47:40 UTC 2026 Generated by AI DevOps Framework Code Review Monitoring |
|
There was a problem hiding this comment.
🧹 Nitpick comments (2)
.agents/scripts/codacy-collector-helper.sh (2)
41-45: Solid SC2155 fix — consider adding defensive validation for consistency.The separation of declaration and assignment correctly addresses ShellCheck SC2155. The repo-root anchoring makes config paths CWD-independent as intended.
For consistency with line 26's
|| exitpattern on SCRIPT_DIR, consider adding a guard to ensure REPO_ROOT resolved successfully:🛡️ Optional: Add defensive validation
REPO_ROOT="$(cd "${SCRIPT_DIR}/../.." && pwd)" readonly REPO_ROOT +[[ -d "$REPO_ROOT" ]] || { echo "FATAL: Could not determine REPO_ROOT" >&2; exit 1; } readonly AUDIT_CONFIG="${REPO_ROOT}/configs/code-audit-config.json"🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In @.agents/scripts/codacy-collector-helper.sh around lines 41 - 45, REPO_ROOT is derived from SCRIPT_DIR but lacks the defensive validation used earlier; after computing REPO_ROOT ensure it succeeded (e.g., test that REPO_ROOT is non-empty and points to an existing directory) and fail fast if not (similar to the SCRIPT_DIR "|| exit" pattern), then only mark REPO_ROOT readonly and proceed to set AUDIT_CONFIG and AUDIT_CONFIG_TEMPLATE so the script won't continue with invalid paths.
530-537: Transaction wrapping is correct — considertotal_changes()for accurate counts.The transaction wrapping provides atomicity and the single-fsync performance benefit as intended. The 5000ms
.timeoutindb()should handle any concurrent write contention withcode-audit-helper.sh.One minor observation: the count on line 537 comes from
wc -lon the SQL file, not actual committed rows. If any inserts fail silently within the transaction, the count could be inaccurate.♻️ Optional: Use total_changes() for accurate row count
{ echo "BEGIN TRANSACTION;" cat "$sql_file" echo "COMMIT;" + echo "SELECT total_changes();" } | - db "$AUDIT_DB" 2>/dev/null || log_warn "Some Codacy inserts may have failed" - count=$(wc -l <"$sql_file" | tr -d ' ') + db "$AUDIT_DB" 2>/dev/null | tail -1 || log_warn "Some Codacy inserts may have failed" + # count is now the actual committed row count from total_changes() + count=${count:-0}🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In @.agents/scripts/codacy-collector-helper.sh around lines 530 - 537, Replace the post-transaction line-count (count=$(wc -l <"$sql_file" | tr -d ' ')) with a query against the same SQLite handle to get the number of rows actually changed by the transaction: after the BEGIN...COMMIT pipeline finishes, call db "$AUDIT_DB" and run "SELECT total_changes();" (or "SELECT changes();" depending on desired scope) to retrieve the committed insert count and assign that to count; reference the existing db function, AUDIT_DB variable, and sql_file/BEGIN TRANSACTION...COMMIT block so the code measures actual committed rows rather than the number of SQL lines.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Nitpick comments:
In @.agents/scripts/codacy-collector-helper.sh:
- Around line 41-45: REPO_ROOT is derived from SCRIPT_DIR but lacks the
defensive validation used earlier; after computing REPO_ROOT ensure it succeeded
(e.g., test that REPO_ROOT is non-empty and points to an existing directory) and
fail fast if not (similar to the SCRIPT_DIR "|| exit" pattern), then only mark
REPO_ROOT readonly and proceed to set AUDIT_CONFIG and AUDIT_CONFIG_TEMPLATE so
the script won't continue with invalid paths.
- Around line 530-537: Replace the post-transaction line-count (count=$(wc -l
<"$sql_file" | tr -d ' ')) with a query against the same SQLite handle to get
the number of rows actually changed by the transaction: after the BEGIN...COMMIT
pipeline finishes, call db "$AUDIT_DB" and run "SELECT total_changes();" (or
"SELECT changes();" depending on desired scope) to retrieve the committed insert
count and assign that to count; reference the existing db function, AUDIT_DB
variable, and sql_file/BEGIN TRANSACTION...COMMIT block so the code measures
actual committed rows rather than the number of SQL lines.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: bd5644ee-1e67-4b72-817e-05472f117b86
📒 Files selected for processing (1)
.agents/scripts/codacy-collector-helper.sh
Summary of ChangesHello, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request significantly enhances the robustness, security, and reliability of the Highlights
Changelog
Activity
Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for GitHub and other Google products, sign up here. You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension. Footnotes
|
There was a problem hiding this comment.
Code Review
This pull request significantly improves the codacy-collector-helper.sh script by addressing several critical, high, and medium severity issues. Key enhancements include anchoring configuration paths to the repository root for better portability, implementing a more robust rate-limiting retry mechanism, utilizing jq for safe JSON request body construction to prevent injection, wrapping SQL INSERT statements in transactions for atomicity and performance, and adding strict input validation for the --limit parameter to mitigate SQL injection vulnerabilities. These changes collectively enhance the script's security, reliability, and maintainability.
| readonly AUDIT_CONFIG="${REPO_ROOT}/configs/code-audit-config.json" | ||
| readonly AUDIT_CONFIG_TEMPLATE="${REPO_ROOT}/configs/code-audit-config.json.txt" |
There was a problem hiding this comment.
Anchoring AUDIT_CONFIG and AUDIT_CONFIG_TEMPLATE to REPO_ROOT correctly resolves the critical issue of CWD-dependent config paths. This ensures that the script can be invoked from any directory within the repository and still locate its configuration files, significantly improving its reliability and preventing unexpected failures.
| if [[ "$2" =~ ^[0-9]+$ ]] && [[ "$2" -gt 0 && "$2" -le 10000 ]]; then | ||
| limit="$2" | ||
| else | ||
| log_error "Invalid limit value: $2 (must be a positive integer, max 10000)" | ||
| return 1 |
There was a problem hiding this comment.
The added input validation for the --limit parameter is a critical security fix. By ensuring that the limit is a positive integer within a defined range (1-10000), the script effectively prevents SQL injection attacks that could exploit malformed or malicious input. This is a strong defense-in-depth measure for handling user-provided input.
References
- Employ a defense-in-depth strategy for handling user-provided input. Sanitize input at the entry point using a strict allowlist, and also apply context-specific escaping or safe handling mechanisms (e.g., parameterized queries for SQL,
--argforjq) at each point of use. - To prevent SQL injection in shell scripts, sanitize input using a strict allowlist and also perform context-specific escaping (e.g., doubling single quotes for SQL strings) before interpolating the value into a query.
- When parsing delimited data from an untrusted source in a shell script, validate numeric fields before using them in calculations. This provides defense-in-depth against injection attacks that could result from delimiter shifting caused by malicious data.
| rate_limit_retries=$((rate_limit_retries + 1)) | ||
| if [[ $rate_limit_retries -ge $max_rate_limit_retries ]]; then | ||
| log_error "Rate limited ${rate_limit_retries} times — giving up: ${endpoint}" | ||
| return 1 | ||
| fi |
There was a problem hiding this comment.
The new logic to check rate_limit_retries against max_rate_limit_retries before giving up on 429 responses is a robust improvement. It ensures that the script attempts to recover from rate limits a predefined number of times, preventing premature failure and improving overall success rates for API calls.
| { | ||
| echo "BEGIN TRANSACTION;" | ||
| cat "$sql_file" | ||
| echo "COMMIT;" | ||
| } | |
There was a problem hiding this comment.
Wrapping the batch INSERT statements in BEGIN TRANSACTION; and COMMIT; is a significant improvement for both atomicity and performance. This ensures that either all inserts succeed or none do, maintaining data integrity. Furthermore, it can drastically improve performance by reducing the number of fsync operations, as noted in the PR description.
| request_body=$(jq -nc --argjson limit "$CODACY_PAGE_SIZE" --arg cursor "$cursor" \ | ||
| '{limit: $limit, cursor: $cursor}') | ||
| else | ||
| request_body="{\"limit\": ${CODACY_PAGE_SIZE}}" | ||
| request_body=$(jq -nc --argjson limit "$CODACY_PAGE_SIZE" '{limit: $limit}') | ||
| fi |
There was a problem hiding this comment.
Using jq -nc --argjson and --arg for constructing the JSON request body is a critical security improvement. This method safely escapes shell variables into JSON, preventing potential injection vulnerabilities that could arise from directly interpolating values into a JSON string. This aligns with best practices for handling user-provided input and constructing data payloads securely.
References
- In shell scripts, use
jq --argfor strings and--argjsonfor other JSON types (like numbers) to safely pass variables into ajqfilter. This avoids syntax errors if the variables contain special characters. - In shell scripts, use
jq --argjsonto safely pass JSON content from variables when merging JSON arrays, instead of direct string interpolation, to prevent parsing errors. - Employ a defense-in-depth strategy for handling user-provided input. Sanitize input at the entry point using a strict allowlist, and also apply context-specific escaping or safe handling mechanisms (e.g., parameterized queries for SQL,
--argforjq) at each point of use.
| @@ -243,7 +249,6 @@ codacy_api_request() { | |||
| -s | |||
| -w "%{http_code}" | |||
| -o "$tmp_response" | |||
| _save_cleanup_scope | ||
| trap '_run_cleanups' RETURN | ||
| push_cleanup "rm -f '${tmp_response}' '${tmp_headers}'" | ||
| push_cleanup "rm -f '${tmp_response}'" |
| local tmp_dir="${HOME}/.aidevops/.agent-workspace/tmp" | ||
| mkdir -p "$tmp_dir" 2>/dev/null || true | ||
| sql_file=$(mktemp "${tmp_dir}/codacy-sql.XXXXXX") | ||
| jq_filter_file=$(mktemp "${tmp_dir}/codacy-jq.XXXXXX") |
There was a problem hiding this comment.
Similar to codacy_api_request, using a dedicated temporary directory ~/.aidevops/.agent-workspace/tmp for sql_file and jq_filter_file is a good practice. It centralizes temporary files and improves management. The mkdir -p with 2>/dev/null || true ensures graceful handling of directory creation.
References
- In status display functions, it is acceptable to suppress stderr (
2>/dev/null) for commands that might fail transiently (e.g., parsing a file being written). This prioritizes clean, formatted output and graceful degradation over showing noisy, transient errors, especially when common failure cases are handled separately. - Suppress stderr (
2>/dev/null) forfindcommands when it is expected that the target directory may not exist. This prevents noisy, unactionable errors in status display scripts where a fallback value (e.g.,|| echo "0") is used.
| REPO_ROOT="$(cd "${SCRIPT_DIR}/../.." && pwd)" | ||
| readonly REPO_ROOT |
There was a problem hiding this comment.
The REPO_ROOT variable is correctly declared and then made readonly in separate steps. This adheres to the ShellCheck SC2155 recommendation, which suggests separating declaration and assignment for readonly variables to avoid potential issues in some shell environments. This is a good practice for maintainability and robustness.
References
- When declaring and assigning local variables in shell scripts, separate the
localdeclaration from the assignment if command substitution is involved, to ensure exit code safety. For consistency, this pattern should be maintained even for simple assignments.
|
|
||
| tmp_response=$(mktemp) | ||
| tmp_headers=$(mktemp) | ||
| local tmp_response |
…review - Add defensive REPO_ROOT validation to fail fast if path resolution fails, matching the SCRIPT_DIR || exit pattern (CodeRabbit nitpick) - Replace wc -l line count with SQLite total_changes() for accurate committed row count after transaction (CodeRabbit nitpick) Closes #4075



Summary
Addresses all 13 findings from PR #1384 review feedback (coderabbit, gemini-code-assist, human reviewers) on
.agents/scripts/codacy-collector-helper.sh.Closes #3711
Changes
CRITICAL
--limit: Validate as positive integer (1-10000) before interpolation into SQLAUDIT_CONFIGandAUDIT_CONFIG_TEMPLATEtoREPO_ROOT(derived fromSCRIPT_DIR) instead of relative paths that break when invoked from non-repo CWDHIGH
jq -ncwith--arg/--argjsonto build cursor pagination request body safelyBEGIN TRANSACTION/COMMITfor atomicity and performance (single fsync)MEDIUM
tmp_headersallocation and-Dcurl arg; use~/.aidevops/.agent-workspace/tmp/for all temp filesREPO_ROOTreadonlyAlready fixed (verified present from b0c77ce)
${MAX_RETRIES:-3}defaults throughout${ERROR_UNKNOWN_COMMAND:-Unknown command:}defaultVerification
shellcheck -xpasses (only SC1091 info for external source, expected)Summary by CodeRabbit
Release Notes
Bug Fixes
Improvements