Skip to content
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
62 changes: 44 additions & 18 deletions .agents/scripts/codacy-collector-helper.sh
Original file line number Diff line number Diff line change
Expand Up @@ -38,9 +38,11 @@ readonly CODACY_PAGE_SIZE=100
readonly CODACY_MAX_PAGES=50
readonly CODACY_RATE_LIMIT_WAIT=60
readonly CODACY_RETRY_BACKOFF_BASE=2
# Config file paths (working config preferred, template as fallback)
readonly AUDIT_CONFIG="configs/code-audit-config.json"
readonly AUDIT_CONFIG_TEMPLATE="configs/code-audit-config.json.txt"
# Config file paths anchored to repo root (not CWD-dependent)
REPO_ROOT="$(cd "${SCRIPT_DIR}/../.." && pwd)"
readonly REPO_ROOT
Comment on lines +42 to +43
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The REPO_ROOT variable is correctly declared and then made readonly in separate steps. This adheres to the ShellCheck SC2155 recommendation, which suggests separating declaration and assignment for readonly variables to avoid potential issues in some shell environments. This is a good practice for maintainability and robustness.

References
  1. When declaring and assigning local variables in shell scripts, separate the local declaration from the assignment if command substitution is involved, to ensure exit code safety. For consistency, this pattern should be maintained even for simple assignments.

readonly AUDIT_CONFIG="${REPO_ROOT}/configs/code-audit-config.json"
readonly AUDIT_CONFIG_TEMPLATE="${REPO_ROOT}/configs/code-audit-config.json.txt"
Comment on lines +44 to +45
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

critical

Anchoring AUDIT_CONFIG and AUDIT_CONFIG_TEMPLATE to REPO_ROOT correctly resolves the critical issue of CWD-dependent config paths. This ensures that the script can be invoked from any directory within the repository and still locate its configuration files, significantly improving its reliability and preventing unexpected failures.


# =============================================================================
# Logging: uses shared log_* from shared-constants.sh with CODACY prefix
Expand Down Expand Up @@ -228,13 +230,17 @@ codacy_api_request() {
local attempt=0
local response=""
local http_code=""
local tmp_response tmp_headers

tmp_response=$(mktemp)
tmp_headers=$(mktemp)
local tmp_response
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The removal of tmp_headers addresses the identified dead code, simplifying the script and removing unnecessary resource allocation. This is a good cleanup for maintainability.

local tmp_dir="${HOME}/.aidevops/.agent-workspace/tmp"
mkdir -p "$tmp_dir" 2>/dev/null || true
tmp_response=$(mktemp "${tmp_dir}/codacy-resp.XXXXXX")
Comment on lines +234 to +236
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

Using a dedicated temporary directory ~/.aidevops/.agent-workspace/tmp for temporary files instead of the system's default /tmp is a good practice. It centralizes temporary files related to the agent, making them easier to manage and debug. The mkdir -p with 2>/dev/null || true gracefully handles cases where the directory already exists or creation fails for non-critical reasons, aligning with the general rule about suppressing stderr for transient failures.

References
  1. In status display functions, it is acceptable to suppress stderr (2>/dev/null) for commands that might fail transiently (e.g., parsing a file being written). This prioritizes clean, formatted output and graceful degradation over showing noisy, transient errors, especially when common failure cases are handled separately.
  2. Suppress stderr (2>/dev/null) for find commands when it is expected that the target directory may not exist. This prevents noisy, unactionable errors in status display scripts where a fallback value (e.g., || echo "0") is used.

_save_cleanup_scope
trap '_run_cleanups' RETURN
push_cleanup "rm -f '${tmp_response}' '${tmp_headers}'"
push_cleanup "rm -f '${tmp_response}'"
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The cleanup for tmp_headers is correctly removed, aligning with the variable's removal. This ensures that no non-existent files are attempted to be removed, preventing potential errors.


# Rate-limit retries use a separate budget so 429s don't consume error retries
local rate_limit_retries=0
local max_rate_limit_retries=5
Comment on lines +241 to +243
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

Introducing rate_limit_retries and max_rate_limit_retries provides a dedicated budget for handling 429 (rate-limited) responses. This is a significant improvement, as it prevents rate-limiting issues from prematurely consuming the general error retry budget, enhancing the script's resilience and reliability when interacting with the API.


while [[ $attempt -lt ${MAX_RETRIES:-3} ]]; do
attempt=$((attempt + 1))
Expand All @@ -243,7 +249,6 @@ codacy_api_request() {
-s
-w "%{http_code}"
-o "$tmp_response"
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The removal of -D "$tmp_headers" from the curl command is appropriate, as the tmp_headers variable and its associated functionality have been removed. This keeps the curl command clean and focused on its primary task.

-D "$tmp_headers"
-H "api-token: ${CODACY_TOKEN}"
-H "Accept: application/json"
)
Expand All @@ -270,10 +275,17 @@ codacy_api_request() {
return 0
;;
429)
# Rate limited — extract retry-after or use default wait
# Rate limited — separate budget so 429s don't consume error retries
rate_limit_retries=$((rate_limit_retries + 1))
if [[ $rate_limit_retries -ge $max_rate_limit_retries ]]; then
log_error "Rate limited ${rate_limit_retries} times — giving up: ${endpoint}"
return 1
fi
Comment on lines +279 to +283
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

The new logic to check rate_limit_retries against max_rate_limit_retries before giving up on 429 responses is a robust improvement. It ensures that the script attempts to recover from rate limits a predefined number of times, preventing premature failure and improving overall success rates for API calls.

local retry_after="$CODACY_RATE_LIMIT_WAIT"
log_warn "Rate limited (429). Waiting ${retry_after}s before retry (attempt ${attempt}/${MAX_RETRIES})..."
log_warn "Rate limited (429). Waiting ${retry_after}s (rate-limit retry ${rate_limit_retries}/${max_rate_limit_retries})..."
sleep "$retry_after"
# Don't consume the error retry budget for rate limits
attempt=$((attempt - 1))
Comment on lines +287 to +288
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

Decrementing the main attempt counter when a 429 (rate-limited) response occurs is a clever and effective way to ensure that rate-limit retries do not consume the general error retry budget. This allows the script to handle transient rate limits without impacting its ability to retry for other types of errors, significantly improving its resilience.

continue
;;
401)
Expand Down Expand Up @@ -387,12 +399,13 @@ cmd_collect() {
while [[ "$has_more" == "true" && $page -lt $CODACY_MAX_PAGES ]]; do
page=$((page + 1))

# Build request body with cursor-based pagination
# Build request body with cursor-based pagination (jq for safe JSON escaping)
local request_body
if [[ -n "$cursor" ]]; then
request_body="{\"limit\": ${CODACY_PAGE_SIZE}, \"cursor\": \"${cursor}\"}"
request_body=$(jq -nc --argjson limit "$CODACY_PAGE_SIZE" --arg cursor "$cursor" \
'{limit: $limit, cursor: $cursor}')
else
request_body="{\"limit\": ${CODACY_PAGE_SIZE}}"
request_body=$(jq -nc --argjson limit "$CODACY_PAGE_SIZE" '{limit: $limit}')
fi
Comment on lines +405 to 409
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

Using jq -nc --argjson and --arg for constructing the JSON request body is a critical security improvement. This method safely escapes shell variables into JSON, preventing potential injection vulnerabilities that could arise from directly interpolating values into a JSON string. This aligns with best practices for handling user-provided input and constructing data payloads securely.

References
  1. In shell scripts, use jq --arg for strings and --argjson for other JSON types (like numbers) to safely pass variables into a jq filter. This avoids syntax errors if the variables contain special characters.
  2. In shell scripts, use jq --argjson to safely pass JSON content from variables when merging JSON arrays, instead of direct string interpolation, to prevent parsing errors.
  3. Employ a defense-in-depth strategy for handling user-provided input. Sanitize input at the entry point using a strict allowlist, and also apply context-specific escaping or safe handling mechanisms (e.g., parameterized queries for SQL, --arg for jq) at each point of use.


local endpoint="/analysis/organizations/gh/${org}/repositories/${repo_name}/issues/search"
Expand Down Expand Up @@ -463,8 +476,10 @@ insert_findings() {
local response="$2"

local sql_file jq_filter_file
sql_file=$(mktemp)
jq_filter_file=$(mktemp)
local tmp_dir="${HOME}/.aidevops/.agent-workspace/tmp"
mkdir -p "$tmp_dir" 2>/dev/null || true
sql_file=$(mktemp "${tmp_dir}/codacy-sql.XXXXXX")
jq_filter_file=$(mktemp "${tmp_dir}/codacy-jq.XXXXXX")
Comment on lines +479 to +482
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

Similar to codacy_api_request, using a dedicated temporary directory ~/.aidevops/.agent-workspace/tmp for sql_file and jq_filter_file is a good practice. It centralizes temporary files and improves management. The mkdir -p with 2>/dev/null || true ensures graceful handling of directory creation.

References
  1. In status display functions, it is acceptable to suppress stderr (2>/dev/null) for commands that might fail transiently (e.g., parsing a file being written). This prioritizes clean, formatted output and graceful degradation over showing noisy, transient errors, especially when common failure cases are handled separately.
  2. Suppress stderr (2>/dev/null) for find commands when it is expected that the target directory may not exist. This prevents noisy, unactionable errors in status display scripts where a fallback value (e.g., || echo "0") is used.

_save_cleanup_scope
trap '_run_cleanups' RETURN
push_cleanup "rm -f '${sql_file}'"
Expand Down Expand Up @@ -512,7 +527,13 @@ JQ_EOF

local count=0
if [[ -s "$sql_file" ]]; then
db "$AUDIT_DB" <"$sql_file" 2>/dev/null || log_warn "Some Codacy inserts may have failed"
# Wrap in transaction for atomicity and performance (single fsync)
{
echo "BEGIN TRANSACTION;"
cat "$sql_file"
echo "COMMIT;"
} |
Comment on lines +531 to +535
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

Wrapping the batch INSERT statements in BEGIN TRANSACTION; and COMMIT; is a significant improvement for both atomicity and performance. This ensures that either all inserts succeed or none do, maintaining data integrity. Furthermore, it can drastically improve performance by reducing the number of fsync operations, as noted in the PR description.

db "$AUDIT_DB" 2>/dev/null || log_warn "Some Codacy inserts may have failed"
count=$(wc -l <"$sql_file" | tr -d ' ')
fi

Expand Down Expand Up @@ -597,7 +618,12 @@ cmd_query() {
log_error "Missing value for --limit"
return 1
}
limit="$2"
if [[ "$2" =~ ^[0-9]+$ ]] && [[ "$2" -gt 0 && "$2" -le 10000 ]]; then
limit="$2"
else
log_error "Invalid limit value: $2 (must be a positive integer, max 10000)"
return 1
Comment on lines +621 to +625
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

critical

The added input validation for the --limit parameter is a critical security fix. By ensuring that the limit is a positive integer within a defined range (1-10000), the script effectively prevents SQL injection attacks that could exploit malformed or malicious input. This is a strong defense-in-depth measure for handling user-provided input.

References
  1. Employ a defense-in-depth strategy for handling user-provided input. Sanitize input at the entry point using a strict allowlist, and also apply context-specific escaping or safe handling mechanisms (e.g., parameterized queries for SQL, --arg for jq) at each point of use.
  2. To prevent SQL injection in shell scripts, sanitize input using a strict allowlist and also perform context-specific escaping (e.g., doubling single quotes for SQL strings) before interpolating the value into a query.
  3. When parsing delimited data from an untrusted source in a shell script, validate numeric fields before using them in calculations. This provides defense-in-depth against injection attacks that could result from delimiter shifting caused by malicious data.

fi
shift 2
;;
*)
Expand Down
Loading