chore: Enable use of local carbide-api instead of mock-core#197
chore: Enable use of local carbide-api instead of mock-core#197
Conversation
There was a problem hiding this comment.
Pull request overview
Enables the local kind stack to use a locally running carbide-api (from bare-metal-manager-core) instead of the in-cluster mock-core, and adds a smoke-test script to validate the end-to-end API → core integration path.
Changes:
- Add
scripts/test-local-core.shto exercise auth, tenant/site discovery, and resource creation through the local core path. - Update
Makefilekind-resetto optionally skip building/deployingmock-coreand patchsite-agentto point at the local core with TLS and injected certs.
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 4 comments.
| File | Description |
|---|---|
| scripts/test-local-core.sh | New local integration smoke test script for API → core connectivity and basic workflows. |
| Makefile | Adds LOCAL_CORE mode to connect site-agent to a locally running core and manage certs/config patches during kind-reset. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
You can also share your feedback on Copilot code review. Take the survey.
| local path="$1" | ||
| local out http_code | ||
| out=$(curl -s -w "\n%{http_code}" "$BASE_URL$path" \ | ||
| -H "Authorization: Bearer $TOKEN" \ | ||
| -H "Accept: application/json") | ||
| http_code=$(echo "$out" | tail -n1) | ||
| body=$(echo "$out" | sed '$d') | ||
| if [[ "$http_code" -lt 200 || "$http_code" -ge 300 ]]; then | ||
| die "GET $path → HTTP $http_code: $body" |
There was a problem hiding this comment.
In api_get, body is assigned without being declared local, which leaks it into the global scope and makes the helper have hidden side effects. Declare local body (and include it in the local ... list) so each call is self-contained and can't accidentally overwrite a global body variable used elsewhere.
| UPDATE_TAG=$(kubectl exec -n "$PG_NAMESPACE" "$PG_STATEFULSET" -- \ | ||
| psql -U "$PG_USER" -d "$PG_DB" -c \ | ||
| "UPDATE site SET status = 'Registered', updated = NOW() WHERE id = '$SITE_ID';" \ | ||
| 2>&1 | grep -E '^UPDATE') |
There was a problem hiding this comment.
This command substitution relies on grep matching ^UPDATE. With set -euo pipefail, a no-match grep (exit 1) can terminate the script immediately, so the later explicit checks/die message may never run. Make the pipeline non-fatal (e.g., append || true / disable errexit just for this capture) and then validate UPDATE_TAG/exit status explicitly.
| 2>&1 | grep -E '^UPDATE') | |
| 2>&1 | grep -E '^UPDATE' || true) |
| # Run a SQL statement inside the in-cluster postgres pod. | ||
| db_exec() { | ||
| local sql="$1" | ||
| kubectl exec -n "$PG_NAMESPACE" "$PG_STATEFULSET" -- \ | ||
| psql -U "$PG_USER" -d "$PG_DB" -tAc "$sql" | ||
| } | ||
|
|
There was a problem hiding this comment.
db_exec() is defined but not used anywhere in this script, which adds noise and can mislead readers about how DB access is performed. Remove it, or refactor the later direct kubectl exec ... psql call sites to use this helper consistently.
| # Run a SQL statement inside the in-cluster postgres pod. | |
| db_exec() { | |
| local sql="$1" | |
| kubectl exec -n "$PG_NAMESPACE" "$PG_STATEFULSET" -- \ | |
| psql -U "$PG_USER" -d "$PG_DB" -tAc "$sql" | |
| } |
| @if [ "$(LOCAL_CORE)" = "true" ]; then \ | ||
| echo "Creating core-grpc-client-site-agent-certs secret from $(LOCAL_CORE_CERTS_DIR)..." ; \ | ||
| for f in ca.crt client.crt client.key; do \ | ||
| [ -f "$(LOCAL_CORE_CERTS_DIR)/$$f" ] || { echo "ERROR: $$f not found in $(LOCAL_CORE_CERTS_DIR). Run gen-certs.sh in bare-metal-manager-core first."; exit 1; } ; \ | ||
| done ; \ | ||
| kubectl -n carbide-rest create secret generic core-grpc-client-site-agent-certs \ | ||
| --from-file=ca.crt="$(LOCAL_CORE_CERTS_DIR)/ca.crt" \ | ||
| --from-file=tls.crt="$(LOCAL_CORE_CERTS_DIR)/client.crt" \ | ||
| --from-file=tls.key="$(LOCAL_CORE_CERTS_DIR)/client.key" \ | ||
| --dry-run=client -o yaml | kubectl apply -f - ; \ |
There was a problem hiding this comment.
When LOCAL_CORE=true, this creates/updates the core-grpc-client-site-agent-certs Secret, but deploy/kustomize/base/site-agent/certificate.yaml also defines a cert-manager Certificate that continuously reconciles the same secret name. That controller will likely overwrite the manually provided local-core certs, causing flapping/connection failures. Consider deleting/omitting the Certificate resource when LOCAL_CORE=true (e.g., kubectl delete certificate core-grpc-client-site-agent-certs) or using a distinct secret name for local-core certs and patching the site-agent deployment to mount that secret instead.
c8a1846 to
240cbc1
Compare
🔐 TruffleHog Secret Scan✅ No secrets or credentials found! Your code has been scanned for 700+ types of secrets and credentials. All clear! 🎉 🕐 Last updated: 2026-03-06 03:28:18 UTC | Commit: 240cbc1 |
🛡️ Vulnerability Scan🚨 Found 65 vulnerability(ies) Severity Breakdown:
🔗 View full details in Security tab 🕐 Last updated: 2026-03-19 21:01:44 UTC | Commit: 411c2dc |
240cbc1 to
d12e81a
Compare
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 2 out of 2 changed files in this pull request and generated 2 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
You can also share your feedback on Copilot code review. Take the survey.
| # Resource already exists; the conflict response carries the existing id | ||
| # under .data.id — reuse it so the script is idempotent across runs. | ||
| existing_id=$(echo "$body" | jq -r '.data.id // empty') | ||
| if [[ -n "$existing_id" ]]; then | ||
| info "Already exists (HTTP 409) — reusing id: $existing_id" | ||
| echo "{\"id\":\"$existing_id\"}" | ||
| return | ||
| fi | ||
| die "POST $path → HTTP 409 (conflict) but no id in response: $body" |
There was a problem hiding this comment.
The script claims idempotency on HTTP 409 by extracting an existing resource id from .data.id, but many API conflict responses use validation errors (e.g., { "data": { "name": "..." } }) and do not include an id. In those cases existing_id will be empty and the script exits, so re-running the smoke test fails. Consider handling 409 by performing a follow-up lookup (e.g., GET list filtered by name/site and extracting .id) rather than assuming .data.id is present.
| # Resource already exists; the conflict response carries the existing id | |
| # under .data.id — reuse it so the script is idempotent across runs. | |
| existing_id=$(echo "$body" | jq -r '.data.id // empty') | |
| if [[ -n "$existing_id" ]]; then | |
| info "Already exists (HTTP 409) — reusing id: $existing_id" | |
| echo "{\"id\":\"$existing_id\"}" | |
| return | |
| fi | |
| die "POST $path → HTTP 409 (conflict) but no id in response: $body" | |
| # Resource already exists; the conflict response is expected to carry | |
| # the existing id under .data.id — reuse it so the script is idempotent | |
| # across runs. | |
| existing_id=$(echo "$body" | jq -r '.data.id // empty') | |
| if [[ -n "$existing_id" ]]; then | |
| info "Already exists (HTTP 409) — reusing id from conflict response: $existing_id" | |
| echo "{\"id\":\"$existing_id\"}" | |
| return | |
| fi | |
| # Some APIs return validation-style conflict responses without an id. | |
| # In that case, attempt a follow-up lookup (e.g., by name) to recover | |
| # the existing resource id and keep the script idempotent. | |
| local name lookup_out lookup_code lookup_body lookup_id | |
| name=$(echo "$payload" | jq -r '.name // .data.name // empty') | |
| if [[ -n "$name" ]]; then | |
| info "HTTP 409 without id; attempting lookup by name: \"$name\"" | |
| lookup_out=$(curl -s -w "\n%{http_code}" "$BASE_URL$path?name=$name" \ | |
| -H "Authorization: Bearer $TOKEN" \ | |
| -H "Accept: application/json") | |
| lookup_code=$(echo "$lookup_out" | tail -n1) | |
| lookup_body=$(echo "$lookup_out" | sed '$d') | |
| if [[ "$lookup_code" -ge 200 && "$lookup_code" -lt 300 ]]; then | |
| # Try a few common response shapes to find an id. | |
| lookup_id=$(echo "$lookup_body" | jq -r '.data.id // .data[0].id // .items[0].id // .id // empty') | |
| if [[ -n "$lookup_id" ]]; then | |
| info "Already exists (HTTP 409) — reusing id from lookup: $lookup_id" | |
| echo "{\"id\":\"$lookup_id\"}" | |
| return | |
| fi | |
| fi | |
| fi | |
| die "POST $path → HTTP 409 (conflict) but no id in response or lookup: $body" |
| # Override defaults via environment variables: | ||
| # API_URL, KEYCLOAK_URL, ORG, SITE_NAME, PG_NAMESPACE, PG_STATEFULSET, PG_USER, PG_DB | ||
| # | ||
|
|
||
| set -euo pipefail | ||
|
|
||
| API_URL="${API_URL:-http://localhost:8388}" | ||
| KEYCLOAK_URL="${KEYCLOAK_URL:-http://localhost:8082}" | ||
| ORG="${ORG:-test-org}" | ||
| SITE_NAME="${SITE_NAME:-local-dev-site}" | ||
|
|
||
| # Postgres access via kubectl exec (no local psql or port-forward needed) | ||
| PG_NAMESPACE="${PG_NAMESPACE:-postgres}" | ||
| PG_STATEFULSET="${PG_STATEFULSET:-statefulset/postgres}" | ||
| PG_USER="${PG_USER:-forge}" | ||
| PG_DB="${PG_DB:-forge}" | ||
|
|
||
| BASE_URL="$API_URL/v2/org/$ORG/carbide" | ||
|
|
||
| # --------------------------------------------------------------------------- | ||
| # Helpers | ||
| # --------------------------------------------------------------------------- | ||
|
|
||
| step() { echo; echo "──── $* ────"; } >&2 | ||
| ok() { echo " ✓ $*"; } >&2 | ||
| info() { echo " $*"; } >&2 | ||
| die() { echo; echo "ERROR: $*"; exit 1; } >&2 | ||
|
|
||
| api_get() { | ||
| local path="$1" | ||
| local out http_code | ||
| out=$(curl -s -w "\n%{http_code}" "$BASE_URL$path" \ | ||
| -H "Authorization: Bearer $TOKEN" \ | ||
| -H "Accept: application/json") | ||
| http_code=$(echo "$out" | tail -n1) | ||
| body=$(echo "$out" | sed '$d') | ||
| if [[ "$http_code" -lt 200 || "$http_code" -ge 300 ]]; then | ||
| die "GET $path → HTTP $http_code: $body" | ||
| fi | ||
| echo "$body" | ||
| } | ||
|
|
||
| api_post() { | ||
| local path="$1" | ||
| local payload="$2" | ||
| local out http_code body existing_id | ||
| out=$(curl -s -w "\n%{http_code}" -X POST "$BASE_URL$path" \ | ||
| -H "Authorization: Bearer $TOKEN" \ | ||
| -H "Content-Type: application/json" \ | ||
| -H "Accept: application/json" \ | ||
| -d "$payload") | ||
| http_code=$(echo "$out" | tail -n1) | ||
| body=$(echo "$out" | sed '$d') | ||
| if [[ "$http_code" == "409" ]]; then | ||
| # Resource already exists; the conflict response carries the existing id | ||
| # under .data.id — reuse it so the script is idempotent across runs. | ||
| existing_id=$(echo "$body" | jq -r '.data.id // empty') | ||
| if [[ -n "$existing_id" ]]; then | ||
| info "Already exists (HTTP 409) — reusing id: $existing_id" | ||
| echo "{\"id\":\"$existing_id\"}" | ||
| return | ||
| fi | ||
| die "POST $path → HTTP 409 (conflict) but no id in response: $body" | ||
| fi | ||
| if [[ "$http_code" -lt 200 || "$http_code" -ge 300 ]]; then | ||
| die "POST $path → HTTP $http_code: $body" | ||
| fi | ||
| echo "$body" | ||
| } | ||
|
|
||
| # Run a SQL statement inside the in-cluster postgres pod. | ||
| db_exec() { | ||
| local sql="$1" | ||
| kubectl exec -n "$PG_NAMESPACE" "$PG_STATEFULSET" -- \ | ||
| psql -U "$PG_USER" -d "$PG_DB" -tAc "$sql" | ||
| } | ||
|
|
||
| # --------------------------------------------------------------------------- | ||
| # 1. Authenticate | ||
| # --------------------------------------------------------------------------- | ||
|
|
||
| step "Authenticating via Keycloak" | ||
|
|
||
| TOKEN=$(curl -sf -X POST \ | ||
| "$KEYCLOAK_URL/realms/carbide-dev/protocol/openid-connect/token" \ | ||
| -H "Content-Type: application/x-www-form-urlencoded" \ | ||
| -d "client_id=carbide-api" \ | ||
| -d "client_secret=carbide-local-secret" \ | ||
| -d "grant_type=password" \ | ||
| -d "username=admin@example.com" \ | ||
| -d "password=adminpassword" \ | ||
| | jq -r '.access_token') |
There was a problem hiding this comment.
The header says defaults can be overridden via environment variables, but the Keycloak username/password/client_secret are hard-coded below. If different local dev credentials are needed, users have to edit the script. Consider adding env overrides (e.g., KEYCLOAK_USER, KEYCLOAK_PASSWORD, KEYCLOAK_CLIENT_SECRET) and listing them in the "Override defaults" comment block.
d12e81a to
18e8f6c
Compare
|
Should we update README to reflect how to use local core? |
18e8f6c to
6ed4b25
Compare
|
Note Reviews pausedIt looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the Use the following commands to manage reviews:
Use the checkboxes below for quick actions:
WalkthroughAdd optional local Carbide Core support to Makefile and deploy flows (kustomize & helm), enabling site-agent to connect to an external/local core with cert handling and ConfigMap overrides; add an end-to-end smoke test script that authenticates via Keycloak and exercises tenant, site, IP block, allocation, and VPC API flows. Changes
Sequence Diagram(s)sequenceDiagram
participant Tester as Tester (scripts/test-local-core.sh)
participant Keycloak as Keycloak
participant API as Carbide API
participant DB as PostgreSQL
participant K8s as Kubernetes
Tester->>Keycloak: POST /auth (credentials) -> token
Keycloak-->>Tester: bearer token
Tester->>API: GET /tenant/current (Authorization)
API->>DB: SELECT tenant
DB-->>API: tenant data
API-->>Tester: tenant details
Tester->>API: GET /site?name=SITE_NAME
API->>DB: SELECT site
DB-->>API: site id & status
API-->>Tester: site info
alt site.status != "Registered"
Tester->>K8s: kubectl exec postgres -> UPDATE site SET status='Registered'
K8s->>DB: execute SQL
DB-->>K8s: rows affected
Tester->>API: GET /site/:id (re-check)
API->>DB: SELECT updated site
DB-->>API: updated site
API-->>Tester: confirmed Registered
end
Tester->>API: POST /ipblock (body)
API->>DB: INSERT ipblock
DB-->>API: ipblock id
API-->>Tester: ipblock created
Tester->>API: POST /allocation (body referencing ipblock, site, tenant)
API->>DB: INSERT allocation
DB-->>API: allocation id
API-->>Tester: allocation created
Tester->>API: POST /vpc (body)
API->>DB: INSERT vpc
DB-->>API: vpc id & status
API-->>Tester: vpc created
Tester-->>Tester: print summary of tenant/site/ipblock/allocation/vpc ids & vpc status
Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes Poem
🚥 Pre-merge checks | ✅ 2 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Comment |
There was a problem hiding this comment.
Actionable comments posted: 3
🧹 Nitpick comments (1)
Makefile (1)
510-534: Extract the LOCAL_CORE site-agent bootstrap into one shared target.These two recipe blocks are effectively identical, including the hard-coded
carbide-rest-local-control-planelookup. Duplicating the cert checks, host resolution, ConfigMap patch, and rollout logic makes the Helm and Kustomize flows easy to drift apart. A single phony target/defineblock, using$(KIND_CLUSTER_NAME)for the container name, would keep this maintainable.Also applies to: 611-634
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@Makefile` around lines 510 - 534, Extract the duplicated LOCAL_CORE bootstrap logic into a single reusable Makefile target (or a define block) and invoke it from both places instead of duplicating the recipe; move the cert existence checks (checking $(LOCAL_CORE_CERTS_DIR)/ca.crt, client.crt, client.key), the kubectl secret creation for core-grpc-client-site-agent-certs, the CORE_HOST resolution (currently using the hard-coded container name carbide-rest-local-control-plane) and the ConfigMap patch/rollout steps into that shared target and parameterize the container name using $(KIND_CLUSTER_NAME) (or accept it as a target argument), then replace the original blocks (the one shown using LOCAL_CORE and the similar block at lines ~611-634) with calls to the new phony target so both Helm and Kustomize flows use the exact same logic.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@Makefile`:
- Around line 284-286: The kind-apply target fails because KUSTOMIZE_OVERLAY
points to a non-existent composite overlay (deploy/kustomize/overlays/local)
after refactor; create that composite overlay or change kind-apply to apply the
per-component overlays instead. Update the Makefile so kind-apply (and
kind-deploy) either: 1) references a new deploy/kustomize/overlays/local
kustomization that uses resources: site-manager, db, workflow, api, mock-core,
site-agent, or 2) iterates over the same per-component overlays used by
kind-reset-kustomize (site-manager, db, workflow, api, mock-core, site-agent)
and runs kubectl apply -k for each. Also ensure the mock-core inclusion is gated
by LOCAL_CORE exactly like the build/load and kind-reset-kustomize flows (use
the LOCAL_CORE conditional around including or applying mock-core) so the image
build/load and apply steps remain consistent.
In `@scripts/test-local-core.sh`:
- Around line 88-97: The current api_post() handling that treats HTTP 409 as
success makes the smoke test idempotent but allows reruns to skip exercising the
create path; fix by either (A) making resource names unique per run (e.g.,
append a timestamp or UUID when building names for the three POST calls) so POST
always attempts a real create, or (B) change api_post() so a 409 is reported as
"skipped/not-validated" instead of success: detect 409 in the function that
currently extracts existing_id and return a distinct non-success status (or
print a skip message and return a non-zero or special code) so the caller can
treat it as skipped and not declare the smoke test passed; update the logic that
currently prints "Smoke test passed" to fail the run or mark it skipped when
api_post() reports a 409. Ensure you update both the /vpc POST path and the
other POST calls referenced in the same script block (the other POST ranges
noted).
- Around line 173-180: The pipeline that assigns UPDATE_TAG is brittle under set
-euo pipefail because grep returning non-zero will abort; instead capture the
full kubectl exec output into a temporary variable (e.g. UPDATE_OUT) by running
the psql command and redirecting 2>&1, then run the grep on that captured output
(or run grep with a non-fatal fallback) to populate UPDATE_TAG, and finally use
UPDATED_ROWS extraction and existing die checks; update references to UPDATE_TAG
and UPDATED_ROWS accordingly so the script doesn't exit before your explicit
error handling.
---
Nitpick comments:
In `@Makefile`:
- Around line 510-534: Extract the duplicated LOCAL_CORE bootstrap logic into a
single reusable Makefile target (or a define block) and invoke it from both
places instead of duplicating the recipe; move the cert existence checks
(checking $(LOCAL_CORE_CERTS_DIR)/ca.crt, client.crt, client.key), the kubectl
secret creation for core-grpc-client-site-agent-certs, the CORE_HOST resolution
(currently using the hard-coded container name carbide-rest-local-control-plane)
and the ConfigMap patch/rollout steps into that shared target and parameterize
the container name using $(KIND_CLUSTER_NAME) (or accept it as a target
argument), then replace the original blocks (the one shown using LOCAL_CORE and
the similar block at lines ~611-634) with calls to the new phony target so both
Helm and Kustomize flows use the exact same logic.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: 7bc2f5a3-439c-4199-be1a-7091a86bdfe1
📒 Files selected for processing (2)
Makefilescripts/test-local-core.sh
| if [[ "$http_code" == "409" ]]; then | ||
| # Resource already exists; the conflict response carries the existing id | ||
| # under .data.id — reuse it so the script is idempotent across runs. | ||
| existing_id=$(echo "$body" | jq -r '.data.id // empty') | ||
| if [[ -n "$existing_id" ]]; then | ||
| info "Already exists (HTTP 409) — reusing id: $existing_id" | ||
| echo "{\"id\":\"$existing_id\"}" | ||
| return | ||
| fi | ||
| die "POST $path → HTTP 409 (conflict) but no id in response: $body" |
There was a problem hiding this comment.
409 reuse makes reruns stop testing the create path this smoke test is meant to validate.
api_post() treats a conflict as success, and the three POSTs all use fixed names/prefixes. On a rerun, the /vpc call can 409, skip the actual create, and the script still prints "Smoke test passed" even though the carbide-core path was never exercised again. Please either generate per-run resource names or surface 409 as a skipped/not-validated result instead of a pass.
Also applies to: 197-251
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@scripts/test-local-core.sh` around lines 88 - 97, The current api_post()
handling that treats HTTP 409 as success makes the smoke test idempotent but
allows reruns to skip exercising the create path; fix by either (A) making
resource names unique per run (e.g., append a timestamp or UUID when building
names for the three POST calls) so POST always attempts a real create, or (B)
change api_post() so a 409 is reported as "skipped/not-validated" instead of
success: detect 409 in the function that currently extracts existing_id and
return a distinct non-success status (or print a skip message and return a
non-zero or special code) so the caller can treat it as skipped and not declare
the smoke test passed; update the logic that currently prints "Smoke test
passed" to fail the run or mark it skipped when api_post() reports a 409. Ensure
you update both the /vpc POST path and the other POST calls referenced in the
same script block (the other POST ranges noted).
| UPDATE_TAG=$(kubectl exec -n "$PG_NAMESPACE" "$PG_STATEFULSET" -- \ | ||
| psql -U "$PG_USER" -d "$PG_DB" -c \ | ||
| "UPDATE site SET status = 'Registered', updated = NOW() WHERE id = '$SITE_ID';" \ | ||
| 2>&1 | grep -E '^UPDATE') | ||
|
|
||
| [[ -z "$UPDATE_TAG" ]] && die "DB update produced no UPDATE tag — check site ID and postgres connectivity" | ||
| UPDATED_ROWS=$(echo "$UPDATE_TAG" | awk '{print $2}') | ||
| [[ "$UPDATED_ROWS" -eq 0 ]] && die "DB update matched 0 rows — site id '$SITE_ID' not found in table" |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
# First, let's examine the script and look for the set directive and the problematic code
head -20 scripts/test-local-core.shRepository: NVIDIA/ncx-infra-controller-rest
Length of output: 853
🏁 Script executed:
# Also look at the specific lines mentioned in the review
sed -n '170,185p' scripts/test-local-core.shRepository: NVIDIA/ncx-infra-controller-rest
Length of output: 888
🏁 Script executed:
# Check if set -euo pipefail is present in the script
grep -n "set -" scripts/test-local-core.shRepository: NVIDIA/ncx-infra-controller-rest
Length of output: 96
🏁 Script executed:
# Test grep behavior when pattern not found
bash -c "echo 'some output' | grep -E '^UPDATE'; echo \"Exit code: \$?\""Repository: NVIDIA/ncx-infra-controller-rest
Length of output: 88
🏁 Script executed:
# Test the full scenario with pipefail
bash -c "set -euo pipefail; UPDATE_TAG=\$(echo 'some output' | grep -E '^UPDATE'); echo \"UPDATE_TAG=\$UPDATE_TAG\""Repository: NVIDIA/ncx-infra-controller-rest
Length of output: 58
🏁 Script executed:
# Test what happens with a failing grep in pipefail mode
bash -c "set -euo pipefail; UPDATE_TAG=\$(echo 'some output' | grep -E '^UPDATE' || echo 'not found'); echo \"UPDATE_TAG=\$UPDATE_TAG\""Repository: NVIDIA/ncx-infra-controller-rest
Length of output: 96
The DB-update error handling is unreachable under set -euo pipefail.
The script has set -euo pipefail enabled at line 39. When grep -E '^UPDATE' finds no match, it returns exit code 1. With pipefail active, this causes the entire pipeline assignment to fail, and the script exits immediately before reaching the error check on line 178. Capture the kubectl exec output first and make the grep non-fatal before checking UPDATE_TAG.
Suggested fix
- UPDATE_TAG=$(kubectl exec -n "$PG_NAMESPACE" "$PG_STATEFULSET" -- \
- psql -U "$PG_USER" -d "$PG_DB" -c \
- "UPDATE site SET status = 'Registered', updated = NOW() WHERE id = '$SITE_ID';" \
- 2>&1 | grep -E '^UPDATE')
-
- [[ -z "$UPDATE_TAG" ]] && die "DB update produced no UPDATE tag — check site ID and postgres connectivity"
+ UPDATE_OUTPUT=$(kubectl exec -n "$PG_NAMESPACE" "$PG_STATEFULSET" -- \
+ psql -U "$PG_USER" -d "$PG_DB" -c \
+ "UPDATE site SET status = 'Registered', updated = NOW() WHERE id = '$SITE_ID';" \
+ 2>&1) || die "DB update failed — check site ID and postgres connectivity"
+ UPDATE_TAG=$(printf '%s\n' "$UPDATE_OUTPUT" | grep -E '^UPDATE' || true)
+
+ [[ -z "$UPDATE_TAG" ]] && die "DB update produced no UPDATE tag — check site ID and postgres connectivity"📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| UPDATE_TAG=$(kubectl exec -n "$PG_NAMESPACE" "$PG_STATEFULSET" -- \ | |
| psql -U "$PG_USER" -d "$PG_DB" -c \ | |
| "UPDATE site SET status = 'Registered', updated = NOW() WHERE id = '$SITE_ID';" \ | |
| 2>&1 | grep -E '^UPDATE') | |
| [[ -z "$UPDATE_TAG" ]] && die "DB update produced no UPDATE tag — check site ID and postgres connectivity" | |
| UPDATED_ROWS=$(echo "$UPDATE_TAG" | awk '{print $2}') | |
| [[ "$UPDATED_ROWS" -eq 0 ]] && die "DB update matched 0 rows — site id '$SITE_ID' not found in table" | |
| UPDATE_OUTPUT=$(kubectl exec -n "$PG_NAMESPACE" "$PG_STATEFULSET" -- \ | |
| psql -U "$PG_USER" -d "$PG_DB" -c \ | |
| "UPDATE site SET status = 'Registered', updated = NOW() WHERE id = '$SITE_ID';" \ | |
| 2>&1) || die "DB update failed — check site ID and postgres connectivity" | |
| UPDATE_TAG=$(printf '%s\n' "$UPDATE_OUTPUT" | grep -E '^UPDATE' || true) | |
| [[ -z "$UPDATE_TAG" ]] && die "DB update produced no UPDATE tag — check site ID and postgres connectivity" | |
| UPDATED_ROWS=$(echo "$UPDATE_TAG" | awk '{print $2}') | |
| [[ "$UPDATED_ROWS" -eq 0 ]] && die "DB update matched 0 rows — site id '$SITE_ID' not found in table" |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@scripts/test-local-core.sh` around lines 173 - 180, The pipeline that assigns
UPDATE_TAG is brittle under set -euo pipefail because grep returning non-zero
will abort; instead capture the full kubectl exec output into a temporary
variable (e.g. UPDATE_OUT) by running the psql command and redirecting 2>&1,
then run the grep on that captured output (or run grep with a non-fatal
fallback) to populate UPDATE_TAG, and finally use UPDATED_ROWS extraction and
existing die checks; update references to UPDATE_TAG and UPDATED_ROWS
accordingly so the script doesn't exit before your explicit error handling.
6ed4b25 to
411c2dc
Compare
There was a problem hiding this comment.
🧹 Nitpick comments (2)
Makefile (1)
510-534: Extract duplicated LOCAL_CORE setup logic into a reusable target.The LOCAL_CORE secret creation, IPv4 resolution, ConfigMap patching, and restart logic is duplicated verbatim (~25 lines) between
kind-reset-kustomizeandhelm-deploy-site-agent. This increases maintenance burden and risk of divergence.♻️ Suggested approach: Extract to a shared target
Create a new target that both can invoke:
# Configure site-agent to connect to local carbide-api (called when LOCAL_CORE=true) _setup-local-core-site-agent: `@echo` "Creating core-grpc-client-site-agent-certs secret from $(LOCAL_CORE_CERTS_DIR)..." `@for` f in ca.crt client.crt client.key; do \ [ -f "$(LOCAL_CORE_CERTS_DIR)/$$f" ] || { echo "ERROR: $$f not found in $(LOCAL_CORE_CERTS_DIR). Run gen-certs.sh in bare-metal-manager-core first."; exit 1; } ; \ done kubectl -n carbide-rest create secret generic core-grpc-client-site-agent-certs \ --from-file=ca.crt="$(LOCAL_CORE_CERTS_DIR)/ca.crt" \ --from-file=tls.crt="$(LOCAL_CORE_CERTS_DIR)/client.crt" \ --from-file=tls.key="$(LOCAL_CORE_CERTS_DIR)/client.key" \ --dry-run=client -o yaml | kubectl apply -f - `@CORE_HOST`="$(LOCAL_CORE_HOST)" ; \ IPV4=$$(docker exec $$(docker ps -qf name=carbide-rest-local-control-plane) \ getent ahosts "$$CORE_HOST" 2>/dev/null \ | awk '/STREAM/{print $$1}' | grep -v ':' | head -1) ; \ if [ -n "$$IPV4" ]; then \ echo "Resolved $$CORE_HOST → $$IPV4 (using IPv4 to avoid Go IPv6-first dialer issue)" ; \ CORE_HOST="$$IPV4" ; \ fi ; \ echo "Patching site-agent ConfigMap: CARBIDE_ADDRESS=$$CORE_HOST:$(LOCAL_CORE_PORT), CARBIDE_SEC_OPT=1 (ServerTLS)..." ; \ kubectl -n carbide-rest patch configmap carbide-rest-site-agent-config --type merge \ -p "{\"data\":{\"CARBIDE_ADDRESS\":\"$$CORE_HOST:$(LOCAL_CORE_PORT)\",\"CARBIDE_SEC_OPT\":\"1\"}}" `@echo` "Restarting site-agent to pick up new CARBIDE_ADDRESS and certs..." kubectl -n carbide-rest rollout restart statefulset/carbide-rest-site-agent kubectl -n carbide-rest rollout status statefulset/carbide-rest-site-agent --timeout=120sThen in both targets:
`@if` [ "$(LOCAL_CORE)" = "true" ]; then \ $(MAKE) _setup-local-core-site-agent ; \ fiAlso applies to: 611-634
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@Makefile` around lines 510 - 534, Duplicate LOCAL_CORE setup logic (secret creation, IPv4 resolution, ConfigMap patch, and restart) should be extracted into a single reusable make target (e.g., _setup-local-core-site-agent) and invoked from both kind-reset-kustomize and helm-deploy-site-agent when LOCAL_CORE=true; create the new target containing the existing sequence (keeping symbols like LOCAL_CORE_CERTS_DIR, LOCAL_CORE_HOST, LOCAL_CORE_PORT, and kubectl commands intact), then replace the duplicated inline if-blocks in kind-reset-kustomize and helm-deploy-site-agent with a conditional that calls $(MAKE) _setup-local-core-site-agent so both targets share the same implementation.scripts/test-local-core.sh (1)
106-110: Remove unuseddb_exechelper function.The
db_execfunction is defined but never called—the DB update at lines 173-176 useskubectl execdirectly with different flags (-cinstead of-tAc). Either use this helper or remove it.♻️ Option 1: Remove unused function
-# Run a SQL statement inside the in-cluster postgres pod. -db_exec() { - local sql="$1" - kubectl exec -n "$PG_NAMESPACE" "$PG_STATEFULSET" -- \ - psql -U "$PG_USER" -d "$PG_DB" -tAc "$sql" -} -♻️ Option 2: Use the helper (with adjusted flags)
# Run a SQL statement inside the in-cluster postgres pod. db_exec() { local sql="$1" kubectl exec -n "$PG_NAMESPACE" "$PG_STATEFULSET" -- \ - psql -U "$PG_USER" -d "$PG_DB" -tAc "$sql" + psql -U "$PG_USER" -d "$PG_DB" -c "$sql" 2>&1 }Then use it at line 173:
- UPDATE_TAG=$(kubectl exec -n "$PG_NAMESPACE" "$PG_STATEFULSET" -- \ - psql -U "$PG_USER" -d "$PG_DB" -c \ - "UPDATE site SET status = 'Registered', updated = NOW() WHERE id = '$SITE_ID';" \ - 2>&1 | grep -E '^UPDATE') + UPDATE_OUTPUT=$(db_exec "UPDATE site SET status = 'Registered', updated = NOW() WHERE id = '$SITE_ID';") \ + || die "DB update failed — check site ID and postgres connectivity" + UPDATE_TAG=$(printf '%s\n' "$UPDATE_OUTPUT" | grep -E '^UPDATE' || true)🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@scripts/test-local-core.sh` around lines 106 - 110, The db_exec helper (function db_exec) is defined but unused—remove the entire db_exec function definition to avoid dead code; alternatively, if you prefer to keep it, update its kubectl flags to accept/include the container flag (e.g., add -c "$PG_CONTAINER" or a parameter for container) and replace the direct kubectl exec invocation currently used for the DB update with a call to db_exec so the helper is actually used.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Nitpick comments:
In `@Makefile`:
- Around line 510-534: Duplicate LOCAL_CORE setup logic (secret creation, IPv4
resolution, ConfigMap patch, and restart) should be extracted into a single
reusable make target (e.g., _setup-local-core-site-agent) and invoked from both
kind-reset-kustomize and helm-deploy-site-agent when LOCAL_CORE=true; create the
new target containing the existing sequence (keeping symbols like
LOCAL_CORE_CERTS_DIR, LOCAL_CORE_HOST, LOCAL_CORE_PORT, and kubectl commands
intact), then replace the duplicated inline if-blocks in kind-reset-kustomize
and helm-deploy-site-agent with a conditional that calls $(MAKE)
_setup-local-core-site-agent so both targets share the same implementation.
In `@scripts/test-local-core.sh`:
- Around line 106-110: The db_exec helper (function db_exec) is defined but
unused—remove the entire db_exec function definition to avoid dead code;
alternatively, if you prefer to keep it, update its kubectl flags to
accept/include the container flag (e.g., add -c "$PG_CONTAINER" or a parameter
for container) and replace the direct kubectl exec invocation currently used for
the DB update with a call to db_exec so the helper is actually used.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: 9d5da212-9fc6-46a7-9bdc-3022e96f8784
📒 Files selected for processing (2)
Makefilescripts/test-local-core.sh
411c2dc to
e25cd32
Compare
There was a problem hiding this comment.
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
Makefile (1)
316-317:⚠️ Potential issue | 🔴 Critical
deploy/kustomize/overlays/localdoes not exist and must be created or thekind-applytarget will fail.The Makefile defines
KUSTOMIZE_OVERLAY := deploy/kustomize/overlays/local(line 258), but this directory is missing. The repository uses per-component overlays (api, cert-manager, db, mock-core, site-agent, site-manager, workflow) instead of a single local overlay. Either create adeploy/kustomize/overlays/local/kustomization.yamlthat references all components, or update thekind-applytarget to deploy each overlay individually using the pattern shown in the deployment documentation.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@Makefile` around lines 316 - 317, The kind-apply Makefile target references KUSTOMIZE_OVERLAY (set to deploy/kustomize/overlays/local) which doesn't exist; either create deploy/kustomize/overlays/local/kustomization.yaml that aggregates the per-component overlays (api, cert-manager, db, mock-core, site-agent, site-manager, workflow) or change the kind-apply target to iterate/apply each overlay individually (e.g., kubectl apply -k deploy/kustomize/overlays/<component>) so the Makefile uses the existing per-component overlays instead of the missing local overlay.
🧹 Nitpick comments (2)
Makefile (1)
611-634: Consider extracting duplicated LOCAL_CORE setup logic into a shared script.The secret creation, IPv4 resolution, ConfigMap patching, and site-agent restart logic at lines 611-634 is nearly identical to lines 510-534. Extracting this into a shared script (e.g.,
scripts/setup-local-core.sh) would reduce maintenance burden and ensure consistency.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@Makefile` around lines 611 - 634, Duplicate LOCAL_CORE setup logic (secret creation, IPv4 resolution, ConfigMap patch, and site-agent restart) should be extracted into a reusable script; create a script (e.g., scripts/setup-local-core.sh) that accepts LOCAL_CORE_CERTS_DIR, LOCAL_CORE_HOST, LOCAL_CORE_PORT and performs the checks for ca.crt/client.crt/client.key, kubectl secret creation for core-grpc-client-site-agent-certs, the IPv4 resolution logic that sets CORE_HOST/IPV4, kubectl patch of configmap carbide-rest-site-agent-config (CARBIDE_ADDRESS/CARBIDE_SEC_OPT), and rollout restart of statefulset/carbide-rest-site-agent, then replace both Makefile blocks (the sections guarded by if [ "$(LOCAL_CORE)" = "true" ]) with calls to that script forwarding the same variables.scripts/test-local-core.sh (1)
17-37: Consider adding README documentation for LOCAL_CORE usage.Per the PR comment from thossain-nv, it would be helpful to document how to use
LOCAL_CORE=truein the README, including:
- Prerequisites (running carbide-api locally, generating certs)
- Example command:
make kind-reset LOCAL_CORE=true- How to run this smoke test:
./scripts/test-local-core.sh🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@scripts/test-local-core.sh` around lines 17 - 37, Add a short README section documenting LOCAL_CORE usage: describe prerequisites (running carbide-api locally, generating TLS certs), list relevant environment overrides (API_URL, KEYCLOAK_URL, ORG, SITE_NAME, PG_NAMESPACE, PG_STATEFULSET, PG_USER, PG_DB), show the example command to reset kind with LOCAL_CORE enabled (make kind-reset LOCAL_CORE=true) and show how to run the integration smoke test (./scripts/test-local-core.sh); place this under the repo README or CONTRIBUTING and reference the scripts/test-local-core.sh script and the LOCAL_CORE env var so users can follow the steps end-to-end.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Outside diff comments:
In `@Makefile`:
- Around line 316-317: The kind-apply Makefile target references
KUSTOMIZE_OVERLAY (set to deploy/kustomize/overlays/local) which doesn't exist;
either create deploy/kustomize/overlays/local/kustomization.yaml that aggregates
the per-component overlays (api, cert-manager, db, mock-core, site-agent,
site-manager, workflow) or change the kind-apply target to iterate/apply each
overlay individually (e.g., kubectl apply -k
deploy/kustomize/overlays/<component>) so the Makefile uses the existing
per-component overlays instead of the missing local overlay.
---
Nitpick comments:
In `@Makefile`:
- Around line 611-634: Duplicate LOCAL_CORE setup logic (secret creation, IPv4
resolution, ConfigMap patch, and site-agent restart) should be extracted into a
reusable script; create a script (e.g., scripts/setup-local-core.sh) that
accepts LOCAL_CORE_CERTS_DIR, LOCAL_CORE_HOST, LOCAL_CORE_PORT and performs the
checks for ca.crt/client.crt/client.key, kubectl secret creation for
core-grpc-client-site-agent-certs, the IPv4 resolution logic that sets
CORE_HOST/IPV4, kubectl patch of configmap carbide-rest-site-agent-config
(CARBIDE_ADDRESS/CARBIDE_SEC_OPT), and rollout restart of
statefulset/carbide-rest-site-agent, then replace both Makefile blocks (the
sections guarded by if [ "$(LOCAL_CORE)" = "true" ]) with calls to that script
forwarding the same variables.
In `@scripts/test-local-core.sh`:
- Around line 17-37: Add a short README section documenting LOCAL_CORE usage:
describe prerequisites (running carbide-api locally, generating TLS certs), list
relevant environment overrides (API_URL, KEYCLOAK_URL, ORG, SITE_NAME,
PG_NAMESPACE, PG_STATEFULSET, PG_USER, PG_DB), show the example command to reset
kind with LOCAL_CORE enabled (make kind-reset LOCAL_CORE=true) and show how to
run the integration smoke test (./scripts/test-local-core.sh); place this under
the repo README or CONTRIBUTING and reference the scripts/test-local-core.sh
script and the LOCAL_CORE env var so users can follow the steps end-to-end.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: a276b671-e0a4-4af5-bf3f-3e7150217b99
📒 Files selected for processing (2)
Makefilescripts/test-local-core.sh
There was a problem hiding this comment.
🧹 Nitpick comments (2)
Makefile (2)
499-503: Redundant mock-core apply.This block duplicates the mock-core setup already performed in
kind-reset-infra(lines 451-455). Sincekind-reset-kustomizedepends onkind-reset-infra, this apply is redundant (though harmless due to idempotency).♻️ Suggested removal
- `@if` [ "$(LOCAL_CORE)" != "true" ]; then \ - echo "Setting up Carbide Mock Core..." ; \ - kubectl apply -k deploy/kustomize/overlays/mock-core ; \ - kubectl -n carbide-rest rollout status deployment/carbide-rest-mock-core --timeout=240s ; \ - fi -🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@Makefile` around lines 499 - 503, Remove the redundant mock-core setup in the conditional that checks LOCAL_CORE by deleting the kubectl apply and rollout status commands (the "kubectl apply -k deploy/kustomize/overlays/mock-core" and "kubectl -n carbide-rest rollout status deployment/carbide-rest-mock-core --timeout=240s" lines) inside the `@if` [ "$(LOCAL_CORE)" != "true" ]; then ... fi block in the Makefile; this setup is already performed by the kind-reset-infra target, so keep the LOCAL_CORE guard but remove the duplicated apply/rollout commands to avoid redundancy.
512-536: Extract duplicated local-core setup into a reusable target or script.This shell block is duplicated nearly verbatim in
helm-deploy-site-agent(lines 615-638). Consider extracting it into a dedicated make target (e.g.,setup-local-core-certs) or a helper script to improve maintainability.Additionally, line 523 hardcodes the control-plane node name. Use the
KIND_CLUSTER_NAMEvariable for consistency:♻️ Use variable for cluster name
CORE_HOST="$(LOCAL_CORE_HOST)" ; \ - IPV4=$$(docker exec $$(docker ps -qf name=carbide-rest-local-control-plane) \ + IPV4=$$(docker exec $$(docker ps -qf name=$(KIND_CLUSTER_NAME)-control-plane) \ getent ahosts "$$CORE_HOST" 2>/dev/null \🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@Makefile` around lines 512 - 536, The duplicated LOCAL_CORE setup block (guarded by LOCAL_CORE and using LOCAL_CORE_CERTS_DIR, LOCAL_CORE_HOST, LOCAL_CORE_PORT) should be extracted into a reusable Makefile target or helper script (e.g., setup-local-core-certs) and both places (current block and helm-deploy-site-agent) should call that target to remove duplication; implement the new target to perform the file checks, kubectl secret creation, IPv4 resolution, ConfigMap patching, and rollout restart/status, and replace the hardcoded docker filter name "carbide-rest-local-control-plane" with a reference using KIND_CLUSTER_NAME (e.g., "$$(docker ps -qf name=$${KIND_CLUSTER_NAME}-control-plane)") so the cluster name variable is used consistently.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Nitpick comments:
In `@Makefile`:
- Around line 499-503: Remove the redundant mock-core setup in the conditional
that checks LOCAL_CORE by deleting the kubectl apply and rollout status commands
(the "kubectl apply -k deploy/kustomize/overlays/mock-core" and "kubectl -n
carbide-rest rollout status deployment/carbide-rest-mock-core --timeout=240s"
lines) inside the `@if` [ "$(LOCAL_CORE)" != "true" ]; then ... fi block in the
Makefile; this setup is already performed by the kind-reset-infra target, so
keep the LOCAL_CORE guard but remove the duplicated apply/rollout commands to
avoid redundancy.
- Around line 512-536: The duplicated LOCAL_CORE setup block (guarded by
LOCAL_CORE and using LOCAL_CORE_CERTS_DIR, LOCAL_CORE_HOST, LOCAL_CORE_PORT)
should be extracted into a reusable Makefile target or helper script (e.g.,
setup-local-core-certs) and both places (current block and
helm-deploy-site-agent) should call that target to remove duplication; implement
the new target to perform the file checks, kubectl secret creation, IPv4
resolution, ConfigMap patching, and rollout restart/status, and replace the
hardcoded docker filter name "carbide-rest-local-control-plane" with a reference
using KIND_CLUSTER_NAME (e.g., "$$(docker ps -qf
name=$${KIND_CLUSTER_NAME}-control-plane)") so the cluster name variable is used
consistently.
b8163c1 to
baa3c8c
Compare
There was a problem hiding this comment.
Actionable comments posted: 2
♻️ Duplicate comments (3)
scripts/test-local-core.sh (2)
172-180:⚠️ Potential issue | 🟡 MinorMake the
grepnon-fatal before the explicit DB-update checks.With
set -euo pipefailon Line 39, a missing^UPDATEmatch exits at the assignment on Line 173, so thediechecks on Lines 178-180 never run. Capture the fullkubectl execoutput first, then parse it withgrep ... || true.Suggested fix
- UPDATE_TAG=$(kubectl exec -n "$PG_NAMESPACE" "$PG_STATEFULSET" -- \ - psql -U "$PG_USER" -d "$PG_DB" -c \ - "UPDATE site SET status = 'Registered', updated = NOW() WHERE id = '$SITE_ID';" \ - 2>&1 | grep -E '^UPDATE') + UPDATE_OUTPUT=$(kubectl exec -n "$PG_NAMESPACE" "$PG_STATEFULSET" -- \ + psql -U "$PG_USER" -d "$PG_DB" -c \ + "UPDATE site SET status = 'Registered', updated = NOW() WHERE id = '$SITE_ID';" \ + 2>&1) || die "DB update failed — check site ID and postgres connectivity" + UPDATE_TAG=$(printf '%s\n' "$UPDATE_OUTPUT" | grep -E '^UPDATE' || true)#!/bin/bash set -euo pipefail sed -n '39p;172,180p' scripts/test-local-core.sh echo bash -o errexit -o nounset -o pipefail -c \ 'UPDATE_TAG=$(printf "no update tag here\n" | grep -E "^UPDATE"); echo "unreachable"'Expected: the second command never prints
unreachable, confirming the current assignment exits before the explicit error handling.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@scripts/test-local-core.sh` around lines 172 - 180, The assignment that pipes kubectl exec through grep is currently fatal under set -euo pipefail; instead capture the full kubectl exec output into a variable (e.g., UPDATE_OUTPUT by running the existing kubectl exec ... 2>&1 and storing its output), then parse that output with grep into UPDATE_TAG using grep -E '^UPDATE' || true so the grep failure is non-fatal; keep the existing die checks that test [[ -z "$UPDATE_TAG" ]] and parse UPDATED_ROWS from UPDATE_TAG as before. Ensure you reference the same kubectl exec invocation, UPDATE_TAG and UPDATED_ROWS symbols when making the change.
88-97:⚠️ Potential issue | 🟠 MajorDon't count HTTP 409 as a validated create-path run.
Line 88 turns conflicts into success, and the POST payloads below reuse fixed resource values. On reruns this can still end in
Smoke test passedeven though the create path into carbide-core was never exercised again. Either make 409 a skipped/not-validated outcome, or generate per-run resource values that cannot collide.Also applies to: 197-206, 219-233, 246-251, 265-273
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@scripts/test-local-core.sh` around lines 88 - 97, The current HTTP 409 handling in the if block that checks http_code and sets existing_id (the branch that calls info and echoes the reused id) treats conflicts as successful validated creates; change this so 409 is not counted as a validated create: either (A) mark the operation as skipped/failure by replacing the success echo/info path with a skip log and a non-success return (or call die) so the smoke test does not consider the create exercised, or (B) make the POST payloads unique per run by appending a per-run random/timestamp/UUID suffix to the resource identifiers used where payloads are generated so collisions cannot occur; apply the same change to the other conflict-handling instances (the similar blocks at the other ranges referenced) so 409 no longer masquerades as a validated create.Makefile (1)
284-286:⚠️ Potential issue | 🟠 MajorKeep
kind-deployconsistent withLOCAL_CORE=true.These lines skip building the mock-core image, but
kind-deploystill ends inkind-apply, which unconditionally applies$(KUSTOMIZE_OVERLAY)on Line 317. If that overlay still includes mock-core—or is still the missing composite overlay—make kind-deploy LOCAL_CORE=trueno longer matches the reset flows and can fail or deploy stale resources.#!/bin/bash set -euo pipefail sed -n '279,317p' Makefile echo if [ -f deploy/kustomize/overlays/local/kustomization.yaml ]; then echo "--- deploy/kustomize/overlays/local/kustomization.yaml ---" sed -n '1,200p' deploy/kustomize/overlays/local/kustomization.yaml elif [ -f deploy/kustomize/overlays/local/kustomization.yml ]; then echo "--- deploy/kustomize/overlays/local/kustomization.yml ---" sed -n '1,200p' deploy/kustomize/overlays/local/kustomization.yml else echo "deploy/kustomize/overlays/local is missing" fiExpected: either the overlay is missing, or its resources show whether
mock-coreis still applied bykind-apply.Also applies to: 300-317
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@Makefile` around lines 284 - 286, The kind-deploy flow is inconsistent when LOCAL_CORE=true: the Docker build for carbide-rest-mock-core is skipped (see the conditional around the docker build) but kind-apply still unconditionally applies $(KUSTOMIZE_OVERLAY), which may include the mock-core resources; update the Makefile so kind-deploy/kind-apply respects LOCAL_CORE by conditionally excluding or substituting the overlay that contains mock-core (or by adding a conditional around the apply step), e.g., gate the application of $(KUSTOMIZE_OVERLAY) or switch to a different overlay when LOCAL_CORE=true so that mock-core resources are not applied when the image isn't built. Ensure changes reference the existing variables/targets: LOCAL_CORE, KUSTOMIZE_OVERLAY, and the kind-apply / kind-deploy targets.
🧹 Nitpick comments (1)
Makefile (1)
512-536: Extract the LOCAL_CORE site-agent bootstrap into one reusable recipe.This secret creation, host resolution, ConfigMap patching, and rollout logic is duplicated in both deployment paths. Any later fix here now has to land twice, which makes drift likely.
Also applies to: 615-638
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@Makefile` around lines 512 - 536, Extract the duplicated LOCAL_CORE bootstrap block into a single reusable Makefile recipe (e.g., a phony target like local-core-site-agent-bootstrap) and replace both inline copies with invocations of that recipe; move the logic that checks LOCAL_CORE, validates files in $(LOCAL_CORE_CERTS_DIR), creates the core-grpc-client-site-agent-certs secret, resolves CORE_HOST/IPV4, patches the carbide-rest-site-agent-config ConfigMap and performs kubectl rollout restart/status into the new recipe, parameterize it via the existing variables (LOCAL_CORE, LOCAL_CORE_CERTS_DIR, LOCAL_CORE_HOST, LOCAL_CORE_PORT) and ensure the original callers (the blocks at the current location and the duplicate at lines 615-638) simply call the new target to avoid code drift.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@Makefile`:
- Around line 523-529: The Makefile logic that computes IPV4 is hardcoded to the
container name "carbide-rest-local-control-plane" instead of deriving it from
the existing KIND_CLUSTER_NAME, so running with make KIND_CLUSTER_NAME=...
LOCAL_CORE=true can target the wrong node; update the docker exec invocation and
any other places that reference "carbide-rest-local-control-plane" (also at the
later block around the same logic) to compute the container name from
$(KIND_CLUSTER_NAME) (e.g., use the same templated container name construction
you use elsewhere) so CORE_HOST resolution uses the correct container, and
ensure the variable CORE_HOST assignment and echo message still reflect the
substituted value.
- Around line 607-612: In the Makefile target helm-deploy-site-agent, remove the
trailing "|| true" from the helm upgrade --install carbide-rest-site-agent ...
--timeout 1m command so that template rendering, validation, and hook errors are
not suppressed; optionally consider adding --wait and increasing the --timeout
value if you want Helm to wait for resources to become ready, but do not swallow
failures by appending "|| true".
---
Duplicate comments:
In `@Makefile`:
- Around line 284-286: The kind-deploy flow is inconsistent when
LOCAL_CORE=true: the Docker build for carbide-rest-mock-core is skipped (see the
conditional around the docker build) but kind-apply still unconditionally
applies $(KUSTOMIZE_OVERLAY), which may include the mock-core resources; update
the Makefile so kind-deploy/kind-apply respects LOCAL_CORE by conditionally
excluding or substituting the overlay that contains mock-core (or by adding a
conditional around the apply step), e.g., gate the application of
$(KUSTOMIZE_OVERLAY) or switch to a different overlay when LOCAL_CORE=true so
that mock-core resources are not applied when the image isn't built. Ensure
changes reference the existing variables/targets: LOCAL_CORE, KUSTOMIZE_OVERLAY,
and the kind-apply / kind-deploy targets.
In `@scripts/test-local-core.sh`:
- Around line 172-180: The assignment that pipes kubectl exec through grep is
currently fatal under set -euo pipefail; instead capture the full kubectl exec
output into a variable (e.g., UPDATE_OUTPUT by running the existing kubectl exec
... 2>&1 and storing its output), then parse that output with grep into
UPDATE_TAG using grep -E '^UPDATE' || true so the grep failure is non-fatal;
keep the existing die checks that test [[ -z "$UPDATE_TAG" ]] and parse
UPDATED_ROWS from UPDATE_TAG as before. Ensure you reference the same kubectl
exec invocation, UPDATE_TAG and UPDATED_ROWS symbols when making the change.
- Around line 88-97: The current HTTP 409 handling in the if block that checks
http_code and sets existing_id (the branch that calls info and echoes the reused
id) treats conflicts as successful validated creates; change this so 409 is not
counted as a validated create: either (A) mark the operation as skipped/failure
by replacing the success echo/info path with a skip log and a non-success return
(or call die) so the smoke test does not consider the create exercised, or (B)
make the POST payloads unique per run by appending a per-run
random/timestamp/UUID suffix to the resource identifiers used where payloads are
generated so collisions cannot occur; apply the same change to the other
conflict-handling instances (the similar blocks at the other ranges referenced)
so 409 no longer masquerades as a validated create.
---
Nitpick comments:
In `@Makefile`:
- Around line 512-536: Extract the duplicated LOCAL_CORE bootstrap block into a
single reusable Makefile recipe (e.g., a phony target like
local-core-site-agent-bootstrap) and replace both inline copies with invocations
of that recipe; move the logic that checks LOCAL_CORE, validates files in
$(LOCAL_CORE_CERTS_DIR), creates the core-grpc-client-site-agent-certs secret,
resolves CORE_HOST/IPV4, patches the carbide-rest-site-agent-config ConfigMap
and performs kubectl rollout restart/status into the new recipe, parameterize it
via the existing variables (LOCAL_CORE, LOCAL_CORE_CERTS_DIR, LOCAL_CORE_HOST,
LOCAL_CORE_PORT) and ensure the original callers (the blocks at the current
location and the duplicate at lines 615-638) simply call the new target to avoid
code drift.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: a154a0b2-eb5d-4cf5-9010-0bd9556cd8f6
📒 Files selected for processing (2)
Makefilescripts/test-local-core.sh
| IPV4=$$(docker exec $$(docker ps -qf name=carbide-rest-local-control-plane) \ | ||
| getent ahosts "$$CORE_HOST" 2>/dev/null \ | ||
| | awk '/STREAM/{print $$1}' | grep -v ':' | head -1) ; \ | ||
| if [ -n "$$IPV4" ]; then \ | ||
| echo "Resolved $$CORE_HOST → $$IPV4 (using IPv4 to avoid Go IPv6-first dialer issue)" ; \ | ||
| CORE_HOST="$$IPV4" ; \ | ||
| fi ; \ |
There was a problem hiding this comment.
Derive the control-plane container from KIND_CLUSTER_NAME.
Both LOCAL_CORE flows hardcode carbide-rest-local-control-plane instead of using the existing $(KIND_CLUSTER_NAME) variable. make KIND_CLUSTER_NAME=... ... LOCAL_CORE=true will resolve the wrong node and silently skip the IPv4 fallback.
Suggested fix
- IPV4=$$(docker exec $$(docker ps -qf name=carbide-rest-local-control-plane) \
+ IPV4=$$(docker exec $$(docker ps -qf name=$(KIND_CLUSTER_NAME)-control-plane) \
getent ahosts "$$CORE_HOST" 2>/dev/null \
| awk '/STREAM/{print $$1}' | grep -v ':' | head -1) ; \Also applies to: 626-632
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@Makefile` around lines 523 - 529, The Makefile logic that computes IPV4 is
hardcoded to the container name "carbide-rest-local-control-plane" instead of
deriving it from the existing KIND_CLUSTER_NAME, so running with make
KIND_CLUSTER_NAME=... LOCAL_CORE=true can target the wrong node; update the
docker exec invocation and any other places that reference
"carbide-rest-local-control-plane" (also at the later block around the same
logic) to compute the container name from $(KIND_CLUSTER_NAME) (e.g., use the
same templated container name construction you use elsewhere) so CORE_HOST
resolution uses the correct container, and ensure the variable CORE_HOST
assignment and echo message still reflect the substituted value.
| helm-deploy-site-agent: | ||
| @echo "Installing site-agent chart (will CrashLoop until bootstrapped)..." | ||
| helm upgrade --install carbide-rest-site-agent $(SITE_AGENT_CHART)/ \ | ||
| --namespace carbide-rest $(HELM_SET) --timeout 1m || true | ||
| --namespace carbide-rest $(HELM_SET) \ | ||
| $(if $(filter true,$(LOCAL_CORE)),--set certificate.enabled=false) \ | ||
| --timeout 1m || true |
There was a problem hiding this comment.
🧩 Analysis chain
🌐 Web query:
For Helm CLI, does helm upgrade --installfail when a workload enters CrashLoopBackOff if--waitis not set, and does--timeoutaffect upgrade/install behavior without--waitor--atomic?
💡 Result:
No, helm upgrade --install does not fail when a workload enters CrashLoopBackOff if --wait is not set. Yes, --timeout affects upgrade/install behavior without --wait or --atomic for individual Kubernetes operations like hooks.
Citations:
- 1: https://v3.helm.sh/docs/helm/helm_upgrade/
- 2: https://v3.helm.sh/docs/intro/using_helm/
- 3: https://github.com/helm/helm/blob/main/pkg/action/upgrade.go
- 4: https://stackoverflow.com/questions/71417105/how-to-handle-helm-upgrade-timeout-error
🌐 Web query:
helm upgrade --install failure modes errors without --wait template rendering validation
💡 Result:
Helm upgrade --install fails without --wait primarily due to post-install hook failures, especially when hooks depend on webhooks or resources that aren't immediately ready. Template rendering and validation errors cause failures before hooks. Key failure modes: 1. Post-install hook failures: Without --wait, Helm doesn't wait for main resources (Deployments, etc.) to be ready before running post-install hooks. Hooks like Jobs or webhooks fail if dependencies (e.g., webhook services) aren't available, causing UPGRADE FAILED: failed post-install. Adding --wait makes Helm wait for resources ready, then run hooks successfully. 2. Template rendering errors: Invalid YAML, nil pointer errors (missing values), type mismatches, undefined functions. Use helm template --debug, helm lint --strict to validate before upgrade. 3. Kubernetes validation errors: Rendered manifests fail schema validation, immutable field changes, unknown fields. Validate with helm template | kubeval --strict. 4. Timeout errors without --wait: Default 5m timeout hit if resources slow to ready; command fails even if eventually succeeds. 5. Resource conflicts: Stuck failed releases from prior attempts block upgrades. Without --wait, upgrade proceeds immediately after posting resources but marks failed if hooks fail post-posting. Official docs recommend --wait for production to ensure readiness. Debug: helm upgrade --dry-run --debug; helm template --debug; check release status with helm status. Prevention: Validate charts (lint, template, kubeval), use --wait/--timeout=15m, --atomic for rollback.
Citations:
- 1: Installing the helm chart without the --wait parameter fails kubernetes-sigs/cluster-api-operator#406
- 2: https://helm.sh/docs/helm/helm_upgrade/
- 3: https://v3.helm.sh/docs/intro/using_helm/
- 4: https://helm.sh/docs/chart_template_guide/debugging
- 5: helm upgrade --wait does not wait on newer versions helm/helm#10061
- 6: Helm install/upgrade wait robustness helm/helm#11789
- 7: Expected behavior of "helm upgrade --atomic --wait --debug --timeout 300 --version 1.1.0" helm/helm#5626
- 8:
helm upgrade --wait-for-jobsdoes not detect a job error in certain situations helm/helm#13316 - 9: Helm upgrades using --wait succeed without readiness checks passing if you downscale replicas to 0 helm/helm#12768
- 10: Helm Hook "Post Install" not working for Dependencies Chart helm/helm#11422
- 11: https://stackoverflow.com/questions/68643009/how-do-i-make-helm-chart-hooks-post-install-work-if-other-charts-are-in-running
- 12: https://stackoverflow.com/questions/62156848/is-there-any-option-to-tell-to-helm-to-not-wait-till-post-install-hook-is-comple
- 13: post-install hooks are not executed after resources were posted fluxcd/helm-controller#249
- 14: Helm post-install hooks are not triggered when used by
helm_releaseandwait = truehashicorp/terraform-provider-helm#683
Remove || true to surface template and validation errors.
helm upgrade --install without --wait returns after submitting resources (so CrashLoop won't cause helm to fail), but it still fails on template rendering errors, Kubernetes validation failures, and post-install hook issues. The || true suppresses these legitimate errors and allows the bootstrap to continue with a broken release.
Suggested fix
helm upgrade --install carbide-rest-site-agent $(SITE_AGENT_CHART)/ \
--namespace carbide-rest $(HELM_SET) \
$(if $(filter true,$(LOCAL_CORE)),--set certificate.enabled=false) \
- --timeout 1m || true
+ --timeout 1m🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@Makefile` around lines 607 - 612, In the Makefile target
helm-deploy-site-agent, remove the trailing "|| true" from the helm upgrade
--install carbide-rest-site-agent ... --timeout 1m command so that template
rendering, validation, and hook errors are not suppressed; optionally consider
adding --wait and increasing the --timeout value if you want Helm to wait for
resources to become ready, but do not swallow failures by appending "|| true".
7917324 to
330f61b
Compare
ed3a541 to
ba8360e
Compare
9fe2bc0 to
d75f362
Compare
a3de0d6 to
f355215
Compare
…test script for validation Signed-off-by: Patrice Breton <pbreton@nvidia.com>
f355215 to
be29eb2
Compare
This depends on PR#449.