Skip to content

[DNM] debug ci#817

Open
elfosardo wants to merge 1 commit intoopenshift:mainfrom
elfosardo:debug-ci
Open

[DNM] debug ci#817
elfosardo wants to merge 1 commit intoopenshift:mainfrom
elfosardo:debug-ci

Conversation

@elfosardo
Copy link
Copy Markdown

No description provided.

@coderabbitai
Copy link
Copy Markdown

coderabbitai bot commented Mar 19, 2026

Walkthrough

Added a new dnf_wrapper.sh script and integrated it into three stages (builder, wheel-builder, final) of Dockerfile.ocp by enabling debug env, copying the wrapper to /bin/dnf_wrapper/dnf, creating /bin/dnf_wrapper/microdnf, and adding a directory listing debug step for /etc/yum.repos.art/ci/.

Changes

Cohort / File(s) Summary
Dockerfile changes
Dockerfile.ocp
In the builder, wheel-builder, and final runtime stages: set ENV ART_DNF_WRAPPER_DEBUG="true", COPY dnf_wrapper.sh /bin/dnf_wrapper/dnf, RUN cp /bin/dnf_wrapper/dnf /bin/dnf_wrapper/microdnf, and added RUN ls -la /etc/yum.repos.art/ci/ diagnostic lines.
DNF wrapper script
dnf_wrapper.sh
Added a new wrapper script (~198 lines) that validates env vars, detects CI vs localdev vs skip modes, manages repo dirs and CA installation, composes DNF/microdnf arguments (including --setopt=reposdir=... and optional --disableplugin=subscription-manager), logs configured .repo sections, and invokes the real package manager binary. Pay attention to env-driven policy behavior and marker-file logic.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
📝 Coding Plan
  • Generate coding plan for human review comments

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@Dockerfile.ocp`:
- Around line 48-49: Remove the unused runtime environment variable
ART_DNF_WRAPPER_DEBUG by deleting the ENV ART_DNF_WRAPPER_DEBUG="true" line from
the final runtime stage in the Dockerfile; if the debug flag is needed during
build only, move or set it inside the build stage (or the prepare-image.sh
invocation) so it does not persist into the final image, and ensure no other
references to ART_DNF_WRAPPER_DEBUG remain in the final stage.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: abd854a2-cd8d-4b58-bae1-a51268c83e1b

📥 Commits

Reviewing files that changed from the base of the PR and between a0dad02 and 9fe2e13.

📒 Files selected for processing (1)
  • Dockerfile.ocp

Comment on lines +48 to +49
ENV ART_DNF_WRAPPER_DEBUG="true"

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
# Description: Check if dnf/ART wrapper is used at container runtime.

# Check scripts that might run at container startup
rg -n "dnf|yum" --type sh scripts/ -C 2 2>/dev/null

# Check for entrypoint or startup scripts
fd -e sh -e py | xargs rg -l "entrypoint|startup|init" | head -5 | xargs cat

Repository: openshift/ironic-image

Length of output: 12154


🏁 Script executed:

cat -n Dockerfile.ocp | head -100

Repository: openshift/ironic-image

Length of output: 3474


🏁 Script executed:

# Search for ART_DNF_WRAPPER_DEBUG usage
rg -i "ART_DNF_WRAPPER_DEBUG" --type sh --type py

# Search for dnf usage in scripts that might run at runtime
rg -l "prepare-image|prepare-ipxe" --type sh | head -5 | xargs rg -n "dnf|ART_DNF"

# Check if dnf is invoked in any of the deployed scripts
fd -e sh -path "*/bin/*" -o -path "*/scripts/*" | xargs rg -l "dnf|yum" 2>/dev/null

Repository: openshift/ironic-image

Length of output: 340


🏁 Script executed:

# Search for ART_DNF_WRAPPER_DEBUG usage in all files
rg "ART_DNF_WRAPPER_DEBUG" -B 2 -A 2

# Check prepare-image.sh to see if it uses dnf
cat -n prepare-image.sh | head -80

# Check if dnf is used in any of the build scripts
rg "dnf\s|yum\s" prepare-image.sh prepare-efi.sh prepare-ipxe.sh

Repository: openshift/ironic-image

Length of output: 3961


🏁 Script executed:

# Find and examine entrypoint scripts
fd -name "*start*" -o -name "*entry*" | xargs cat 2>/dev/null | head -200

# More thorough search for dnf in all scripts that would be in final image
rg "dnf|yum" /bin /usr/bin /usr/local/bin 2>/dev/null | head -20

# Check if any scripts call prepare-image or reference the wrapper
rg -i "wrapper|prepare.image" --type sh

Repository: openshift/ironic-image

Length of output: 296


🏁 Script executed:

# Find scripts in the repo that might be runtime entrypoints
find . -name "*start*.sh" -o -name "*entry*.sh" | head -10

# Check the Dockerfile for ENTRYPOINT or CMD
grep -n "ENTRYPOINT\|CMD" Dockerfile.ocp

# Look at the rest of the Dockerfile to see what's set as entrypoint
tail -20 Dockerfile.ocp

# Search for any reference to dnf wrapper in config files
rg "wrapper" --type sh --type py

Repository: openshift/ironic-image

Length of output: 884


🏁 Script executed:

# Check complete Dockerfile for ENTRYPOINT/CMD
cat Dockerfile.ocp | grep -n "ENTRYPOINT\|CMD\|LABEL\|EXPOSE"

# Get the full length to understand scope
wc -l Dockerfile.ocp

# Get the end of file
tail -40 Dockerfile.ocp

Repository: openshift/ironic-image

Length of output: 1616


🏁 Script executed:

# Search for any dnf invocations in the entire codebase
rg "dnf\s" . --type sh

# Check if ART_DNF_WRAPPER is referenced anywhere (complete search)
rg "ART_DNF" . 

# List all shell scripts in the final image's /bin directory by examining what's copied
grep -n "COPY.*\s/bin" Dockerfile.ocp

Repository: openshift/ironic-image

Length of output: 924


Remove unnecessary debug environment variable from final runtime stage.

The ART_DNF_WRAPPER_DEBUG environment variable at line 48 persists into the final runtime image but is never used at runtime. All dnf operations occur during the build phase (in prepare-image.sh, which is deleted after execution). This environment variable serves no purpose in the runtime container and should be removed.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@Dockerfile.ocp` around lines 48 - 49, Remove the unused runtime environment
variable ART_DNF_WRAPPER_DEBUG by deleting the ENV ART_DNF_WRAPPER_DEBUG="true"
line from the final runtime stage in the Dockerfile; if the debug flag is needed
during build only, move or set it inside the build stage (or the
prepare-image.sh invocation) so it does not persist into the final image, and
ensure no other references to ART_DNF_WRAPPER_DEBUG remain in the final stage.

@openshift-ci openshift-ci bot requested review from derekhiggins and dtantsur March 19, 2026 09:49
@openshift-ci
Copy link
Copy Markdown

openshift-ci bot commented Mar 19, 2026

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: elfosardo

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-ci openshift-ci bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Mar 19, 2026
Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (1)
Dockerfile.ocp (1)

57-58: Debug listing in final stage provides limited value.

This ls command runs before the dnf operations in prepare-image.sh (line 68), but the output won't be visible in CI logs unless the subsequent build steps fail. Consider moving this inside prepare-image.sh right before any dnf calls for more useful debug output, or remove it from the final stage entirely since debugging should focus on the builder stages where the actual issues likely occur.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@Dockerfile.ocp` around lines 57 - 58, The debug RUN ls -la
/etc/yum.repos.art/ci/ in Dockerfile.ocp's final stage is low-value; either
remove that RUN line from Dockerfile.ocp entirely or move the listing into
prepare-image.sh immediately before the dnf calls (as in the prepare-image.sh
script around the dnf invocation) so the output is produced during the builder
steps and appears in CI logs when package installation runs; update
Dockerfile.ocp to delete the RUN ls... or update prepare-image.sh to add the ls
-la /etc/yum.repos.art/ci/ just before the dnf commands.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In `@Dockerfile.ocp`:
- Around line 57-58: The debug RUN ls -la /etc/yum.repos.art/ci/ in
Dockerfile.ocp's final stage is low-value; either remove that RUN line from
Dockerfile.ocp entirely or move the listing into prepare-image.sh immediately
before the dnf calls (as in the prepare-image.sh script around the dnf
invocation) so the output is produced during the builder steps and appears in CI
logs when package installation runs; update Dockerfile.ocp to delete the RUN
ls... or update prepare-image.sh to add the ls -la /etc/yum.repos.art/ci/ just
before the dnf commands.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 99a8226c-508d-499b-a23e-32a227bfc4c7

📥 Commits

Reviewing files that changed from the base of the PR and between 9fe2e13 and c28db54.

📒 Files selected for processing (1)
  • Dockerfile.ocp

Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@dnf_wrapper.sh`:
- Around line 40-49: The script currently validates DNF_WRAPPER_DIR and
CI_RPM_SVC before the caller-configurable ART_DNF_WRAPPER_POLICY is evaluated,
causing deliberate "skip" invocations to hard-fail; change the flow so
ART_DNF_WRAPPER_POLICY is read/evaluated early (or check its value) and only
enforce the presence of DNF_WRAPPER_DIR and CI_RPM_SVC when
ART_DNF_WRAPPER_POLICY is not "skip" (i.e., guard the existing checks for
DNF_WRAPPER_DIR and CI_RPM_SVC with a conditional that bypasses them when
ART_DNF_WRAPPER_POLICY==skip), ensuring $0.real/other execution paths can
proceed without those env vars when policy is skip.
- Around line 115-123: The curl probes that fetch CI repo definitions and the
other external service (invocations using CI_RPM_SVC writing to CI_RPM_REPO_DEST
and the second probe in the subsequent retry loop) lack explicit timeouts;
update both curl calls to include sensible timeout flags (e.g. --connect-timeout
5 and --max-time 10 or similar values used project-wide) so DNS/TCP stalls don't
hang the retry loop, leaving the rest of the logic (setting WRAPPER_MODE,
INSTALL_ART_RH_VPN_REPOS, echoerr, rm -f, retry sleep) unchanged.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 17f709bb-75f8-40c8-a3d4-98e152f52227

📥 Commits

Reviewing files that changed from the base of the PR and between c28db54 and 5f3e0e6.

📒 Files selected for processing (2)
  • Dockerfile.ocp
  • dnf_wrapper.sh
✅ Files skipped from review due to trivial changes (1)
  • Dockerfile.ocp

Comment on lines +40 to +49
if [[ -z "${DNF_WRAPPER_DIR:-}" ]]; then
echo "Expected DNF_WRAPPER_DIR to be set. Exiting since environment is not expected."
exit 1
fi

# CI_RPM_SVC should be injected into the rebased Dockerfile from the doozer image
# metadata config (the .envs list).
if [[ -z "${CI_RPM_SVC:-}" ]]; then
echo "Expected CI_RPM_SVC to be set. Exiting since environment is not expected."
exit 1
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

skip mode still hard-fails on repo-management env.

Line 58 makes ART_DNF_WRAPPER_POLICY=skip configurable, but Lines 40-49 and Lines 92-93 are evaluated first. With set -u, callers that intentionally choose skip can still exit on missing repo-management env before the wrapper ever reaches $0.real.

Proposed fix
-if [[ -z "${DNF_WRAPPER_DIR:-}" ]]; then
-  echo "Expected DNF_WRAPPER_DIR to be set. Exiting since environment is not expected."
-  exit 1
-fi
-
-# CI_RPM_SVC should be injected into the rebased Dockerfile from the doozer image
-# metadata config (the .envs list).
-if [[ -z "${CI_RPM_SVC:-}" ]]; then
-  echo "Expected CI_RPM_SVC to be set. Exiting since environment is not expected."
-  exit 1
-fi
-
 # External users of this image may have different use cases that diverge
 # from our default repo installation logic. Allow them to override
 # the behavior.
@@
-# If we install repos appropriate for CI builds, they will reside here.
-CI_RPM_REPO_DEST="${ART_REPOS_DIR_CI}/ci-svc-repos.repo"
-VPN_RPM_REPO_DEST="${ART_REPOS_DIR_LOCALDEV}/rh-vpn-repos.repo"
-
 # Red Hat IT Root CA certificate paths for localdev builds
@@
 if [[ "${SKIP_REPO_INSTALL}" == "0" ]]; then
+  : "${DNF_WRAPPER_DIR:?Expected DNF_WRAPPER_DIR to be set. Exiting since environment is not expected.}"
+  : "${CI_RPM_SVC:?Expected CI_RPM_SVC to be set. Exiting since environment is not expected.}"
+  : "${ART_REPOS_DIR_CI:?Expected ART_REPOS_DIR_CI to be set.}"
+  : "${ART_REPOS_DIR_LOCALDEV:?Expected ART_REPOS_DIR_LOCALDEV to be set.}"
+
+  # If we install repos appropriate for CI builds, they will reside here.
+  CI_RPM_REPO_DEST="${ART_REPOS_DIR_CI}/ci-svc-repos.repo"
+  VPN_RPM_REPO_DEST="${ART_REPOS_DIR_LOCALDEV}/rh-vpn-repos.repo"
+
   WRAPPER_MODE="unknown"

Also applies to: 58-60, 91-93

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@dnf_wrapper.sh` around lines 40 - 49, The script currently validates
DNF_WRAPPER_DIR and CI_RPM_SVC before the caller-configurable
ART_DNF_WRAPPER_POLICY is evaluated, causing deliberate "skip" invocations to
hard-fail; change the flow so ART_DNF_WRAPPER_POLICY is read/evaluated early (or
check its value) and only enforce the presence of DNF_WRAPPER_DIR and CI_RPM_SVC
when ART_DNF_WRAPPER_POLICY is not "skip" (i.e., guard the existing checks for
DNF_WRAPPER_DIR and CI_RPM_SVC with a conditional that bypasses them when
ART_DNF_WRAPPER_POLICY==skip), ensuring $0.real/other execution paths can
proceed without those env vars when policy is skip.

Comment on lines +115 to +123
for (( i=1; i<=5; i++ )); do
if curl --fail "${CI_RPM_SVC}" 2> /dev/null > "${CI_RPM_REPO_DEST}"; then
WRAPPER_MODE="ci"
INSTALL_ART_RH_VPN_REPOS="0"
echoerr "Installed CI build pod repo definitions from ${CI_RPM_SVC}."
break
fi
rm -f "${CI_RPM_REPO_DEST}"
sleep 1
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Add explicit timeouts to both network probes.

Line 116 and Line 129 call external services with retries but no --connect-timeout or --max-time. A DNS or TCP stall here can block every wrapped package install far longer than the retry loop implies.

Proposed fix
-      if curl --fail "${CI_RPM_SVC}" 2> /dev/null > "${CI_RPM_REPO_DEST}"; then
+      if curl --fail --silent --connect-timeout 5 --max-time 15 \
+          "${CI_RPM_SVC}" 2> /dev/null > "${CI_RPM_REPO_DEST}"; then
@@
-      if curl --fail --silent --location --retry 5 --retry-delay 2 --output "${RH_IT_ROOT_CA_CERT_TMP_PATH}" "${RH_IT_ROOT_CA_CERT_URL}"; then
+      if curl --fail --silent --location \
+          --connect-timeout 5 --max-time 30 \
+          --retry 5 --retry-delay 2 \
+          --output "${RH_IT_ROOT_CA_CERT_TMP_PATH}" \
+          "${RH_IT_ROOT_CA_CERT_URL}"; then

Also applies to: 129-135

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@dnf_wrapper.sh` around lines 115 - 123, The curl probes that fetch CI repo
definitions and the other external service (invocations using CI_RPM_SVC writing
to CI_RPM_REPO_DEST and the second probe in the subsequent retry loop) lack
explicit timeouts; update both curl calls to include sensible timeout flags
(e.g. --connect-timeout 5 and --max-time 10 or similar values used project-wide)
so DNS/TCP stalls don't hang the retry loop, leaving the rest of the logic
(setting WRAPPER_MODE, INSTALL_ART_RH_VPN_REPOS, echoerr, rm -f, retry sleep)
unchanged.

@openshift-ci
Copy link
Copy Markdown

openshift-ci bot commented Mar 19, 2026

@elfosardo: The following tests failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
ci/prow/prevalidation-images 5f3e0e6 link true /test prevalidation-images
ci/prow/e2e-metal-ipi-ovn-ipv6 5f3e0e6 link true /test e2e-metal-ipi-ovn-ipv6
ci/prow/e2e-metal-ipi-virtualmedia 5f3e0e6 link true /test e2e-metal-ipi-virtualmedia
ci/prow/e2e-metal-ipi-bm 5f3e0e6 link true /test e2e-metal-ipi-bm
ci/prow/images 5f3e0e6 link true /test images

Full PR test history. Your PR dashboard.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

openshift-merge-bot bot pushed a commit that referenced this pull request Mar 25, 2026
…-digest

🌱 Update openstack-ironic digest to acb78ef (main)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by an approver from all required OWNERS files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant