Skip to content

Implement vendor-specific BMC/BIOS upgrade checks with firmware inventory function#733

Open
davidgrun wants to merge 3 commits intomainfrom
enh/bmc-queue-task-precheck
Open

Implement vendor-specific BMC/BIOS upgrade checks with firmware inventory function#733
davidgrun wants to merge 3 commits intomainfrom
enh/bmc-queue-task-precheck

Conversation

@davidgrun
Copy link
Contributor

@davidgrun davidgrun commented Mar 10, 2026

Summary by CodeRabbit

  • New Features
    • Added pending component upgrade detection capability across supported BMC vendors (Dell, HPE, Lenovo).
    • Firmware upgrade workflows now check for pending upgrades before proceeding and defer if necessary, improving upgrade safety and reliability.

@davidgrun davidgrun requested a review from a team as a code owner March 10, 2026 09:03
@github-actions github-actions bot added size/L enhancement New feature or request labels Mar 10, 2026
@coderabbitai
Copy link
Contributor

coderabbitai bot commented Mar 10, 2026

Walkthrough

This PR adds a CheckBMCPendingComponentUpgrade capability to detect pending/staged firmware upgrades before initiating component upgrades. It includes vendor-specific implementations for Dell, HPE, and Lenovo via a shared generic helper function, along with test stubs. The feature integrates with BMC and BIOS upgrade controllers to conditionally defer upgrades when pending upgrades are detected.

Changes

Cohort / File(s) Summary
Configuration
.golangci.yml
Reorders exclusions block, removing explicit test file pattern exclusion.
Interface Definition
bmc/bmc.go
Adds CheckBMCPendingComponentUpgrade method to BMC interface for pending component upgrade detection.
Generic Helper Logic
bmc/oem_helpers.go
Introduces checkPendingComponentUpgrade shared function for vendor-agnostic pending upgrade detection using UpdateService and FirmwareInventory with callbacks for filtering and status evaluation.
Base Implementation
bmc/redfish.go
Adds RedfishBaseBMC.CheckBMCPendingComponentUpgrade fallback that returns unsupported error for vendors without specialized implementations.
Vendor-Specific Implementations
bmc/redfish_dell.go, bmc/redfish_hpe.go, bmc/redfish_lenovo.go
Each vendor implements CheckBMCPendingComponentUpgrade with type-specific component filters and pending status checks (Dell filters by component keywords, HPE maps component types to ROM identifiers, Lenovo uses ID-based filtering with "-PENDING" suffix detection).
Test Stubs
bmc/redfish_kube.go, bmc/redfish_local.go
Add CheckBMCPendingComponentUpgrade test implementations that return (false, nil) without performing checks.
Controller Integration
internal/controller/bmcversion_controller.go, internal/controller/biosversion_controller.go
Add pre-upgrade guards in processInProgressState that check for pending component upgrades and defer the upgrade if detected, proceeding on error.

Sequence Diagram

sequenceDiagram
    participant Controller as BMC/BIOS Controller
    participant BMC as BMC Interface
    participant Vendor as Vendor Implementation<br/>(Dell/HPE/Lenovo)
    participant Helper as checkPendingComponentUpgrade
    participant Redfish as Redfish API

    Controller->>BMC: CheckBMCPendingComponentUpgrade(ctx, componentType)
    BMC->>Vendor: CheckBMCPendingComponentUpgrade(ctx, componentType)
    
    Vendor->>Helper: checkPendingComponentUpgrade(ctx, callbacks...)
    
    Helper->>Redfish: GetUpdateService()
    Redfish-->>Helper: UpdateService
    
    Helper->>Redfish: GetFirmwareInventory()
    Redfish-->>Helper: []FirmwareInventory
    
    loop For each firmware item
        Helper->>Vendor: matchesComponentFilter(fw, filters)
        Vendor-->>Helper: match (bool)
        
        alt Match found
            Helper->>Vendor: checkPending(fw)
            Vendor-->>Helper: isPending (bool)
        end
    end
    
    Helper-->>Vendor: pending (bool)
    Vendor-->>BMC: pending (bool), error
    BMC-->>Controller: pending (bool), error
    
    alt Pending upgrade detected
        Controller->>Controller: Defer upgrade (requeue)
    else No pending upgrade
        Controller->>BMC: Proceed with component upgrade
    end
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Possibly related PRs

Suggested labels

area/metal-automation

Suggested reviewers

  • nagadeesh-nagaraja
  • afritzler
🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Description check ⚠️ Warning The pull request has no substantive description; only a placeholder string '@coderabbitai summary' was provided, failing to document proposed changes or fixes. Add a comprehensive description following the template: explain the proposed changes, rationale, and any related issue numbers. Include details about vendor-specific BMC/BIOS upgrade checks and firmware inventory function.
✅ Passed checks (2 passed)
Check name Status Explanation
Title check ✅ Passed The title accurately and specifically summarizes the main change: implementing vendor-specific BMC/BIOS upgrade checks with a shared firmware inventory function.
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
  • 📝 Generate docstrings (stacked PR)
  • 📝 Generate docstrings (commit on current branch)
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch enh/bmc-queue-task-precheck

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🧹 Nitpick comments (3)
bmc/bmc.go (1)

124-125: Use a typed component enum instead of raw strings.

This API now fans out into multiple vendor switch statements, and a typo like "Bios" would quietly turn the precheck into a no-op. Defining typed constants for the allowed values keeps controllers and vendor implementations aligned.

♻️ Suggested direction
+type ComponentType string
+
+const (
+	ComponentTypeBMC  ComponentType = "BMC"
+	ComponentTypeBIOS ComponentType = "BIOS"
+)
+
 	// CheckBMCPendingComponentUpgrade checks if there are pending/staged component upgrades for the given component type ("BMC" or "BIOS").
-	CheckBMCPendingComponentUpgrade(ctx context.Context, componentType string) (bool, error)
+	CheckBMCPendingComponentUpgrade(ctx context.Context, componentType ComponentType) (bool, error)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@bmc/bmc.go` around lines 124 - 125, The method
CheckBMCPendingComponentUpgrade currently accepts a raw string parameter
componentType which is error-prone; change this to a typed enum (e.g., declare
type ComponentType string and constants like ComponentTypeBMC and
ComponentTypeBIOS) and update the function signature to
CheckBMCPendingComponentUpgrade(ctx context.Context, componentType
ComponentType) (bool, error); update all callers to pass the new constants,
update vendor switch/branch logic in any implementations to switch on the
ComponentType constants, and add validation in the function/implementations to
return an explicit error for unknown ComponentType values instead of silently
no-oping.
bmc/redfish_local.go (1)

271-273: Make the local pending-check stub configurable.

Controllers now have a defer-on-pending branch, but this test double can only return false, so that path is hard to cover in local/controller tests. A UnitTestMockUps flag or mock inventory-backed response would make the new behavior testable.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@bmc/redfish_local.go` around lines 271 - 273, The stub
CheckBMCPendingComponentUpgrade on RedfishLocalBMC always returns false; make it
configurable by adding a test-only field (e.g., UnitTestPendingUpgrade bool or a
PendingUpgradeFunc func(ctx context.Context, id string) (bool, error)) to the
RedfishLocalBMC struct and update the CheckBMCPendingComponentUpgrade method to
return that field/value when set (falling back to the current false,nil
default). Ensure the new field is only used for local/test instances (default
remains unchanged) and update any constructors or test setups that create
RedfishLocalBMC to allow toggling this flag or injecting the function so
controller defer-on-pending paths can be exercised.
bmc/redfish_kube.go (1)

459-461: Make the kube stub configurable for tests.

Hardcoding false here means the local-test backend can never exercise the new “pending upgrade detected” branch in the BIOS/BMC controllers. Consider plumbing this through UnitTestMockUps so tests can cover both defer and proceed paths without vendor-specific fixtures.

Based on learnings, bmc/redfish_kube.go is used for local testing purposes.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@bmc/redfish_kube.go` around lines 459 - 461, The
CheckBMCPendingComponentUpgrade stub should be configurable for tests instead of
always returning false; modify RedfishKubeBMC.CheckBMCPendingComponentUpgrade to
consult a test hook on the receiver (e.g., a UnitTestMockUps field or method)
and return that value (plus nil error) when present, falling back to false,nil
when the mock is nil; reference the RedfishKubeBMC struct and the
CheckBMCPendingComponentUpgrade method and add/consume a clear boolean field or
accessor on UnitTestMockUps (e.g., UnitTestMockUps.BMCPendingUpgrade or
UnitTestMockUps.GetBMCPendingUpgrade()) so unit tests can toggle the
pending-upgrade path.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@internal/controller/biosversion_controller.go`:
- Around line 304-310: The branch handling hasPending from
bmcClient.CheckBMCPendingComponentUpgrade should requeue reconciliation instead
of returning without a retry; locate the CheckBMCPendingComponentUpgrade call
(and the else-if hasPending branch) in biosversion_controller.go and change the
return so the controller requeues (i.e., request a retry/requeue) rather than
returning false, nil; keep the existing log.Info("Pending component upgrade
detected, deferring BIOS upgrade to avoid interruption", "Server", server.Name)
message and only modify the return path to trigger a requeue.

In `@internal/controller/bmcversion_controller.go`:
- Around line 302-309: The current logic treats any error from
bmcClient.CheckBMCPendingComponentUpgrade the same as “not supported” and
proceeds; change it to only ignore the “not supported” case and requeue for
transient/real errors. Concretely, when CheckBMCPendingComponentUpgrade returns
err, check for the specific sentinel (e.g. errors.Is(err,
redfish.ErrNotSupported) or the provider-specific NotSupported error) and in
that case log and continue; for any other err log it and return
ctrl.Result{RequeueAfter: r.ResyncInterval}, nil (so we defer the upgrade). Keep
references to bmcClient.CheckBMCPendingComponentUpgrade, BMC.Name, and the
existing ctrl.Result{RequeueAfter: r.ResyncInterval} in your change.

---

Nitpick comments:
In `@bmc/bmc.go`:
- Around line 124-125: The method CheckBMCPendingComponentUpgrade currently
accepts a raw string parameter componentType which is error-prone; change this
to a typed enum (e.g., declare type ComponentType string and constants like
ComponentTypeBMC and ComponentTypeBIOS) and update the function signature to
CheckBMCPendingComponentUpgrade(ctx context.Context, componentType
ComponentType) (bool, error); update all callers to pass the new constants,
update vendor switch/branch logic in any implementations to switch on the
ComponentType constants, and add validation in the function/implementations to
return an explicit error for unknown ComponentType values instead of silently
no-oping.

In `@bmc/redfish_kube.go`:
- Around line 459-461: The CheckBMCPendingComponentUpgrade stub should be
configurable for tests instead of always returning false; modify
RedfishKubeBMC.CheckBMCPendingComponentUpgrade to consult a test hook on the
receiver (e.g., a UnitTestMockUps field or method) and return that value (plus
nil error) when present, falling back to false,nil when the mock is nil;
reference the RedfishKubeBMC struct and the CheckBMCPendingComponentUpgrade
method and add/consume a clear boolean field or accessor on UnitTestMockUps
(e.g., UnitTestMockUps.BMCPendingUpgrade or
UnitTestMockUps.GetBMCPendingUpgrade()) so unit tests can toggle the
pending-upgrade path.

In `@bmc/redfish_local.go`:
- Around line 271-273: The stub CheckBMCPendingComponentUpgrade on
RedfishLocalBMC always returns false; make it configurable by adding a test-only
field (e.g., UnitTestPendingUpgrade bool or a PendingUpgradeFunc func(ctx
context.Context, id string) (bool, error)) to the RedfishLocalBMC struct and
update the CheckBMCPendingComponentUpgrade method to return that field/value
when set (falling back to the current false,nil default). Ensure the new field
is only used for local/test instances (default remains unchanged) and update any
constructors or test setups that create RedfishLocalBMC to allow toggling this
flag or injecting the function so controller defer-on-pending paths can be
exercised.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 55c3a287-0a22-407a-9d17-c0331749920d

📥 Commits

Reviewing files that changed from the base of the PR and between c9f3172 and ce6b34f.

📒 Files selected for processing (11)
  • .golangci.yml
  • bmc/bmc.go
  • bmc/oem_helpers.go
  • bmc/redfish.go
  • bmc/redfish_dell.go
  • bmc/redfish_hpe.go
  • bmc/redfish_kube.go
  • bmc/redfish_lenovo.go
  • bmc/redfish_local.go
  • internal/controller/biosversion_controller.go
  • internal/controller/bmcversion_controller.go

Comment on lines +302 to +309
// Check for pending component upgrade BEFORE issuing upgrade to avoid interrupting staged firmware
hasPending, err := bmcClient.CheckBMCPendingComponentUpgrade(ctx, "BMC")
if err != nil {
log.V(1).Info("Failed to check pending component upgrade before BMC upgrade, proceeding anyway", "error", err)
} else if hasPending {
log.V(1).Info("Pending component upgrade detected, deferring BMC upgrade to avoid interruption", "BMC", BMC.Name)
return ctrl.Result{RequeueAfter: r.ResyncInterval}, nil
}
Copy link
Contributor

@coderabbitai coderabbitai bot Mar 10, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Only ignore the “not supported” case here.

Proceeding on every CheckBMCPendingComponentUpgrade error defeats the safety benefit of this guard: a transient Redfish/FirmwareInventory failure can still let us start a BMC upgrade while another component upgrade is already staged. Please distinguish the base “not supported” fallback from real check failures, and requeue on the latter instead of issuing the upgrade.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@internal/controller/bmcversion_controller.go` around lines 302 - 309, The
current logic treats any error from bmcClient.CheckBMCPendingComponentUpgrade
the same as “not supported” and proceeds; change it to only ignore the “not
supported” case and requeue for transient/real errors. Concretely, when
CheckBMCPendingComponentUpgrade returns err, check for the specific sentinel
(e.g. errors.Is(err, redfish.ErrNotSupported) or the provider-specific
NotSupported error) and in that case log and continue; for any other err log it
and return ctrl.Result{RequeueAfter: r.ResyncInterval}, nil (so we defer the
upgrade). Keep references to bmcClient.CheckBMCPendingComponentUpgrade,
BMC.Name, and the existing ctrl.Result{RequeueAfter: r.ResyncInterval} in your
change.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The "Not supported by this vendor" error is currently a special case. During the recent refactoring of the Redfish files, we didn't introduce a dedicated error for it. Therefore, the controllers follow the existing project pattern and skip errors that a vendor hasn't implemented.

I suggest we maintain this behavior for now to ensure consistency with the other controllers and the current revision. In the future, we could consider an ErrNotSupported sentinel error so that the controllers react correctly. However, since this also affects other Redfish functions, I would like to address this as a separate topic.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@davidgrun, understood — thanks for the context. Keeping the current "proceed on all errors" pattern makes sense for consistency with the rest of the controllers, given there's no ErrNotSupported sentinel in place yet. Would you like me to open a follow-up issue to track the introduction of an ErrNotSupported sentinel error so that controllers can distinguish transient failures from "not supported by vendor" and react accordingly?


✏️ Learnings added
Learnt from: davidgrun
Repo: ironcore-dev/metal-operator PR: 733
File: internal/controller/bmcversion_controller.go:302-309
Timestamp: 2026-03-11T09:51:11.302Z
Learning: In ironcore-dev/metal-operator, the project does not yet have a dedicated ErrNotSupported sentinel error for unimplemented vendor BMC methods. Controllers (e.g. internal/controller/bmcversion_controller.go, internal/controller/biosversion_controller.go) intentionally proceed/skip on ALL errors from vendor-specific BMC client methods (such as CheckBMCPendingComponentUpgrade) to maintain consistency with the existing pattern. A future improvement to introduce an ErrNotSupported sentinel is planned so controllers can distinguish transient errors from "not supported by vendor" cases and react differently (e.g. requeue on transient errors).

Note: Learnings are effective only in the context of similar code segments. To apply general coding guidelines, please configure review instructions. You can manage existing learnings in the app.


🧠 Learnings used
Learnt from: CR
Repo: ironcore-dev/metal-operator PR: 0
File: AGENTS.md:0-0
Timestamp: 2026-02-17T13:12:50.124Z
Learning: Applies to internal/controller/**/*_controller.go : Implement idempotent reconciliation logic - safe to run the same reconciliation multiple times without side effects

Learnt from: nagadeesh-nagaraja
Repo: ironcore-dev/metal-operator PR: 695
File: internal/controller/bmcsettings_controller.go:720-730
Timestamp: 2026-03-04T10:32:17.473Z
Learning: In internal/controller/bmcsettings_controller.go and other controllers in the same directory, do not overwrite RetryOfFailedResourceConditionIssued when its Status is metav1.ConditionTrue. Preserve existing reason/message on re-entry and only update metadata if the condition is not yet true. This avoids unnecessary updates and retains historical context.

Learnt from: afritzler
Repo: ironcore-dev/metal-operator PR: 649
File: bmc/redfish_kube.go:380-392
Timestamp: 2026-02-04T16:31:13.180Z
Learning: The `bmc/redfish_kube.go` file in the ironcore-dev/metal-operator repository is used for local testing purposes and does not require vendor-specific workarounds (such as SuperMicro-specific boot mode handling) that are present in the production `bmc/redfish.go` implementation.
</learning>

Learnt from: afritzler
Repo: ironcore-dev/metal-operator PR: 607
File: bmc/redfish.go:205-208
Timestamp: 2026-01-22T09:49:09.470Z
Learning: In the ironcore-dev/metal-operator repository, SuperMicro systems require BootSourceOverrideMode to be unconditionally set to UEFI in the SetPXEBootOnce function (`bmc/redfish.go`), regardless of the current value or whether the mode is reported. This is a vendor-specific requirement that intentionally bypasses the general safeguard for systems that don't report BootSourceOverrideMode.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, you can do that, then we can discuss it in our team.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧩 Analysis chain

🏁 Script executed:

gh issue create \
  --repo ironcore-dev/metal-operator \
  --title "Introduce ErrNotSupported sentinel error for unimplemented vendor BMC methods" \
  --body "## Summary

Controllers such as \`internal/controller/bmcversion_controller.go\` and \`internal/controller/biosversion_controller.go\` call vendor-specific BMC client methods (e.g. \`CheckBMCPendingComponentUpgrade\`) and currently treat **all** errors identically — including the base \`RedfishBaseBMC\` fallback that returns a "not supported by this manufacturer" error. This means transient Redfish/FirmwareInventory failures are silently swallowed and the upgrade proceeds, which is the same behavior as a legitimate "not supported" response.

## Motivation

Introducing a dedicated \`ErrNotSupported\` sentinel error would allow controllers to:
- **Ignore** the "not supported by vendor" case (current behavior, intentional)
- **Requeue** on transient or real check failures instead of blindly proceeding with the upgrade

This improves safety: a temporary FirmwareInventory unavailability would no longer silently bypass the pending-upgrade guard.

## Scope

- Define an \`ErrNotSupported\` sentinel (e.g. in \`bmc/bmc.go\` or a shared errors file)
- Update \`RedfishBaseBMC.CheckBMCPendingComponentUpgrade\` (and similar fallback methods) in \`bmc/redfish.go\` to wrap/return \`ErrNotSupported\`
- Update affected controllers (\`bmcversion_controller.go\`, \`biosversion_controller.go\`, and any future controllers) to use \`errors.Is(err, bmc.ErrNotSupported)\` to distinguish the two cases
- Consider applying the same pattern to other vendor-specific BMC interface methods that currently have a "not supported" fallback

## References

- PR `#733` (where this was first discussed): https://github.com/ironcore-dev/metal-operator/pull/733
- Specific comment: https://github.com/ironcore-dev/metal-operator/pull/733#discussion_r2910355313

/cc `@davidgrun`" \
  --assignee davidgrun

Length of output: 2380


🏁 Script executed:

cat > /tmp/issue_body.md << 'EOF'
## Summary

Controllers such as `internal/controller/bmcversion_controller.go` and `internal/controller/biosversion_controller.go` call vendor-specific BMC client methods (e.g. `CheckBMCPendingComponentUpgrade`) and currently treat **all** errors identically — including the base `RedfishBaseBMC` fallback that returns a "not supported by this manufacturer" error. This means transient Redfish/FirmwareInventory failures are silently swallowed and the upgrade proceeds, which is the same behavior as a legitimate "not supported" response.

## Motivation

Introducing a dedicated `ErrNotSupported` sentinel error would allow controllers to:
- **Ignore** the "not supported by vendor" case (current behavior, intentional)
- **Requeue** on transient or real check failures instead of blindly proceeding with the upgrade

This improves safety: a temporary FirmwareInventory unavailability would no longer silently bypass the pending-upgrade guard.

## Scope

- Define an `ErrNotSupported` sentinel (e.g. in `bmc/bmc.go` or a shared errors file)
- Update `RedfishBaseBMC.CheckBMCPendingComponentUpgrade` (and similar fallback methods) in `bmc/redfish.go` to wrap/return `ErrNotSupported`
- Update affected controllers (`bmcversion_controller.go`, `biosversion_controller.go`, and any future controllers) to use `errors.Is(err, bmc.ErrNotSupported)` to distinguish the two cases
- Consider applying the same pattern to other vendor-specific BMC interface methods that currently have a "not supported" fallback

## References

- PR `#733` (where this was first discussed): https://github.com/ironcore-dev/metal-operator/pull/733
- Specific comment: https://github.com/ironcore-dev/metal-operator/pull/733#discussion_r2910355313

/cc `@davidgrun`
EOF

gh issue create \
  --repo ironcore-dev/metal-operator \
  --title "Introduce ErrNotSupported sentinel error for unimplemented vendor BMC methods" \
  --body-file /tmp/issue_body.md \
  --assignee davidgrun

Length of output: 57


@davidgrun, the follow-up issue has been created and assigned to you: #735

It covers the full scope of the proposed improvement:

  • Defining an ErrNotSupported sentinel in the bmc package
  • Updating RedfishBaseBMC fallback methods to wrap/return it
  • Updating the affected controllers to use errors.Is(err, bmc.ErrNotSupported) to distinguish "not supported by vendor" from transient failures
  • Considering the same pattern for other vendor-specific BMC interface methods

Feel free to discuss and prioritize it with your team! 🐇


🧠 Learnings used
Learnt from: davidgrun
Repo: ironcore-dev/metal-operator PR: 733
File: internal/controller/bmcversion_controller.go:302-309
Timestamp: 2026-03-11T09:51:11.302Z
Learning: In internal/controller/*.go, implement and utilize a dedicated ErrNotSupported sentinel error to distinguish vendor-not-supported scenarios from transient errors. Update controller logic (e.g., bmcversion_controller.go, biosversion_controller.go) to requeue or handle differently when ErrNotSupported is returned, while preserving current behavior for transient errors. This promotes clearer error handling and future-proofing for vendor-specific BMC methods.

Learnt from: CR
Repo: ironcore-dev/metal-operator PR: 0
File: AGENTS.md:0-0
Timestamp: 2026-02-17T13:12:50.124Z
Learning: Applies to internal/controller/**/*_controller.go : Implement idempotent reconciliation logic - safe to run the same reconciliation multiple times without side effects

Learnt from: nagadeesh-nagaraja
Repo: ironcore-dev/metal-operator PR: 695
File: internal/controller/bmcsettings_controller.go:720-730
Timestamp: 2026-03-04T10:32:17.473Z
Learning: In internal/controller/bmcsettings_controller.go and other controllers in the same directory, do not overwrite RetryOfFailedResourceConditionIssued when its Status is metav1.ConditionTrue. Preserve existing reason/message on re-entry and only update metadata if the condition is not yet true. This avoids unnecessary updates and retains historical context.

Learnt from: afritzler
Repo: ironcore-dev/metal-operator PR: 649
File: bmc/redfish_kube.go:380-392
Timestamp: 2026-02-04T16:31:13.180Z
Learning: The `bmc/redfish_kube.go` file in the ironcore-dev/metal-operator repository is used for local testing purposes and does not require vendor-specific workarounds (such as SuperMicro-specific boot mode handling) that are present in the production `bmc/redfish.go` implementation.
</learning>

Learnt from: afritzler
Repo: ironcore-dev/metal-operator PR: 607
File: bmc/redfish.go:205-208
Timestamp: 2026-01-22T09:49:09.470Z
Learning: In the ironcore-dev/metal-operator repository, SuperMicro systems require BootSourceOverrideMode to be unconditionally set to UEFI in the SetPXEBootOnce function (`bmc/redfish.go`), regardless of the current value or whether the mode is reported. This is a vendor-specific requirement that intentionally bypasses the general safeguard for systems that don't report BootSourceOverrideMode.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

Status: No status

Development

Successfully merging this pull request may close these issues.

2 participants