Skip to content

OCPBUGS-81503: Use merge patches instead of full updates for BMH and agent#10092

Closed
alegacy wants to merge 1 commit into
openshift:masterfrom
alegacy:fix/use-patch-instead-of-update
Closed

OCPBUGS-81503: Use merge patches instead of full updates for BMH and agent#10092
alegacy wants to merge 1 commit into
openshift:masterfrom
alegacy:fix/use-patch-instead-of-update

Conversation

@alegacy
Copy link
Copy Markdown
Contributor

@alegacy alegacy commented Mar 31, 2026

The BMAC controller used client.Update() to write BMH and Agent objects back to the API server. This sends the entire object, which risks overwriting fields set by other controllers (e.g., siteconfig) — if assisted-service is compiled against an older version of the BMH CRD that lacks newer fields.

Switch handleReconcileResult to use client.Patch with client.MergeFrom, which sends only the delta between the pre-mutation snapshot and the current state. This ensures fields unknown to assisted-service's Go structs are left untouched on the server.

List all the issues related to this PR

  • New Feature
  • Enhancement
  • Bug fix
  • Tests
  • Documentation
  • CI/CD

What environments does this code impact?

  • Automation (CI, tools, etc)
  • Cloud
  • Operator Managed Deployments
  • [] None

How was this code tested?

  • assisted-test-infra environment
  • dev-scripts environment
  • Reviewer's test appreciated
  • Waiting for CI to do a full test run
  • Manual (Elaborate on how it was tested)
  • [] No tests needed

Checklist

  • Title and description added to both, commit and PR.
  • Relevant issues have been associated (see CONTRIBUTING guide)
  • This change does not require a documentation update (docstring, docs, README, etc)
  • Does this change include unit-tests (note that code changes require unit-tests)

Reviewers Checklist

  • Are the title and description (in both PR and commit) meaningful and clear?
  • Is there a bug required (and linked) for this change?
  • Should this PR be backported?

@openshift-ci-robot openshift-ci-robot added jira/severity-moderate Referenced Jira bug's severity is moderate for the branch this PR is targeting. jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. jira/valid-bug Indicates that a referenced Jira bug is valid for the branch this PR is targeting. labels Mar 31, 2026
@openshift-ci-robot
Copy link
Copy Markdown

@alegacy: This pull request references Jira Issue OCPBUGS-81503, which is valid. The bug has been moved to the POST state.

3 validation(s) were run on this bug
  • bug is open, matching expected state (open)
  • bug target version (4.22.0) matches configured target version for branch (4.22.0)
  • bug is in the state New, which is one of the valid states (NEW, ASSIGNED, POST)

The bug has been updated to refer to the pull request using the external bug tracker.

Details

In response to this:

The BMAC controller used client.Update() to write BMH and Agent objects back to the API server. This sends the entire object, which risks overwriting fields set by other controllers (e.g., siteconfig) — if assisted-service is compiled against an older version of the BMH CRD that lacks newer fields.

Switch handleReconcileResult to use client.Patch with client.MergeFrom, which sends only the delta between the pre-mutation snapshot and the current state. This ensures fields unknown to assisted-service's Go structs are left untouched on the server.

List all the issues related to this PR

  • New Feature
  • Enhancement
  • Bug fix
  • Tests
  • Documentation
  • CI/CD

What environments does this code impact?

  • Automation (CI, tools, etc)
  • Cloud
  • Operator Managed Deployments
  • [] None

How was this code tested?

  • assisted-test-infra environment
  • dev-scripts environment
  • Reviewer's test appreciated
  • Waiting for CI to do a full test run
  • Manual (Elaborate on how it was tested)
  • [] No tests needed

Checklist

  • Title and description added to both, commit and PR.
  • Relevant issues have been associated (see CONTRIBUTING guide)
  • This change does not require a documentation update (docstring, docs, README, etc)
  • Does this change include unit-tests (note that code changes require unit-tests)

Reviewers Checklist

  • Are the title and description (in both PR and commit) meaningful and clear?
  • Is there a bug required (and linked) for this change?
  • Should this PR be backported?

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@openshift-ci openshift-ci Bot added the size/M Denotes a PR that changes 30-99 lines, ignoring generated files. label Mar 31, 2026
@openshift-ci openshift-ci Bot requested review from mlorenzofr and yoavsc0302 March 31, 2026 17:36
@openshift-ci
Copy link
Copy Markdown

openshift-ci Bot commented Mar 31, 2026

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: alegacy
Once this PR has been reviewed and has the lgtm label, please assign javipolo for approval. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Mar 31, 2026

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 449df05b-bf69-442a-892e-a5d4e26d23f5

📥 Commits

Reviewing files that changed from the base of the PR and between bbba150 and f4d23a5.

📒 Files selected for processing (2)
  • internal/controller/controllers/bmh_agent_controller.go
  • internal/controller/controllers/bmh_agent_controller_test.go

Walkthrough

The controller now persists partial changes using RFC6902-style merge patches instead of full resource updates. handleReconcileResult accepts a patchBase and Reconcile creates per-step DeepCopy snapshots (bmhPatchBase, agentPatchBase) used as patch bases before stages that may mutate BMH/Agent state.

Changes

Cohort / File(s) Summary
Patch-based reconciliation implementation
internal/controller/controllers/bmh_agent_controller.go
Replaced Client.Update(...) calls with Client.Patch(..., client.MergeFromWithOptions(patchBase, client.MergeFromWithOptimisticLock{})). handleReconcileResult signature now accepts patchBase client.Object. Reconcile creates and refreshes bmhPatchBase and agentPatchBase DeepCopy snapshots and passes them to handleReconcileResult before each BMH/Agent-mutating stage.
Test mock updates
internal/controller/controllers/bmh_agent_controller_test.go
Updated gomock expectations from Update(...) to Patch(...) for BareMetalHost and Agent. Adjusted DoAndReturn callbacks to forward to the client's Patch(...). Changed zero-call assertions to Patch(...) and added a test seeding an extra annotation to verify merge-patch preserves unmanaged fields.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~22 minutes

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (1)
internal/controller/controllers/bmh_agent_controller_test.go (1)

1200-1208: Add one regression that proves field-preserving behavior.

These assertions only prove that the write path changed. They do not verify the bug this PR fixes: preserving live-object fields that this binary does not know about. One regression that seeds an extra field on the stored object and verifies reconcile leaves it intact would lock that in.

Also applies to: 3426-3436, 3706-3707

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@internal/controller/controllers/bmh_agent_controller_test.go` around lines
1200 - 1208, The tests set up Patch expectations but don't assert that reconcile
preserves unknown/live fields; add a regression in bmh_agent_controller_test.go
that seeds an extra (non-modeled) field on the stored resource (e.g., add a
custom JSON field in the BareMetalHost or Agent object body/annotations) before
invoking the reconcile path used in the test, run the existing reconcile logic
(the same test flow that triggers
mockClient.EXPECT().Patch(...).DoAndReturn(...) for BareMetalHost and Agent) and
assert after reconcile that the extra field remains unchanged; reference the
existing mockClient.EXPECT().Patch handlers and the BareMetalHost/Agent objects
used in those expectations to find where to seed the extra field and where to
add the post-reconcile assertion.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@internal/controller/controllers/bmh_agent_controller.go`:
- Around line 221-225: The current patch call in handleReconcileResult uses
r.Client.Patch(ctx, obj, client.MergeFrom(patchBase)) which drops optimistic
locking; instead enforce resourceVersion-based optimistic concurrency by
updating the full object: set
obj.SetResourceVersion(patchBase.GetResourceVersion()) and call
r.Client.Update(ctx, obj) (handle conflicts as needed). Locate
handleReconcileResult and replace the Patch call using
client.MergeFrom(patchBase) with the Update approach so the original
resourceVersion from patchBase is preserved and Update will surface conflicts.

---

Nitpick comments:
In `@internal/controller/controllers/bmh_agent_controller_test.go`:
- Around line 1200-1208: The tests set up Patch expectations but don't assert
that reconcile preserves unknown/live fields; add a regression in
bmh_agent_controller_test.go that seeds an extra (non-modeled) field on the
stored resource (e.g., add a custom JSON field in the BareMetalHost or Agent
object body/annotations) before invoking the reconcile path used in the test,
run the existing reconcile logic (the same test flow that triggers
mockClient.EXPECT().Patch(...).DoAndReturn(...) for BareMetalHost and Agent) and
assert after reconcile that the extra field remains unchanged; reference the
existing mockClient.EXPECT().Patch handlers and the BareMetalHost/Agent objects
used in those expectations to find where to seed the extra field and where to
add the post-reconcile assertion.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: f36c44ce-d10c-40ad-a5bb-c599876d6e67

📥 Commits

Reviewing files that changed from the base of the PR and between ba89549 and 7d52c0b.

📒 Files selected for processing (2)
  • internal/controller/controllers/bmh_agent_controller.go
  • internal/controller/controllers/bmh_agent_controller_test.go

Comment thread internal/controller/controllers/bmh_agent_controller.go
@codecov
Copy link
Copy Markdown

codecov Bot commented Mar 31, 2026

Codecov Report

❌ Patch coverage is 85.71429% with 3 lines in your changes missing coverage. Please review.
✅ Project coverage is 44.28%. Comparing base (ba89549) to head (f4d23a5).
⚠️ Report is 6 commits behind head on master.

Files with missing lines Patch % Lines
...nal/controller/controllers/bmh_agent_controller.go 85.71% 0 Missing and 3 partials ⚠️
Additional details and impacted files

Impacted file tree graph

@@           Coverage Diff           @@
##           master   #10092   +/-   ##
=======================================
  Coverage   44.27%   44.28%           
=======================================
  Files         416      416           
  Lines       72549    72558    +9     
=======================================
+ Hits        32123    32132    +9     
  Misses      37521    37521           
  Partials     2905     2905           
Files with missing lines Coverage Δ
...nal/controller/controllers/bmh_agent_controller.go 75.97% <85.71%> (+0.19%) ⬆️

... and 3 files with indirect coverage changes

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@CrystalChun
Copy link
Copy Markdown
Contributor

Looks fine to me w/ coderabbit's suggestion 👍

@alegacy alegacy force-pushed the fix/use-patch-instead-of-update branch from 7d52c0b to bbba150 Compare March 31, 2026 22:01
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (1)
internal/controller/controllers/bmh_agent_controller.go (1)

277-350: Consider a helper for the patch-base lifecycle.

The repeated DeepCopy() + handleReconcileResult(...) sequence is easy to miss the next time a mutating reconcile step is added. Wrapping “snapshot → mutate → apply” in one helper would make stale-base mistakes harder.

As per coding guidelines, Focus on major issues impacting performance, readability, maintainability and security. Avoid nitpicks and avoid verbosity.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@internal/controller/controllers/bmh_agent_controller.go` around lines 277 -
350, The code repeatedly takes snapshots (bmhPatchBase/agentPatchBase =
obj.DeepCopy()), runs a mutating step (e.g. r.reconcileBMH,
r.reconcileAgentSpec, r.reconcileAgentInventory, r.ensureMCSCert,
r.reconcileDay2SpokeBMH) and then calls r.handleReconcileResult, which is
error-prone; introduce a helper (e.g. applyWithPatchBase) that accepts the
target runtime.Object (BMH/Agent), a mutation callback (func() ReconcileResult
or similar) and the logger/context, and inside the helper do obj.DeepCopy(),
invoke the mutation, then call r.handleReconcileResult with the original and
snapshot; replace each manual DeepCopy + call sequence (references:
bmhPatchBase, agentPatchBase, DeepCopy(), handleReconcileResult, reconcileBMH,
reconcileAgentSpec, reconcileAgentInventory, ensureMCSCert,
reconcileDay2SpokeBMH) with calls to this helper to centralize the
snapshot→mutate→apply pattern.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In `@internal/controller/controllers/bmh_agent_controller.go`:
- Around line 277-350: The code repeatedly takes snapshots
(bmhPatchBase/agentPatchBase = obj.DeepCopy()), runs a mutating step (e.g.
r.reconcileBMH, r.reconcileAgentSpec, r.reconcileAgentInventory,
r.ensureMCSCert, r.reconcileDay2SpokeBMH) and then calls
r.handleReconcileResult, which is error-prone; introduce a helper (e.g.
applyWithPatchBase) that accepts the target runtime.Object (BMH/Agent), a
mutation callback (func() ReconcileResult or similar) and the logger/context,
and inside the helper do obj.DeepCopy(), invoke the mutation, then call
r.handleReconcileResult with the original and snapshot; replace each manual
DeepCopy + call sequence (references: bmhPatchBase, agentPatchBase, DeepCopy(),
handleReconcileResult, reconcileBMH, reconcileAgentSpec,
reconcileAgentInventory, ensureMCSCert, reconcileDay2SpokeBMH) with calls to
this helper to centralize the snapshot→mutate→apply pattern.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 1fcda195-bd43-465e-80ed-a9ceedc036d1

📥 Commits

Reviewing files that changed from the base of the PR and between 7d52c0b and bbba150.

📒 Files selected for processing (2)
  • internal/controller/controllers/bmh_agent_controller.go
  • internal/controller/controllers/bmh_agent_controller_test.go

Comment thread internal/controller/controllers/bmh_agent_controller_test.go Outdated
…Agent

The BMAC controller used client.Update() to write BMH and Agent objects
back to the API server. This sends the entire object, which risks
overwriting fields set by other controllers (e.g., siteconfig) —
if assisted-service is compiled against an older version of the BMH CRD
that lacks newer fields.

Switch handleReconcileResult to use client.Patch with client.MergeFrom,
which sends only the delta between the pre-mutation snapshot and the
current state. This ensures fields unknown to assisted-service's Go
structs are left untouched on the server.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Signed-off-by: Allain Legacy <alegacy@redhat.com>
@alegacy alegacy force-pushed the fix/use-patch-instead-of-update branch from bbba150 to f4d23a5 Compare April 1, 2026 11:32
@rccrdpccl
Copy link
Copy Markdown
Contributor

Firstly, thank you @alegacy for this contribution!

We were thinking to implement patch as deferred method in the reconcile loop, and I think it would make perfect sense to take advantage of this issue to implement it.

I have drafted a PR with the help of Claude minions #10095 to convey the idea. Feel free to comment/contribute to it or take inspiration to complete this PR.

@CrystalChun please share your thoughts

FYI @carbonin as we were talking about it

@alegacy
Copy link
Copy Markdown
Contributor Author

alegacy commented Apr 1, 2026

Firstly, thank you @alegacy for this contribution!

We were thinking to implement patch as deferred method in the reconcile loop, and I think it would make perfect sense to take advantage of this issue to implement it.

I have drafted a PR with the help of Claude minions #10095 to convey the idea. Feel free to comment/contribute to it or take inspiration to complete this PR.

@CrystalChun please share your thoughts

FYI @carbonin as we were talking about it

Tested your fix in my lab system. Solves the same problem. I'm happy with your implementation if we can proceed with that as per our offline discussion.

@alegacy
Copy link
Copy Markdown
Contributor Author

alegacy commented Apr 1, 2026

/hold

...will close once the other PR is merged.

@openshift-ci openshift-ci Bot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Apr 1, 2026
@openshift-ci
Copy link
Copy Markdown

openshift-ci Bot commented Apr 1, 2026

@alegacy: The following test failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
ci/prow/e2e-agent-compact-ipv4 f4d23a5 link true /test e2e-agent-compact-ipv4

Full PR test history. Your PR dashboard.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

@alegacy
Copy link
Copy Markdown
Contributor Author

alegacy commented Apr 3, 2026

Superseded by #10095

@alegacy alegacy closed this Apr 3, 2026
@openshift-ci-robot
Copy link
Copy Markdown

@alegacy: This pull request references Jira Issue OCPBUGS-81503. The bug has been updated to no longer refer to the pull request using the external bug tracker. All external bug links have been closed. The bug has been moved to the NEW state.

Details

In response to this:

The BMAC controller used client.Update() to write BMH and Agent objects back to the API server. This sends the entire object, which risks overwriting fields set by other controllers (e.g., siteconfig) — if assisted-service is compiled against an older version of the BMH CRD that lacks newer fields.

Switch handleReconcileResult to use client.Patch with client.MergeFrom, which sends only the delta between the pre-mutation snapshot and the current state. This ensures fields unknown to assisted-service's Go structs are left untouched on the server.

List all the issues related to this PR

  • New Feature
  • Enhancement
  • Bug fix
  • Tests
  • Documentation
  • CI/CD

What environments does this code impact?

  • Automation (CI, tools, etc)
  • Cloud
  • Operator Managed Deployments
  • [] None

How was this code tested?

  • assisted-test-infra environment
  • dev-scripts environment
  • Reviewer's test appreciated
  • Waiting for CI to do a full test run
  • Manual (Elaborate on how it was tested)
  • [] No tests needed

Checklist

  • Title and description added to both, commit and PR.
  • Relevant issues have been associated (see CONTRIBUTING guide)
  • This change does not require a documentation update (docstring, docs, README, etc)
  • Does this change include unit-tests (note that code changes require unit-tests)

Reviewers Checklist

  • Are the title and description (in both PR and commit) meaningful and clear?
  • Is there a bug required (and linked) for this change?
  • Should this PR be backported?

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. jira/severity-moderate Referenced Jira bug's severity is moderate for the branch this PR is targeting. jira/valid-bug Indicates that a referenced Jira bug is valid for the branch this PR is targeting. jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. size/M Denotes a PR that changes 30-99 lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants