fix: handle resource snapshot missing but work already synced and add cro/ro annotation to all the works#936
Conversation
78db570 to
251b3a3
Compare
michaelawyu
left a comment
There was a problem hiding this comment.
Added a few comments. LGTM ;)
|
|
||
| // areAllWorkSynced checks if all the works are synced with the resource binding. | ||
| func areAllWorkSynced(existingWorks map[string]*fleetv1beta1.Work, resourceBinding *fleetv1beta1.ClusterResourceBinding, _, _ string) bool { | ||
| syncedCondition := resourceBinding.GetCondition(string(fleetv1beta1.ResourceBindingWorkSynchronized)) |
There was a problem hiding this comment.
Hi Ryan! For clusters that have been in the faulted state this condition might have been removed; of course we have instructed the impacted user to fix things manually and they are the only affected party we knew, so it might be alright
c7d247c to
a8225a9
Compare
a8225a9 to
f079f32
Compare
| Eventually(crpStatusActual, eventuallyDuration, eventuallyInterval).Should(Succeed(), "Failed to update CRP status as expected") | ||
| }) | ||
|
|
||
| It("update work to trigger a work generator reconcile", func() { |
There was a problem hiding this comment.
Hi Ryan! Is this step necessary? It's clearing the manifestConditions on the work objects, which will set the failedPlacements part in the resource binding status to nil, but this does not seem to be related to the test cause. Sorry if I missed anything.
There was a problem hiding this comment.
And normally the work object should have already been marked as unavailable after image change.
There was a problem hiding this comment.
if the work doesn't change, the already available binding won't get reconcile then its status will stay as "ready" then it won't hit this bug.
| if err != nil { | ||
| return err | ||
| } | ||
| testDeployment.Spec.Template.Spec.Containers[0].Image = "1.26.2" |
There was a problem hiding this comment.
Ah, Ryan, I think this would be still be an invalid image name? But it does align with the situation Infosys encountered (which is, even if the old snapshot has been removed and all clusters are failing, updates should still be processed).
There was a problem hiding this comment.
this image is valid
| binding.Spec.ResourceSnapshotName = "next" | ||
| Expect(k8sClient.Update(ctx, binding)).Should(Succeed()) | ||
| updateRolloutStartedGeneration(&binding) | ||
| // check the binding status that it should be marked as override succeed but not synced |
There was a problem hiding this comment.
Hi Ryan! Just a nit: here it means rollout started but not overridden, right?
Description of your changes
Fixes #
I have:
make reviewableto ensure this PR is ready for review.How has this code been tested
Special notes for your reviewer