Conversation
|
Hi @alezkv Thanks for the contribution! Can you please do the DCO? Once all the checks go through this can be merged! |
Problem
Two k8up schedulers in the same namespace (daily remote jobs, hourly local jobs) were interfering with each other's cleanup cycles. The local scheduler's cleanup was deleting remote scheduler's jobs because
the cleanup logic only filtered by namespace, not by job ownership.
Root Cause
In operator/executor/generic.go, the listOldResources() method was only filtering jobs by namespace:
err := g.Client.List(ctx, objList, &client.ListOptions{
Namespace: namespace, // Only namespace filtering - no ownership check
})
This allowed one scheduler's cleanup to see and delete jobs belonging to other schedulers in the same namespace.
Solution
Fixed ownership filtering using existing label-based approach:
1. Modified listOldResources() to filter by the k8up.io/owned-by label:
err := g.Client.List(ctx, objList, client.MatchingLabels{k8upv1.LabelK8upOwnedBy: ownedBy}, client.InNamespace(namespace))
2. Enhanced CleanupOldResources() to construct ownership identifier:
jobObj, ok := limits.(k8upv1.JobObject)
ownedBy := fmt.Sprintf("%s_%s", jobObj.GetType().String(), jobObj.GetName())
Fixes k8up-io#1053
Signed-off-by: Aleksey Zhukov <353748+alezkv@users.noreply.github.com>
07d2747 to
445f552
Compare
|
HI @alezkv The linter is not quite happy yet. There are some functions that seem unused, can you please remove them? For the error about using a switch statement: feel free to ignore it via a |
|
@Kidswiss Yes, I understand. I've already resolved the issue locally, but I'm still trying to identify the end-to-end failures. |
|
It seems that my initial assumption was incorrect. The currently proposed changes did not resolve the issue and may have introduced a regression in the cleaning. It appears that these changes involve more than just a bug fix, they may require architectural refactoring. I would appreciate any input from the code owners @Kidswiss Current state of the code:
I didn't tested this flow but according to code I think currently it's working like so. Let's assume all jobs succeeds.
What issues I see in current approach:
|
|
Hi @alezkv Thanks for looking into this.
I'd say that's the main issue. It should probably be the schedule's responsibility to clean-up the jobs. Especially now that we support multiple schedules per namespace. So ideally it should do it like this:
This way the schedule will clean all the jobs that belong to it. But any manually created jobs should also be manually cleaned up again. What do you think? |
|
Thanks for this contribution and the thorough write-up! The fix for the job cleanup ownership filtering looks solid. It looks like CI is failing (lint and e2e) — likely due to the branch being based on an older master. Could you rebase onto current master and force-push? That should resolve the CI issues. Also, to be able to merge it, we need the commits to be signed off (DCO — Developer Certificate of Origin), which is a requirement for all CNCF projects. You can do this by rebasing with: (where N is the number of commits in this PR) and then force-pushing to your branch. If you have any questions about this, feel free to ask! |
Summary
Two k8up schedulers in the same namespace (daily remote jobs, hourly local jobs) were interfering with each other's cleanup cycles. The local scheduler's cleanup was deleting remote scheduler's jobs because the cleanup logic only filtered by namespace, not by job ownership.
Root Cause
In operator/executor/generic.go, the listOldResources() method was only filtering jobs by namespace: err := g.Client.List(ctx, objList, &client.ListOptions{
Namespace: namespace, // Only namespace filtering - no ownership check
})
This allowed one scheduler's cleanup to see and delete jobs belonging to other schedulers in the same namespace.
Solution
Fixed ownership filtering using existing label-based approach:
ownedBy := fmt.Sprintf("%s_%s", jobObj.GetType().String(), jobObj.GetName())
Fixes #1053
Checklist
For Code changes
bug,enhancement,documentation,change,breaking,dependencyas they show up in the changelog
area:operatorcharts/directory.