Skip to content

✨ Refresh cache entries on cache hit in SSA caches#13459

Merged
k8s-ci-robot merged 1 commit intokubernetes-sigs:mainfrom
sbueringer:improve-ssa-caching
Mar 16, 2026
Merged

✨ Refresh cache entries on cache hit in SSA caches#13459
k8s-ci-robot merged 1 commit intokubernetes-sigs:mainfrom
sbueringer:improve-ssa-caching

Conversation

@sbueringer
Copy link
Copy Markdown
Member

Signed-off-by: Stefan Büringer buringerst@vmware.com

What this PR does / why we need it:

Which issue(s) this PR fixes (optional, in fixes #<issue number>(, fixes #<issue_number>, ...) format, will close the issue(s) when PR gets merged):
Part of #13305

@k8s-ci-robot k8s-ci-robot added do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. do-not-merge/needs-area PR is missing an area label labels Mar 16, 2026
@k8s-ci-robot
Copy link
Copy Markdown
Contributor

This PR is currently missing an area label, which is used to identify the modified component when generating release notes.

Area labels can be added by org members by writing /area ${COMPONENT} in a comment

Please see the labels list for possible areas.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@k8s-ci-robot k8s-ci-robot added the size/XS Denotes a PR that changes 0-9 lines, ignoring generated files. label Mar 16, 2026
@sbueringer sbueringer added area/util Issues or PRs related to utils and removed size/XS Denotes a PR that changes 0-9 lines, ignoring generated files. do-not-merge/needs-area PR is missing an area label labels Mar 16, 2026
@k8s-ci-robot k8s-ci-robot added the cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. label Mar 16, 2026
@sbueringer sbueringer mentioned this pull request Mar 16, 2026
31 tasks
@sbueringer
Copy link
Copy Markdown
Member Author

/test pull-cluster-api-e2e-blocking-main
/test pull-cluster-api-e2e-conformance-ci-latest-main
/test pull-cluster-api-e2e-conformance-main
/test pull-cluster-api-e2e-latestk8s-main
/test pull-cluster-api-e2e-main
/test pull-cluster-api-e2e-main-gke
/test pull-cluster-api-e2e-mink8s-main
/test pull-cluster-api-e2e-upgrade-1-35-1-36-main

@sbueringer sbueringer force-pushed the improve-ssa-caching branch from eb7cf59 to b119620 Compare March 16, 2026 10:41
@k8s-ci-robot k8s-ci-robot added the size/XS Denotes a PR that changes 0-9 lines, ignoring generated files. label Mar 16, 2026
@sbueringer sbueringer changed the title [WIP] ✨ Refresh cache entries on cache hit in SSA caches ✨ Refresh cache entries on cache hit in SSA caches Mar 16, 2026
@k8s-ci-robot k8s-ci-robot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Mar 16, 2026
@sbueringer
Copy link
Copy Markdown
Member Author

/test pull-cluster-api-e2e-blocking-main
/test pull-cluster-api-e2e-conformance-ci-latest-main
/test pull-cluster-api-e2e-conformance-main
/test pull-cluster-api-e2e-latestk8s-main
/test pull-cluster-api-e2e-main
/test pull-cluster-api-e2e-main-gke
/test pull-cluster-api-e2e-mink8s-main
/test pull-cluster-api-e2e-upgrade-1-35-1-36-main

const (
// ttl is the duration for which we keep the keys in the cache.
ttl = 10 * time.Minute
// The ttl is a multiple of the syncPeriod to ensure we have a few chances
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// The ttl is a multiple of the syncPeriod to ensure we have a few chances
// The ttl is a multiple of the default syncPeriod to ensure we have a few chances

May be as a follow up / future work we might try to introduce a direct link between the two values, but not required at this stage (I'm not aware of people changing default syncPeriod nor with any issue with cache ttl)

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, it's not ideal. On the other side I also didn't want to allow users to fine-tune this cache TTL (either directly with a flag or indirectly via sync period)

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(added to the umbrella issue)

// Note: We only add an identifier to the cache if the result of the dry run was no diff.
if exists := dryRunCtx.ssaCache.Has(requestIdentifier, dryRunCtx.originalUnstructured.GetKind()); exists {
// Refresh the cache entry so we don't have to execute the dry-runs again after the cache TTL.
dryRunCtx.ssaCache.Add(requestIdentifier)
Copy link
Copy Markdown
Member

@fabriziopandini fabriziopandini Mar 16, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(Q: not directly related to this change, but related on how we use ssaCache)

We when there is no item in cache, and we detect there is a diff after the dry run call, we are not adding an item in cache.

I assume this because at this stage (after a dry run call) we don't know what the target resource version will be after the diff is applied, so we have to wait for another dry run call before adding the new version to the item to the cache.

Now, the question is: Can we avoid the second dry call by adding items to cache after Patch?

Copy link
Copy Markdown
Member Author

@sbueringer sbueringer Mar 16, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I assume this because at this stage (after a dry run call) we don't know what the target resource version will be after the diff is applied, so we have to go a second dry run call before adding to the cache.

Three's also no guarantee that the Patch call succeeds.

Now, the question is: Can we avoid the second dry call by adding items to cache after Patch?

I don't know if the assumption that after a Patch the next Patch wouldn't change anything is safe.

(because I don't know if the fact that we did a patch is equivalent to all the dry-run logic determining that another patch wouldn't lead to a diff)

Copy link
Copy Markdown
Member Author

@sbueringer sbueringer Mar 16, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But I assume it's not really worth further optimizing, the additional dry-runs that we could safe should be negligible compared to that we are not going to do any further applies once the object is stable.

For all the cases where further applies are necessary we wouldn't save anything. Only for the cases where everything becomes stable we save some calls

Copy link
Copy Markdown
Member Author

@sbueringer sbueringer Mar 16, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On top of that, consider:

  • object resourceVersion: 1
  • Cluster topology reconcile starts
  • someone else modifies the object resourceVersion: 2
  • Cluster topology controller patches, resourceVersion: 3

If we would now cache based on RV 3 we would miss whatever might have changed in RV 2

(Maybe that's not a problem, but I think the potential benefit is not worth the risk. Considering that we now cache ~ forever we should really not make any mistakes when adding a cache entry)

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, may be let's keep track of the idea in the umbrella issue

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added to the umbrella issue

Signed-off-by: Stefan Büringer buringerst@vmware.com
@sbueringer sbueringer force-pushed the improve-ssa-caching branch from b119620 to 4348956 Compare March 16, 2026 12:02
@fabriziopandini
Copy link
Copy Markdown
Member

/lgtm
/approve

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Mar 16, 2026
@k8s-ci-robot
Copy link
Copy Markdown
Contributor

LGTM label has been added.

DetailsGit tree hash: c2e380d7dba1063bab241abab00a3f60b1e3fe43

@k8s-ci-robot
Copy link
Copy Markdown
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: fabriziopandini

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Mar 16, 2026
@k8s-ci-robot k8s-ci-robot merged commit 84ffd0f into kubernetes-sigs:main Mar 16, 2026
18 checks passed
@k8s-ci-robot k8s-ci-robot added this to the v1.13 milestone Mar 16, 2026
@sbueringer sbueringer deleted the improve-ssa-caching branch March 16, 2026 15:14
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by an approver from all required OWNERS files. area/util Issues or PRs related to utils cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. lgtm "Looks good to me", indicates that a PR is ready to be merged. size/XS Denotes a PR that changes 0-9 lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants