✨ Refresh cache entries on cache hit in SSA caches by sbueringer · Pull Request #13459 · kubernetes-sigs/cluster-api

sbueringer · 2026-03-16T10:22:54Z

Signed-off-by: Stefan Büringer buringerst@vmware.com

What this PR does / why we need it:

Which issue(s) this PR fixes (optional, in fixes #<issue number>(, fixes #<issue_number>, ...) format, will close the issue(s) when PR gets merged):
Part of #13305

k8s-ci-robot · 2026-03-16T10:23:02Z

This PR is currently missing an area label, which is used to identify the modified component when generating release notes.

Area labels can be added by org members by writing /area ${COMPONENT} in a comment

Please see the labels list for possible areas.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

sbueringer · 2026-03-16T10:27:16Z

/test pull-cluster-api-e2e-blocking-main
/test pull-cluster-api-e2e-conformance-ci-latest-main
/test pull-cluster-api-e2e-conformance-main
/test pull-cluster-api-e2e-latestk8s-main
/test pull-cluster-api-e2e-main
/test pull-cluster-api-e2e-main-gke
/test pull-cluster-api-e2e-mink8s-main
/test pull-cluster-api-e2e-upgrade-1-35-1-36-main

sbueringer · 2026-03-16T10:44:06Z

/test pull-cluster-api-e2e-blocking-main
/test pull-cluster-api-e2e-conformance-ci-latest-main
/test pull-cluster-api-e2e-conformance-main
/test pull-cluster-api-e2e-latestk8s-main
/test pull-cluster-api-e2e-main
/test pull-cluster-api-e2e-main-gke
/test pull-cluster-api-e2e-mink8s-main
/test pull-cluster-api-e2e-upgrade-1-35-1-36-main

fabriziopandini · 2026-03-16T11:27:28Z

internal/util/ssa/cache.go

 const (
 	// ttl is the duration for which we keep the keys in the cache.
-	ttl = 10 * time.Minute
+	// The ttl is a multiple of the syncPeriod to ensure we have a few chances


Suggested change

// The ttl is a multiple of the syncPeriod to ensure we have a few chances

// The ttl is a multiple of the default syncPeriod to ensure we have a few chances

May be as a follow up / future work we might try to introduce a direct link between the two values, but not required at this stage (I'm not aware of people changing default syncPeriod nor with any issue with cache ttl)

Yeah, it's not ideal. On the other side I also didn't want to allow users to fine-tune this cache TTL (either directly with a flag or indirectly via sync period)

(added to the umbrella issue)

fabriziopandini · 2026-03-16T11:41:51Z

internal/controllers/topology/cluster/structuredmerge/dryrun.go

 	// Note: We only add an identifier to the cache if the result of the dry run was no diff.
 	if exists := dryRunCtx.ssaCache.Has(requestIdentifier, dryRunCtx.originalUnstructured.GetKind()); exists {
+		// Refresh the cache entry so we don't have to execute the dry-runs again after the cache TTL.
+		dryRunCtx.ssaCache.Add(requestIdentifier)


(Q: not directly related to this change, but related on how we use ssaCache)

We when there is no item in cache, and we detect there is a diff after the dry run call, we are not adding an item in cache.

I assume this because at this stage (after a dry run call) we don't know what the target resource version will be after the diff is applied, so we have to wait for another dry run call before adding the new version to the item to the cache.

Now, the question is: Can we avoid the second dry call by adding items to cache after Patch?

I assume this because at this stage (after a dry run call) we don't know what the target resource version will be after the diff is applied, so we have to go a second dry run call before adding to the cache.

Three's also no guarantee that the Patch call succeeds.

Now, the question is: Can we avoid the second dry call by adding items to cache after Patch?

I don't know if the assumption that after a Patch the next Patch wouldn't change anything is safe.

(because I don't know if the fact that we did a patch is equivalent to all the dry-run logic determining that another patch wouldn't lead to a diff)

But I assume it's not really worth further optimizing, the additional dry-runs that we could safe should be negligible compared to that we are not going to do any further applies once the object is stable.

For all the cases where further applies are necessary we wouldn't save anything. Only for the cases where everything becomes stable we save some calls

On top of that, consider:

object resourceVersion: 1

Cluster topology reconcile starts

someone else modifies the object resourceVersion: 2

Cluster topology controller patches, resourceVersion: 3

If we would now cache based on RV 3 we would miss whatever might have changed in RV 2

(Maybe that's not a problem, but I think the potential benefit is not worth the risk. Considering that we now cache ~ forever we should really not make any mistakes when adding a cache entry)

Ok, may be let's keep track of the idea in the umbrella issue

Added to the umbrella issue

Signed-off-by: Stefan Büringer buringerst@vmware.com

fabriziopandini · 2026-03-16T13:31:45Z

/lgtm
/approve

k8s-ci-robot · 2026-03-16T13:31:54Z

LGTM label has been added.

Details

Git tree hash: c2e380d7dba1063bab241abab00a3f60b1e3fe43

k8s-ci-robot · 2026-03-16T13:31:55Z

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: fabriziopandini

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details

Needs approval from an approver in each of these files:

~~OWNERS~~ [fabriziopandini]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

k8s-ci-robot added do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. do-not-merge/needs-area PR is missing an area label labels Mar 16, 2026

k8s-ci-robot requested review from richardcase and vincepri March 16, 2026 10:23

k8s-ci-robot added the size/XS Denotes a PR that changes 0-9 lines, ignoring generated files. label Mar 16, 2026

sbueringer added area/util Issues or PRs related to utils and removed size/XS Denotes a PR that changes 0-9 lines, ignoring generated files. do-not-merge/needs-area PR is missing an area label labels Mar 16, 2026

k8s-ci-robot added the cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. label Mar 16, 2026

sbueringer mentioned this pull request Mar 16, 2026

Scale testing #13305

Open

31 tasks

sbueringer force-pushed the improve-ssa-caching branch from eb7cf59 to b119620 Compare March 16, 2026 10:41

k8s-ci-robot added the size/XS Denotes a PR that changes 0-9 lines, ignoring generated files. label Mar 16, 2026

sbueringer changed the title ~~[WIP] ✨ Refresh cache entries on cache hit in SSA caches~~ ✨ Refresh cache entries on cache hit in SSA caches Mar 16, 2026

k8s-ci-robot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Mar 16, 2026

fabriziopandini reviewed Mar 16, 2026

View reviewed changes

Refresh cache entries on cache hit in SSA caches

4348956

Signed-off-by: Stefan Büringer buringerst@vmware.com

sbueringer force-pushed the improve-ssa-caching branch from b119620 to 4348956 Compare March 16, 2026 12:02

k8s-ci-robot assigned fabriziopandini Mar 16, 2026

k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Mar 16, 2026

k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Mar 16, 2026

k8s-ci-robot merged commit 84ffd0f into kubernetes-sigs:main Mar 16, 2026
18 checks passed

k8s-ci-robot added this to the v1.13 milestone Mar 16, 2026

sbueringer deleted the improve-ssa-caching branch March 16, 2026 15:14

	// The ttl is a multiple of the syncPeriod to ensure we have a few chances
	// The ttl is a multiple of the default syncPeriod to ensure we have a few chances

Conversation

sbueringer commented Mar 16, 2026

Uh oh!

k8s-ci-robot commented Mar 16, 2026

Uh oh!

sbueringer commented Mar 16, 2026

Uh oh!

sbueringer commented Mar 16, 2026

Uh oh!

fabriziopandini Mar 16, 2026

Choose a reason for hiding this comment

Uh oh!

sbueringer Mar 16, 2026

Choose a reason for hiding this comment

Uh oh!

sbueringer Mar 16, 2026

Choose a reason for hiding this comment

Uh oh!

fabriziopandini Mar 16, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

sbueringer Mar 16, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

sbueringer Mar 16, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

sbueringer Mar 16, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

fabriziopandini Mar 16, 2026

Choose a reason for hiding this comment

Uh oh!

sbueringer Mar 16, 2026

Choose a reason for hiding this comment

Uh oh!

fabriziopandini commented Mar 16, 2026

Uh oh!

k8s-ci-robot commented Mar 16, 2026

Uh oh!

k8s-ci-robot commented Mar 16, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

fabriziopandini Mar 16, 2026 •

edited

Loading

sbueringer Mar 16, 2026 •

edited

Loading

sbueringer Mar 16, 2026 •

edited

Loading

sbueringer Mar 16, 2026 •

edited

Loading