Skip to content

Allow long lived syncs to be cancelled if client has gone away#19499

Merged
anoadragon453 merged 12 commits intodevelopfrom
erikj/cancel_long_syncs
Feb 26, 2026
Merged

Allow long lived syncs to be cancelled if client has gone away#19499
anoadragon453 merged 12 commits intodevelopfrom
erikj/cancel_long_syncs

Conversation

@erikjohnston
Copy link
Member

We use a ResponseCache to handle long running initial syncs, which in extreme cases can take up to an hour. The cache correctly handles clients repeatedly retrying requests by starting the response calculation once and returning references to that when client retries a request.

However, if there is an hour long sync and the client stops retrying we'll currently continue the response calculation to completion. This is a waste as the response will likely expire before the client comes back.

This PR updates ResponseCache such that it will cancel the wrapped function if nothing is waiting for the entry for longer than the specified timeout.

Note: I've asserted that the sync calculations are @cancellable, but I've not actually checked this. I think it should be safe given its a read API.

Reviewable commit-by-commit.

@erikjohnston erikjohnston marked this pull request as ready for review February 25, 2026 14:18
@erikjohnston erikjohnston requested a review from a team as a code owner February 25, 2026 14:18
If a timeout is specified and the wrapped function is cancellable, we
want to cancel the wrapped function if no consumers wait on the key for
more than the timeout.

This handles the case where we are wrapping an expensive function on the
assumption that clients will retry, but the client has given up and will
not consume the response. In this case we want to cancel the processing
rather than further using resources.
@erikjohnston erikjohnston force-pushed the erikj/cancel_long_syncs branch from 40f5b54 to d6ec59e Compare February 25, 2026 14:44
Copy link
Member

@anoadragon453 anoadragon453 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for taking this on. Hopefully will help quite a bit with long-running transactions.

Can we do message search next? 😁

continue

if now - entry.last_observer_removed_time_ms > self.timeout.as_millis():
self._metrics.inc_evictions(EvictionReason.time)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we introduce another EvictionReason for "client left"?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I kinda think its close enough TBH, but we can if you want?

I'm not sure *why* the previous approach failed, but it did. Doing it
like this seems to work?!?!?!?!
Copy link
Member

@anoadragon453 anoadragon453 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This now LGTM :shipit:

@anoadragon453 anoadragon453 merged commit 2c73e8d into develop Feb 26, 2026
79 of 81 checks passed
@anoadragon453 anoadragon453 deleted the erikj/cancel_long_syncs branch February 26, 2026 21:41
alexlebens pushed a commit to alexlebens/infrastructure that referenced this pull request Mar 10, 2026
This PR contains the following updates:

| Package | Update | Change |
|---|---|---|
| [element-hq/synapse](https://github.com/element-hq/synapse) | minor | `v1.148.0` → `v1.149.0` |

---

### Release Notes

<details>
<summary>element-hq/synapse (element-hq/synapse)</summary>

### [`v1.149.0`](https://github.com/element-hq/synapse/releases/tag/v1.149.0)

[Compare Source](element-hq/synapse@v1.148.0...v1.149.0)

### Synapse 1.149.0 (2026-03-10)

No significant changes since 1.149.0rc1.

### Synapse 1.149.0rc1 (2026-03-03)

#### Features

- Add experimental support for [MSC4388: Secure out-of-band channel for sign in with QR](matrix-org/matrix-spec-proposals#4388). ([#&#8203;19127](element-hq/synapse#19127))
- Add stable support for [MSC4380](matrix-org/matrix-spec-proposals#4380) invite blocking. ([#&#8203;19431](element-hq/synapse#19431))

#### Bugfixes

- Fix the 'Login as a user' Admin API not checking if the user exists before issuing an access token. ([#&#8203;18518](element-hq/synapse#18518))
- Fix `/sync` missing membership event in `state_after` (experimental [MSC4222](matrix-org/matrix-spec-proposals#4222) implementation) in some scenarios. ([#&#8203;19460](element-hq/synapse#19460))

#### Internal Changes

- Add log to explain when and why we freeze objects in the garbage collector. ([#&#8203;19440](element-hq/synapse#19440))
- Better instrument `JoinRoomAliasServlet` with tracing. ([#&#8203;19461](element-hq/synapse#19461))
- Fix Complement CI not running against the code from our PRs. ([#&#8203;19475](element-hq/synapse#19475))
- Log `docker system info` in CI so we have a plain record of how GitHub runners evolve over time. ([#&#8203;19480](element-hq/synapse#19480))
- Rename the `test_disconnect` test helper so that pytest doesn't see it as a test. ([#&#8203;19486](element-hq/synapse#19486))
- Add a log line when we delete devices. Contributed by [@&#8203;bradtgmurray](https://github.com/bradtgmurray) @&#8203; Beeper. ([#&#8203;19496](element-hq/synapse#19496))
- Pre-allocate the buffer based on the expected `Content-Length` with the Rust HTTP client. ([#&#8203;19498](element-hq/synapse#19498))
- Cancel long-running sync requests if the client has gone away. ([#&#8203;19499](element-hq/synapse#19499))
- Try and reduce reactor tick times when under heavy load. ([#&#8203;19507](element-hq/synapse#19507))
- Simplify Rust HTTP client response streaming and limiting. ([#&#8203;19510](element-hq/synapse#19510))
- Replace deprecated collection import locations with current locations. ([#&#8203;19515](element-hq/synapse#19515))
- Bump most locked Python dependencies to their latest versions. ([#&#8203;19519](element-hq/synapse#19519))

</details>

---

### Configuration

📅 **Schedule**: Branch creation - At any time (no schedule defined), Automerge - At any time (no schedule defined).

🚦 **Automerge**: Disabled by config. Please merge this manually once you are satisfied.

♻ **Rebasing**: Whenever PR is behind base branch, or you tick the rebase/retry checkbox.

🔕 **Ignore**: Close this PR and you won't be reminded about this update again.

---

 - [ ] <!-- rebase-check -->If you want to rebase/retry this PR, check this box

---

This PR has been generated by [Renovate Bot](https://github.com/renovatebot/renovate).
<!--renovate-debug:eyJjcmVhdGVkSW5WZXIiOiI0My41OS4yIiwidXBkYXRlZEluVmVyIjoiNDMuNTkuMiIsInRhcmdldEJyYW5jaCI6Im1haW4iLCJsYWJlbHMiOlsiaW1hZ2UiXX0=-->

Reviewed-on: https://gitea.alexlebens.dev/alexlebens/infrastructure/pulls/4580
Co-authored-by: Renovate Bot <renovate-bot@alexlebens.net>
Co-committed-by: Renovate Bot <renovate-bot@alexlebens.net>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants