Skip to content

Fix looping calls not getting GCed. #19416

Merged
erikjohnston merged 4 commits intodevelopfrom
erikj/fix_looping_call_stops
Jan 30, 2026
Merged

Fix looping calls not getting GCed. #19416
erikjohnston merged 4 commits intodevelopfrom
erikj/fix_looping_call_stops

Conversation

@erikjohnston
Copy link
Member

@erikjohnston erikjohnston commented Jan 29, 2026

The Clock tracks looping calls to allow cancelling of all looping calls. However, this stopped them from getting garbage collected.

This was introduced in #18828

Fixes #19392

The `Clock` tracks looping calls to allow cancelling of all looping
calls. However, this stopped them from getting garbage collected.
That way when we log looping calls it points to the actual function,
rather than the wrapper.
@erikjohnston erikjohnston marked this pull request as ready for review January 29, 2026 16:29
@erikjohnston erikjohnston requested a review from a team as a code owner January 29, 2026 16:29
Copy link
Member

@devonh devonh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah - dang. Thanks for cleaning up this oversight of mine :)

Changes look good.

if now:
looping_call_context_string = "looping_call_now"

@wraps(f)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Neat. TIL about @wraps

@@ -0,0 +1 @@
Fix memory leak caused by not cleaning up stopped looping calls. Introduced in v1.140.0.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How did you stumble upon this?

Investigation from seeing some memory graph?

Copy link
Member

@devonh devonh Jan 29, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know it started with seeing the matrix.org client_readers take a long time to do GC, resulting in them appearing down frequently, https://github.com/matrix-org/internal-config/issues/1677
Then looking into why their memory usage was so high and kept growing.
Also, looking at this issue from the other day: #19392

I'm not sure how he jumped from there to looking at looping_calls.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The basic steps were:

  1. Check the graphs to see if there was any obvious correlation, e.g. large cache size.
  2. Manhole into one of the workers and manually clear out all the caches, this did not help.
  3. Install objgraph in the venv and run objgraph.show_most_common_types(), this came back with a list of cell, function (etc)... and then LoopingCall.
  4. At that point I had a look around trying to find where we keep references to looping calls, I just happened to stumble on Clock._looping_calls. One can also use objgraph to generate graphs of references of objects to see what is holding on to the references, which also would have pointed to Clock._looping_calls.

@erikjohnston erikjohnston merged commit 0dfcffa into develop Jan 30, 2026
76 of 80 checks passed
@erikjohnston erikjohnston deleted the erikj/fix_looping_call_stops branch January 30, 2026 10:26
alexlebens pushed a commit to alexlebens/infrastructure that referenced this pull request Feb 10, 2026
This PR contains the following updates:

| Package | Update | Change |
|---|---|---|
| [element-hq/synapse](https://github.com/element-hq/synapse) | minor | `1.146.0` → `1.147.0` |

---

### Release Notes

<details>
<summary>element-hq/synapse (element-hq/synapse)</summary>

### [`v1.147.0`](https://github.com/element-hq/synapse/releases/tag/v1.147.0)

[Compare Source](element-hq/synapse@v1.146.0...v1.147.0)

### Synapse 1.147.0 (2026-02-10)

No significant changes since 1.147.0rc1.

### Synapse 1.147.0rc1 (2026-02-03)

#### Bugfixes

- Fix memory leak caused by not cleaning up stopped looping calls. Introduced in v1.140.0. ([#&#8203;19416](element-hq/synapse#19416))
- Fix a typo that incorrectly made `setuptools_rust` a runtime dependency. ([#&#8203;19417](element-hq/synapse#19417))

#### Internal Changes

- Prune stale entries from `sliding_sync_connection_required_state` table. ([#&#8203;19306](element-hq/synapse#19306))
- Update "Event Send Time Quantiles" graph to only use dots for the event persistence rate (Grafana dashboard). ([#&#8203;19399](element-hq/synapse#19399))
- Update and align Grafana dashboard to use regex matching for `job` selectors (`job=~"$job"`) so the "all" value works correctly across all panels. ([#&#8203;19400](element-hq/synapse#19400))
- Don't retry joining partial state rooms all at once on startup. ([#&#8203;19402](element-hq/synapse#19402))
- Disallow requests to the health endpoint from containing trailing path characters. ([#&#8203;19405](element-hq/synapse#19405))
- Add notes that new experimental features should have associated tracking issues. ([#&#8203;19410](element-hq/synapse#19410))
- Bump `pyo3` from 0.26.0 to 0.27.2 and `pythonize` from 0.26.0 to 0.27.0. Contributed by [@&#8203;razvp](https://github.com/razvp) @&#8203; ERCOM. ([#&#8203;19412](element-hq/synapse#19412))

</details>

---

### Configuration

📅 **Schedule**: Branch creation - At any time (no schedule defined), Automerge - At any time (no schedule defined).

🚦 **Automerge**: Disabled by config. Please merge this manually once you are satisfied.

♻ **Rebasing**: Whenever PR is behind base branch, or you tick the rebase/retry checkbox.

🔕 **Ignore**: Close this PR and you won't be reminded about this update again.

---

 - [ ] <!-- rebase-check -->If you want to rebase/retry this PR, check this box

---

This PR has been generated by [Renovate Bot](https://github.com/renovatebot/renovate).
<!--renovate-debug:eyJjcmVhdGVkSW5WZXIiOiI0My4zLjYiLCJ1cGRhdGVkSW5WZXIiOiI0My4zLjYiLCJ0YXJnZXRCcmFuY2giOiJtYWluIiwibGFiZWxzIjpbImltYWdlIl19-->

Reviewed-on: https://gitea.alexlebens.dev/alexlebens/infrastructure/pulls/3877
Co-authored-by: Renovate Bot <renovate-bot@alexlebens.net>
Co-committed-by: Renovate Bot <renovate-bot@alexlebens.net>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Memory Leak because one client sends 20Hz requests to POST client/v3/keys/upload. One time key signed_curve25519:AAAAAAAAAFA already exists

3 participants