Skip to content

Always rollback transaction when retrying#19372

Merged
MadLittleMods merged 12 commits intodevelopfrom
madlittlemods/19202-database-transaction-retry-rollback
Jan 16, 2026
Merged

Always rollback transaction when retrying#19372
MadLittleMods merged 12 commits intodevelopfrom
madlittlemods/19202-database-transaction-retry-rollback

Conversation

@MadLittleMods
Copy link
Contributor

@MadLittleMods MadLittleMods commented Jan 13, 2026

Always rollback transaction (connection) when retrying.

Previously, because conn.rollback() was inside the if i < MAX_NUMBER_OF_RETRIES: condition, it never rolled back on the final retry.

Part of #19202

There are other problems mentioned in #19202 but this is a nice standalone change.

Dev notes

psql --username=postgres synapse

See uncommitted transactions:

SELECT * FROM pg_stat_activity ORDER BY xact_start;

SYNAPSE_POSTGRES=1 SYNAPSE_POSTGRES_USER=postgres SYNAPSE_TEST_LOG_LEVEL=INFO poetry run trial tests.storage.test_database.CallbacksTestCase.test_failed_retry

Pull Request Checklist

  • Pull request is based on the develop branch
  • Pull request includes a changelog file. The entry should:
    • Be a short description of your change which makes sense to users. "Fixed a bug that prevented receiving messages from other servers." instead of "Moved X method from EventStore to EventWorkerStore.".
    • Use markdown where necessary, mostly for code blocks.
    • End with either a period (.) or an exclamation mark (!).
    • Start with a capital letter.
    • Feel free to credit yourself, by adding a sentence "Contributed by @github_username." or "Contributed by [Your Name]." to the end of the entry.
  • Code style is correct (run the linters)

Comment on lines +834 to +838
try:
with opentracing.start_active_span("db.rollback"):
conn.rollback()
except self.engine.module.Error as e1:
transaction_logger.warning("[TXN EROLL] {%s} %s", name, e1)
Copy link
Contributor Author

@MadLittleMods MadLittleMods Jan 13, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Moved conn.rollback() outside of the if i < MAX_NUMBER_OF_RETRIES: condition so we always rollback, then decide whether to keep retrying or not.

Comment on lines +260 to +261
# no additional calls
self.assertEqual(txn_mocks.exception_callback.call_count, 5)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This count changed because we changed the logic to be MAX_NUMBER_OF_RETRIES and align with that instead of being willy nilly with N and off-by-one.

Comment on lines +240 to +242
# FIXME: Every transaction should have been rolled back, see
# https://github.com/element-hq/synapse/issues/19202
# self.assertEqual(txn_mocks.rollback.call_count, 1)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Something to fixup in a follow-up PR

Comment on lines +211 to +219
# FIXME: Sanity check that every transaction is either committed or rolled back,
# see https://github.com/element-hq/synapse/issues/19202
# transaction_count = after_callback.call_count + exception_callback.call_count
# self.assertEqual(
# transaction_count,
# commit_mock.call_count + rollback_mock.call_count,
# "We expect every transaction attempt to either commit or rollback. "
# f"Saw {transaction_count} transactions, but only {commit_mock.call_count} commits and {rollback_mock.call_count} rollbacks",
# )
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can't uncomment this yet because of the test_exception_callback case below.

Something to fixup in a follow-up PR

@MadLittleMods MadLittleMods marked this pull request as ready for review January 13, 2026 03:37
@MadLittleMods MadLittleMods requested a review from a team as a code owner January 13, 2026 03:37
Copy link
Contributor

@reivilibre reivilibre left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just a suggestion, otherwise I'm happy with this, thanks for tidying as always

N = 5
MAX_NUMBER_OF_RETRIES = 5
while True:
cursor = conn.cursor(
Copy link
Contributor

@reivilibre reivilibre Jan 14, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

given what you've ended up with, I'd be very tempted to i += 1 at the start of the loop

this then means:

  • no need to +1 for display
  • no need for the awkward i < MAX_NUMBER_OF_RETRIES - 1 — you can drop the - 1
  • (minor) don't have multiple code paths doing the increment

However the pedant in me is saying this should be MAX_NUMBER_OF_ATTEMPTS since it also counts the first attempt

In fact, couldn't we just use a for attempt_number in range(1, MAX_NUMBER_OF_ATTEMPTS+1): loop?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've updated to for attempt_number in range(1, MAX_NUMBER_OF_ATTEMPTS+1): but there was a little hiccup with the linter not being able to tell that all code-paths either return or raise, so I had to add a little "unreachable" hint.

Let me know what you think of the trade-off. Looks decent 👍

We can still have most of the benefit if we go back to using a while True loop 🤷

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Happy with either, if you're happy with this then let's just merge like that

@MadLittleMods MadLittleMods merged commit 13c6476 into develop Jan 16, 2026
45 checks passed
@MadLittleMods MadLittleMods deleted the madlittlemods/19202-database-transaction-retry-rollback branch January 16, 2026 01:35
@MadLittleMods
Copy link
Contributor Author

Thanks for the review @reivilibre 🦃

alexlebens pushed a commit to alexlebens/infrastructure that referenced this pull request Jan 27, 2026
This PR contains the following updates:

| Package | Update | Change |
|---|---|---|
| [element-hq/synapse](https://github.com/element-hq/synapse) | minor | `1.145.0` → `1.146.0` |

---

> ⚠️ **Warning**
>
> Some dependencies could not be looked up. Check the Dependency Dashboard for more information.

---

### Release Notes

<details>
<summary>element-hq/synapse (element-hq/synapse)</summary>

### [`v1.146.0`](https://github.com/element-hq/synapse/releases/tag/v1.146.0)

[Compare Source](element-hq/synapse@v1.145.0...v1.146.0rc1)

### Synapse 1.146.0 (2026-01-27)

No significant changes since 1.146.0rc1.

#### Deprecations and Removals

- [MSC2697](matrix-org/matrix-spec-proposals#2697) (Dehydrated devices) has been removed, as the MSC is closed. Developers should migrate to [MSC3814](matrix-org/matrix-spec-proposals#3814). ([#&#8203;19346](element-hq/synapse#19346))
- Support for Ubuntu 25.04 (Plucky Puffin) has been dropped. Synapse no longer builds debian packages for Ubuntu 25.04.

### Synapse 1.146.0rc1 (2026-01-20)

#### Features

- Add a new config option [`enable_local_media_storage`](https://element-hq.github.io/synapse/latest/usage/configuration/config_documentation.html#enable_local_media_storage) which controls whether media is additionally stored locally when using configured `media_storage_providers`. Setting this to `false` allows off-site media storage without a local cache. Contributed by Patrice Brend'amour [@&#8203;dr](https://github.com/dr).allgood. ([#&#8203;19204](element-hq/synapse#19204))
- Stabilise support for [MSC4312](matrix-org/matrix-spec-proposals#4312 `m.oauth` User-Interactive Auth stage for resetting cross-signing identity with the OAuth 2.0 API. The old, unstable name (`org.matrix.cross_signing_reset`) is now deprecated and will be removed in a future release. ([#&#8203;19273](element-hq/synapse#19273))
- Refactor Grafana dashboard to use `server_name` label (instead of `instance`). ([#&#8203;19337](element-hq/synapse#19337))

#### Bugfixes

- Fix joining a restricted v12 room locally when no local room creator is present but local users with sufficient power levels are. Contributed by [@&#8203;nexy7574](https://github.com/nexy7574). ([#&#8203;19321](element-hq/synapse#19321))
- Fixed parallel calls to `/_matrix/media/v1/create` being ratelimited for appservices even if `rate_limited: false` was set in the registration. Contributed by [@&#8203;tulir](https://github.com/tulir) @&#8203; Beeper. ([#&#8203;19335](element-hq/synapse#19335))
- Fix a bug introduced in 1.61.0 where a user's membership in a room was accidentally ignored when considering access to historical state events in rooms with the "shared" history visibility. Contributed by Lukas Tautz. ([#&#8203;19353](element-hq/synapse#19353))
- [MSC4140](matrix-org/matrix-spec-proposals#4140): Store the JSON content of scheduled delayed events as text instead of a byte array. This fixes the inability to schedule a delayed event with non-ASCII characters in its content. ([#&#8203;19360](element-hq/synapse#19360))
- Always rollback database transactions when retrying (avoid orphaned connections). ([#&#8203;19372](element-hq/synapse#19372))
- Fix `InFlightGauge` typing to allow upgrading to `prometheus_client` 0.24. ([#&#8203;19379](element-hq/synapse#19379))

#### Updates to the Docker image

- Add [Prometheus HTTP service discovery](https://prometheus.io/docs/prometheus/latest/configuration/configuration/#http_sd_config) endpoint for easy discovery of all workers when using the `docker/Dockerfile-workers` image (see the [*Metrics* section of our Docker testing docs](docker/README-testing.md#metrics)). ([#&#8203;19336](element-hq/synapse#19336))

#### Improved Documentation

- Remove docs on legacy metric names (no longer in the codebase since 2022-12-06). ([#&#8203;19341](element-hq/synapse#19341))
- Clarify how the estimated value of room complexity is calculated internally. ([#&#8203;19384](element-hq/synapse#19384))

#### Internal Changes

- Add an internal `cancel_task` API to the task scheduler. ([#&#8203;19310](element-hq/synapse#19310))
- Tweak docstrings and signatures of `auth_types_for_event` and `get_catchup_room_event_ids`. ([#&#8203;19320](element-hq/synapse#19320))
- Replace usage of deprecated `assertEquals` with `assertEqual` in unit test code. ([#&#8203;19345](element-hq/synapse#19345))
- Drop support for Ubuntu 25.04 'Plucky Puffin', add support for Ubuntu 25.10 'Questing Quokka'. ([#&#8203;19348](element-hq/synapse#19348))
- Revert "Add an Admin API endpoint for listing quarantined media ([#&#8203;19268](element-hq/synapse#19268))". ([#&#8203;19351](element-hq/synapse#19351))
- Bump `mdbook` from 0.4.17 to 0.5.2 and remove our custom table-of-contents plugin in favour of the new default functionality. ([#&#8203;19356](element-hq/synapse#19356))
- Replace deprecated usage of PyGitHub's `GitRelease.title` with `.name` in release script. ([#&#8203;19358](element-hq/synapse#19358))
- Update the Element logo in Synapse's README to be an absolute URL, allowing it to render on other sites (such as PyPI). ([#&#8203;19368](element-hq/synapse#19368))
- Apply minor tweaks to v1.145.0 changelog. ([#&#8203;19376](element-hq/synapse#19376))
- Update Grafana dashboard syntax to use the latest from importing/exporting with Grafana 12.3.1. ([#&#8203;19381](element-hq/synapse#19381))
- Warn about skipping reactor metrics when using unknown reactor type. ([#&#8203;19383](element-hq/synapse#19383))
- Add support for reactor metrics with the `ProxiedReactor` used in worker Complement tests. ([#&#8203;19385](element-hq/synapse#19385))

</details>

---

### Configuration

📅 **Schedule**: Branch creation - At any time (no schedule defined), Automerge - At any time (no schedule defined).

🚦 **Automerge**: Disabled by config. Please merge this manually once you are satisfied.

♻ **Rebasing**: Whenever PR is behind base branch, or you tick the rebase/retry checkbox.

🔕 **Ignore**: Close this PR and you won't be reminded about this update again.

---

 - [ ] <!-- rebase-check -->If you want to rebase/retry this PR, check this box

---

This PR has been generated by [Renovate Bot](https://github.com/renovatebot/renovate).
<!--renovate-debug:eyJjcmVhdGVkSW5WZXIiOiI0Mi42OS4yIiwidXBkYXRlZEluVmVyIjoiNDIuNjkuMiIsInRhcmdldEJyYW5jaCI6Im1haW4iLCJsYWJlbHMiOlsiaW1hZ2UiXX0=-->

Reviewed-on: https://gitea.alexlebens.dev/alexlebens/infrastructure/pulls/3533
Co-authored-by: Renovate Bot <renovate-bot@alexlebens.net>
Co-committed-by: Renovate Bot <renovate-bot@alexlebens.net>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants