Fix Complement CI not running against the code from our PRs (remote images being chosen over local)#19475
Conversation
That way we don't have to think about future images being published and conflicting
scripts-dev/complement.sh
Outdated
| ) | ||
|
|
||
| export COMPLEMENT_BASE_IMAGE=complement-synapse | ||
| export COMPLEMENT_BASE_IMAGE="$LOCAL_IMAGE_NAMESPACE/$COMPLEMENT_SYNAPSE_IMAGE_PATH" |
There was a problem hiding this comment.
One difference from #18210 is that I've updated everything we build to use the local dummy registry namespace.
This way we don't have to worry about the other images being published or not at some point in the future. It's all just namespaced and pulls from the same place across the board.
| @@ -0,0 +1 @@ | |||
| Fix Complement CI not running against the code from our PRs. | |||
There was a problem hiding this comment.
Ideally, to detect this kind of problem, we'd have an in-repo Complement test that checks if the current git commit sha matches the one that Synapse is using for Complement (via /version endpoint). I'm tackling this separately as #19476
|
|
||
| [guideComplementSh]: https://element-hq.github.io/synapse/latest/development/contributing_guide.html#run-the-integration-tests-complement | ||
|
|
||
| ## Building and running the images manually |
There was a problem hiding this comment.
Overall, this docs section doesn't seem that useful. To make this PR whole, I'm updating the docs but I think it's worth ripping this all out (just point to complement.sh and those docs) in a follow-up PR.
There was a problem hiding this comment.
Related discussion below with some favor on removing this section, #19475 (comment)
| --build-arg TEST_ONLY_IGNORE_POETRY_LOCKFILE \ | ||
| -f "docker/Dockerfile" . | ||
| $CONTAINER_RUNTIME build \ | ||
| -t "$SYNAPSE_IMAGE_PATH" \ |
There was a problem hiding this comment.
Is it worth just dropping this 'global' tag here and keeping only the local tag?
Since it's a build that was never published, it feels almost preferable to not give it the global tag
There was a problem hiding this comment.
Updated everything to just use localhost/... straight-up.
We do publish the complement-synapse image built from complement.sh but if I squint at the CI code there, I think it still works:
synapse/.github/workflows/push_complement_image.yml
Lines 56 to 74 in 3833eb4
There was a problem hiding this comment.
hmm, would this need to say
docker tag localhost/complement-synapse $TAG
for that to work?
There was a problem hiding this comment.
Just seeing this now (not sure how I missed).
Good spot! I think the answer is yes based on these CI failures:
❌ https://github.com/element-hq/synapse/actions/runs/22609655282/job/65509315002#step:8:39
Error response from daemon: No such image: complement-synapse:latest
reivilibre
left a comment
There was a problem hiding this comment.
Thanks! Some minor points but I'm not fussy about them
09ffece to
15b0210
Compare
|
Thanks for the review @reivilibre 🦜 |
Follow-up to #19460 (comment) and #19475
*This PR was originally only to enable [MSC4222](matrix-org/matrix-spec-proposals#4222) Complement tests (`/sync` `state_after`) but after merging the [fix PR](#19463), we discovered that while the tests pass locally, [fail in CI](#19460 (comment)). To unblock the RC, we decided to revert the fix PR (see #19474 (comment) for more info). To better ensure tests actually pass in CI, we're re-introducing the fix here in the same PR that we enable the tests in.* --- Fix `/sync` missing membership in `state_after`. This applies to any scenario where the first membership has a different `sender` compared to the `state_key` and then the second membership has the same `sender`/`state_key`. Like someone inviting another person and then them joining. Or someone being kicked and then they leave. This bug has been present since the MSC4222 implementation was introduced into the codebase (#17888). --- Fix #19455 Fix element-hq/customer-success#656 I have a feeling, this might also fix these issues (will close and see how people report back): Fix #18182 Fix #19478 ### Testing strategy Complement tests: matrix-org/complement#842 We will need #19460 to merge in order to enable the Complement tests in Synapse but this PR should be merged first so they pass in the first place. I've tested locally that the Complement tests pass with this fix. ### Dev notes [MSC4222](matrix-org/matrix-spec-proposals#4222) has already been merged into the spec and is already part of Matrix v1.16 but we haven't [stabilized support in Synapse yet](#19414). --- In the same ballpark: - #19455 - #17050 - #17430 - #16940 - #18182 - #18793 - #19478 --- Docker builds preferring remote image over the local image we just built, #19460 (comment) `containerd` image store (storage driver, driver type) -> #19475 ### Todo - [x] Wait for #19463 to merge so the Complement tests all pass - [x] Wait for #19475 to merge ### Pull Request Checklist <!-- Please read https://element-hq.github.io/synapse/latest/development/contributing_guide.html before submitting your pull request --> * [x] Pull request is based on the develop branch * [x] Pull request includes a [changelog file](https://element-hq.github.io/synapse/latest/development/contributing_guide.html#changelog). The entry should: - Be a short description of your change which makes sense to users. "Fixed a bug that prevented receiving messages from other servers." instead of "Moved X method from `EventStore` to `EventWorkerStore`.". - Use markdown where necessary, mostly for `code blocks`. - End with either a period (.) or an exclamation mark (!). - Start with a capital letter. - Feel free to credit yourself, by adding a sentence "Contributed by @github_username." or "Contributed by [Your Name]." to the end of the entry. * [x] [Code style](https://element-hq.github.io/synapse/latest/code_style.html) is correct (run the [linters](https://element-hq.github.io/synapse/latest/development/contributing_guide.html#run-the-linters)) --------- Co-authored-by: Andrew Morgan <1342360+anoadragon453@users.noreply.github.com> Co-authored-by: Andrew Ferrazzutti <andrewf@element.io>
This PR contains the following updates: | Package | Update | Change | |---|---|---| | [element-hq/synapse](https://github.com/element-hq/synapse) | minor | `v1.148.0` → `v1.149.0` | --- ### Release Notes <details> <summary>element-hq/synapse (element-hq/synapse)</summary> ### [`v1.149.0`](https://github.com/element-hq/synapse/releases/tag/v1.149.0) [Compare Source](element-hq/synapse@v1.148.0...v1.149.0) ### Synapse 1.149.0 (2026-03-10) No significant changes since 1.149.0rc1. ### Synapse 1.149.0rc1 (2026-03-03) #### Features - Add experimental support for [MSC4388: Secure out-of-band channel for sign in with QR](matrix-org/matrix-spec-proposals#4388). ([#​19127](element-hq/synapse#19127)) - Add stable support for [MSC4380](matrix-org/matrix-spec-proposals#4380) invite blocking. ([#​19431](element-hq/synapse#19431)) #### Bugfixes - Fix the 'Login as a user' Admin API not checking if the user exists before issuing an access token. ([#​18518](element-hq/synapse#18518)) - Fix `/sync` missing membership event in `state_after` (experimental [MSC4222](matrix-org/matrix-spec-proposals#4222) implementation) in some scenarios. ([#​19460](element-hq/synapse#19460)) #### Internal Changes - Add log to explain when and why we freeze objects in the garbage collector. ([#​19440](element-hq/synapse#19440)) - Better instrument `JoinRoomAliasServlet` with tracing. ([#​19461](element-hq/synapse#19461)) - Fix Complement CI not running against the code from our PRs. ([#​19475](element-hq/synapse#19475)) - Log `docker system info` in CI so we have a plain record of how GitHub runners evolve over time. ([#​19480](element-hq/synapse#19480)) - Rename the `test_disconnect` test helper so that pytest doesn't see it as a test. ([#​19486](element-hq/synapse#19486)) - Add a log line when we delete devices. Contributed by [@​bradtgmurray](https://github.com/bradtgmurray) @​ Beeper. ([#​19496](element-hq/synapse#19496)) - Pre-allocate the buffer based on the expected `Content-Length` with the Rust HTTP client. ([#​19498](element-hq/synapse#19498)) - Cancel long-running sync requests if the client has gone away. ([#​19499](element-hq/synapse#19499)) - Try and reduce reactor tick times when under heavy load. ([#​19507](element-hq/synapse#19507)) - Simplify Rust HTTP client response streaming and limiting. ([#​19510](element-hq/synapse#19510)) - Replace deprecated collection import locations with current locations. ([#​19515](element-hq/synapse#19515)) - Bump most locked Python dependencies to their latest versions. ([#​19519](element-hq/synapse#19519)) </details> --- ### Configuration 📅 **Schedule**: Branch creation - At any time (no schedule defined), Automerge - At any time (no schedule defined). 🚦 **Automerge**: Disabled by config. Please merge this manually once you are satisfied. ♻ **Rebasing**: Whenever PR is behind base branch, or you tick the rebase/retry checkbox. 🔕 **Ignore**: Close this PR and you won't be reminded about this update again. --- - [ ] <!-- rebase-check -->If you want to rebase/retry this PR, check this box --- This PR has been generated by [Renovate Bot](https://github.com/renovatebot/renovate). <!--renovate-debug:eyJjcmVhdGVkSW5WZXIiOiI0My41OS4yIiwidXBkYXRlZEluVmVyIjoiNDMuNTkuMiIsInRhcmdldEJyYW5jaCI6Im1haW4iLCJsYWJlbHMiOlsiaW1hZ2UiXX0=--> Reviewed-on: https://gitea.alexlebens.dev/alexlebens/infrastructure/pulls/4580 Co-authored-by: Renovate Bot <renovate-bot@alexlebens.net> Co-committed-by: Renovate Bot <renovate-bot@alexlebens.net>
…it checkout (#19476) This way we actually detect problems like #19475 as they happen instead of being invisible until something breaks. Sanity check that Complement is testing against your code changes (whether it be local or from the PR in CI). ``` COMPLEMENT_DIR=../complement ./scripts-dev/complement.sh --in-repo -run TestSynapseVersion ```
Fix remote images being chosen over the local ones we just built with Complement in CI (any Docker environment using the
containerdimage store). This problem means that Complement jobs in CI don't actually test against the code from the PR (since 2026-02-10).This PR approaches the problem the same way that @AndrewFerr proposed in #18210. This is better than the alternative listed below as we can just make our code compatible with whatever image store is being used.
Problem
Spawning from #19460 (comment) where we found that our Complement jobs in CI don't actually test against the code from the PR at the moment.
This is caused by a change in Docker Engine 29.0.0:
And our
ubuntu-latestGitHub runner (Current runner version: '2.331.0') points to using Docker client/server29.1.5🎯This Docker version bump happened on actions/runner-images@416418d (2026-02-10) (
28.0.4->29.1.5). Specific PR: actions/runner-images#13633I found this because I reviewed and remembered #18210 was a thing that @AndrewFerr ran into. And then running
dockers system prunealso revealed the problematiccontainerdin CI. Checking the Docker changelogs, I found the new default culprit and then could trace down where the GitHub runners made the dependency update.Alternatives
Use non-published tags
We could use tags that aren't published like
matrixdotorg/synapse:testing.Disable the
containerdimage storeLooking at the PR's cross-linked from actions/runner-images#13633, I saw NixOS/nix#15252 which disables the
containerdimage store. Seems like it could work but we might as well update our code to be compatible with whatever image store is being used (especially when someone has already put in the work to figure it out).Dev notes
I've given a heads-up to other developers about the breaking change, actions/runner-images#13633 (comment)
Todo
/versionendpoint), discussion belowPull Request Checklist
EventStoretoEventWorkerStore.".code blocks.