Skip to content

stream_ordering_to_exterm grows to >1TB and is not cleaned even after room is forgotten & blocked #19241

@gitzen01

Description

@gitzen01

Description

Hi,

Synapse version: 1.142.1
PostgreSQL version: 17.7

Last 24h

Image Image

My Synapse instance keeps accumulating rows in the stream_ordering_to_exterm table for a specific room even though from my server:

  • All local users have left the room,
  • The room is marked as "forgotten": true,
  • The room is blocked (block: true),
  • An admin delete/purge request was attempted (but failed),
  • The room no longer appears in the client
  • Yet PostgreSQL continues adding rows related to this room.

The table has grown from 100GB to 1.2 TB, causing the database to reach ~1.3 TB total size.

           table_name             |  total  |  data   | indexes |  toast  
-----------------------------------+---------+---------+---------+---------
 stream_ordering_to_exterm         | 1256 GB | 1129 GB | 128 GB  | 316 MB
 event_json                        | 1899 MB | 1578 MB | 145 MB  | 176 MB
 state_groups_state                | 1392 MB | 956 MB  | 436 MB  | 304 kB
 events                            | 757 MB  | 276 MB  | 481 MB  | 104 kB
 event_edges                       | 525 MB  | 181 MB  | 344 MB  | 80 kB
 push_rules_stream                 | 522 MB  | 355 MB  | 167 MB  | 136 kB
 event_to_state_groups             | 237 MB  | 97 MB   | 140 MB  | 64 kB
 event_search                      | 163 MB  | 111 MB  | 51 MB   | 752 kB
 event_forward_extremities         | 163 MB  | 91 MB   | 71 MB   | 56 kB

This seems to be a case where Synapse keeps processing extremities or state for a room that should no longer be active.

Once a room :

  • has no local members,
  • is forgotten,
  • is blocked,
  • and deletion was attempted,

Synapse should no longer:

  • process events for it,
  • add extremities,
  • or write new rows in stream_ordering_to_exterm.
    Old rows should eventually be cleaned.

Synapse keeps inserting rows into stream_ordering_to_exterm for a dead room, causing:

  • 1.2 TB of useless data,
  • continuous database growth,
  • I/O load,
  • inability to purge via admin API.

Additional notes

  • How can I safely purge stream_ordering_to_exterm for a room that is no longer joinable?
  • Is there a recommended process to forcefully delete or garbage-collect extremities + stream entries?
  • Is this a known issue where Synapse continues to process state for forgotten rooms?

Since the partition exploded, I had to stop my automatic backup because my backup server didn’t have the required 1.5TB of free space.

Thanks

Steps to reproduce

Step 1 : launched a large number of redactions at once for an userin a specific room

synadm  user redact @<user> --room \!<room> --reason "cleanup" --limit 2000000

Step 2 : wait some days and see db partition exploded

Step 3 : Delete and purge the problematic room

curl -X DELETE "https://myserver/_synapse/admin/v2/rooms/!<room to delete>" \
  -H "Authorization: Bearer <token>" \
  -H "Content-Type: application/json" \
  -d '{
    "block": true,
    "purge": true
  }'

Step 4 : wait some days (check scheduled tasks)

        id        |         action          | status |   timestamp   
------------------+-------------------------+--------+---------------
 ShoOHRGCMAyyDcjM | shutdown_and_purge_room | active | 1764286595876
 lATsRRnqGmuRAPcd | redact_all_events       | active | 1764049220628

Some logs

INFO - task-redact_all_events-0-lATsRRnqGmuRAPcd - Task continuing: 600.001sec (2.329sec, 0.372sec) (9.893sec/546.924sec/1552) [154954 dbevts] 
INFO - task-shutdown_and_purge_room-0-ShoOHRGCMAyyDcjM - Task continuing: 600.001sec (0.004sec, 0.001sec) (0.011sec/0.069sec/5) [0 dbevts]

Count events in this specific room

 count  
--------
 334380
(1 row)

forward_extremities | jq '.count' returns 114,444
-> stable

Step 5 : next day

shutdown_and_purge_room task failed but redact_all_events remains active.

-[ RECORD 1 ]----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
id          | ShoOHRGCMAyyDcjM
action      | shutdown_and_purge_room
status      | failed
timestamp   | 1764356659958
error       | canceling statement due to statement timeout                                                                                                                                                                                                                                                                                                                               +
            | CONTEXT:  while locking tuple (9,75) in relation "rooms"                                                                                                                                                                                                                                                                 


-[ RECORD 2 ]----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
id          | lATsRRnqGmuRAPcd
action      | redact_all_events
status      | active
timestamp   | 1764424195816
error       | 

Step 6 : wait

forward_extremities" | jq '.count'
117671

forward_extremities continues to increase since the purge failed

 SELECT
  relname,
  n_live_tup,
  n_dead_tup
FROM pg_stat_all_tables
WHERE relname = 'stream_ordering_to_exterm';
          relname          | n_live_tup  | n_dead_tup 
---------------------------+-------------+------------
 stream_ordering_to_exterm | 10980411303 |     733288

Homeserver

my own homeserver

Synapse Version

1.142.1

Installation Method

pip (from PyPI)

Database

PostgreSQL

Workers

Single process

Platform

VM
Ubuntu 24.04.3 LTS
8 CPUs
RAM : 16GB

Configuration

No response

Relevant log output

Tried to restart synapse instance
Before the restart : 
482 - WARNING - launch_scheduled_tasks-4591 - Task lATsRRnqGmuRAPcd (action redact_all_events) has seen no update for more than 24h and may be stuck
388 - INFO - task-redact_all_events-0-lATsRRnqGmuRAPcd - Task continuing: 275400.000sec (77.775sec, 15.456sec) (51.785sec/337.232sec/102665) [166289 dbevts]
388 - INFO - task-shutdown_and_purge_room-0-ShoOHRGCMAyyDcjM - Task continuing: 38100.001sec (0.023sec, 0.007sec) (0.016sec/2.425sec/23) [0 dbevts]

After the restart 
482 - WARNING - launch_scheduled_tasks-0 - Task lATsRRnqGmuRAPcd (action redact_all_events) has seen no update for more than 24h and may be stuck
388 - INFO - task-shutdown_and_purge_room-0-ShoOHRGCMAyyDcjM - Task continuing: 300.001sec (0.004sec, 0.000sec) (0.011sec/0.069sec/5) [0 dbevts]
388 - INFO - task-redact_all_events-0-lATsRRnqGmuRAPcd - Task continuing: 600.001sec (2.329sec, 0.372sec) (9.893sec/546.924sec/1552) [154954 dbevts]

Anything else that would be useful to know?

No response

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions