stream_ordering_to_exterm grows to >1TB and is not cleaned even after room is forgotten & blocked

### Description

Hi, 

Synapse version: 1.142.1
PostgreSQL version: 17.7

Last 24h 

<img width="947" height="434" alt="Image" src="https://github.com/user-attachments/assets/d7d34e92-5724-42bc-9b12-4bb34ef9b8f2" />

<img width="897" height="262" alt="Image" src="https://github.com/user-attachments/assets/bede6e7d-26cd-4d23-adb6-c6813fc9b606" />

My Synapse instance keeps accumulating rows in the stream_ordering_to_exterm table for a specific room  even though from my server:
- All local users have left the room,
- The room is marked as "forgotten": true,
- The room is blocked (block: true),
- An admin delete/purge request was attempted (but failed),
- The room no longer appears in the client
- Yet PostgreSQL continues adding rows related to this room.

The table has grown from 100GB to 1.2 TB, causing the database to reach ~1.3 TB total size.

```
           table_name             |  total  |  data   | indexes |  toast  
-----------------------------------+---------+---------+---------+---------
 stream_ordering_to_exterm         | 1256 GB | 1129 GB | 128 GB  | 316 MB
 event_json                        | 1899 MB | 1578 MB | 145 MB  | 176 MB
 state_groups_state                | 1392 MB | 956 MB  | 436 MB  | 304 kB
 events                            | 757 MB  | 276 MB  | 481 MB  | 104 kB
 event_edges                       | 525 MB  | 181 MB  | 344 MB  | 80 kB
 push_rules_stream                 | 522 MB  | 355 MB  | 167 MB  | 136 kB
 event_to_state_groups             | 237 MB  | 97 MB   | 140 MB  | 64 kB
 event_search                      | 163 MB  | 111 MB  | 51 MB   | 752 kB
 event_forward_extremities         | 163 MB  | 91 MB   | 71 MB   | 56 kB
```

This seems to be a case where Synapse keeps processing extremities or state for a room that should no longer be active.

Once a room : 
- has no local members,
- is forgotten,
- is blocked,
- and deletion was attempted,

Synapse should no longer:
- process events for it,
- add extremities,
- or write new rows in stream_ordering_to_exterm.
Old rows should eventually be cleaned.

Synapse keeps inserting rows into stream_ordering_to_exterm for a dead room, causing:
- 1.2 TB of useless data,
- continuous database growth,
- I/O load,
- inability to purge via admin API.

Additional notes
- How can I safely purge stream_ordering_to_exterm for a room that is no longer joinable?
- Is there a recommended process to forcefully delete or garbage-collect extremities + stream entries?
- Is this a known issue where Synapse continues to process state for forgotten rooms?

Since the partition exploded, I had to stop my automatic backup because my backup server didn’t have the required 1.5TB of free space.

Thanks

### Steps to reproduce

#### Step 1 : launched a large number of redactions at once for an userin a specific room
```
synadm  user redact @<user> --room \!<room> --reason "cleanup" --limit 2000000
```
#### Step 2 : wait some days and see db partition exploded
#### Step 3 : Delete and purge the problematic room
```
curl -X DELETE "https://myserver/_synapse/admin/v2/rooms/!<room to delete>" \
  -H "Authorization: Bearer <token>" \
  -H "Content-Type: application/json" \
  -d '{
    "block": true,
    "purge": true
  }'
```
#### Step 4 : wait some days (check scheduled tasks)
```
        id        |         action          | status |   timestamp   
------------------+-------------------------+--------+---------------
 ShoOHRGCMAyyDcjM | shutdown_and_purge_room | active | 1764286595876
 lATsRRnqGmuRAPcd | redact_all_events       | active | 1764049220628
```
Some logs 
```
INFO - task-redact_all_events-0-lATsRRnqGmuRAPcd - Task continuing: 600.001sec (2.329sec, 0.372sec) (9.893sec/546.924sec/1552) [154954 dbevts] 
INFO - task-shutdown_and_purge_room-0-ShoOHRGCMAyyDcjM - Task continuing: 600.001sec (0.004sec, 0.001sec) (0.011sec/0.069sec/5) [0 dbevts]
```
Count events in this specific room 
```
 count  
--------
 334380
(1 row)
```
forward_extremities | jq '.count' returns 114,444
-> stable

#### Step 5 : next day
 _shutdown_and_purge_room_ task failed but _redact_all_events_ remains active. 
```
-[ RECORD 1 ]----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
id          | ShoOHRGCMAyyDcjM
action      | shutdown_and_purge_room
status      | failed
timestamp   | 1764356659958
error       | canceling statement due to statement timeout                                                                                                                                                                                                                                                                                                                               +
            | CONTEXT:  while locking tuple (9,75) in relation "rooms"                                                                                                                                                                                                                                                                 


-[ RECORD 2 ]----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
id          | lATsRRnqGmuRAPcd
action      | redact_all_events
status      | active
timestamp   | 1764424195816
error       | 

```
#### Step 6 : wait 
```
forward_extremities" | jq '.count'
117671
```
forward_extremities continues to increase since the purge failed

```
 SELECT
  relname,
  n_live_tup,
  n_dead_tup
FROM pg_stat_all_tables
WHERE relname = 'stream_ordering_to_exterm';
          relname          | n_live_tup  | n_dead_tup 
---------------------------+-------------+------------
 stream_ordering_to_exterm | 10980411303 |     733288
```

### Homeserver

my own homeserver

### Synapse Version

 1.142.1

### Installation Method

pip (from PyPI)

### Database

PostgreSQL

### Workers

Single process

### Platform

VM 
Ubuntu 24.04.3 LTS
8 CPUs
RAM : 16GB 

### Configuration

_No response_

### Relevant log output

```shell
Tried to restart synapse instance
Before the restart : 
482 - WARNING - launch_scheduled_tasks-4591 - Task lATsRRnqGmuRAPcd (action redact_all_events) has seen no update for more than 24h and may be stuck
388 - INFO - task-redact_all_events-0-lATsRRnqGmuRAPcd - Task continuing: 275400.000sec (77.775sec, 15.456sec) (51.785sec/337.232sec/102665) [166289 dbevts]
388 - INFO - task-shutdown_and_purge_room-0-ShoOHRGCMAyyDcjM - Task continuing: 38100.001sec (0.023sec, 0.007sec) (0.016sec/2.425sec/23) [0 dbevts]

After the restart 
482 - WARNING - launch_scheduled_tasks-0 - Task lATsRRnqGmuRAPcd (action redact_all_events) has seen no update for more than 24h and may be stuck
388 - INFO - task-shutdown_and_purge_room-0-ShoOHRGCMAyyDcjM - Task continuing: 300.001sec (0.004sec, 0.000sec) (0.011sec/0.069sec/5) [0 dbevts]
388 - INFO - task-redact_all_events-0-lATsRRnqGmuRAPcd - Task continuing: 600.001sec (2.329sec, 0.372sec) (9.893sec/546.924sec/1552) [154954 dbevts]
```

### Anything else that would be useful to know?

_No response_

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

stream_ordering_to_exterm grows to >1TB and is not cleaned even after room is forgotten & blocked #19241

Description

Steps to reproduce

Step 1 : launched a large number of redactions at once for an userin a specific room

Step 2 : wait some days and see db partition exploded

Step 3 : Delete and purge the problematic room

Step 4 : wait some days (check scheduled tasks)

Step 5 : next day

Step 6 : wait

Homeserver

Synapse Version

Installation Method

Database

Workers

Platform

Configuration

Relevant log output

Anything else that would be useful to know?

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

stream_ordering_to_exterm grows to >1TB and is not cleaned even after room is forgotten & blocked #19241

Description

Description

Steps to reproduce

Step 1 : launched a large number of redactions at once for an userin a specific room

Step 2 : wait some days and see db partition exploded

Step 3 : Delete and purge the problematic room

Step 4 : wait some days (check scheduled tasks)

Step 5 : next day

Step 6 : wait

Homeserver

Synapse Version

Installation Method

Database

Workers

Platform

Configuration

Relevant log output

Anything else that would be useful to know?

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions