Skip to content

Attachment fixes to address inconsistencies between DB and backend store#1672

Merged
binwiederhier merged 5 commits into
mainfrom
attachment-fixes
Mar 25, 2026
Merged

Attachment fixes to address inconsistencies between DB and backend store#1672
binwiederhier merged 5 commits into
mainfrom
attachment-fixes

Conversation

@binwiederhier
Copy link
Copy Markdown
Owner

@binwiederhier binwiederhier commented Mar 24, 2026

Summary

Fixes attachment handling to prevent orphaned S3/filesystem files and phantom DB records (DB records pointing to missing files). These issues were caused by race conditions during server restarts and missing safeguards in the cleanup logic.

Attachment expiry cap

  • attachment_expires is now capped to never exceed message.expires, ensuring attachments can never outlive their message

Single owner for file deletion

  • Before: Both pruneAttachments(), pruneMessages(), and sync() could delete attachment files, creating race conditions during restarts (files deleted but DB rows surviving → phantom records causing user-facing 404s)
  • After: Only sync() deletes files from storage. pruneAttachments() and pruneMessages() only update the DB. If the process is killed mid-operation, the worst case is orphaned files (cleaned up by sync()) rather than phantom DB records

Batched message/attachment pruning

  • Before: pruneMessages() deleted rows one-by-one in a transaction (242K individual DELETE WHERE mid=$1 over remote Postgres = 115 seconds, blocking the entire manager cycle)
  • After: Single DELETE ... WHERE mid IN (SELECT mid ... LIMIT $2) query, capped at ManagerBatchSize (default 30K) per cycle. Backlog clears over multiple 1-minute cycles (~2s each) instead of one long blocking operation

Missing Postgres index

  • Added idx_message_attachment_expires ON message (attachment_expires) WHERE attachment_deleted = FALSE — SQLite already had this, Postgres didn't. Without it, AttachmentsExpired and AttachmentsWithSizes queries did full table scans on 1M+ rows
  • Added Postgres migration system (14→15) matching the existing SQLite migration pattern

Configurable orphan grace period

  • orphanGracePeriod is now injected through the store constructor via Config.AttachmentOrphanGracePeriod instead of being a package-level constant

Removed dead code

  • Removed DeleteMessages(), MessagesExpired(), AttachmentsExpired(), MarkAttachmentsDeleted(), and their per-row queries — all replaced by the batched equivalents

Test plan

  • Tested with prod data snapshot (1M messages, 14K attachments) on staging with S3 backend — zero phantoms, zero orphans after settlement
  • Tested with prod data snapshot on staging with filesystem backend — zero new phantoms, zero orphans after settlement (3 pre-existing prod phantoms from backup timing)
  • Verified manager cycle times: first cycle ~7s (with backlog), steady state ~300ms (down from 168s)
  • Verified messy restarts no longer create phantom records
  • All existing tests pass

@binwiederhier binwiederhier merged commit d183af6 into main Mar 25, 2026
2 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant