-
Notifications
You must be signed in to change notification settings - Fork 2k
[ENH]: GCv2: add grace period for transitioning soft deleted collections -> hard deleted #4719
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
This stack of pull requests is managed by Graphite. Learn more about stacking. |
Reviewer ChecklistPlease leverage this checklist to ensure your code review is thorough before approving Testing, Bugs, Errors, Logs, Documentation
System Compatibility
Quality
|
| query = query.Where("databases.tenant_id = ?", tenantID) | ||
| } | ||
| if ids != nil && len(ids) > 0 { | ||
| if ids != nil { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
small bug: this method should not return all collections if ids: []
af62c95 to
8e4c3ef
Compare
|
GCv2: Add Grace Period Before Hard Deletion of Soft-Deleted Collections This PR introduces a new configurable grace period for GCv2, so soft-deleted collections are only eligible for hard deletion after a defined time. It updates the entire stack (IDLs, Go sysdb service, Rust sysdb/types/GC) to propagate 'updated_at' timestamps and properly filter collections, ensuring compatibility and backward migrations. Key Changes: Affected Areas: Potential Impact: Functionality: Soft-deleted collections will remain recoverable for configurable grace period before permanent hard deletion. All collection APIs now expose 'updated_at'. Performance: Negligible impact; an extra proto field and filtering step in queries. May increase volume of soft-deleted but not-yet-deleted data. Security: No direct security implication. Grace period could extend the window for restoration/cancellation of accidental deletion. Scalability: May increase temporary soft-deleted collection count in large deployments until hard delete occurs. Review Focus: Testing Needed• Code Quality Assessmentidl/chromadb/proto/*: API changes well-documented; introduces wrapper types for clear intent and migration safety. rust/garbage_collector/src/garbage_collector_component.rs: Complex but well-structured; clear variable renames; thorough in updating logic and tests. rust/garbage_collector/src/garbage_collector_orchestrator_v2.rs: Well-refactored; now explicitly checks cutoff timestamp and handles errors better. go/pkg/sysdb/grpc/collection_service.go: Clean handling of new ids_filter and proto changes; careful translation of request/response. go/pkg/sysdb/metastore/db/dao/collection.go: Small but important bugfix for empty slice queries; new filter logic is correct. rust/types/src/collection.rs: Pragmatic backward/forward compatibility for updated_at; includes TODO for migration cutoff. Best PracticesBackward Compatibility: Robustness: Explicit Configuration: Potential Issues• Rolling deployment with mixed old/new binaries could result in default 'now' for missing updated_at, causing possible premature hard deletes (mitigated by TODO and migration guidance). This summary was automatically generated by @propel-code-bot |
ed8da7e to
6405256
Compare
rust/types/src/collection.rs
Outdated
| let updated_at = | ||
| proto_collection | ||
| .updated_at | ||
| .ok_or(CollectionConversionError::MissingField( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will this fail queries during upgrade for e.g. if query service is on a newer version with this change and sysdb is on an older version without this change?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, updated
6405256 to
0a5dd83
Compare
| optional int32 limit = 6; | ||
| optional int32 offset = 7; | ||
| repeated string ids = 8; | ||
| optional CollectionIdsFilter ids_filter = 8; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this wrapper type is unfortunately needed because otherwise we can't distinguish the case when no IDs filter is provided and when an empty IDs filter ([]) is provided
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But is it backwards and forwards compatible? For e.g. if old sysdb receives the new type, it will error out trying to deserialize this field (vice-versa)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
discussed offline--yes, it's ok as long as the change that introduced GetCollectionsRequest.ids doesn't get promoted to prod separately from this change
…ns -> hard deleted
0a5dd83 to
e81e954
Compare
| ) -> Result<HashSet<CollectionUuid>, GarbageCollectorError> { | ||
| let soft_delete_statuses = self | ||
| .sysdb_client | ||
| .batch_get_collection_soft_delete_status(all_collection_ids.clone()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should we directly use get_collections to get the information for all of these collections? this could reduce a round trip
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we could, but I thought originally when we discussed this we decided to not add is_deleted to the main collection object
|
|
||
| let collections = self | ||
| .sysdb_client | ||
| .get_collections(GetCollectionsOptions { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why can't you directly use this for all_collection_ids? why do you need a separate batch_get_collection_soft_delete_status?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we would need to add is_deleted to the Collection struct which seems a little strange
| let soft_delete_statuses = self | ||
| .sysdb_client | ||
| .batch_get_collection_soft_delete_status(all_collection_ids.clone()) | ||
| .await |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
seems like this api is a subset of sysdb.get_collections()? It only returns the soft delete status whereas the get_collections() call returns the soft delete status as well as other info?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
discussed offline--can potentially merge the APIs in the future
…ons -> hard deleted (chroma-core#4719) ## Description of changes - Return collection `updated_at` timestamp from the SysDb. - Added a new `collection_soft_delete_grace_period` config parameter, defaults to 1 day. - GCv2 will now ignore soft-deleted collections that are newer than the grace period (instead of immediately hard deleting them). ## Test plan _How are these changes tested?_ - [x] Tests pass locally with `pytest` for python, `yarn test` for js, `cargo test` for rust Updated existing test to validate that collections are not deleted when inside grace period. ## Documentation Changes _Are all docstrings for user-facing APIs updated if required? Do we need to make documentation changes in the [docs section](https://github.com/chroma-core/chroma/tree/main/docs/docs.trychroma.com)?_ n/a

Description of changes
updated_attimestamp from the SysDb.collection_soft_delete_grace_periodconfig parameter, defaults to 1 day.Test plan
How are these changes tested?
pytestfor python,yarn testfor js,cargo testfor rustUpdated existing test to validate that collections are not deleted when inside grace period.
Documentation Changes
Are all docstrings for user-facing APIs updated if required? Do we need to make documentation changes in the docs section?
n/a