Skip to content

Conversation

@codetheweb
Copy link
Contributor

@codetheweb codetheweb commented Jun 2, 2025

Description of changes

  • Return collection updated_at timestamp from the SysDb.
  • Added a new collection_soft_delete_grace_period config parameter, defaults to 1 day.
  • GCv2 will now ignore soft-deleted collections that are newer than the grace period (instead of immediately hard deleting them).

Test plan

How are these changes tested?

  • Tests pass locally with pytest for python, yarn test for js, cargo test for rust

Updated existing test to validate that collections are not deleted when inside grace period.

Documentation Changes

Are all docstrings for user-facing APIs updated if required? Do we need to make documentation changes in the docs section?

n/a

Copy link
Contributor Author

This stack of pull requests is managed by Graphite. Learn more about stacking.

@github-actions
Copy link

github-actions bot commented Jun 2, 2025

Reviewer Checklist

Please leverage this checklist to ensure your code review is thorough before approving

Testing, Bugs, Errors, Logs, Documentation

  • Can you think of any use case in which the code does not behave as intended? Have they been tested?
  • Can you think of any inputs or external events that could break the code? Is user input validated and safe? Have they been tested?
  • If appropriate, are there adequate property based tests?
  • If appropriate, are there adequate unit tests?
  • Should any logging, debugging, tracing information be added or removed?
  • Are error messages user-friendly?
  • Have all documentation changes needed been made?
  • Have all non-obvious changes been commented?

System Compatibility

  • Are there any potential impacts on other parts of the system or backward compatibility?
  • Does this change intersect with any items on our roadmap, and if so, is there a plan for fitting them together?

Quality

  • Is this code of a unexpectedly high quality (Readability, Modularity, Intuitiveness)

@codetheweb codetheweb changed the title [ENH]: GCv2: add cutoff time for transitioning soft deleted collections -> hard deleted [ENH]: GCv2: add grace period for transitioning soft deleted collections -> hard deleted Jun 2, 2025
query = query.Where("databases.tenant_id = ?", tenantID)
}
if ids != nil && len(ids) > 0 {
if ids != nil {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

small bug: this method should not return all collections if ids: []

@codetheweb codetheweb force-pushed the feat-gc-v2-soft-delete-grace-period branch 2 times, most recently from af62c95 to 8e4c3ef Compare June 2, 2025 23:51
@codetheweb codetheweb marked this pull request as ready for review June 2, 2025 23:52
@codetheweb codetheweb requested a review from Sicheng-Pan June 2, 2025 23:52
@propel-code-bot
Copy link
Contributor

propel-code-bot bot commented Jun 2, 2025

GCv2: Add Grace Period Before Hard Deletion of Soft-Deleted Collections

This PR introduces a new configurable grace period for GCv2, so soft-deleted collections are only eligible for hard deletion after a defined time. It updates the entire stack (IDLs, Go sysdb service, Rust sysdb/types/GC) to propagate 'updated_at' timestamps and properly filter collections, ensuring compatibility and backward migrations.

Key Changes:
• Added 'collection_soft_delete_grace_period' config to garbage-collector (default: 1 day).
• Collections now have an 'updated_at' timestamp populated and wired end-to-end (database, proto, Go, Rust, tests).
• GCv2 logic updated: collections remain in soft-deleted state until grace period elapses.
• Proto API (coordinator.proto/chroma.proto) refactored -- IDs filter now uses a wrapper, and 'updated_at' is added to the Collection message.
• Go sysdb updated to support new filtering semantics and to return timestamps.
• Rust sysdb/types/GC stack updated to use the new timestamp field, including backward-compatibility code and extended tests.
• Logic in GCv2 now checks 'updated_at' before transitioning soft-deleted collections to hard-delete.
• Fixed retrieval logic to handle empty IDs filters properly (does not match all collections).
• Test coverage updated and extended for grace period cases.

Affected Areas:
• Garbage Collector (Rust)
• SysDb API (Go and Rust)
• IDLs (Protobufs: chroma.proto, coordinator.proto)
• Collection model/types
• Go sysdb GRPC server, catalog, and DB DAO
• GCv2 orchestrators and tests

Potential Impact:

Functionality: Soft-deleted collections will remain recoverable for configurable grace period before permanent hard deletion. All collection APIs now expose 'updated_at'.

Performance: Negligible impact; an extra proto field and filtering step in queries. May increase volume of soft-deleted but not-yet-deleted data.

Security: No direct security implication. Grace period could extend the window for restoration/cancellation of accidental deletion.

Scalability: May increase temporary soft-deleted collection count in large deployments until hard delete occurs.

Review Focus:
• Correctness of updated_at propagation across stack (Go, Rust, proto).
• Compatibility: does partial upgrade break any core queries?
• Semantic correctness in filtering logic (Go DAO and server).
• Correct wiring of collection_soft_delete_grace_period config everywhere it's used.
• Thoroughness of test coverage, especially for grace period scenarios.

Testing Needed

GCv2 end-to-end scenarios: soft-delete, wait less/more than grace period, verify hard delete only occurs after expiration.
• Grpc/REST APIs: collection CRUD, check updated_at returned and filters honored.
• Rolling upgrades: test compatibility with older sysdb/query nodes.
• Migration: verify manual and automated rollbacks do not cause errors.

Code Quality Assessment

idl/chromadb/proto/*: API changes well-documented; introduces wrapper types for clear intent and migration safety.

rust/garbage_collector/src/garbage_collector_component.rs: Complex but well-structured; clear variable renames; thorough in updating logic and tests.

rust/garbage_collector/src/garbage_collector_orchestrator_v2.rs: Well-refactored; now explicitly checks cutoff timestamp and handles errors better.

go/pkg/sysdb/grpc/collection_service.go: Clean handling of new ids_filter and proto changes; careful translation of request/response.

go/pkg/sysdb/metastore/db/dao/collection.go: Small but important bugfix for empty slice queries; new filter logic is correct.

rust/types/src/collection.rs: Pragmatic backward/forward compatibility for updated_at; includes TODO for migration cutoff.

Best Practices

Backward Compatibility:
• Handles absent updated_at in proto structs gracefully until full migration.
• Uses wrapper type for id filters to distinguish unset vs. empty list.

Robustness:
• Extensive test updates for edge cases and rollout scenarios.
• Error handling improved throughout stack.

Explicit Configuration:
• New grace period parameter is explicitly documented and perfectly defaults for safety.
• Code docs and in-code comments provide migration/TODO guidance.

Potential Issues

• Rolling deployment with mixed old/new binaries could result in default 'now' for missing updated_at, causing possible premature hard deletes (mitigated by TODO and migration guidance).
• Potential edge case: if sysdb or a client misses updated_at for new collections (transitional state), could get incorrect deletion timing.
• Grace period currently only checked at GC time, so immediate recoverability from soft-deletes may be affected by GC interval.

This summary was automatically generated by @propel-code-bot

@codetheweb codetheweb force-pushed the feat-gc-v2-soft-delete-grace-period branch 2 times, most recently from ed8da7e to 6405256 Compare June 3, 2025 17:16
@codetheweb codetheweb requested a review from sanketkedia June 3, 2025 18:46
let updated_at =
proto_collection
.updated_at
.ok_or(CollectionConversionError::MissingField(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will this fail queries during upgrade for e.g. if query service is on a newer version with this change and sysdb is on an older version without this change?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, updated

@codetheweb codetheweb force-pushed the feat-gc-v2-soft-delete-grace-period branch from 6405256 to 0a5dd83 Compare June 3, 2025 19:44
optional int32 limit = 6;
optional int32 offset = 7;
repeated string ids = 8;
optional CollectionIdsFilter ids_filter = 8;
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this wrapper type is unfortunately needed because otherwise we can't distinguish the case when no IDs filter is provided and when an empty IDs filter ([]) is provided

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But is it backwards and forwards compatible? For e.g. if old sysdb receives the new type, it will error out trying to deserialize this field (vice-versa)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

discussed offline--yes, it's ok as long as the change that introduced GetCollectionsRequest.ids doesn't get promoted to prod separately from this change

@codetheweb codetheweb force-pushed the feat-gc-v2-soft-delete-grace-period branch from 0a5dd83 to e81e954 Compare June 3, 2025 22:18
@codetheweb codetheweb requested a review from sanketkedia June 3, 2025 22:18
) -> Result<HashSet<CollectionUuid>, GarbageCollectorError> {
let soft_delete_statuses = self
.sysdb_client
.batch_get_collection_soft_delete_status(all_collection_ids.clone())
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should we directly use get_collections to get the information for all of these collections? this could reduce a round trip

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we could, but I thought originally when we discussed this we decided to not add is_deleted to the main collection object


let collections = self
.sysdb_client
.get_collections(GetCollectionsOptions {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why can't you directly use this for all_collection_ids? why do you need a separate batch_get_collection_soft_delete_status?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

https://github.com/chroma-core/chroma/pull/4719/files/e81e954f928ec18a7c144e4c4607939657d1500b#r2125184184

we would need to add is_deleted to the Collection struct which seems a little strange

let soft_delete_statuses = self
.sysdb_client
.batch_get_collection_soft_delete_status(all_collection_ids.clone())
.await
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

seems like this api is a subset of sysdb.get_collections()? It only returns the soft delete status whereas the get_collections() call returns the soft delete status as well as other info?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

discussed offline--can potentially merge the APIs in the future

@codetheweb codetheweb merged commit 77698dc into main Jun 6, 2025
72 checks passed
@codetheweb codetheweb deleted the feat-gc-v2-soft-delete-grace-period branch June 6, 2025 00:58
Inventrohyder pushed a commit to Inventrohyder/chroma that referenced this pull request Aug 5, 2025
…ons -> hard deleted (chroma-core#4719)

## Description of changes

- Return collection `updated_at` timestamp from the SysDb.
- Added a new `collection_soft_delete_grace_period` config parameter,
defaults to 1 day.
- GCv2 will now ignore soft-deleted collections that are newer than the
grace period (instead of immediately hard deleting them).

## Test plan

_How are these changes tested?_

- [x] Tests pass locally with `pytest` for python, `yarn test` for js,
`cargo test` for rust

Updated existing test to validate that collections are not deleted when
inside grace period.

## Documentation Changes

_Are all docstrings for user-facing APIs updated if required? Do we need
to make documentation changes in the [docs
section](https://github.com/chroma-core/chroma/tree/main/docs/docs.trychroma.com)?_

n/a
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants