Skip to content

Conversation

@sanketkedia
Copy link
Contributor

@sanketkedia sanketkedia commented Oct 21, 2025

Description of changes

Summarize the changes made by this PR.

  • Improvements & Bug fixes
    • If client passes None in schema (or don't pass in a schema) then we persist the ef as legacy. We should instead persist the ef in collection config
    • Reconcile now takes references instead of clones
    • Displays a better error msg to user showing what to do if they try to set index on special keys
  • New functionality
    • ...

Test plan

How are these changes tested?

  • Tests pass locally with pytest for python, yarn test for js, cargo test for rust

Migration plan

None

Observability plan

Staging

Documentation Changes

None

Copy link
Contributor Author

This stack of pull requests is managed by Graphite. Learn more about stacking.

@sanketkedia sanketkedia requested a review from jairad26 October 21, 2025 21:39
@sanketkedia sanketkedia marked this pull request as ready for review October 21, 2025 21:39
@propel-code-bot
Copy link
Contributor

propel-code-bot bot commented Oct 21, 2025

Refactor Schema/Config Reconciliation and Embedding Function Persistence

This PR overhauls how schema and collection configuration are reconciled when one or both are None or default and ensures proper persistence and propagation of the embedding function (ef) and vector space (space). Major logic changes include reference-based reconciliation routines to prevent unnecessary cloning, improved validation around special key handling, and the consistent use of configuration-derived ef/space when schema is missing or none. The PR introduces expanded and more robust field-level checks for determining schema defaultness and introduces extensive new tests in both Rust and Python to validate these behaviors.

Key Changes

• Refactored Schema::reconcile_schema_and_config and related methods to operate via reference parameters instead of value, improving efficiency and clarity
• Updated reconciliation logic to ensure that when schema is None or default, ef and space are always populated from configuration instead of persisting legacy defaults
• Significantly expanded logic to determine if a schema is default using per-field and per-key checks, subsuming previously brittle equality comparisons
• Rewrote large parts of the reconciliation logic to avoid ambiguity when both config and schema are set, now erroring with clear messages if both are non-default
• Strengthened schema validation in validators.rs to enforce that the special keys #document and #embedding only allow permitted value types, and improved error messaging for user misconfigurations
• Adjusted all downstream reconciliation calls and integration points (frontend, distributed segment, sysdb provider) to use references and the new reconciliation signature
• Added and rewrote Rust unit tests and end-to-end Python tests to explicitly verify reconciliation behavior, embedding function persistence, and error conditions
• Enhanced Python client error reporting for attempts to index special keys

Affected Areas

rust/types/src/collection_schema.rs
rust/types/src/collection_configuration.rs
rust/types/src/validators.rs
chromadb/test/api/test_schema_e2e.py
rust/frontend/src/impls/service_based_frontend.rs
rust/frontend/src/get_collection_with_segments_provider.rs
rust/types/src/collection.rs
rust/segment/src/distributed_spann.rs
chromadb/api/types.py

This summary was automatically generated by @propel-code-bot

@github-actions
Copy link

Reviewer Checklist

Please leverage this checklist to ensure your code review is thorough before approving

Testing, Bugs, Errors, Logs, Documentation

  • Can you think of any use case in which the code does not behave as intended? Have they been tested?
  • Can you think of any inputs or external events that could break the code? Is user input validated and safe? Have they been tested?
  • If appropriate, are there adequate property based tests?
  • If appropriate, are there adequate unit tests?
  • Should any logging, debugging, tracing information be added or removed?
  • Are error messages user-friendly?
  • Have all documentation changes needed been made?
  • Have all non-obvious changes been commented?

System Compatibility

  • Are there any potential impacts on other parts of the system or backward compatibility?
  • Does this change intersect with any items on our roadmap, and if so, is there a plan for fitting them together?

Quality

  • Is this code of a unexpectedly high quality (Readability, Modularity, Intuitiveness)

@sanketkedia sanketkedia force-pushed the 10-21-_bug_populate_ef_and_space_from_config_if_schema_is_none branch from 34b3e27 to 28b3658 Compare October 22, 2025 01:40
@blacksmith-sh blacksmith-sh bot deleted a comment from sanketkedia Oct 22, 2025
@sanketkedia sanketkedia merged commit c9e3652 into main Oct 22, 2025
118 of 122 checks passed
sanketkedia added a commit that referenced this pull request Oct 22, 2025
This PR cherry-picks the commit c9e3652
onto release/2025-10-17. If there are unresolved conflicts, please
resolve them manually.

---------

Co-authored-by: Sanket Kedia <[email protected]>
Co-authored-by: Sanket Kedia <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants