-
Notifications
You must be signed in to change notification settings - Fork 2k
[ENH]: Add ability to set different block sizes for different blockfiles #4948
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[ENH]: Add ability to set different block sizes for different blockfiles #4948
Conversation
This stack of pull requests is managed by Graphite. Learn more about stacking. |
Reviewer ChecklistPlease leverage this checklist to ensure your code review is thorough before approving Testing, Bugs, Errors, Logs, Documentation
System Compatibility
Quality
|
|
Enable Per-Blockfile Custom Block Sizes and Metadata Migration This PR introduces support for specifying different max block sizes for individual blockfiles. The blockfile versioning and Arrow metadata schema are updated to persist block size information per file, with corresponding migration logic to maintain backward compatibility. The primary motivator is to allow the Spann posting list blockfiles to use a distinct block size (defaulted to 5MiB) while leaving the default 8MiB setting for other blockfile types. Extensive tests and migration scenarios are included to ensure correctness and compatibility across prior versions. Key Changes: Affected Areas: This summary was automatically generated by @propel-code-bot |
rust/index/src/spann/types.rs
Outdated
|
|
||
| use super::utils::{rng_query, KMeansAlgorithmInput, KMeansError, RngQueryError}; | ||
|
|
||
| const PL_BLOCK_SIZE: usize = 5 * 1024 * 1024; // 5 MiB |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should add a justification for why this
f99a00a to
ab8839e
Compare
56c1e29 to
3c09498
Compare
| Ok(()) | ||
| } | ||
|
|
||
| async fn migrate_v1_to_v1_1( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should update this method name
c13149b to
561c851
Compare
3c09498 to
adfa3f8
Compare
561c851 to
9f05434
Compare
adfa3f8 to
27cdd92
Compare
9f05434 to
28e3c9e
Compare
27cdd92 to
1b00d59
Compare
28e3c9e to
05d4906
Compare
1b00d59 to
0b3c66e
Compare
f9ead28 to
2293092
Compare
e7c50b5 to
4aeb57d
Compare
…les (chroma-core#4948) ## Description of changes _Summarize the changes made by this PR._ - Improvements & Bug fixes - We persist the max_block_size_bytes in the arrow metadata now and also add a new blockfile version for handling backward compatibility with old data - This is need so that spann posting list can have a block size different from other blockfiles - This PR also sets the spann PL block size to 5MiB - New functionality - ... ## Test plan _How are these changes tested?_ - [x] Tests pass locally with `pytest` for python, `yarn test` for js, `cargo test` for rust ## Documentation Changes None

Description of changes
Summarize the changes made by this PR.
Test plan
How are these changes tested?
pytestfor python,yarn testfor js,cargo testfor rustDocumentation Changes
None