Skip to content

Conversation

@sanketkedia
Copy link
Contributor

@sanketkedia sanketkedia commented Oct 21, 2025

Description of changes

Summarize the changes made by this PR.

  • Improvements & Bug fixes
    • Adds range validations on various params and validates them
  • New functionality
    • ...

Test plan

How are these changes tested?
Added unit tests
Tested on local tilt

  • Tests pass locally with pytest for python, yarn test for js, cargo test for rust

Migration plan

None

Observability plan

Validate in staging once deployed

Documentation Changes

None

Copy link
Contributor Author

This stack of pull requests is managed by Graphite. Learn more about stacking.

@github-actions
Copy link

Reviewer Checklist

Please leverage this checklist to ensure your code review is thorough before approving

Testing, Bugs, Errors, Logs, Documentation

  • Can you think of any use case in which the code does not behave as intended? Have they been tested?
  • Can you think of any inputs or external events that could break the code? Is user input validated and safe? Have they been tested?
  • If appropriate, are there adequate property based tests?
  • If appropriate, are there adequate unit tests?
  • Should any logging, debugging, tracing information be added or removed?
  • Are error messages user-friendly?
  • Have all documentation changes needed been made?
  • Have all non-obvious changes been commented?

System Compatibility

  • Are there any potential impacts on other parts of the system or backward compatibility?
  • Does this change intersect with any items on our roadmap, and if so, is there a plan for fitting them together?

Quality

  • Is this code of a unexpectedly high quality (Readability, Modularity, Intuitiveness)

@sanketkedia sanketkedia requested a review from jairad26 October 21, 2025 16:32
@sanketkedia sanketkedia marked this pull request as ready for review October 21, 2025 16:32
@propel-code-bot
Copy link
Contributor

propel-code-bot bot commented Oct 21, 2025

Add Validation to Restrict User-Supplied Vector Index Parameters

This PR introduces systematic input validation for user-supplied parameters in vector index configuration structs (HnswIndexConfig and SpannIndexConfig) in rust/types/src/collection_schema.rs. Numeric fields in these structs now have explicit range limits enforced via the validator crate. The PR updates the schema merging logic to properly invoke validation on merged index configs, ensuring users cannot apply out-of-bounds or noncompliant values for vector index tuning. Additional comprehensive unit tests have been added to verify that both valid and invalid configuration values behave as intended, and merging logic remains robust.

Key Changes

• Added validator::Validate trait to HnswIndexConfig and SpannIndexConfig with field-level range restrictions (e.g., min/max constraints on fields like batch_size, split_threshold, initial_lambda, etc.)
• Introduced new method validate_float_list_value_type() and invoked it during schema merging to ensure configs are validated after merging defaults and user-specific settings.
• Modified the merging functions for relevant structs (merge_float_list_type, merge_vector_index_type, merge_vector_index_config, etc.) to work with updated validation, and changed many Result-returning merges to direct Option/struct returns.
• Adjusted affected tests to reflect the new validation constraints, covering a full range of passing/failing cases for all new field restrictions.
• Refactored derive macros for the config structs (adding/removing derives as needed, e.g., Default and Validate).
• Expanded the unit test suite significantly (including edge and out-of-range cases for every validated field), and ensured that test data and checks reflect and enforce the new API contract.

Affected Areas

rust/types/src/collection_schema.rs (major schema config structs and logic): validation, merging, tests
• Internal merging and validation logic for schema configurations
• Unit test coverage for config validation and merging

This summary was automatically generated by @propel-code-bot

Comment on lines +975 to +988
fn validate_float_list_value_type(float_list: &FloatListValueType) -> Result<(), String> {
if let Some(vector_index) = &float_list.vector_index {
if let Some(hnsw) = &vector_index.config.hnsw {
hnsw.validate()
.map_err(|e| format!("Invalid HNSW configuration: {}", e))?;
}
if let Some(spann) = &vector_index.config.spann {
spann
.validate()
.map_err(|e| format!("Invalid SPANN configuration: {}", e))?;
}
}
Ok(())
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[BestPractice]

Validating the individual HNSW and SPANN configurations is a great addition. However, it appears a merged configuration could potentially have both hnsw and spann fields set to Some, which represents an invalid state as enforced later during the conversion to InternalCollectionConfiguration. It would be more robust to validate this constraint here to prevent the creation of an invalid schema.

Consider adding a check to ensure that both configurations are not present simultaneously.

Suggested Change
Suggested change
fn validate_float_list_value_type(float_list: &FloatListValueType) -> Result<(), String> {
if let Some(vector_index) = &float_list.vector_index {
if let Some(hnsw) = &vector_index.config.hnsw {
hnsw.validate()
.map_err(|e| format!("Invalid HNSW configuration: {}", e))?;
}
if let Some(spann) = &vector_index.config.spann {
spann
.validate()
.map_err(|e| format!("Invalid SPANN configuration: {}", e))?;
}
}
Ok(())
}
fn validate_float_list_value_type(float_list: &FloatListValueType) -> Result<(), String> {
if let Some(vector_index) = &float_list.vector_index {
if vector_index.config.hnsw.is_some() && vector_index.config.spann.is_some() {
return Err(
"A vector index can have HNSW or SPANN configuration, but not both".to_string(),
);
}
if let Some(hnsw) = &vector_index.config.hnsw {
hnsw.validate()
.map_err(|e| format!("Invalid HNSW configuration: {}", e))?;
}
if let Some(spann) = &vector_index.config.spann {
spann
.validate()
.map_err(|e| format!("Invalid SPANN configuration: {}", e))?;
}
}
Ok(())
}

Committable suggestion

Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation.

Context for Agents
[**BestPractice**]

Validating the individual HNSW and SPANN configurations is a great addition. However, it appears a merged configuration could potentially have both `hnsw` and `spann` fields set to `Some`, which represents an invalid state as enforced later during the conversion to `InternalCollectionConfiguration`. It would be more robust to validate this constraint here to prevent the creation of an invalid schema.

Consider adding a check to ensure that both configurations are not present simultaneously.

<details>
<summary>Suggested Change</summary>

```suggestion
    fn validate_float_list_value_type(float_list: &FloatListValueType) -> Result<(), String> {
        if let Some(vector_index) = &float_list.vector_index {
            if vector_index.config.hnsw.is_some() && vector_index.config.spann.is_some() {
                return Err(
                    "A vector index can have HNSW or SPANN configuration, but not both".to_string(),
                );
            }
            if let Some(hnsw) = &vector_index.config.hnsw {
                hnsw.validate()
                    .map_err(|e| format!("Invalid HNSW configuration: {}", e))?;
            }
            if let Some(spann) = &vector_index.config.spann {
                spann
                    .validate()
                    .map_err(|e| format!("Invalid SPANN configuration: {}", e))?;
            }
        }
        Ok(())
    }
```

⚡ **Committable suggestion**

Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation.

</details>

File: rust/types/src/collection_schema.rs
Line: 988

@sanketkedia sanketkedia enabled auto-merge (squash) October 21, 2025 17:26
@sanketkedia sanketkedia merged commit 197ea94 into main Oct 21, 2025
62 checks passed
@blacksmith-sh blacksmith-sh bot deleted a comment from sanketkedia Oct 21, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants