Skip to content

Fix wrong schema for Faiss SQ.#6169

Merged
rishabh6788 merged 1 commit into
opensearch-project:mainfrom
0ctopus13prime:faiss-sq-schema-fix
Apr 30, 2026
Merged

Fix wrong schema for Faiss SQ.#6169
rishabh6788 merged 1 commit into
opensearch-project:mainfrom
0ctopus13prime:faiss-sq-schema-fix

Conversation

@0ctopus13prime
Copy link
Copy Markdown
Contributor

Description

Fixing Faiss SQ's invalid schema.
Make it use faiss-sq-1bit-index.json.

Issues Resolved

N/A

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented Apr 30, 2026

PR Reviewer Guide 🔍

(Review updated until commit 412b743)

Here are some key observations to aid the review process:

🧪 No relevant tests
🔒 No security concerns identified
✅ No TODO sections
🔀 No multiple PR themes
⚡ Recommended focus areas for review

Schema Name Mismatch

The new SQ entries use faiss-sq-1bit-index.json as the index body schema. The file name suggests a "1bit" quantization, but the encoder is set to "encoder":"sq" (scalar quantization), not a 1-bit binary encoder. It should be verified that faiss-sq-1bit-index.json is the correct schema file for standard SQ (scalar quantization) and not specific to 1-bit quantization, to avoid a mismatch between the schema and the encoder configuration.

H 3 * * * %BUNDLE_MANIFEST_URL=https://ci.opensearch.org/ci/dbc/distribution-build-opensearch/3.7.0/latest/linux/arm64/tar/dist/opensearch/manifest.yml;TEST_WORKLOAD=vectorsearch;SINGLE_NODE_CLUSTER=false;DATA_NODE_COUNT=2;DATA_INSTANCE_TYPE=r7gd.2xlarge;USER_TAGS=run-type:nightly,segrep:disabled,arch:arm64,instance-type:r7gd.2xlarge,major-version:3x,cluster-config:arm64-r7gd.2xlarge-2-data-5-shards-1-replica-multi-faiss-sq-cohere-10m;ADDITIONAL_CONFIG=knn.algo_param.index_thread_qty:2;WORKLOAD_PARAMS={"target_index_name":"target_index","target_field_name":"target_field","target_index_body":"indices/faiss-sq-1bit-index.json","target_index_primary_shards":5,"target_index_replica_shards":1,"target_index_dimension":768,"target_index_space_type":"innerproduct","target_index_bulk_size":100,"target_index_bulk_index_data_set_format":"hdf5","target_index_bulk_index_data_set_corpus":"cohere-10m","target_index_bulk_indexing_clients":10,"target_index_max_num_segments":30,"hnsw_ef_search":256,"hnsw_ef_construction":256,"encoder":"sq","query_k":100,"query_body":{"docvalue_fields":["_id"],"stored_fields":"_none_"},"query_data_set_format":"hdf5","query_data_set_corpus":"cohere-10m","query_count":10000};CAPTURE_NODE_STAT=true;ENABLE_INSTANCE_BASED_STORAGE=true;TELEMETRY_PARAMS={"node-stats-sample-interval":5}

H 3 * * * %BUNDLE_MANIFEST_URL=https://ci.opensearch.org/ci/dbc/distribution-build-opensearch/3.7.0/latest/linux/arm64/tar/dist/opensearch/manifest.yml;TEST_WORKLOAD=vectorsearch;SINGLE_NODE_CLUSTER=false;DATA_NODE_COUNT=2;DATA_INSTANCE_TYPE=r7gd.2xlarge;USER_TAGS=run-type:nightly,segrep:disabled,arch:arm64,instance-type:r7gd.2xlarge,major-version:3x,cluster-config:arm64-r7gd.2xlarge-2-data-5-shards-1-replica-faiss-sq-cohere-10m;ADDITIONAL_CONFIG=knn.algo_param.index_thread_qty:2;WORKLOAD_PARAMS={"target_index_name":"target_index","target_field_name":"target_field","target_index_body":"indices/faiss-sq-1bit-index.json","target_index_primary_shards":5,"target_index_replica_shards":1,"target_index_dimension":768,"target_index_space_type":"innerproduct","target_index_bulk_size":100,"target_index_bulk_index_data_set_format":"hdf5","target_index_bulk_index_data_set_corpus":"cohere-10m","target_index_bulk_indexing_clients":10,"target_index_max_num_segments":1,"hnsw_ef_search":256,"hnsw_ef_construction":256,"encoder":"sq","query_k":100,"query_body":{"docvalue_fields":["_id"],"stored_fields":"_none_"},"query_data_set_format":"hdf5","query_data_set_corpus":"cohere-10m","query_count":10000};CAPTURE_NODE_STAT=true;ENABLE_INSTANCE_BASED_STORAGE=true;TELEMETRY_PARAMS={"node-stats-sample-interval":5}

H 3 * * * %BUNDLE_MANIFEST_URL=https://ci.opensearch.org/ci/dbc/distribution-build-opensearch/3.7.0/latest/linux/arm64/tar/dist/opensearch/manifest.yml;TEST_WORKLOAD=vectorsearch;SINGLE_NODE_CLUSTER=false;DATA_NODE_COUNT=2;DATA_INSTANCE_TYPE=r7gd.xlarge;USER_TAGS=run-type:nightly,segrep:disabled,arch:arm64,instance-type:r7gd.xlarge,major-version:3x,cluster-config:arm64-r7gd.xlarge-2-data-5-shards-1-replica-constrained-multi-faiss-sq-cohere-10m;ADDITIONAL_CONFIG=knn.algo_param.index_thread_qty:1;WORKLOAD_PARAMS={"target_index_name":"target_index","target_field_name":"target_field","target_index_body":"indices/faiss-sq-1bit-index.json","target_index_primary_shards":5,"target_index_replica_shards":1,"target_index_dimension":768,"target_index_space_type":"innerproduct","target_index_bulk_size":100,"target_index_bulk_index_data_set_format":"hdf5","target_index_bulk_index_data_set_corpus":"cohere-10m","target_index_bulk_indexing_clients":5,"target_index_max_num_segments":30,"hnsw_ef_search":256,"hnsw_ef_construction":256,"encoder":"sq","query_k":100,"query_body":{"docvalue_fields":["_id"],"stored_fields":"_none_"},"query_data_set_format":"hdf5","query_data_set_corpus":"cohere-10m","query_count":10000};CAPTURE_NODE_STAT=true;ENABLE_INSTANCE_BASED_STORAGE=true;TELEMETRY_PARAMS={"node-stats-sample-interval":5}

H 3 * * * %BUNDLE_MANIFEST_URL=https://ci.opensearch.org/ci/dbc/distribution-build-opensearch/3.7.0/latest/linux/arm64/tar/dist/opensearch/manifest.yml;TEST_WORKLOAD=vectorsearch;SINGLE_NODE_CLUSTER=false;DATA_NODE_COUNT=2;DATA_INSTANCE_TYPE=r7gd.xlarge;USER_TAGS=run-type:nightly,segrep:disabled,arch:arm64,instance-type:r7gd.xlarge,major-version:3x,cluster-config:arm64-r7gd.xlarge-2-data-5-shards-1-replica-constrained-faiss-sq-cohere-10m;ADDITIONAL_CONFIG=knn.algo_param.index_thread_qty:1;WORKLOAD_PARAMS={"target_index_name":"target_index","target_field_name":"target_field","target_index_body":"indices/faiss-sq-1bit-index.json","target_index_primary_shards":5,"target_index_replica_shards":1,"target_index_dimension":768,"target_index_space_type":"innerproduct","target_index_bulk_size":100,"target_index_bulk_index_data_set_format":"hdf5","target_index_bulk_index_data_set_corpus":"cohere-10m","target_index_bulk_indexing_clients":5,"target_index_max_num_segments":1,"hnsw_ef_search":256,"hnsw_ef_construction":256,"encoder":"sq","query_k":100,"query_body":{"docvalue_fields":["_id"],"stored_fields":"_none_"},"query_data_set_format":"hdf5","query_data_set_corpus":"cohere-10m","query_count":10000};CAPTURE_NODE_STAT=true;ENABLE_INSTANCE_BASED_STORAGE=true;TELEMETRY_PARAMS={"node-stats-sample-interval":5}
Removed mode parameter

The old SQ entries included "mode":"on_disk" in WORKLOAD_PARAMS, but the new entries do not include this parameter. It should be confirmed whether removing "mode":"on_disk" is intentional for the SQ schema fix, or if it was accidentally omitted.

H 3 * * * %BUNDLE_MANIFEST_URL=https://ci.opensearch.org/ci/dbc/distribution-build-opensearch/3.7.0/latest/linux/arm64/tar/dist/opensearch/manifest.yml;TEST_WORKLOAD=vectorsearch;SINGLE_NODE_CLUSTER=false;DATA_NODE_COUNT=2;DATA_INSTANCE_TYPE=r7gd.2xlarge;USER_TAGS=run-type:nightly,segrep:disabled,arch:arm64,instance-type:r7gd.2xlarge,major-version:3x,cluster-config:arm64-r7gd.2xlarge-2-data-5-shards-1-replica-multi-faiss-sq-cohere-10m;ADDITIONAL_CONFIG=knn.algo_param.index_thread_qty:2;WORKLOAD_PARAMS={"target_index_name":"target_index","target_field_name":"target_field","target_index_body":"indices/faiss-sq-1bit-index.json","target_index_primary_shards":5,"target_index_replica_shards":1,"target_index_dimension":768,"target_index_space_type":"innerproduct","target_index_bulk_size":100,"target_index_bulk_index_data_set_format":"hdf5","target_index_bulk_index_data_set_corpus":"cohere-10m","target_index_bulk_indexing_clients":10,"target_index_max_num_segments":30,"hnsw_ef_search":256,"hnsw_ef_construction":256,"encoder":"sq","query_k":100,"query_body":{"docvalue_fields":["_id"],"stored_fields":"_none_"},"query_data_set_format":"hdf5","query_data_set_corpus":"cohere-10m","query_count":10000};CAPTURE_NODE_STAT=true;ENABLE_INSTANCE_BASED_STORAGE=true;TELEMETRY_PARAMS={"node-stats-sample-interval":5}

H 3 * * * %BUNDLE_MANIFEST_URL=https://ci.opensearch.org/ci/dbc/distribution-build-opensearch/3.7.0/latest/linux/arm64/tar/dist/opensearch/manifest.yml;TEST_WORKLOAD=vectorsearch;SINGLE_NODE_CLUSTER=false;DATA_NODE_COUNT=2;DATA_INSTANCE_TYPE=r7gd.2xlarge;USER_TAGS=run-type:nightly,segrep:disabled,arch:arm64,instance-type:r7gd.2xlarge,major-version:3x,cluster-config:arm64-r7gd.2xlarge-2-data-5-shards-1-replica-faiss-sq-cohere-10m;ADDITIONAL_CONFIG=knn.algo_param.index_thread_qty:2;WORKLOAD_PARAMS={"target_index_name":"target_index","target_field_name":"target_field","target_index_body":"indices/faiss-sq-1bit-index.json","target_index_primary_shards":5,"target_index_replica_shards":1,"target_index_dimension":768,"target_index_space_type":"innerproduct","target_index_bulk_size":100,"target_index_bulk_index_data_set_format":"hdf5","target_index_bulk_index_data_set_corpus":"cohere-10m","target_index_bulk_indexing_clients":10,"target_index_max_num_segments":1,"hnsw_ef_search":256,"hnsw_ef_construction":256,"encoder":"sq","query_k":100,"query_body":{"docvalue_fields":["_id"],"stored_fields":"_none_"},"query_data_set_format":"hdf5","query_data_set_corpus":"cohere-10m","query_count":10000};CAPTURE_NODE_STAT=true;ENABLE_INSTANCE_BASED_STORAGE=true;TELEMETRY_PARAMS={"node-stats-sample-interval":5}

H 3 * * * %BUNDLE_MANIFEST_URL=https://ci.opensearch.org/ci/dbc/distribution-build-opensearch/3.7.0/latest/linux/arm64/tar/dist/opensearch/manifest.yml;TEST_WORKLOAD=vectorsearch;SINGLE_NODE_CLUSTER=false;DATA_NODE_COUNT=2;DATA_INSTANCE_TYPE=r7gd.xlarge;USER_TAGS=run-type:nightly,segrep:disabled,arch:arm64,instance-type:r7gd.xlarge,major-version:3x,cluster-config:arm64-r7gd.xlarge-2-data-5-shards-1-replica-constrained-multi-faiss-sq-cohere-10m;ADDITIONAL_CONFIG=knn.algo_param.index_thread_qty:1;WORKLOAD_PARAMS={"target_index_name":"target_index","target_field_name":"target_field","target_index_body":"indices/faiss-sq-1bit-index.json","target_index_primary_shards":5,"target_index_replica_shards":1,"target_index_dimension":768,"target_index_space_type":"innerproduct","target_index_bulk_size":100,"target_index_bulk_index_data_set_format":"hdf5","target_index_bulk_index_data_set_corpus":"cohere-10m","target_index_bulk_indexing_clients":5,"target_index_max_num_segments":30,"hnsw_ef_search":256,"hnsw_ef_construction":256,"encoder":"sq","query_k":100,"query_body":{"docvalue_fields":["_id"],"stored_fields":"_none_"},"query_data_set_format":"hdf5","query_data_set_corpus":"cohere-10m","query_count":10000};CAPTURE_NODE_STAT=true;ENABLE_INSTANCE_BASED_STORAGE=true;TELEMETRY_PARAMS={"node-stats-sample-interval":5}

H 3 * * * %BUNDLE_MANIFEST_URL=https://ci.opensearch.org/ci/dbc/distribution-build-opensearch/3.7.0/latest/linux/arm64/tar/dist/opensearch/manifest.yml;TEST_WORKLOAD=vectorsearch;SINGLE_NODE_CLUSTER=false;DATA_NODE_COUNT=2;DATA_INSTANCE_TYPE=r7gd.xlarge;USER_TAGS=run-type:nightly,segrep:disabled,arch:arm64,instance-type:r7gd.xlarge,major-version:3x,cluster-config:arm64-r7gd.xlarge-2-data-5-shards-1-replica-constrained-faiss-sq-cohere-10m;ADDITIONAL_CONFIG=knn.algo_param.index_thread_qty:1;WORKLOAD_PARAMS={"target_index_name":"target_index","target_field_name":"target_field","target_index_body":"indices/faiss-sq-1bit-index.json","target_index_primary_shards":5,"target_index_replica_shards":1,"target_index_dimension":768,"target_index_space_type":"innerproduct","target_index_bulk_size":100,"target_index_bulk_index_data_set_format":"hdf5","target_index_bulk_index_data_set_corpus":"cohere-10m","target_index_bulk_indexing_clients":5,"target_index_max_num_segments":1,"hnsw_ef_search":256,"hnsw_ef_construction":256,"encoder":"sq","query_k":100,"query_body":{"docvalue_fields":["_id"],"stored_fields":"_none_"},"query_data_set_format":"hdf5","query_data_set_corpus":"cohere-10m","query_count":10000};CAPTURE_NODE_STAT=true;ENABLE_INSTANCE_BASED_STORAGE=true;TELEMETRY_PARAMS={"node-stats-sample-interval":5}

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented Apr 30, 2026

PR Code Suggestions ✨

Latest suggestions up to 412b743
Explore these optional code suggestions:

CategorySuggestion                                                                                                                                    Impact
Possible issue
Verify encoder value matches index schema file

The new index body file is named faiss-sq-1bit-index.json, but the encoder is set to
"sq" (not "1bit"). Verify that faiss-sq-1bit-index.json is the correct schema for
the generic SQ encoder, or confirm that the encoder value should be updated to match
the specific quantization type (e.g., "sq_1bit" or similar) defined in that index
body. A mismatch between the index schema file and the encoder parameter could cause
incorrect benchmark results.

jenkins/opensearch/benchmark-test-vectorsearch.jenkinsfile [76-82]

-"target_index_body":"indices/faiss-sq-1bit-index.json",...,"encoder":"sq","query_k":100
+"target_index_body":"indices/faiss-sq-1bit-index.json",...,"encoder":"sq_1bit","query_k":100
Suggestion importance[1-10]: 5

__

Why: The suggestion raises a valid concern about a potential mismatch between faiss-sq-1bit-index.json and encoder:"sq", but the improved code proposes "sq_1bit" which may not be a valid encoder value. The suggestion asks to verify rather than providing a definitive fix, and the actual correctness depends on the benchmark framework's encoder definitions.

Low

Previous suggestions

Suggestions up to commit d8bd438
CategorySuggestion                                                                                                                                    Impact
General
Verify correct schema file for SQ encoder

The PR title mentions fixing the wrong schema for Faiss SQ, and the fix changes
faiss-index.json to faiss-sq-1bit-index.json. However, the file name
faiss-sq-1bit-index.json implies a 1-bit scalar quantization, while the encoder
parameter is set to "sq" without specifying a bit width. Verify that
faiss-sq-1bit-index.json is the correct schema file for the default SQ encoder, or
if a more generic faiss-sq-index.json should be used instead to avoid confusion
about the bit width.

jenkins/opensearch/benchmark-test-vectorsearch.jenkinsfile [76-82]

-"target_index_body":"indices/faiss-sq-1bit-index.json"
+"target_index_body":"indices/faiss-sq-index.json"
Suggestion importance[1-10]: 5

__

Why: The suggestion raises a valid concern about whether faiss-sq-1bit-index.json is the correct schema file for the "sq" encoder. The new lines consistently use this filename across all 4 added configurations, suggesting it's intentional, but the naming mismatch between "1bit" and the generic "sq" encoder value warrants verification. This is a moderate concern that could affect test correctness.

Low

Signed-off-by: Dooyong Kim <kdooyong@amazon.com>
@github-actions
Copy link
Copy Markdown
Contributor

Persistent review updated to latest commit 412b743

@codecov
Copy link
Copy Markdown

codecov Bot commented Apr 30, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 96.62%. Comparing base (14cbe58) to head (412b743).
⚠️ Report is 1 commits behind head on main.

Additional details and impacted files
@@           Coverage Diff           @@
##             main    #6169   +/-   ##
=======================================
  Coverage   96.62%   96.62%           
=======================================
  Files         405      405           
  Lines       18983    18983           
=======================================
  Hits        18342    18342           
  Misses        641      641           

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@rishabh6788 rishabh6788 merged commit 125ce5f into opensearch-project:main Apr 30, 2026
17 checks passed
@github-project-automation github-project-automation Bot moved this from 👀 In Review to ✅ Done in Engineering Effectiveness Board Apr 30, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

Status: ✅ Done

Development

Successfully merging this pull request may close these issues.

2 participants