Add BF16 scalar quantization support for FAISS-backed k-NN indices.#3190
Add BF16 scalar quantization support for FAISS-backed k-NN indices.#3190mulugetam wants to merge 2 commits intoopensearch-project:mainfrom
Conversation
Codecov Report❌ Patch coverage is Additional details and impacted files@@ Coverage Diff @@
## main #3190 +/- ##
============================================
+ Coverage 83.10% 83.19% +0.08%
- Complexity 4168 4198 +30
============================================
Files 447 449 +2
Lines 15317 15367 +50
Branches 1965 1978 +13
============================================
+ Hits 12729 12784 +55
- Misses 1797 1799 +2
+ Partials 791 784 -7 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
415c1f9 to
5512f4e
Compare
|
The table below compares bulk similarity results comparing:
The Source: https://gist.github.com/mulugetam/c04a80f048e0f42520e245cc9dd615e7 |
|
@mulugetam thanks for the PR. I could not understand the min_score, max_score in the above benchmarks. So are we getting all the scores of top k and then take average out of these and see difference in recall? Wondering if we can do experiments with datasets like Cohere 10M, 768D or datasets here. We could then see recall difference between both the approaches BM_FP16 (existing) and BM_BF16 (this PR) |
|
@vamshin The scores are just the min, max, and average distances from the query vector to the database vectors in my bulk similarity benchmark. The goal is just to see the loss in precision in the bulk similarity for FP16 and BF16 against FP32, nothing more. Yeah, I’m already working on Cohere and will share the data once it’s ready. |
|
@mulugetam can you also share the setup on how you are running the benchmarks? we can also try reproducing it on our side? Or if you can contribute directly in the repo that will be awesome. |
|
@vamshin @navneet1v Below are results from running vectordbbench doesn’t currently support --ef-search = 256
--ef-search = 512
Concurrent Latency Results--ef-search = 256 — Concurrent Latency
--ef-search = 512 — Concurrent Latency
|
The bulk similarity benchmarks are standalone Google Benchmark tests that compare the kernels of the similarity functions. I’ve updated the results to also include the benchmark harness that was used. |
e09c6df to
655ffd2
Compare
Introduce BF16 as a new scalar quantizer type alongside FP16. - Implement bulk BF16 vector similarity implementation for inner product and L2 distance. - Add AVX512-BF16 SIMD kernels for BF16 vector similarity. - Register "bf16" as a FAISS SQ encoder type in KNN constants. - Add FaissBF16Util with validation and clipping logic for BF16 vectors. - Update FAISS index builders, memory-optimized searchers, and reconstructors for BF16. - Add FaissBF16Reconstructor for decoding BF16 quantized vectors. - Add integration tests for carga-to-hnsw BF16 index creation and search. - Add JNI-level unit tests for BF16 similarity functions. - Include a binary test fixture with 300 BF16 vectors of 768 dimenstions. Signed-off-by: Mulugeta Mammo <mulugeta.mammo@intel.com> Refactor based on recent changes. Signed-off-by: Mulugeta Mammo <mulugeta.mammo@intel.com>
Description
This PR adds BFloat16 (BF16) scalar quantization support for FAISS-backed k-NN indices. Key changes/additions:
avx512_sprwith up to 45% speed up on IP).The BF16 bulk distance implementations live in
avx512_simd_similarity_function.cppandavx512_spr_simd_similarity_function.cpp.||a - b||^2 = ||a||^2 - 2·a·b + ||b||^2ends up being slower.Below are the results comparing the BF16 IP implementation in
avx512vsavx512_spr.Source: https://gist.github.com/mulugetam/f23317bbb9057e9798b86f4d02713fd7
Related Issues
#3189
Check List
--signoff.By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.