[ENH] Implements RRF helper expression #5499

Sicheng-Pan · 2025-09-17T22:36:36Z

Description of changes

Summarize the changes made by this PR.

Improvements & Bug fixes
- Format code with black
New functionality
- Implements reciprocal rank fusion

Test plan

How are these changes tested?

Tests pass locally with pytest for python, yarn test for js, cargo test for rust

Migration plan

Are there any migrations, or any forwards/backwards compatibility changes needed in order to make sure this change deploys reliably?

Observability plan

What is the plan to instrument and monitor this change?

Documentation Changes

Are all docstrings for user-facing APIs updated if required? Do we need to make documentation changes in the docs section?

Sicheng-Pan · 2025-09-17T22:36:52Z

[ENH] Allow dict as search args #5503
[ENH] Implements RRF helper expression #5499 👈 (View in Graphite)
[ENH] Implement row iterator for search result #5498
main

This stack of pull requests is managed by Graphite. Learn more about stacking.

github-actions · 2025-09-17T22:36:57Z

propel-code-bot · 2025-09-17T22:37:58Z

Implement Reciprocal Rank Fusion (RRF) Helper Expression and Tests

This PR introduces a new Rrf (Reciprocal Rank Fusion) ranking operator for the search expression framework, enabling users to combine multiple ranking strategies using the RRF algorithm. It adds complete support for operator overloading, validation logic, optional weight normalization, extended documentation, and comprehensive tests with error coverage. Additionally, it cleans up formatting and makes minor import and API improvements to maintain consistency across the relevant modules.

Key Changes

• Added new Rrf class to chromadb/execution/expression/operator.py, implementing reciprocal rank fusion logic with customizable weights and optional normalization.
• Extended operator overloading for arithmetic and functional composition on Rank expressions.
• Added input validation for the Rrf class ensuring non-negative k, weights, correct list lengths, and proper normalization semantics.
• Enhanced documentation for Rrf and related classes, covering usage, examples, and mathematical background.
• Modified __init__.py and related plan modules to correctly expose Rrf and maintain public API integrity.
• Added robust tests (test_rrf_to_dict) in chromadb/test/test_api.py covering the functional output, validation, and edge cases for the Rrf expression.
• Refactored imports and formatting for coherence and compliance with code standards (formatted with black and structured imports).

Affected Areas

• chromadb/execution/expression/operator.py (core ranking expression logic, new RRF implementation, docs)
• chromadb/execution/expression/plan.py (search plan composition and public API adjustments)
• chromadb/execution/expression/__init__.py (module exports, API exposure)
• chromadb/test/test_api.py (unit tests for RRF, edge/error coverage)

This summary was automatically generated by @propel-code-bot

chromadb/execution/expression/operator.py

jairad26 · 2025-09-22T17:22:01Z

chromadb/execution/expression/operator.py

+                Knn(query=[0.1, 0.2], return_rank=True),
+                Knn(query=sparse_vector, key="#sparse", return_rank=True)
+            ],
+            weights=[2.0, 1.0],  # Dense is 2x more important than sparse


normalize to 1 might be easier to compare separate rrf ranking calculations
(sparse + dense + bm25 vs sparse + dense should be comparable)

I see a use case where normalization helps, but I think this should not be the default because normalization seems to me as a separate step, and could cause confusion if always enabled. The user should be able to normalize using an expression if they want using the current api. We could provide a normalize flag as indicator that the weights should be normalized to facilitate this use case.

From claude:

Option 1: Always normalize weights automatically

Pros:

• User-friendly - no need to worry about normalization
• Consistent behavior across all uses
• Prevents mathematical inconsistencies
• Similar to how probabilities work (always sum to 1)

Cons:

• Loss of interpretability - weights like [2.0, 1.0] are clearer than [0.667, 0.333]
• Breaks backward compatibility if changed now
• May surprise users who expect raw weights to be used as-is
• Different from standard RRF implementations in literature

Option 2: Require normalized weights

Pros:

• Forces users to be explicit about weight distribution
• No ambiguity about weight values
• Mathematically clean

Cons:

• Poor user experience - adds burden on users
• Error-prone - users must calculate normalization manually
• Would need validation to ensure sum equals 1.0 (floating point issues)
• Different from common practice in ML libraries

Option 3: No normalization (current implementation)

Pros:

• Flexible - users can use any scale they prefer
• Intuitive relative weights (e.g., [2, 1] means "2x more important")
• Matches standard RRF implementations
• Users can normalize if they want to

Cons:

• Scale-dependent results - [2, 1] vs [200, 100] give different scores
• May lead to numerical issues with very large weights
• Less mathematically principled

Option 4: Add normalize flag (recommended)

Pros:

• Best of both worlds - flexibility and control
• Backward compatible (default normalize=False)
• Clear intent from API usage
• Allows both use cases:
• Relative importance: weights=[2, 1], normalize=True
• Absolute weights: weights=[0.7, 0.3], normalize=False

Cons:

• Adds API complexity (one more parameter)
• Need to document behavior clearly
• Slight increase in implementation complexity

Recommendation

I recommend Option 4 with a normalize flag for these reasons:

Preserves backward compatibility - Existing code continues to work

Supports both use cases elegantly:
• Research/experimentation may want exact weight control
• Production use may prefer normalized weights for consistency

Clear semantics - The flag makes the behavior explicit

Common pattern in ML libraries (e.g., scikit-learn's normalize parameters)

jairad26 · 2025-09-22T17:22:41Z

chromadb/execution/expression/operator.py

+        ranks: List of Rank expressions to fuse (must have at least one)
+        k: Smoothing constant (default: 60, standard in literature)
+        weights: Optional weights for each ranking strategy. If not provided,
+                all ranks are weighted equally (weight=1.0 each).


weight = 1/n, where n is number of rank expressions

jairad26

tests verifying rrf works would be nice

jairad26 · 2025-09-22T17:31:32Z

chromadb/test/test_api.py

+    # Test 1: Basic RRF with two KNN rankings (equal weight)
+    rrf = Rrf(
+        [
+            Knn(query=[0.1, 0.2], return_rank=True),


validate return_rank in RRF function

Would like to keep test_api minimal for now. Could add this test later.

jairad26 · 2025-09-22T17:35:53Z

chromadb/test/test_api.py

+    rrf = Rrf(
+        [
+            Knn(query=[0.1, 0.2], return_rank=True),
+            Knn(query=[0.3, 0.4], key="#sparse", return_rank=True),


note for after schema is added, having a BM25 wrapper around KNN would give more clarity

Sicheng-Pan · 2025-09-23T00:17:20Z

Merge activity

Sep 23, 12:17 AM UTC: A user started a stack merge that includes this pull request via Graphite.
Sep 23, 12:19 AM UTC: Graphite rebased this pull request as part of a merge.
Sep 23, 12:49 AM UTC: @Sicheng-Pan merged this pull request with Graphite.

Sicheng-Pan mentioned this pull request Sep 17, 2025

[ENH] Implement row iterator for search result #5498

Merged

1 task

Sicheng-Pan marked this pull request as ready for review September 17, 2025 22:36

Sicheng-Pan force-pushed the 09-17-_enh_implements_rrf_helper_expression branch from d28af3b to bfb4652 Compare September 17, 2025 22:38

propel-code-bot bot reviewed Sep 17, 2025

View reviewed changes

chromadb/execution/expression/operator.py Show resolved Hide resolved

Sicheng-Pan mentioned this pull request Sep 18, 2025

[ENH] Allow dict as search args #5503

Merged

1 task

jairad26 reviewed Sep 22, 2025

View reviewed changes

jairad26 approved these changes Sep 22, 2025

View reviewed changes

jairad26 reviewed Sep 22, 2025

View reviewed changes

Sicheng-Pan changed the base branch from 09-17-_enh_implement_row_iterator_for_search_result to graphite-base/5499 September 23, 2025 00:17

Sicheng-Pan changed the base branch from graphite-base/5499 to main September 23, 2025 00:17

Sicheng-Pan added 3 commits September 23, 2025 00:18

[ENH] Implements RRF helper expression

8e5ddf1

Add test

6200744

Normalize weights optionally

cf123a1

Sicheng-Pan force-pushed the 09-17-_enh_implements_rrf_helper_expression branch from b16853f to cf123a1 Compare September 23, 2025 00:18

Sicheng-Pan merged commit afd8c02 into main Sep 23, 2025
59 checks passed

Sicheng-Pan deleted the 09-17-_enh_implements_rrf_helper_expression branch September 23, 2025 01:05

[ENH] Implements RRF helper expression #5499

[ENH] Implements RRF helper expression #5499

Uh oh!

Conversation

Sicheng-Pan commented Sep 17, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description of changes

Test plan

Migration plan

Observability plan

Documentation Changes

Uh oh!

Sicheng-Pan commented Sep 17, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

github-actions bot commented Sep 17, 2025

Reviewer Checklist

Testing, Bugs, Errors, Logs, Documentation

System Compatibility

Quality

Uh oh!

propel-code-bot bot commented Sep 17, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

jairad26 Sep 22, 2025

Choose a reason for hiding this comment

Uh oh!

Sicheng-Pan Sep 22, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Sicheng-Pan Sep 22, 2025

Choose a reason for hiding this comment

Option 1: Always normalize weights automatically

Option 2: Require normalized weights

Option 3: No normalization (current implementation)

Option 4: Add normalize flag (recommended)

Recommendation

Uh oh!

jairad26 Sep 22, 2025

Choose a reason for hiding this comment

Uh oh!

Sicheng-Pan Sep 22, 2025

Choose a reason for hiding this comment

Uh oh!

jairad26 left a comment

Choose a reason for hiding this comment

Uh oh!

jairad26 Sep 22, 2025

Choose a reason for hiding this comment

Uh oh!

Sicheng-Pan Sep 22, 2025

Choose a reason for hiding this comment

Uh oh!

jairad26 Sep 22, 2025

Choose a reason for hiding this comment

Uh oh!

Sicheng-Pan commented Sep 23, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Merge activity

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Sicheng-Pan commented Sep 17, 2025 •

edited

Loading

Sicheng-Pan commented Sep 17, 2025 •

edited

Loading

propel-code-bot bot commented Sep 17, 2025 •

edited

Loading

Sicheng-Pan Sep 22, 2025 •

edited

Loading

Sicheng-Pan commented Sep 23, 2025 •

edited

Loading