-
Notifications
You must be signed in to change notification settings - Fork 2k
[ENH] Implements RRF helper expression #5499
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[ENH] Implements RRF helper expression #5499
Conversation
This stack of pull requests is managed by Graphite. Learn more about stacking. |
Reviewer ChecklistPlease leverage this checklist to ensure your code review is thorough before approving Testing, Bugs, Errors, Logs, Documentation
System Compatibility
Quality
|
|
Implement Reciprocal Rank Fusion (RRF) Helper Expression and Tests This PR introduces a new Key Changes• Added new Affected Areas• This summary was automatically generated by @propel-code-bot |
d28af3b to
bfb4652
Compare
| Knn(query=[0.1, 0.2], return_rank=True), | ||
| Knn(query=sparse_vector, key="#sparse", return_rank=True) | ||
| ], | ||
| weights=[2.0, 1.0], # Dense is 2x more important than sparse |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
normalize to 1 might be easier to compare separate rrf ranking calculations
(sparse + dense + bm25 vs sparse + dense should be comparable)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see a use case where normalization helps, but I think this should not be the default because normalization seems to me as a separate step, and could cause confusion if always enabled. The user should be able to normalize using an expression if they want using the current api. We could provide a normalize flag as indicator that the weights should be normalized to facilitate this use case.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
From claude:
Option 1: Always normalize weights automatically
Pros:
• User-friendly - no need to worry about normalization
• Consistent behavior across all uses
• Prevents mathematical inconsistencies
• Similar to how probabilities work (always sum to 1)Cons:
• Loss of interpretability - weights like [2.0, 1.0] are clearer than [0.667, 0.333]
• Breaks backward compatibility if changed now
• May surprise users who expect raw weights to be used as-is
• Different from standard RRF implementations in literatureOption 2: Require normalized weights
Pros:
• Forces users to be explicit about weight distribution
• No ambiguity about weight values
• Mathematically cleanCons:
• Poor user experience - adds burden on users
• Error-prone - users must calculate normalization manually
• Would need validation to ensure sum equals 1.0 (floating point issues)
• Different from common practice in ML librariesOption 3: No normalization (current implementation)
Pros:
• Flexible - users can use any scale they prefer
• Intuitive relative weights (e.g., [2, 1] means "2x more important")
• Matches standard RRF implementations
• Users can normalize if they want toCons:
• Scale-dependent results - [2, 1] vs [200, 100] give different scores
• May lead to numerical issues with very large weights
• Less mathematically principledOption 4: Add normalize flag (recommended)
Pros:
• Best of both worlds - flexibility and control
• Backward compatible (default normalize=False)
• Clear intent from API usage
• Allows both use cases:
• Relative importance: weights=[2, 1], normalize=True
• Absolute weights: weights=[0.7, 0.3], normalize=FalseCons:
• Adds API complexity (one more parameter)
• Need to document behavior clearly
• Slight increase in implementation complexityRecommendation
I recommend Option 4 with a normalize flag for these reasons:
- Preserves backward compatibility - Existing code continues to work
- Supports both use cases elegantly:
• Research/experimentation may want exact weight control
• Production use may prefer normalized weights for consistency- Clear semantics - The flag makes the behavior explicit
- Common pattern in ML libraries (e.g., scikit-learn's normalize parameters)
| ranks: List of Rank expressions to fuse (must have at least one) | ||
| k: Smoothing constant (default: 60, standard in literature) | ||
| weights: Optional weights for each ranking strategy. If not provided, | ||
| all ranks are weighted equally (weight=1.0 each). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
weight = 1/n, where n is number of rank expressions
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
see above
jairad26
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
tests verifying rrf works would be nice
| # Test 1: Basic RRF with two KNN rankings (equal weight) | ||
| rrf = Rrf( | ||
| [ | ||
| Knn(query=[0.1, 0.2], return_rank=True), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
validate return_rank in RRF function
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would like to keep test_api minimal for now. Could add this test later.
| rrf = Rrf( | ||
| [ | ||
| Knn(query=[0.1, 0.2], return_rank=True), | ||
| Knn(query=[0.3, 0.4], key="#sparse", return_rank=True), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
note for after schema is added, having a BM25 wrapper around KNN would give more clarity
Merge activity
|
b16853f to
cf123a1
Compare

Description of changes
Summarize the changes made by this PR.
blackTest plan
How are these changes tested?
pytestfor python,yarn testfor js,cargo testfor rustMigration plan
Are there any migrations, or any forwards/backwards compatibility changes needed in order to make sure this change deploys reliably?
Observability plan
What is the plan to instrument and monitor this change?
Documentation Changes
Are all docstrings for user-facing APIs updated if required? Do we need to make documentation changes in the docs section?