Add XTR scoring

Recently [WARP](https://arxiv.org/abs/2501.17788) has reminded us about and improved upon [XTR](https://arxiv.org/abs/2304.01982)'s modified inference scheme which obviates the need for loading every embedding for retrieved candidate documents by imputing the missing query-doc scores as the minimum score seen in the retrieved scores for that query token. 

This change however requires an analogous modification of the scoring function during training, only keeping query-document token scores that are in the top `k_train` of all in-batch document tokens for a given query-token.

This change should be somewhat straightforward, mainly with one or multiple (depending on the multiplicity of documents to queries) `xtr_scores` functions in `scores/scores.py`. 


I'd like to take this on if that's alright with y'all @NohTow @raphaelsty . It could be a good prelude to adding WARP support after the PLAID implementation is up and running.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add XTR scoring #86

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Add XTR scoring #86

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions