Compatibility with MUVERA encoding

First off, excellent work with this project and all the hard work you guys have put into late interaction models!

I've been working to add [MUVERA to txtai](https://github.com/neuml/txtai/issues/952) and I've been having difficulty getting any PyLate models to work well with MUVERA encoding.

[I also asked this question over on the muvfde repo](https://github.com/viig99/muvfde/issues/1) and used those benchmark scripts to rule out anything I'm doing wrong (results copied here for clarity).

| Metric                       | ModernColBERT        | FDE(20,5,16) ModernColBERT   |
|------------------------------|----------------------|------------------------------|
| ndcg_at_1                    | 0.35294              | 0.05882                      |
| ndcg_at_3                    | 0.31416              | 0.05294                      |
| ndcg_at_5                    | 0.28701              | 0.05297                      |
| ndcg_at_10                   | 0.26125              | 0.05245                      |
| ndcg_at_20                   | 0.24186              | 0.05263                      |
| ndcg_at_100                  | 0.247                | 0.0671                       |
| ndcg_at_1000                 | 0.33668              | 0.16921                      |
| map_at_1                     | 0.03499              | 0.00202                      |
| map_at_3                     | 0.06291              | 0.00427                      |
| map_at_5                     | 0.0735               | 0.00554                      |
| map_at_10                    | 0.08719              | 0.00736                      |
| map_at_20                    | 0.09664              | 0.00914                      |
| map_at_100                   | 0.11057              | 0.01303                      |
| map_at_1000                  | 0.12253              | 0.02048                      |
| recall_at_1                  | 0.03499              | 0.00202                      |
| recall_at_3                  | 0.07216              | 0.00647                      |
| recall_at_5                  | 0.09391              | 0.01138                      |
| recall_at_10                 | 0.1267               | 0.01995                      |
| recall_at_20                 | 0.15834              | 0.03348                      |
| recall_at_100                | 0.26555              | 0.09853                      |
| recall_at_1000               | 0.57746              | 0.4513                       |

Curious if anyone has tried MUVERA encoding and has insights into parameters that work. Or if there is something fundamentally different with these embeddings vs the ones generated with models trained with the Stanford code.

Thank you!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Compatibility with MUVERA encoding #142

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Metric	ModernColBERT	FDE(20,5,16) ModernColBERT
ndcg_at_1	0.35294	0.05882
ndcg_at_3	0.31416	0.05294
ndcg_at_5	0.28701	0.05297
ndcg_at_10	0.26125	0.05245
ndcg_at_20	0.24186	0.05263
ndcg_at_100	0.247	0.0671
ndcg_at_1000	0.33668	0.16921
map_at_1	0.03499	0.00202
map_at_3	0.06291	0.00427
map_at_5	0.0735	0.00554
map_at_10	0.08719	0.00736
map_at_20	0.09664	0.00914
map_at_100	0.11057	0.01303
map_at_1000	0.12253	0.02048
recall_at_1	0.03499	0.00202
recall_at_3	0.07216	0.00647
recall_at_5	0.09391	0.01138
recall_at_10	0.1267	0.01995
recall_at_20	0.15834	0.03348
recall_at_100	0.26555	0.09853
recall_at_1000	0.57746	0.4513

Compatibility with MUVERA encoding #142

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions