Skip to content

Compatibility with MUVERA encoding #142

@davidmezzetti

Description

@davidmezzetti

First off, excellent work with this project and all the hard work you guys have put into late interaction models!

I've been working to add MUVERA to txtai and I've been having difficulty getting any PyLate models to work well with MUVERA encoding.

I also asked this question over on the muvfde repo and used those benchmark scripts to rule out anything I'm doing wrong (results copied here for clarity).

Metric ModernColBERT FDE(20,5,16) ModernColBERT
ndcg_at_1 0.35294 0.05882
ndcg_at_3 0.31416 0.05294
ndcg_at_5 0.28701 0.05297
ndcg_at_10 0.26125 0.05245
ndcg_at_20 0.24186 0.05263
ndcg_at_100 0.247 0.0671
ndcg_at_1000 0.33668 0.16921
map_at_1 0.03499 0.00202
map_at_3 0.06291 0.00427
map_at_5 0.0735 0.00554
map_at_10 0.08719 0.00736
map_at_20 0.09664 0.00914
map_at_100 0.11057 0.01303
map_at_1000 0.12253 0.02048
recall_at_1 0.03499 0.00202
recall_at_3 0.07216 0.00647
recall_at_5 0.09391 0.01138
recall_at_10 0.1267 0.01995
recall_at_20 0.15834 0.03348
recall_at_100 0.26555 0.09853
recall_at_1000 0.57746 0.4513

Curious if anyone has tried MUVERA encoding and has insights into parameters that work. Or if there is something fundamentally different with these embeddings vs the ones generated with models trained with the Stanford code.

Thank you!

Metadata

Metadata

Assignees

Labels

questionFurther information is requested

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions