-
Notifications
You must be signed in to change notification settings - Fork 73
Compatibility with MUVERA encoding #142
Copy link
Copy link
Open
Labels
questionFurther information is requestedFurther information is requested
Description
First off, excellent work with this project and all the hard work you guys have put into late interaction models!
I've been working to add MUVERA to txtai and I've been having difficulty getting any PyLate models to work well with MUVERA encoding.
I also asked this question over on the muvfde repo and used those benchmark scripts to rule out anything I'm doing wrong (results copied here for clarity).
| Metric | ModernColBERT | FDE(20,5,16) ModernColBERT |
|---|---|---|
| ndcg_at_1 | 0.35294 | 0.05882 |
| ndcg_at_3 | 0.31416 | 0.05294 |
| ndcg_at_5 | 0.28701 | 0.05297 |
| ndcg_at_10 | 0.26125 | 0.05245 |
| ndcg_at_20 | 0.24186 | 0.05263 |
| ndcg_at_100 | 0.247 | 0.0671 |
| ndcg_at_1000 | 0.33668 | 0.16921 |
| map_at_1 | 0.03499 | 0.00202 |
| map_at_3 | 0.06291 | 0.00427 |
| map_at_5 | 0.0735 | 0.00554 |
| map_at_10 | 0.08719 | 0.00736 |
| map_at_20 | 0.09664 | 0.00914 |
| map_at_100 | 0.11057 | 0.01303 |
| map_at_1000 | 0.12253 | 0.02048 |
| recall_at_1 | 0.03499 | 0.00202 |
| recall_at_3 | 0.07216 | 0.00647 |
| recall_at_5 | 0.09391 | 0.01138 |
| recall_at_10 | 0.1267 | 0.01995 |
| recall_at_20 | 0.15834 | 0.03348 |
| recall_at_100 | 0.26555 | 0.09853 |
| recall_at_1000 | 0.57746 | 0.4513 |
Curious if anyone has tried MUVERA encoding and has insights into parameters that work. Or if there is something fundamentally different with these embeddings vs the ones generated with models trained with the Stanford code.
Thank you!
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
questionFurther information is requestedFurther information is requested