Hi @NohTow,
Thanks for this amazing package. I've been trying to evaluate GTE-ModernColBERT-v1 on FreshStack.
I used the fastPlaid script, but since my max lengths are quite large in FreshStack (4096 for both query and document), I encounter a OOM even with a single H200 GPU with ~140GB VRAM.
Since you have better experience, what's the possible solution here?
Should I reduce my max length to 512 and evaluate which fits within the H200 GPU, or should I use the CPU for evaluation instead?
The code is bug-free: https://github.com/fresh-stack/freshstack/blob/main/examples/evaluation/pylate_retrieval_evaluation.py
Thanks,
Nandan Thakur