CUDA RAM OOM for large query lengths and document lengths = 4096

Hi @NohTow, 

Thanks for this amazing package. I've been trying to evaluate `GTE-ModernColBERT-v1` on FreshStack. 
I used the fastPlaid script, but since my max lengths are quite large in FreshStack (4096 for both query and document), I encounter a OOM even with a single H200 GPU with ~140GB VRAM.

Since you have better experience, what's the possible solution here? 
Should I reduce my max length to 512 and evaluate which fits within the H200 GPU, or should I use the CPU for evaluation instead?

The code is bug-free: https://github.com/fresh-stack/freshstack/blob/main/examples/evaluation/pylate_retrieval_evaluation.py

Thanks,
Nandan Thakur



Provide feedback

Saved searches

Use saved searches to filter your results more quickly

CUDA RAM OOM for large query lengths and document lengths = 4096 #183

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

CUDA RAM OOM for large query lengths and document lengths = 4096 #183

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions