Skip to content

CUDA RAM OOM for large query lengths and document lengths = 4096 #183

@thakur-nandan

Description

@thakur-nandan

Hi @NohTow,

Thanks for this amazing package. I've been trying to evaluate GTE-ModernColBERT-v1 on FreshStack.
I used the fastPlaid script, but since my max lengths are quite large in FreshStack (4096 for both query and document), I encounter a OOM even with a single H200 GPU with ~140GB VRAM.

Since you have better experience, what's the possible solution here?
Should I reduce my max length to 512 and evaluate which fits within the H200 GPU, or should I use the CPU for evaluation instead?

The code is bug-free: https://github.com/fresh-stack/freshstack/blob/main/examples/evaluation/pylate_retrieval_evaluation.py

Thanks,
Nandan Thakur

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions