Skip to content

[BUG] can't place models close to memory limits #1709

@leanzero-srl

Description

@leanzero-srl

Thisi s really a question possibly a bug, i am not sure however when I load something like a Step 3.5 flash for eg. even from the beginning, without any messages being sent, i get:

60gb occupied on M3 ultra studio with 96gb
64gb occupied on M4 max 128gb

In total that is clearly 124gb. The model I am using is the 4bit which only has 107gb. I have tried with lower context or higher and even at the start the size is always around 120gb.

After vibe coding for a while, around 30-40 messages in and around 90k context it reaches 150-160gb, which i do find kind of extreme. From what I've messed around withi n Inferencer for eg. Step 3.5 never goes above 125gb in full load at around 200k context.

Any idea? is this known?

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions