-
Notifications
You must be signed in to change notification settings - Fork 3k
Open
Description
Thisi s really a question possibly a bug, i am not sure however when I load something like a Step 3.5 flash for eg. even from the beginning, without any messages being sent, i get:
60gb occupied on M3 ultra studio with 96gb
64gb occupied on M4 max 128gb
In total that is clearly 124gb. The model I am using is the 4bit which only has 107gb. I have tried with lower context or higher and even at the start the size is always around 120gb.
After vibe coding for a while, around 30-40 messages in and around 90k context it reaches 150-160gb, which i do find kind of extreme. From what I've messed around withi n Inferencer for eg. Step 3.5 never goes above 125gb in full load at around 200k context.
Any idea? is this known?
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
No labels