Conversation
This reverts commit 6cc6f62.
…to peteish13-augusta
…to peteish13-augusta
|
|
||
| del eval_batches | ||
|
|
||
| # Eval compiles a bunch more versions, and the result is terrible. This way we get back to zero. |
There was a problem hiding this comment.
What do you that the result is terrible?
There was a problem hiding this comment.
so this prompted me to look into this a bit more and I think I've found a better solution: just mark the model input sizes as dynamic. I tested this out in OLMo-core and it appears to work well.
allenai/OLMo-core#105
There was a problem hiding this comment.
I think it compiles a bunch of versions for different batch sizes, because that's how we call it during eval, and then they stick around. In all of my early runs I had high tps until the first eval, and then low tps afterwards. This is what fixed it.
There was a problem hiding this comment.
I tried dynamic and it was bad. I don't remember the way in which it was bad, but it didn't work. That's why I added that version in the first place.
There was a problem hiding this comment.
Ok, oh well. I tested with nightly so maybe it's just better now with recent compiler advances.
torch.compile()compile()to one block at a time.