Skip to content

Initial Loss increased from 10 (0.3.0 v) to 60 (0.4.0) ! #678

@Xuekai-Zhu

Description

@Xuekai-Zhu

🐛 Describe the bug

There is a significant discrepancy in the initial loss values between different versions of olmo and the presence or absence of the step-738020 checkpoint. This suggests potential issues with the model initialization or checkpoint handling in version 0.4.0. I believe the following results can be reproduced, since this bug has costed me for a week.

Task:

  • Training from scratch / fine-tuning on BIoMed

Results

  • olmo v0.4.0 : w/ step-738020 ckpt -- intial loss is 71

  • olmo v0.4.0 : w/o step-738020 ckpt -- intial loss is 32

  • olmo v0.3.0 : w/ step-738020 ckpt -- intial loss is 2

  • olmo v0.3.0 : w/o step-738020 ckpt -- intial loss is 11

W B Chart 2024_7_29 22_12_01

Versions

Build from source

  • olmo v0.4.0
  • olmo v0.3.0

Metadata

Metadata

Assignees

No one assigned

    Labels

    type/bugAn issue about a bug

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions