Skip to content

Comments related to training scripts #4

@MJordahn

Description

@MJordahn

Hi @yilunliao!

I have few more things that I discovered that may be helpful for future users of the repo. I am currently pretraining with direct force predictions on OMat24 and ran into the following errors when getting things up and running:

  1. The training script threw errors for the me as the TORCHRUN variable is not set by default in the environment setup. I fixed this by simply adding a TORCHRUN=$(which torchrun) line in the script, but a better solution would be to just set an environment variable directly.
  2. I think the CONFIG_PATH in the training script points to the incorrect config path. It currently points to direct training on mptrj when I think it should be pointing to direct training on omat24.
  3. When running my_main.py I originally got a AttributeError: 'Namespace' object has no attribute 'distributed' because the distributed flag was not defined. I fixed this by adding the distributed flag https://github.com/atomicarchitects/equiformer_v3/blob/main/src/fairchem/core/common/flags.py.

I hope this is useful. If not, feel free to delete the issue.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions