Hi @yilunliao!
I have few more things that I discovered that may be helpful for future users of the repo. I am currently pretraining with direct force predictions on OMat24 and ran into the following errors when getting things up and running:
- The training script threw errors for the me as the
TORCHRUN variable is not set by default in the environment setup. I fixed this by simply adding a TORCHRUN=$(which torchrun) line in the script, but a better solution would be to just set an environment variable directly.
- I think the
CONFIG_PATH in the training script points to the incorrect config path. It currently points to direct training on mptrj when I think it should be pointing to direct training on omat24.
- When running
my_main.py I originally got a AttributeError: 'Namespace' object has no attribute 'distributed' because the distributed flag was not defined. I fixed this by adding the distributed flag https://github.com/atomicarchitects/equiformer_v3/blob/main/src/fairchem/core/common/flags.py.
I hope this is useful. If not, feel free to delete the issue.
Hi @yilunliao!
I have few more things that I discovered that may be helpful for future users of the repo. I am currently pretraining with direct force predictions on OMat24 and ran into the following errors when getting things up and running:
TORCHRUNvariable is not set by default in the environment setup. I fixed this by simply adding aTORCHRUN=$(which torchrun)line in the script, but a better solution would be to just set an environment variable directly.CONFIG_PATHin the training script points to the incorrect config path. It currently points to direct training onmptrjwhen I think it should be pointing to direct training onomat24.my_main.pyI originally got aAttributeError: 'Namespace' object has no attribute 'distributed'because thedistributedflag was not defined. I fixed this by adding thedistributedflag https://github.com/atomicarchitects/equiformer_v3/blob/main/src/fairchem/core/common/flags.py.I hope this is useful. If not, feel free to delete the issue.