Bug summary
It seems that in 3.0.0b3, executing multitask training or finetune task would run into a RuntimeError, calling inconsistent type map, while the same case could run perfectly on code installed from 2024Q1 branch. The original input.json is uploaded, to identify the bug.
Traceback (most recent call last):
File "/public/home/ypliucat/.conda/envs/deepmd-kit-3.0.0b3/lib/python3.10/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 346, in wrapper
return f(*args, **kwargs)
File "/public/home/ypliucat/.conda/envs/deepmd-kit-3.0.0b3/lib/python3.10/site-packages/deepmd/pt/entrypoints/main.py", line 562, in main
train(FLAGS)
File "/public/home/ypliucat/.conda/envs/deepmd-kit-3.0.0b3/lib/python3.10/site-packages/deepmd/pt/entrypoints/main.py", line 311, in train
train_data = get_data(
File "/public/home/ypliucat/.conda/envs/deepmd-kit-3.0.0b3/lib/python3.10/site-packages/deepmd/utils/data_system.py", line 802, in get_data
data = DeepmdDataSystem(
File "/public/home/ypliucat/.conda/envs/deepmd-kit-3.0.0b3/lib/python3.10/site-packages/deepmd/utils/data_system.py", line 184, in __init__
self.type_map = self._check_type_map_consistency(type_map_list)
File "/public/home/ypliucat/.conda/envs/deepmd-kit-3.0.0b3/lib/python3.10/site-packages/deepmd/utils/data_system.py", line 616, in _check_type_map_consistency
raise RuntimeError(f"inconsistent type map: {ret!s} {ii!s}")
RuntimeError: inconsistent type map: ['Ag', 'Cu'] ['Ag', 'Ni']
And in #4031, a possible solution to this issue is addressed, but it is not the direct error raised.
DeePMD-kit Version
3.0.0b3
Backend and its version
PyTorch v2.0.0.post200, TensorFlow v2.14.0
How did you download the software?
Offline packages
Input Files, Running Commands, Error Log, etc.
input.json
Steps to Reproduce
Please run a multitask training using dataset from Domains_Cluster.
Further Information, Files, and Links
No response
Bug summary
It seems that in 3.0.0b3, executing multitask training or finetune task would run into a RuntimeError, calling inconsistent type map, while the same case could run perfectly on code installed from 2024Q1 branch. The original
input.jsonis uploaded, to identify the bug.And in #4031, a possible solution to this issue is addressed, but it is not the direct error raised.
DeePMD-kit Version
3.0.0b3
Backend and its version
PyTorch v2.0.0.post200, TensorFlow v2.14.0
How did you download the software?
Offline packages
Input Files, Running Commands, Error Log, etc.
input.json
Steps to Reproduce
Please run a multitask training using dataset from
Domains_Cluster.Further Information, Files, and Links
No response