Bug summary
When training NvNMD QNN model (-s s2) in version 2.2.11 trained with float precision (export DP_INTERFACE_PREC=low), the log showed that the data type of g_t is float64.
DEEPMD DEBUG #u: Tensor("u/FltNvnmd:0", dtype=float32)
DEEPMD DEBUG #rji: Tensor("rji/EnsureShape:0", shape=(?, 3), dtype=float32)
DEEPMD DEBUG #s_s: Tensor("s_s/FltNvnmd:0", dtype=float32)
DEEPMD DEBUG #h_s: Tensor("h_s/FltNvnmd:0", dtype=float32)
DEEPMD DEBUG #s: Tensor("s/FltNvnmd:0", dtype=float32)
DEEPMD DEBUG #h: Tensor("h/FltNvnmd:0", dtype=float32)
DEEPMD DEBUG #Rxyz: Tensor("Rxyz/FltNvnmd:0", dtype=float32)
DEEPMD INFO use the compressible model with stripped type embedding
DEEPMD DEBUG #g_s: Tensor("filter_type_all/g_s/FltNvnmd:0", dtype=float32)
DEEPMD DEBUG #g_t: Tensor("filter_type_all/g_t/FltNvnmd:0", dtype=float64)
It seems that this variable doesn't do the data type conversion.
DeePMD-kit Version
v2.2.11
Backend and its version
TensorFlow v2.14.0
How did you download the software?
docker
Input Files, Running Commands, Error Log, etc.
DEEPMD INFO training without frame parameter
DEEPMD INFO data stating... (this step may take long time)
2024-07-10 02:45:59.699397: I tensorflow/compiler/mlir/mlir_graph_optimization_pass.cc:382] MLIR V1 optimization pass is not enabled
DEEPMD INFO built lr
DEEPMD INFO the range of s is [-0.0, 6.388733386993408]
DEEPMD DEBUG #u: Tensor("u/FltNvnmd:0", dtype=float32)
DEEPMD DEBUG #rji: Tensor("rji/EnsureShape:0", shape=(?, 3), dtype=float32)
DEEPMD DEBUG #s_s: Tensor("s_s/FltNvnmd:0", dtype=float32)
DEEPMD DEBUG #h_s: Tensor("h_s/FltNvnmd:0", dtype=float32)
DEEPMD DEBUG #s: Tensor("s/FltNvnmd:0", dtype=float32)
DEEPMD DEBUG #h: Tensor("h/FltNvnmd:0", dtype=float32)
DEEPMD DEBUG #Rxyz: Tensor("Rxyz/FltNvnmd:0", dtype=float32)
DEEPMD INFO use the compressible model with stripped type embedding
DEEPMD DEBUG #g_s: Tensor("filter_type_all/g_s/FltNvnmd:0", dtype=float32)
DEEPMD DEBUG #g_t: Tensor("filter_type_all/g_t/FltNvnmd:0", dtype=float64)
Traceback (most recent call last):
File "/opt/deepmd-kit/lib/python3.11/site-packages/tensorflow/python/framework/op_def_library.py", line 551, in _ExtractInputsAndAttrs
values = ops.convert_to_tensor(
^^^^^^^^^^^^^^^^^^^^^^
File "/opt/deepmd-kit/lib/python3.11/site-packages/tensorflow/python/profiler/trace.py", line 183, in wrapped
return func(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^
File "/opt/deepmd-kit/lib/python3.11/site-packages/tensorflow/python/framework/ops.py", line 698, in convert_to_tensor
return tensor_conversion_registry.convert(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/opt/deepmd-kit/lib/python3.11/site-packages/tensorflow/python/framework/tensor_conversion_registry.py", line 209, in convert
return overload(dtype, name) # pylint: disable=not-callable
^^^^^^^^^^^^^^^^^^^^^
File "/opt/deepmd-kit/lib/python3.11/site-packages/tensorflow/python/framework/tensor.py", line 762, in tf_tensor
raise ValueError(
ValueError: w: Tensor conversion requested dtype float32 for Tensor with dtype float64: <tf.Tensor 'filter_type_all/g_t/EnsureShape:0' shape=(?, 32) dtype=float64>
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/opt/deepmd-kit/bin/dp", line 10, in
sys.exit(main())
^^^^^^
File "/opt/deepmd-kit/lib/python3.11/site-packages/deepmd_utils/main.py", line 657, in main
deepmd_main(args)
File "/opt/deepmd-kit/lib/python3.11/site-packages/deepmd/entrypoints/main.py", line 92, in main
train_nvnmd(**dict_args)
File "/opt/deepmd-kit/lib/python3.11/site-packages/deepmd/nvnmd/entrypoints/train.py", line 187, in train_nvnmd
train(**jdata)
File "/opt/deepmd-kit/lib/python3.11/site-packages/deepmd/entrypoints/train.py", line 168, in train
_do_work(jdata, run_opt, is_compress)
File "/opt/deepmd-kit/lib/python3.11/site-packages/deepmd/entrypoints/train.py", line 280, in _do_work
model.build(train_data, stop_batch, origin_type_map=origin_type_map)
File "/opt/deepmd-kit/lib/python3.11/site-packages/deepmd/train/trainer.py", line 308, in build
self._build_network(data, suffix)
File "/opt/deepmd-kit/lib/python3.11/site-packages/deepmd/train/trainer.py", line 385, in _build_network
self.model_pred = self.model.build(
^^^^^^^^^^^^^^^^^
File "/opt/deepmd-kit/lib/python3.11/site-packages/deepmd/model/ener.py", line 222, in build
dout = self.build_descrpt(
^^^^^^^^^^^^^^^^^^^
File "/opt/deepmd-kit/lib/python3.11/site-packages/deepmd/model/model.py", line 290, in build_descrpt
dout = self.descrpt.build(
^^^^^^^^^^^^^^^^^^^
File "/opt/deepmd-kit/lib/python3.11/site-packages/deepmd/descriptor/se_atten.py", line 626, in build
self.dout, self.qmat = self._pass_filter(
^^^^^^^^^^^^^^^^^^
File "/opt/deepmd-kit/lib/python3.11/site-packages/deepmd/descriptor/se_atten.py", line 685, in _pass_filter
layer, qmat = self._filter(
^^^^^^^^^^^^^
File "/opt/deepmd-kit/lib/python3.11/site-packages/deepmd/common.py", line 258, in wrapper
returned_tensor = func(
^^^^^
File "/opt/deepmd-kit/lib/python3.11/site-packages/deepmd/descriptor/se_atten.py", line 1269, in _filter
xyz_scatter_1 = self._filter_lower(
^^^^^^^^^^^^^^^^^^^
File "/opt/deepmd-kit/lib/python3.11/site-packages/deepmd/descriptor/se_atten.py", line 1104, in _filter_lower
return filter_lower_R42GR(
^^^^^^^^^^^^^^^^^^^
File "/opt/deepmd-kit/lib/python3.11/site-packages/deepmd/nvnmd/descriptor/se_atten.py", line 217, in filter_lower_R42GR
G = op_module.mul_flt_nvnmd(G, two_embd)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "", line 2276, in mul_flt_nvnmd
File "/opt/deepmd-kit/lib/python3.11/site-packages/tensorflow/python/framework/op_def_library.py", line 778, in _apply_op_helper
_ExtractInputsAndAttrs(op_type_name, op_def, allowed_list_attr_map,
File "/opt/deepmd-kit/lib/python3.11/site-packages/tensorflow/python/framework/op_def_library.py", line 589, in _ExtractInputsAndAttrs
raise TypeError(
TypeError: Input 'w' of 'MulFltNvnmd' Op has type float64 that does not match type float32 of argument 'x'.
Steps to Reproduce
export DP_INTERFACE_PREC=low; export OMP_NUM_THREADS=8; dp train-nvnmd cnn.json --skip-neighbor-stat -s s1 >> train.log 2>&1 ; dp train-nvnmd qnn.json --skip-neighbor-stat -s s2 >> train.log 2>&1
Further Information, Files, and Links
No response
Bug summary
When training NvNMD QNN model (-s s2) in version 2.2.11 trained with float precision (export DP_INTERFACE_PREC=low), the log showed that the data type of g_t is float64.
DEEPMD DEBUG #u: Tensor("u/FltNvnmd:0", dtype=float32)
DEEPMD DEBUG #rji: Tensor("rji/EnsureShape:0", shape=(?, 3), dtype=float32)
DEEPMD DEBUG #s_s: Tensor("s_s/FltNvnmd:0", dtype=float32)
DEEPMD DEBUG #h_s: Tensor("h_s/FltNvnmd:0", dtype=float32)
DEEPMD DEBUG #s: Tensor("s/FltNvnmd:0", dtype=float32)
DEEPMD DEBUG #h: Tensor("h/FltNvnmd:0", dtype=float32)
DEEPMD DEBUG #Rxyz: Tensor("Rxyz/FltNvnmd:0", dtype=float32)
DEEPMD INFO use the compressible model with stripped type embedding
DEEPMD DEBUG #g_s: Tensor("filter_type_all/g_s/FltNvnmd:0", dtype=float32)
DEEPMD DEBUG #g_t: Tensor("filter_type_all/g_t/FltNvnmd:0", dtype=float64)
It seems that this variable doesn't do the data type conversion.
DeePMD-kit Version
v2.2.11
Backend and its version
TensorFlow v2.14.0
How did you download the software?
docker
Input Files, Running Commands, Error Log, etc.
DEEPMD INFO training without frame parameter
DEEPMD INFO data stating... (this step may take long time)
2024-07-10 02:45:59.699397: I tensorflow/compiler/mlir/mlir_graph_optimization_pass.cc:382] MLIR V1 optimization pass is not enabled
DEEPMD INFO built lr
DEEPMD INFO the range of s is [-0.0, 6.388733386993408]
DEEPMD DEBUG #u: Tensor("u/FltNvnmd:0", dtype=float32)
DEEPMD DEBUG #rji: Tensor("rji/EnsureShape:0", shape=(?, 3), dtype=float32)
DEEPMD DEBUG #s_s: Tensor("s_s/FltNvnmd:0", dtype=float32)
DEEPMD DEBUG #h_s: Tensor("h_s/FltNvnmd:0", dtype=float32)
DEEPMD DEBUG #s: Tensor("s/FltNvnmd:0", dtype=float32)
DEEPMD DEBUG #h: Tensor("h/FltNvnmd:0", dtype=float32)
DEEPMD DEBUG #Rxyz: Tensor("Rxyz/FltNvnmd:0", dtype=float32)
DEEPMD INFO use the compressible model with stripped type embedding
DEEPMD DEBUG #g_s: Tensor("filter_type_all/g_s/FltNvnmd:0", dtype=float32)
DEEPMD DEBUG #g_t: Tensor("filter_type_all/g_t/FltNvnmd:0", dtype=float64)
Traceback (most recent call last):
File "/opt/deepmd-kit/lib/python3.11/site-packages/tensorflow/python/framework/op_def_library.py", line 551, in _ExtractInputsAndAttrs
values = ops.convert_to_tensor(
^^^^^^^^^^^^^^^^^^^^^^
File "/opt/deepmd-kit/lib/python3.11/site-packages/tensorflow/python/profiler/trace.py", line 183, in wrapped
return func(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^
File "/opt/deepmd-kit/lib/python3.11/site-packages/tensorflow/python/framework/ops.py", line 698, in convert_to_tensor
return tensor_conversion_registry.convert(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/opt/deepmd-kit/lib/python3.11/site-packages/tensorflow/python/framework/tensor_conversion_registry.py", line 209, in convert
return overload(dtype, name) # pylint: disable=not-callable
^^^^^^^^^^^^^^^^^^^^^
File "/opt/deepmd-kit/lib/python3.11/site-packages/tensorflow/python/framework/tensor.py", line 762, in tf_tensor
raise ValueError(
ValueError: w: Tensor conversion requested dtype float32 for Tensor with dtype float64: <tf.Tensor 'filter_type_all/g_t/EnsureShape:0' shape=(?, 32) dtype=float64>
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/opt/deepmd-kit/bin/dp", line 10, in
sys.exit(main())
^^^^^^
File "/opt/deepmd-kit/lib/python3.11/site-packages/deepmd_utils/main.py", line 657, in main
deepmd_main(args)
File "/opt/deepmd-kit/lib/python3.11/site-packages/deepmd/entrypoints/main.py", line 92, in main
train_nvnmd(**dict_args)
File "/opt/deepmd-kit/lib/python3.11/site-packages/deepmd/nvnmd/entrypoints/train.py", line 187, in train_nvnmd
train(**jdata)
File "/opt/deepmd-kit/lib/python3.11/site-packages/deepmd/entrypoints/train.py", line 168, in train
_do_work(jdata, run_opt, is_compress)
File "/opt/deepmd-kit/lib/python3.11/site-packages/deepmd/entrypoints/train.py", line 280, in _do_work
model.build(train_data, stop_batch, origin_type_map=origin_type_map)
File "/opt/deepmd-kit/lib/python3.11/site-packages/deepmd/train/trainer.py", line 308, in build
self._build_network(data, suffix)
File "/opt/deepmd-kit/lib/python3.11/site-packages/deepmd/train/trainer.py", line 385, in _build_network
self.model_pred = self.model.build(
^^^^^^^^^^^^^^^^^
File "/opt/deepmd-kit/lib/python3.11/site-packages/deepmd/model/ener.py", line 222, in build
dout = self.build_descrpt(
^^^^^^^^^^^^^^^^^^^
File "/opt/deepmd-kit/lib/python3.11/site-packages/deepmd/model/model.py", line 290, in build_descrpt
dout = self.descrpt.build(
^^^^^^^^^^^^^^^^^^^
File "/opt/deepmd-kit/lib/python3.11/site-packages/deepmd/descriptor/se_atten.py", line 626, in build
self.dout, self.qmat = self._pass_filter(
^^^^^^^^^^^^^^^^^^
File "/opt/deepmd-kit/lib/python3.11/site-packages/deepmd/descriptor/se_atten.py", line 685, in _pass_filter
layer, qmat = self._filter(
^^^^^^^^^^^^^
File "/opt/deepmd-kit/lib/python3.11/site-packages/deepmd/common.py", line 258, in wrapper
returned_tensor = func(
^^^^^
File "/opt/deepmd-kit/lib/python3.11/site-packages/deepmd/descriptor/se_atten.py", line 1269, in _filter
xyz_scatter_1 = self._filter_lower(
^^^^^^^^^^^^^^^^^^^
File "/opt/deepmd-kit/lib/python3.11/site-packages/deepmd/descriptor/se_atten.py", line 1104, in _filter_lower
return filter_lower_R42GR(
^^^^^^^^^^^^^^^^^^^
File "/opt/deepmd-kit/lib/python3.11/site-packages/deepmd/nvnmd/descriptor/se_atten.py", line 217, in filter_lower_R42GR
G = op_module.mul_flt_nvnmd(G, two_embd)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "", line 2276, in mul_flt_nvnmd
File "/opt/deepmd-kit/lib/python3.11/site-packages/tensorflow/python/framework/op_def_library.py", line 778, in _apply_op_helper
_ExtractInputsAndAttrs(op_type_name, op_def, allowed_list_attr_map,
File "/opt/deepmd-kit/lib/python3.11/site-packages/tensorflow/python/framework/op_def_library.py", line 589, in _ExtractInputsAndAttrs
raise TypeError(
TypeError: Input 'w' of 'MulFltNvnmd' Op has type float64 that does not match type float32 of argument 'x'.
Steps to Reproduce
export DP_INTERFACE_PREC=low; export OMP_NUM_THREADS=8; dp train-nvnmd cnn.json --skip-neighbor-stat -s s1 >> train.log 2>&1 ; dp train-nvnmd qnn.json --skip-neighbor-stat -s s2 >> train.log 2>&1Further Information, Files, and Links
No response