Bug summary
I have recently conducted a grid search for my NN parameters, however following a longer training (increased training steps), the model will not compress. I have not faced issues with model compression before.
DeePMD-kit Version
v2.1.5
TensorFlow Version
2.9.0
How did you download the software?
conda
Input Files, Running Commands, Error Log, etc.
running command: dp compress -i graph.pb -o graph_compress.pb
Input JSON:
{
"model": {
"type_map": [
"Si",
"C"
],
"type_embedding": {
"neuron": [
20,
20,
20
],
"activation_function": "relu",
"trainable": true,
"seed": 1
},
"descriptor": {
"type": "se_e2_a",
"sel": "auto:1.5",
"rcut": 6.0,
"rcut_smth": 5.5,
"neuron": [
20,
40,
80
],
"axis_neuron": 16,
"activation_function": "relu",
"trainable": true,
"seed": 1
},
"fitting_net": {
"type": "ener",
"neuron": [
160,
160,
160
],
"activation_function": "relu",
"trainable": true,
"seed": 1
}
},
"learning_rate": {
"type": "exp",
"start_lr": 0.001,
"stop_lr": 3.51e-08,
"decay_steps": 500
},
"loss": {
"type": "ener",
"start_pref_e": 0.001,
"limit_pref_e": 1,
"start_pref_f": 10,
"limit_pref_f": 10
},
"training": {
"training_data": {
"systems": "/blue/subhash/michaelmacisaac/SiC/SiC_potential/data/tempdata/trainingdata"
},
"validation_data": {
"systems": "/blue/subhash/michaelmacisaac/SiC/SiC_potential/data/tempdata/validationdata"
},
"numb_steps": 100000,
"seed": 1,
"disp_file": "lcurve.out",
"disp_freq": 100,
"save_freq": 1000,
"save_ckpt": "model_ckpt",
"disp_training": true,
"time_training": true,
"profiling": true,
"profiling_file": "profile",
"tensorboard": true,
"tensorboard_log_dir": "tensorboard_log_dir",
"tensorboard_freq": 500
}
}
Error Log:
(deepmd) [michaelmacisaac@login1 model_002]$ dp compress -i graph.pb -o graph_compress.pb
WARNING:tensorflow:From /blue/subhash/michaelmacisaac/SiC/envs/deepmd/lib/python3.10/site-packages/tensorflow/python/compat/v2_compat.py:107: disable_resource_variables (from tensorflow.python.ops.variable_scope) is deprecated and will be removed in a future version.
Instructions for updating:
non-resource variables are not supported in the long term
WARNING:root:To get the best performance, it is recommended to adjust the number of threads by setting the environment variables OMP_NUM_THREADS, TF_INTRA_OP_PARALLELISM_THREADS, and TF_INTER_OP_PARALLELISM_THREADS.
2022-11-17 09:03:46.433395: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcuda.so.1'; dlerror: libcuda.so.1: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /opt/slurm/lib64:
2022-11-17 09:03:46.454311: W tensorflow/stream_executor/cuda/cuda_driver.cc:269] failed call to cuInit: UNKNOWN ERROR (303)
DEEPMD INFO
DEEPMD INFO stage 1: compress the model
DEEPMD INFO _____ _____ __ __ _____ _ _ _
DEEPMD INFO | __ \ | __ \ | / || __ \ | | ()| |
DEEPMD INFO | | | | ___ ___ | |__) || \ / || | | | ______ | | __ _ | |
DEEPMD INFO | | | | / _ \ / _ | / | |/| || | | |||| |/ /| || |
DEEPMD INFO | || || /| /| | | | | || || | | < | || |
DEEPMD INFO |/ _| _||| || |_||____/ ||_|| __|
DEEPMD INFO Please read and cite:
DEEPMD INFO Wang, Zhang, Han and E, Comput.Phys.Comm. 228, 178-184 (2018)
DEEPMD INFO installed to: /home/conda/feedstock_root/build_artifacts/deepmd-kit_1663923590539/work/_skbuild/linux-x86_64-3.10/cmake-install
DEEPMD INFO source : v2.1.5
DEEPMD INFO source brach: HEAD
DEEPMD INFO source commit: 6e3d4a6
DEEPMD INFO source commit at: 2022-09-23 16:10:28 +0800
DEEPMD INFO build float prec: double
DEEPMD INFO build variant: cuda
DEEPMD INFO build with tf inc: /blue/subhash/michaelmacisaac/SiC/envs/deepmd/lib/python3.10/site-packages/tensorflow/include;/blue/subhash/michaelmacisaac/SiC/envs/deepmd/include
DEEPMD INFO build with tf lib:
DEEPMD INFO ---Summary of the training---------------------------------------
DEEPMD INFO running on: login1.ufhpc
DEEPMD INFO computing device: cpu:0
DEEPMD INFO CUDA_VISIBLE_DEVICES: unset
DEEPMD INFO Count of visible GPU: 0
DEEPMD INFO num_intra_threads: 0
DEEPMD INFO num_inter_threads: 0
DEEPMD INFO -----------------------------------------------------------------
DEEPMD INFO training without frame parameter
Traceback (most recent call last):
File "/blue/subhash/michaelmacisaac/SiC/envs/deepmd/bin/dp", line 10, in
sys.exit(main())
File "/blue/subhash/michaelmacisaac/SiC/envs/deepmd/lib/python3.10/site-packages/deepmd/entrypoints/main.py", line 572, in main
compress(**dict_args)
File "/blue/subhash/michaelmacisaac/SiC/envs/deepmd/lib/python3.10/site-packages/deepmd/entrypoints/compress.py", line 119, in compress
train(
File "/blue/subhash/michaelmacisaac/SiC/envs/deepmd/lib/python3.10/site-packages/deepmd/entrypoints/train.py", line 107, in train
_do_work(jdata, run_opt, is_compress)
File "/blue/subhash/michaelmacisaac/SiC/envs/deepmd/lib/python3.10/site-packages/deepmd/entrypoints/train.py", line 163, in _do_work
model.build(train_data, stop_batch)
File "/blue/subhash/michaelmacisaac/SiC/envs/deepmd/lib/python3.10/site-packages/deepmd/train/trainer.py", line 336, in build
self.descrpt.enable_compression(self.model_param['compress']["min_nbor_dist"], self.model_param['compress']['model_file'], self.model_param['compress']['table_config'][0], self.model_param['compress']['table_config'][1], self.model_param['compress']['table_config'][2], self.model_param['compress']['table_config'][3])
File "/blue/subhash/michaelmacisaac/SiC/envs/deepmd/lib/python3.10/site-packages/deepmd/descriptor/se_a.py", line 346, in enable_compression
self.table = DPTabulate(
File "/blue/subhash/michaelmacisaac/SiC/envs/deepmd/lib/python3.10/site-packages/deepmd/utils/tabulate.py", line 118, in init
self.data_type = self._get_data_type()
File "/blue/subhash/michaelmacisaac/SiC/envs/deepmd/lib/python3.10/site-packages/deepmd/utils/tabulate.py", line 482, in get_data_type
for item in self.matrix["layer" + str(self.layer_size)]:
KeyError: 'layer_0'
Steps to Reproduce
dp compress -i graph.pb -o graph_compress.pb
graph.zip
input.zip
Further Information, Files, and Links
No response
Bug summary
I have recently conducted a grid search for my NN parameters, however following a longer training (increased training steps), the model will not compress. I have not faced issues with model compression before.
DeePMD-kit Version
v2.1.5
TensorFlow Version
2.9.0
How did you download the software?
conda
Input Files, Running Commands, Error Log, etc.
running command: dp compress -i graph.pb -o graph_compress.pb
Input JSON:
{
"model": {
"type_map": [
"Si",
"C"
],
"type_embedding": {
"neuron": [
20,
20,
20
],
"activation_function": "relu",
"trainable": true,
"seed": 1
},
"descriptor": {
"type": "se_e2_a",
"sel": "auto:1.5",
"rcut": 6.0,
"rcut_smth": 5.5,
"neuron": [
20,
40,
80
],
"axis_neuron": 16,
"activation_function": "relu",
"trainable": true,
"seed": 1
},
"fitting_net": {
"type": "ener",
"neuron": [
160,
160,
160
],
"activation_function": "relu",
"trainable": true,
"seed": 1
}
},
"learning_rate": {
"type": "exp",
"start_lr": 0.001,
"stop_lr": 3.51e-08,
"decay_steps": 500
},
"loss": {
"type": "ener",
"start_pref_e": 0.001,
"limit_pref_e": 1,
"start_pref_f": 10,
"limit_pref_f": 10
},
"training": {
"training_data": {
"systems": "/blue/subhash/michaelmacisaac/SiC/SiC_potential/data/tempdata/trainingdata"
},
"validation_data": {
"systems": "/blue/subhash/michaelmacisaac/SiC/SiC_potential/data/tempdata/validationdata"
},
"numb_steps": 100000,
"seed": 1,
"disp_file": "lcurve.out",
"disp_freq": 100,
"save_freq": 1000,
"save_ckpt": "model_ckpt",
"disp_training": true,
"time_training": true,
"profiling": true,
"profiling_file": "profile",
"tensorboard": true,
"tensorboard_log_dir": "tensorboard_log_dir",
"tensorboard_freq": 500
}
}
Error Log:
(deepmd) [michaelmacisaac@login1 model_002]$ dp compress -i graph.pb -o graph_compress.pb
WARNING:tensorflow:From /blue/subhash/michaelmacisaac/SiC/envs/deepmd/lib/python3.10/site-packages/tensorflow/python/compat/v2_compat.py:107: disable_resource_variables (from tensorflow.python.ops.variable_scope) is deprecated and will be removed in a future version.
Instructions for updating:
non-resource variables are not supported in the long term
WARNING:root:To get the best performance, it is recommended to adjust the number of threads by setting the environment variables OMP_NUM_THREADS, TF_INTRA_OP_PARALLELISM_THREADS, and TF_INTER_OP_PARALLELISM_THREADS.
2022-11-17 09:03:46.433395: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcuda.so.1'; dlerror: libcuda.so.1: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /opt/slurm/lib64:
2022-11-17 09:03:46.454311: W tensorflow/stream_executor/cuda/cuda_driver.cc:269] failed call to cuInit: UNKNOWN ERROR (303)
DEEPMD INFO
DEEPMD INFO stage 1: compress the model
DEEPMD INFO _____ _____ __ __ _____ _ _ _
DEEPMD INFO | __ \ | __ \ | / || __ \ | | ()| |
DEEPMD INFO | | | | ___ ___ | |__) || \ / || | | | ______ | | __ _ | |
DEEPMD INFO | | | | / _ \ / _ | / | |/| || | | |||| |/ /| || |
DEEPMD INFO | || || /| /| | | | | || || | | < | || |
DEEPMD INFO |/ _| _||| || |_||____/ ||_|| __|
DEEPMD INFO Please read and cite:
DEEPMD INFO Wang, Zhang, Han and E, Comput.Phys.Comm. 228, 178-184 (2018)
DEEPMD INFO installed to: /home/conda/feedstock_root/build_artifacts/deepmd-kit_1663923590539/work/_skbuild/linux-x86_64-3.10/cmake-install
DEEPMD INFO source : v2.1.5
DEEPMD INFO source brach: HEAD
DEEPMD INFO source commit: 6e3d4a6
DEEPMD INFO source commit at: 2022-09-23 16:10:28 +0800
DEEPMD INFO build float prec: double
DEEPMD INFO build variant: cuda
DEEPMD INFO build with tf inc: /blue/subhash/michaelmacisaac/SiC/envs/deepmd/lib/python3.10/site-packages/tensorflow/include;/blue/subhash/michaelmacisaac/SiC/envs/deepmd/include
DEEPMD INFO build with tf lib:
DEEPMD INFO ---Summary of the training---------------------------------------
DEEPMD INFO running on: login1.ufhpc
DEEPMD INFO computing device: cpu:0
DEEPMD INFO CUDA_VISIBLE_DEVICES: unset
DEEPMD INFO Count of visible GPU: 0
DEEPMD INFO num_intra_threads: 0
DEEPMD INFO num_inter_threads: 0
DEEPMD INFO -----------------------------------------------------------------
DEEPMD INFO training without frame parameter
Traceback (most recent call last):
File "/blue/subhash/michaelmacisaac/SiC/envs/deepmd/bin/dp", line 10, in
sys.exit(main())
File "/blue/subhash/michaelmacisaac/SiC/envs/deepmd/lib/python3.10/site-packages/deepmd/entrypoints/main.py", line 572, in main
compress(**dict_args)
File "/blue/subhash/michaelmacisaac/SiC/envs/deepmd/lib/python3.10/site-packages/deepmd/entrypoints/compress.py", line 119, in compress
train(
File "/blue/subhash/michaelmacisaac/SiC/envs/deepmd/lib/python3.10/site-packages/deepmd/entrypoints/train.py", line 107, in train
_do_work(jdata, run_opt, is_compress)
File "/blue/subhash/michaelmacisaac/SiC/envs/deepmd/lib/python3.10/site-packages/deepmd/entrypoints/train.py", line 163, in _do_work
model.build(train_data, stop_batch)
File "/blue/subhash/michaelmacisaac/SiC/envs/deepmd/lib/python3.10/site-packages/deepmd/train/trainer.py", line 336, in build
self.descrpt.enable_compression(self.model_param['compress']["min_nbor_dist"], self.model_param['compress']['model_file'], self.model_param['compress']['table_config'][0], self.model_param['compress']['table_config'][1], self.model_param['compress']['table_config'][2], self.model_param['compress']['table_config'][3])
File "/blue/subhash/michaelmacisaac/SiC/envs/deepmd/lib/python3.10/site-packages/deepmd/descriptor/se_a.py", line 346, in enable_compression
self.table = DPTabulate(
File "/blue/subhash/michaelmacisaac/SiC/envs/deepmd/lib/python3.10/site-packages/deepmd/utils/tabulate.py", line 118, in init
self.data_type = self._get_data_type()
File "/blue/subhash/michaelmacisaac/SiC/envs/deepmd/lib/python3.10/site-packages/deepmd/utils/tabulate.py", line 482, in get_data_type
for item in self.matrix["layer" + str(self.layer_size)]:
KeyError: 'layer_0'
Steps to Reproduce
dp compress -i graph.pb -o graph_compress.pb
graph.zip
input.zip
Further Information, Files, and Links
No response