Skip to content

compress for type embedding and se_atten #2104

@michaelmacisaac

Description

@michaelmacisaac

Bug summary

I have recently conducted a grid search for my NN parameters, however following a longer training (increased training steps), the model will not compress. I have not faced issues with model compression before.

DeePMD-kit Version

v2.1.5

TensorFlow Version

2.9.0

How did you download the software?

conda

Input Files, Running Commands, Error Log, etc.

running command: dp compress -i graph.pb -o graph_compress.pb

Input JSON:

{
"model": {
"type_map": [
"Si",
"C"
],
"type_embedding": {
"neuron": [
20,
20,
20
],
"activation_function": "relu",
"trainable": true,
"seed": 1
},
"descriptor": {
"type": "se_e2_a",
"sel": "auto:1.5",
"rcut": 6.0,
"rcut_smth": 5.5,
"neuron": [
20,
40,
80
],
"axis_neuron": 16,
"activation_function": "relu",
"trainable": true,
"seed": 1
},
"fitting_net": {
"type": "ener",
"neuron": [
160,
160,
160
],
"activation_function": "relu",
"trainable": true,
"seed": 1
}
},
"learning_rate": {
"type": "exp",
"start_lr": 0.001,
"stop_lr": 3.51e-08,
"decay_steps": 500
},
"loss": {
"type": "ener",
"start_pref_e": 0.001,
"limit_pref_e": 1,
"start_pref_f": 10,
"limit_pref_f": 10
},
"training": {
"training_data": {
"systems": "/blue/subhash/michaelmacisaac/SiC/SiC_potential/data/tempdata/trainingdata"
},
"validation_data": {
"systems": "/blue/subhash/michaelmacisaac/SiC/SiC_potential/data/tempdata/validationdata"
},
"numb_steps": 100000,
"seed": 1,
"disp_file": "lcurve.out",
"disp_freq": 100,
"save_freq": 1000,
"save_ckpt": "model_ckpt",
"disp_training": true,
"time_training": true,
"profiling": true,
"profiling_file": "profile",
"tensorboard": true,
"tensorboard_log_dir": "tensorboard_log_dir",
"tensorboard_freq": 500
}
}

Error Log:

(deepmd) [michaelmacisaac@login1 model_002]$ dp compress -i graph.pb -o graph_compress.pb
WARNING:tensorflow:From /blue/subhash/michaelmacisaac/SiC/envs/deepmd/lib/python3.10/site-packages/tensorflow/python/compat/v2_compat.py:107: disable_resource_variables (from tensorflow.python.ops.variable_scope) is deprecated and will be removed in a future version.
Instructions for updating:
non-resource variables are not supported in the long term
WARNING:root:To get the best performance, it is recommended to adjust the number of threads by setting the environment variables OMP_NUM_THREADS, TF_INTRA_OP_PARALLELISM_THREADS, and TF_INTER_OP_PARALLELISM_THREADS.
2022-11-17 09:03:46.433395: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcuda.so.1'; dlerror: libcuda.so.1: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /opt/slurm/lib64:
2022-11-17 09:03:46.454311: W tensorflow/stream_executor/cuda/cuda_driver.cc:269] failed call to cuInit: UNKNOWN ERROR (303)
DEEPMD INFO

DEEPMD INFO stage 1: compress the model
DEEPMD INFO _____ _____ __ __ _____ _ _ _
DEEPMD INFO | __ \ | __ \ | / || __ \ | | ()| |
DEEPMD INFO | | | | ___ ___ | |__) || \ / || | | | ______ | | __ _ | |

DEEPMD INFO | | | | / _ \ / _ | / | |/| || | | |||| |/ /| || |
DEEPMD INFO | || || /| /| | | | | || || | | < | || |
DEEPMD INFO |
/ _| _||| || |_||____/ ||_|| __|
DEEPMD INFO Please read and cite:
DEEPMD INFO Wang, Zhang, Han and E, Comput.Phys.Comm. 228, 178-184 (2018)
DEEPMD INFO installed to: /home/conda/feedstock_root/build_artifacts/deepmd-kit_1663923590539/work/_skbuild/linux-x86_64-3.10/cmake-install
DEEPMD INFO source : v2.1.5
DEEPMD INFO source brach: HEAD
DEEPMD INFO source commit: 6e3d4a6
DEEPMD INFO source commit at: 2022-09-23 16:10:28 +0800
DEEPMD INFO build float prec: double
DEEPMD INFO build variant: cuda
DEEPMD INFO build with tf inc: /blue/subhash/michaelmacisaac/SiC/envs/deepmd/lib/python3.10/site-packages/tensorflow/include;/blue/subhash/michaelmacisaac/SiC/envs/deepmd/include
DEEPMD INFO build with tf lib:
DEEPMD INFO ---Summary of the training---------------------------------------
DEEPMD INFO running on: login1.ufhpc
DEEPMD INFO computing device: cpu:0
DEEPMD INFO CUDA_VISIBLE_DEVICES: unset
DEEPMD INFO Count of visible GPU: 0
DEEPMD INFO num_intra_threads: 0
DEEPMD INFO num_inter_threads: 0
DEEPMD INFO -----------------------------------------------------------------
DEEPMD INFO training without frame parameter
Traceback (most recent call last):
File "/blue/subhash/michaelmacisaac/SiC/envs/deepmd/bin/dp", line 10, in
sys.exit(main())
File "/blue/subhash/michaelmacisaac/SiC/envs/deepmd/lib/python3.10/site-packages/deepmd/entrypoints/main.py", line 572, in main
compress(**dict_args)
File "/blue/subhash/michaelmacisaac/SiC/envs/deepmd/lib/python3.10/site-packages/deepmd/entrypoints/compress.py", line 119, in compress
train(
File "/blue/subhash/michaelmacisaac/SiC/envs/deepmd/lib/python3.10/site-packages/deepmd/entrypoints/train.py", line 107, in train
_do_work(jdata, run_opt, is_compress)
File "/blue/subhash/michaelmacisaac/SiC/envs/deepmd/lib/python3.10/site-packages/deepmd/entrypoints/train.py", line 163, in _do_work
model.build(train_data, stop_batch)
File "/blue/subhash/michaelmacisaac/SiC/envs/deepmd/lib/python3.10/site-packages/deepmd/train/trainer.py", line 336, in build
self.descrpt.enable_compression(self.model_param['compress']["min_nbor_dist"], self.model_param['compress']['model_file'], self.model_param['compress']['table_config'][0], self.model_param['compress']['table_config'][1], self.model_param['compress']['table_config'][2], self.model_param['compress']['table_config'][3])
File "/blue/subhash/michaelmacisaac/SiC/envs/deepmd/lib/python3.10/site-packages/deepmd/descriptor/se_a.py", line 346, in enable_compression
self.table = DPTabulate(
File "/blue/subhash/michaelmacisaac/SiC/envs/deepmd/lib/python3.10/site-packages/deepmd/utils/tabulate.py", line 118, in init
self.data_type = self._get_data_type()
File "/blue/subhash/michaelmacisaac/SiC/envs/deepmd/lib/python3.10/site-packages/deepmd/utils/tabulate.py", line 482, in get_data_type
for item in self.matrix["layer
" + str(self.layer_size)]:
KeyError: 'layer_0'

Steps to Reproduce

dp compress -i graph.pb -o graph_compress.pb

graph.zip

input.zip

Further Information, Files, and Links

No response

Metadata

Metadata

Assignees

No one assigned

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions