Skip to content

Feat: add model format for dpa1#3211

Closed
iProzd wants to merge 7 commits intodeepmodeling:develfrom
iProzd:rf_dpa1
Closed

Feat: add model format for dpa1#3211
iProzd wants to merge 7 commits intodeepmodeling:develfrom
iProzd:rf_dpa1

Conversation

@iProzd
Copy link
Copy Markdown
Member

@iProzd iProzd commented Feb 1, 2024

This PR add model format for DPA1 model:

  • Add torch reformat implementation for DPA1 model
  • Add numpy implementation for DPA1 model without attention layer
  • Align the torch and numpy implementations

TODO:

  • Add numpy implementation for DPA1 model with attention layer
  • Align the TF and numpy implementations
  • Align the smoothness implementations
  • Make filter_layers._networks in torch be accessable from outside

Comment thread deepmd/pt/model/descriptor/dpa1.py Fixed
atype_embd = atype_embd_ext[:, :nloc, :]
# nf x nloc x nnei x tebd_dim
atype_embd_nnei = np.tile(atype_embd[:, :, np.newaxis, :], (1, 1, nnei, 1))
nlist_mask = nlist != -1

Check notice

Code scanning / CodeQL

Unused local variable

Variable nlist_mask is not used.
Comment thread deepmd/model_format/dpa1.py Fixed
Comment thread source/tests/pt/test_dpa1.py Fixed
Comment thread source/tests/pt/test_dpa1.py Fixed
):
dtype = PRECISION_DICT[prec]
rtol, atol = get_tols(prec)
err_msg = f"idt={idt} prec={prec}"

Check notice

Code scanning / CodeQL

Unused local variable

Variable err_msg is not used.
dd0.se_atten.mean = torch.tensor(davg, dtype=dtype, device=env.DEVICE)
dd0.se_atten.dstd = torch.tensor(dstd, dtype=dtype, device=env.DEVICE)
# dd1 = DescrptDPA1.deserialize(dd0.serialize())
model = torch.jit.script(dd0)

Check notice

Code scanning / CodeQL

Unused local variable

Variable model is not used.
resnet=False,
precision=precision,
)
self.w = self.w.squeeze(0) # keep the weight shape to be [num_in]

Check warning

Code scanning / CodeQL

Overwriting attribute in super-class or sub-class

Assignment overwrites attribute w, which was previously defined in superclass [NativeLayer](1).
)
self.w = self.w.squeeze(0) # keep the weight shape to be [num_in]
if self.uni_init:
self.w = 1.0

Check warning

Code scanning / CodeQL

Overwriting attribute in super-class or sub-class

Assignment overwrites attribute w, which was previously defined in superclass [NativeLayer](1).
self.w = self.w.squeeze(0) # keep the weight shape to be [num_in]
if self.uni_init:
self.w = 1.0
self.b = 0.0

Check warning

Code scanning / CodeQL

Overwriting attribute in super-class or sub-class

Assignment overwrites attribute b, which was previously defined in superclass [NativeLayer](1).
@codecov
Copy link
Copy Markdown

codecov Bot commented Feb 1, 2024

Codecov Report

Attention: 529 lines in your changes are missing coverage. Please review.

Comparison is base (afb440a) 74.39% compared to head (a96cab0) 20.72%.
Report is 2 commits behind head on devel.

Files Patch % Lines
deepmd/pt/model/descriptor/se_atten.py 0.00% 200 Missing ⚠️
deepmd/model_format/dpa1.py 0.00% 117 Missing ⚠️
deepmd/model_format/network.py 0.00% 109 Missing ⚠️
deepmd/pt/model/network/mlp.py 0.00% 64 Missing ⚠️
deepmd/pt/model/descriptor/dpa1.py 0.00% 36 Missing ⚠️
deepmd/model_format/__init__.py 0.00% 1 Missing ⚠️
deepmd/pt/model/descriptor/se_a.py 0.00% 1 Missing ⚠️
deepmd/pt/model/task/ener.py 0.00% 1 Missing ⚠️
Additional details and impacted files
@@             Coverage Diff             @@
##            devel    #3211       +/-   ##
===========================================
- Coverage   74.39%   20.72%   -53.68%     
===========================================
  Files         345      346        +1     
  Lines       31981    32509      +528     
  Branches     1592     1594        +2     
===========================================
- Hits        23791     6736    -17055     
- Misses       7265    25075    +17810     
+ Partials      925      698      -227     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

embeddings = data.pop("embeddings")
type_embedding = data.pop("type_embedding")
attention_layers = data.pop("attention_layers")
env_mat = data.pop("env_mat")

Check notice

Code scanning / CodeQL

Unused local variable

Variable env_mat is not used.
Copy link
Copy Markdown
Collaborator

@wanghan-iapcm wanghan-iapcm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The serialize and de-serialize of the model_format/dpa1 should be tested.

variables = data.pop("@variables")
embeddings = data.pop("embeddings")
type_embedding = data.pop("type_embedding")
attention_layers = data.pop("attention_layers", None)

Check notice

Code scanning / CodeQL

Unused local variable

Variable attention_layers is not used.
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is it pop and not used?

dd0_state_dict = dd0.se_atten.state_dict()
dd4_state_dict = dd4.se_atten.state_dict()

dd0_state_dict_attn = dd0.se_atten.dpa1_attention.state_dict()

Check notice

Code scanning / CodeQL

Unused local variable

Variable dd0_state_dict_attn is not used.
dd4_state_dict = dd4.se_atten.state_dict()

dd0_state_dict_attn = dd0.se_atten.dpa1_attention.state_dict()
dd4_state_dict_attn = dd4.se_atten.dpa1_attention.state_dict()

Check notice

Code scanning / CodeQL

Unused local variable

Variable dd4_state_dict_attn is not used.
data = copy.deepcopy(data)
variables = data.pop("@variables")
embeddings = data.pop("embeddings")
type_embedding = data.pop("type_embedding")

Check failure

Code scanning / CodeQL

Modification of parameter with default

This expression mutates a [default value](1).
variables = data.pop("@variables")
embeddings = data.pop("embeddings")
type_embedding = data.pop("type_embedding")
attention_layers = data.pop("attention_layers", None)

Check failure

Code scanning / CodeQL

Modification of parameter with default

This expression mutates a [default value](1).
@njzjz njzjz added the Test CUDA Trigger test CUDA workflow label Feb 2, 2024
@github-actions github-actions Bot removed the Test CUDA Trigger test CUDA workflow label Feb 2, 2024
Then the scaled dot-product attention method is adopted:

.. math::
A(\mathcal{Q}^{i,l}, \mathcal{K}^{i,l}, \mathcal{V}^{i,l}, \mathcal{R}^{i,l})=\varphi\left(\mathcal{Q}^{i,l}, \mathcal{K}^{i,l},\mathcal{R}^{i,l}\right)\mathcal{V}^{i,l},
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

variables = data.pop("@variables")
embeddings = data.pop("embeddings")
type_embedding = data.pop("type_embedding")
attention_layers = data.pop("attention_layers", None)
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is it pop and not used?

Comment on lines +330 to +331
w : np.ndarray, optional
The embedding weights of the layer.
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mismatch the actual parameters.

Comment on lines +444 to +447
w : np.ndarray, optional
The learnable weights of the normalization scale in the layer.
b : np.ndarray, optional
The learnable biases of the normalization shift in the layer.
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mismatch the actual parameters.

@njzjz njzjz linked an issue Mar 19, 2024 that may be closed by this pull request
@iProzd
Copy link
Copy Markdown
Member Author

iProzd commented Apr 21, 2024

This PR is merged into #3696

@iProzd iProzd closed this Apr 21, 2024
@iProzd iProzd deleted the rf_dpa1 branch April 24, 2024 09:12
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Feature Request] pt: refactor DPA-1 in the PyTorch backend

4 participants