Skip to content

[Feature Request] Framework-independent DP model format #2982

@njzjz

Description

@njzjz

Summary

Implement a framework-independent DP model format.

Detailed Description

Background

Currently, the DP model file is dependent on the deep learning framework. The TensorFlow model is in ProtoBuf format (.pb), while the developing PyTorch model is in .pt format. These two files are hard to convert between each other. The ONNX package aims to do it on the OP level, but it is limited since both TensorFlow and PyTorch have lots of unsupported OPs, and DP models may have customized OPs.

The DeePMD-kit needs to implement a framework-independent DP model format to have multiple backend support, as described below. Different frameworks are expected to behave similarly for the same model data.

Data structure

  1. The model data is based on the current input parameters, ensuring alignment for each framework. Unimplemented parameters should also be aligned, and the framework raises a NotImplementedError during runtime.

  2. Add a @variables key to each layer's dictionary, with a type of dict[str, np.ndarray], to store network parameters corresponding to what is needed to be restored in the current init_frz_model (which currently ensures complete restoration). "@variables" has a special character @ and should be a reserved name and avoided in the future. The keys of @variables should be aligned for all frameworks. Type embedding should be explicitly written and not hidden.

{
    "argument1": ...,
    "@variables": { 
        "variable1": ..., 
    }
}
  1. Add the following meta-information at the top level: (1) Software, version, and module used to generate the model file. (2) Generation time. (3) A unified model definition version for all frameworks.
{
    "model": ...,
    "software": ...,
    "software_version": ...,
    "time": ...,
    "model_version": ...,
}

Data storage

HDF5 file is used to store data. h5py is a dependency of TensorFlow, PyTorch, and the existing DeePMD-kit, so this doesn't bring extra dependencies.

  1. All variables are stored in the HDF5 file using a unique path. The json path is preserved and should not be used.
  2. The JSON file is stored in the json path, where the type of @variables is dict[str, str]. The value of the @variables dict is the path to the variable, which could be different among different platforms.
  3. Convert dict[str, np.ndarray] to dict[str, str] when saving the model and convert it back when restoring it.

Binding with class

Add deserialize (methodclass) and serialize to each class. The parent class should call the method of subclass. The implementation should follow dpdispacher:

https://github.com/deepmodeling/dpdispatcher/blob/065731a60be3b58979b54f1d33562ef189800158/dpdispatcher/submission.py#L97-L166

The deserialize (methodclass) and serialize of the top class can be called by external modules.

Progress

Further Information, Files, and Links

No response

Metadata

Metadata

Assignees

Projects

Status

Done

Milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions