Skip to content

About the shape of inputs and the size of EarthSpecificBias table #69

@JNR000

Description

@JNR000

The examples and models given in the repo show that the inputs are of shape (5, 13, 721, 1440) and (4, 721, 1440), where the batch size and static mask are omitted for clarity. However, the paper on arXiv and the pseudocode both describe inputs shaped (5, 13, 1440, 721) and (4, 1440, 721). I think maybe (5, 13, 721, 1440) and (4, 721, 1440) is right.

I rewrote the pseudocode and tried to train the model. The total numbers of parameters of mine is much smaller than the total number of parameters in your pangu_weather_24.onnx file.

I printed both the parameter shapes of both models to identify the differences. In your pangu_weather_24 model, there are large layers like (1,124,6,144,144) shape, which may the EarthSpecificBias layer, I think. In this layer, 124 is the number of different windows (type_of _windows), 6 is the number of heads and 144=2*6*12 can be considered as sequence length of a window. The EarthSpecificBias layers have much more parameters than my model and the main difference is the size of EarthSpecificBias table. I wrote EarthSpecificBias table according to the definition in your paper, whose shape is (3312,124,6) instead of (144*144,124,6). There only be 3312*124*6 different earthspecific bias where 3312 means (2*window_size[2]-1)*window_size[1]*window_size[1]*window_size[0]*window_size[0], 124 is the number of different windows and the number of heads is 6. The way position_index computed in the pseudocode also results in a tensor of length 144*144 with only 3312 unique values. Are most of the parameters in your EarthSpecificBias table redundant?

Any help would be greatly appreciated!

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions