Skip to content

Expose dispatch layer so that users can add their own local binary format #177

@edhartnett

Description

@edhartnett

Introduction

In order to enable the HYCOM model to be updated to the netCDF API, and to provide a general capability to users, I propose to expose the netCDF-C dispatch layer so that users can add their own local format. This feature, already available in the netCDF-C library internals, is not currently exposed to users.

Background

HYCOM Model and Data Format

The HYCOM modeling team would like to upgrade their model to netCDF, in part to take advantage of the ParallelIO libary, which provides a standard way to ensure good performance in the I/O layer, specifically in the case of asynchronous I/O to a subset of processors using parallel I/O to write to disk.

Many legacy tools have been written based on the existing HYCOM native binary format ("AB Format") Converting the model to netCDF would break all existing tools, leading to loss of significant investment of time and effort, as well as disrupting work with a need to convert all necessary tools at the same time.

Providing and Upgrade Path for HYCOM

With the NetCDF-C dispatch layer exposed, an external plug-in can be provided for the netCDF-C library. Once this plug-in is in place, HYCOM programmers can convert model code to the netCDF API, while still reading and writing their local binary format.

All tools based on the netCDF-C library will transparently read the AB Format (including all tools in Fortran 77, Fortran 90, C++, Python, Perl, Mathmatica, etc.) These tools will also be able to write the AB Format with the netCDF API, allowing them to be converted one at a time.

This will give HYCOM modelers an upgrade path for their existing large code base of code which works with the local binary format. One by one, as time permits, these tools can also be upgraded to the netCDF API.

Eventually, when all tools are converted, the arguments of the nc_create commands in the HYCOM model can be changed to use one of the supported netCDF formats. All tools will transparently read netCDF files, and files in their local format. Historical data (in AB Format) will still be transparently available to netCDF programs. When a file is opened, the netCDF-C library will recognize it as an AB file, and use the plug-in code to read the file. User applications will be able to access the file through the netCDF API.

NettCDF-Java Plug-Ins

The netCDF Java library allows users to write a plug-in which allows them to support read/writes to their own binary format, through the netCDF Java API. The plug-in enables the netCDF library to read/write an arbitrary local format. This feature is widely used in the netCDF-Java community.

The proposed changes will bring this capability to the netCDF-C user community.

General Applicability for the NetCDF Community

Although this work will directly benefit the HYCOM mode, the need for an upgrade path from a local format to netCDF is general. I propose that this become part of the netCDF release and become available to all netCDF-C library users.

Technical Approach

The solution consists of two parts: a plug-in to read and write AB Format, and the changes to the C library necessary to support use of the plug-in, and others like it.

Changes to NetCDF-C to Expose the Dispatch Layer

The C library contains a "dispatch" layer which allows for other formats to be added to the C library. This is how netCDF-4/HDF5, OPENDAP, HDF4, and the parallel-netcdf library are currently supported.

But the C library does not expose this plug-in capability. Changing the C library source code is required in order to add a new version.

For the purposes of this discussion, AB Format, and other such formats, will be called "user-defined formats".

Dispatch Layer Library

The user is responsible for building a dispatch layer library.

This library, when linked with netCDF and a user application, allows the netCDF library to understand the user-defined format. The user-defined format dispatch library implements all the necessary functions from the netCDF dispatch table. (This is a subset of the entire netCDF API).

The user-defined format is registered with a new function, nc_def_user_format().

The AB Format

The local format for the HYCOM model is called "AB Format." It consists of two files, one with metadata, and one with the data stored in binary form.

An AB "file" consists of two files on disk, a binary file with extension "a", and an ASCII text file with metadata with a "b" extension. (The names of the file are part of the format definition.)

The A file is IEEE big-endian, direct access, with a fixed record length.

Converting the AB Format to the NetCDF Internal Model

For the netCDF library to read and write files in AB format, the plug-in must read the metadata file, and provide functions in accordance with the internal needs of the netCDF library. Reading and writing are handled separately.

Reading the AB Format

When reading the AB format the netCDF library needs functions which will:

  • identify the file as an AB format file
  • read metadata and provide it to netCDF internals
  • read the data in subsetted arrays.
Identifying AB Files

AB Files can be identified by name. When an nc_open call is made, and the file name is not found, but two files "FILENAME.a" and FILENAME.b" are found, then the library knows it will be opening an AB file.

Reading AB File Metadata

Once a file has been identified as AB Format, the netCDF library will need to read and understand all the metadata pertaining to that file. This includes the names, types, and sizes of all variables, attributes, and dimensions in the file.

With AB format, code will be written to read the .a file and provide metadata information to the netCDF library.

Reading Subsetted Arrays of AB File Data

The core data read operation of netCDF is nc_get_vara_TYPE (where TYPE varies). This call allows the user to read a subset of a data variable into an n-dimensional array. With this operation, the netCDF-C library implements most of the other read operations.

In the case of AB Format the subsetted array read will be straightforward. Based on the information in the metadata file, the exact offsets to any element of the data can be computed.

Writing the AB Format

In writing the AB Foramt, the netCDF library must:

  • Create the AB files.
  • Fill the metadata file.
  • Implement the subset array write operation.
Creating an AB File

The config file which is read at library load will contain the mode flag which indicates the AB format.

Writing the AB Metadata File

Once the nc_create call returns, no further disk access takes place until the nc_enddef() call. During the time between nc_create() and nc_enddef() the user can define variables, attributes, and dimensions for the file.

When nc_enddef() is called, the AB Plug-in code muse write the B file, which contains the metadata for the file.

Only the netCDF-classic data model will be supported by the AB Format Plug-in. Other restrictions on types and objects may be necessary.

Subset Write Operation

The netCDF subset write operation nc_put_vara() is the key write operation from which other write operations are built. Once implement for the AB format, all netCDF write operations will work for the files.

Conclusion

Exposing the netCDF-C library dispatch layer will benefit the HYCOM modeling group, and also other netCDF-C users (including all netCDF fortran, python, perl, etc.)

Comments and feedback welcome. Please add to this issue so that all can participate in discussion.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions