Skip to content

get_dataset_filename_from_metadata should return full path for non-conforming datasets + couple of typos #78

@GreenK173

Description

@GreenK173

The SigMF format specification states that the core:dataset field for a non-conforming dataset should contain only the file name and not the path (and that the dataset must be in the same directory as the metadata file). This makes sense, otherwise we would need to change the field every time we move the data. The get_dataset_filename_from_metadata function in the sigmffile.py script correctly returns the full path for compliant datasets, but for non-conforming datasets it returns

metadata["global"].get("core:dataset", None)

which by the above is only a file name without the path. As a result, loading of non-conforming datasets doesn't work unless the dataset is in the working directory.

Furthermore, the line 935

      1. The file named <METAFILE_BASENAME>.sigmf-meta if it exists

should be

      1. The file named <METAFILE_BASENAME>.sigmf-data if it exists

and in line 954

"core:meatadata_only"

should be

"core:metadata_only"

So together I believe the get_dataset_filename_from_metadata function should look like this:

def get_dataset_filename_from_metadata(meta_fn, metadata=None):
    """
    Parse provided metadata and return the expected data filename. In the case of
    a metadata only distribution, or if the file does not exist, this will return
    'None'. The priority for conflicting:
      1. The file named <METAFILE_BASENAME>.sigmf-data if it exists
      2. The file in the `core:dataset` field (Non-Compliant Dataset) if it exists
      3. None (may be a metadata only distribution)
    """
    compliant_data_fn = get_sigmf_filenames(meta_fn)["data_fn"]
    noncompliant_data_fn = metadata["global"].get("core:dataset", None)
    dir_path = path.split(meta_fn)[0]
    if not dir_path:
        dir_path = "."  # sets the correct path in the case meta_fn is only a filename

    if path.isfile(compliant_data_fn):
        if noncompliant_data_fn:
            warnings.warn(
                f"Compliant Dataset `{compliant_data_fn}` exists but "
                f'"core:dataset" is also defined; using `{compliant_data_fn}`'
            )
        return compliant_data_fn

    elif noncompliant_data_fn:
        if path.isfile(f"{dir_path}/{noncompliant_data_fn}"):
            if metadata["global"].get("core:metadata_only", False):
                warnings.warn(
                    'Schema defines "core:dataset" but "core:metadata_only" '
                    f"also exists; using `{noncompliant_data_fn}`"
                )
            return f"{dir_path}/{noncompliant_data_fn}"
        else:
            warnings.warn(
                f"Non-Compliant Dataset `{noncompliant_data_fn}` is specified " 'in "core:dataset" but does not exist!'
            )

    return None

I'd create a pull request but it seems that I don't have an authorization to create a new branch.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions