Skip to content

Possible to use verify-files on individual files from a record ID? #175

@matthewfeickert

Description

@matthewfeickert

There may be times where given the size of files associated with a record ID that a user might not want to download all of them simultaneously, but just one to test. Is it possible to provide an argument to cernopendata-client verify-files to indicate that a particular file is to be verified?

Example: After downloading the file mc-flavtag-ttbar-small.h5 from CERN Open Data Record ID https://opendata.cern.ch/record/93940 I'm not able to validate its digest my passing it to the CLI API

$ pixi global install cernopendata-client --with cernopendata-client-xrootd
$ cernopendata-client version
1.0.2
$ cernopendata-client download-files --recid 93940 --protocol xrootd --filter-name mc-flavtag-ttbar-small.h5 --download-engine xrootd
==> Downloading file 1 of 1
  -> File: ./93940/mc-flavtag-ttbar-small.h5
==> Success!
$ cernopendata-client verify-files --help
Usage: cernopendata-client verify-files [OPTIONS]

  Verify downloaded data file integrity.

  Select a CERN Open Data bibliographic record by a record ID, a DOI, or a
  title and verify integrity of downloaded data files belonging to this
  record.

  Examples:

       $ cernopendata-client verify-files --recid 5500

Options:
  --recid INTEGER  Record ID (exact match)
  --doi TEXT       Digital Object Identifier (exact match)
  --title TEXT     Record title (exact match, no wildcards)
  --server TEXT    Which CERN Open Data server to query?
                   [default=http://opendata.cern.ch]
  --help           Show this message and exit.
$ cernopendata-client verify-files --recid 93940
==> Verifying number of files for record 93940... 
  -> Expected 3, found 1
==> ERROR: File count does not match.
$ cernopendata-client verify-files --recid 93940 --title mc-flavtag-ttbar-small.h5 
==> Verifying number of files for record 93940... 
  -> Expected 3, found 1
==> ERROR: File count does not match.
$ cernopendata-client verify-files --recid 93940 ./93940/mc-flavtag-ttbar-small.h5 
Usage: cernopendata-client verify-files [OPTIONS]
Try 'cernopendata-client verify-files --help' for help.

Error: Got unexpected extra argument (./93940/mc-flavtag-ttbar-small.h5)

I assume(?) that verify-files is comparing the digest of the files to a published value on CERN Open Data. Is that correct? If so, is it possible to get a CLI API function to just get the digests of files associated with a record ID?

$ pixi global install openssl
$ openssl sha256 ./93940/mc-flavtag-ttbar-small.h5 
SHA2-256(./93940/mc-flavtag-ttbar-small.h5)= f7249d6b257b3ebba10acc003748db9ae26861acdf57eb92c912fd3870cb1e4a
$ openssl md5 ./93940/mc-flavtag-ttbar-small.h5 
MD5(./93940/mc-flavtag-ttbar-small.h5)= d72ab992af7ec5517297c05a7f51f7ad

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions