Skip to content

get-file-locations: adapt command to hot/cold disk/tape file storage dichotomy #160

@tiborsimko

Description

@tiborsimko

Current behaviour

Following up the introduction of hot/cold disk/tape storage for CERN Open Data record files, we need to adapt the behaviour of cernopendata-client when retrieving them.

Example: record https://opendata.cern.ch/record/8886 has 5089 files in the dataset, but only 1 file is present on disk (and can be immediately used), whlist the remaining 5088 files are present on tape (and a user needs to request their staging to disk before being able to use them).

On the web, this hot/cold disk/tape dichotomy is visible by the "Download index" button offering two options, list all files or list online files only.

On CLI, the cernopendata-client currently lists all files:

$ cernopendata-client get-file-locations --recid 8886  | wc -l
    5089

Expected behaviour

We should enable users to see which files are on disk (and can be used immediately in their analyses) and which are on tapes (and need staging first).

This could be achieved by introducing a new command-like option to the get-file-locations command such as --file-availability (to reuse the terminology from bucket file listing in the JSON field). In the above case, the command would return only the online file:

$ cernopendata-client get-file-locations --recid 8886 --file-availability online | wc -l
    1
$ cernopendata-client get-file-locations --recid 8886 --file-availability online
http://opendata.cern.ch/eos/opendata/cms/MonteCarlo2012/Summer12_DR53X/QCD_Pt_20_MuEnrichedPt_15_TuneZ2star_8TeV_pythia6/AODSIM/PU_S10_START53_V19-v1/00000/000778BA-0C10-E411-8A00-02163E00E86A.root

Notes

  • The default value could be to return "all" the files, as before.

  • One could argue that a user should be very visibly warned when one uses default command and when some files are not online, so that nobody would mistakenly believe the list is ready to be used.

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions