fix(searcher): adapt to the new metadata schema with file indices #147
fix(searcher): adapt to the new metadata schema with file indices #147tiborsimko merged 2 commits intomasterfrom
Conversation
| server, str(record_json["id"]), file_[0].split("/")[-1] | ||
| for file_ in record_json["metadata"]["_file_indices"]: | ||
| if expand: | ||
| # let's unwind file indexes |
There was a problem hiding this comment.
Note that the changes do not pass unit tests, e.g. see the CI report for Python 3.12:
================== 22 failed, 50 passed, 8 skipped in 45.12s ===================
There was a problem hiding this comment.
After CERN Open Data portal service update, I'm still getting locally failed tests:
$ tox -e py312
...
FAILED tests/test_cli_download_files.py::test_download_files_http_requests - assert 1 == 0
FAILED tests/test_cli_download_files.py::test_download_files_https_requests - assert 1 == 0
FAILED tests/test_cli_download_files.py::test_download_files_download_engine - assert 1 == 0
FAILED tests/test_cli_download_files.py::test_download_files_with_verify - assert 1 == 0
FAILED tests/test_cli_download_files.py::test_download_files_filter_name - assert 1 == 0
FAILED tests/test_cli_download_files.py::test_download_files_filter_name_multiple_values - assert 1 == 0
FAILED tests/test_cli_download_files.py::test_download_files_filter_regexp_single_file - assert 1 == 0
FAILED tests/test_cli_download_files.py::test_download_files_filter_regexp_multiple_files - assert 1 == 0
FAILED tests/test_cli_download_files.py::test_download_files_filter_range - assert 1 == 0
FAILED tests/test_cli_download_files.py::test_download_files_filter_range_multiple_values - assert 1 == 0
FAILED tests/test_cli_download_files.py::test_download_files_filter_single_range_single_regexp - assert 1 == 0
FAILED tests/test_cli_download_files.py::test_download_files_filter_multiple_range_single_regexp - assert 1 == 0
FAILED tests/test_cli_get_file_locations.py::test_get_file_locations_from_recid_without_files - AssertionError: assert 1 == 0
FAILED tests/test_cli_verify_files.py::test_verify_files - assert 1 == 0
FAILED tests/test_cli_verify_files.py::test_verify_files_https_server - assert 1 == 0
FAILED tests/test_metadater.py::test_get_metadata_from_filter_metadata_two - assert 1 == 0
FAILED tests/test_verifier.py::test_get_file_info_local_good_input - assert 1 == 0
FAILED tests/test_verifier.py::test_get_file_info_local_good_input_wrong_count - assert 1 == 0
FAILED tests/test_verifier.py::test_get_file_info_local_good_input_wrong_checksum - assert 1 == 0
FAILED tests/test_verifier.py::test_get_file_info_local_good_input_wrong_size - assert 1 == 0For example, this command works:
$ cernopendata-client download-files --recid 1 --no-expand
==> Downloading file 1 of 6
-> File: ./1/CMS_Run2010B_BTau_AOD_Apr21ReReco-v1_0000_file_index.json
-> Progress: 322/322 KiB (100%)
^C
$ cernopendata-client download-files --recid 1
==> Downloading file 1 of 2916
-> File 00E16FBB-9071-E011-83D3-003048673F12.root is incomplete. Resuming download.
-> File: ./1/00E16FBB-9071-E011-83D3-003048673F12.root
^C-> Progress: 124229/596996 KiB (20%)
Aborted!Whilst this (simplest) use case of directly attached files does not work:
$ cernopendata-client download-files --recid 5500
==> Downloading file 1 of 11
==> ERROR: Download error occured. Please try again.
Traceback (most recent call last):
File "/home/tibor/.virtualenvs/cernopendata-client/bin/cernopendata-client", line 8, in <module>
sys.exit(cernopendata_client())
^^^^^^^^^^^^^^^^^^^^^
File "/home/tibor/.virtualenvs/cernopendata-client/lib/python3.12/site-packages/click/core.py", line 1161, in __call__
return self.main(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/tibor/.virtualenvs/cernopendata-client/lib/python3.12/site-packages/click/core.py", line 1082, in main
rv = self.invoke(ctx)
^^^^^^^^^^^^^^^^
File "/home/tibor/.virtualenvs/cernopendata-client/lib/python3.12/site-packages/click/core.py", line 1697, in invoke
return _process_result(sub_ctx.command.invoke(sub_ctx))
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/tibor/.virtualenvs/cernopendata-client/lib/python3.12/site-packages/click/core.py", line 1443, in invoke
return ctx.invoke(self.callback, **ctx.params)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/tibor/.virtualenvs/cernopendata-client/lib/python3.12/site-packages/click/core.py", line 788, in invoke
return __callback(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/tibor/.virtualenvs/cernopendata-client/lib/python3.12/site-packages/cernopendata_client/cli.py", line 377, in download_files
download_single_file(
File "/home/tibor/.virtualenvs/cernopendata-client/lib/python3.12/site-packages/cernopendata_client/downloader.py", line 340, in download_single_file
downloader.file_downloader()
File "/home/tibor/.virtualenvs/cernopendata-client/lib/python3.12/site-packages/cernopendata_client/downloader.py", line 80, in file_downloader
response = requests.get(self.file_location, headers=headers, stream=True)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/tibor/.virtualenvs/cernopendata-client/lib/python3.12/site-packages/requests/api.py", line 73, in get
return request("get", url, params=params, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/tibor/.virtualenvs/cernopendata-client/lib/python3.12/site-packages/requests/api.py", line 59, in request
return session.request(method=method, url=url, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/tibor/.virtualenvs/cernopendata-client/lib/python3.12/site-packages/requests/sessions.py", line 589, in request
resp = self.send(prep, **send_kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/tibor/.virtualenvs/cernopendata-client/lib/python3.12/site-packages/requests/sessions.py", line 697, in send
adapter = self.get_adapter(url=request.url)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/tibor/.virtualenvs/cernopendata-client/lib/python3.12/site-packages/requests/sessions.py", line 792, in get_adapter
raise InvalidSchema(f"No connection adapters were found for {url!r}")
requests.exceptions.InvalidSchema: No connection adapters were found for 'root://eospublic.cern.ch//eos/opendata/cms/software/HiggsExample20112012/BuildFile.xml'There was a problem hiding this comment.
As discussed IRL, I took over and fixed the download problem and squashed the fix with your branch. I have also added you to the AUTHORS file and fixed an independent metadata filtering test issue following the deprecation of CCID.
Fixes file download for records having files attached via file indexes. Closes cernopendata/cernopendata-portal#111 Co-authored-by: Co-authored-by: Tibor Šimko <tibor.simko@cern.ch>
Fixes metadata filtering test by moving from CCID to ORCID filtering, following the removal of author's CCID in the portal content.
Fixes file download for records having files attached via file indexes. Closes cernopendata/cernopendata-portal#111 Co-authored-by: Tibor Šimko <tibor.simko@cern.ch>
Fixes metadata filtering test by moving from CCID to ORCID filtering, following the removal of author's CCID in the portal content.
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## master #147 +/- ##
==========================================
+ Coverage 80.65% 80.80% +0.14%
==========================================
Files 12 12
Lines 729 719 -10
==========================================
- Hits 588 581 -7
+ Misses 141 138 -3
🚀 New features to boost your workflow:
|
Fixes file download for records having files attached via file indexes. Closes cernopendata/cernopendata-portal#111 Closes #148 Co-authored-by: Tibor Šimko <tibor.simko@cern.ch>
Fixes metadata filtering test by moving from CCID to ORCID filtering, following the removal of author's CCID in the portal content.
Fixes file download for records having files attached via file indexes.
Closes cernopendata/cernopendata-portal#111
Co-authored-by: Co-authored-by: Tibor Šimko tibor.simko@cern.ch