Skip to content

fix(searcher): adapt to the new metadata schema with file indices #147

Merged
tiborsimko merged 2 commits intomasterfrom
fix_111
Feb 16, 2025
Merged

fix(searcher): adapt to the new metadata schema with file indices #147
tiborsimko merged 2 commits intomasterfrom
fix_111

Conversation

@psaiz
Copy link
Contributor

@psaiz psaiz commented Dec 19, 2024

Fixes file download for records having files attached via file indexes.

Closes cernopendata/cernopendata-portal#111

Co-authored-by: Co-authored-by: Tibor Šimko tibor.simko@cern.ch

server, str(record_json["id"]), file_[0].split("/")[-1]
for file_ in record_json["metadata"]["_file_indices"]:
if expand:
# let's unwind file indexes
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note that the changes do not pass unit tests, e.g. see the CI report for Python 3.12:

================== 22 failed, 50 passed, 8 skipped in 45.12s ===================

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After CERN Open Data portal service update, I'm still getting locally failed tests:

$ tox -e py312
...
FAILED tests/test_cli_download_files.py::test_download_files_http_requests - assert 1 == 0
FAILED tests/test_cli_download_files.py::test_download_files_https_requests - assert 1 == 0
FAILED tests/test_cli_download_files.py::test_download_files_download_engine - assert 1 == 0
FAILED tests/test_cli_download_files.py::test_download_files_with_verify - assert 1 == 0
FAILED tests/test_cli_download_files.py::test_download_files_filter_name - assert 1 == 0
FAILED tests/test_cli_download_files.py::test_download_files_filter_name_multiple_values - assert 1 == 0
FAILED tests/test_cli_download_files.py::test_download_files_filter_regexp_single_file - assert 1 == 0
FAILED tests/test_cli_download_files.py::test_download_files_filter_regexp_multiple_files - assert 1 == 0
FAILED tests/test_cli_download_files.py::test_download_files_filter_range - assert 1 == 0
FAILED tests/test_cli_download_files.py::test_download_files_filter_range_multiple_values - assert 1 == 0
FAILED tests/test_cli_download_files.py::test_download_files_filter_single_range_single_regexp - assert 1 == 0
FAILED tests/test_cli_download_files.py::test_download_files_filter_multiple_range_single_regexp - assert 1 == 0
FAILED tests/test_cli_get_file_locations.py::test_get_file_locations_from_recid_without_files - AssertionError: assert 1 == 0
FAILED tests/test_cli_verify_files.py::test_verify_files - assert 1 == 0
FAILED tests/test_cli_verify_files.py::test_verify_files_https_server - assert 1 == 0
FAILED tests/test_metadater.py::test_get_metadata_from_filter_metadata_two - assert 1 == 0
FAILED tests/test_verifier.py::test_get_file_info_local_good_input - assert 1 == 0
FAILED tests/test_verifier.py::test_get_file_info_local_good_input_wrong_count - assert 1 == 0
FAILED tests/test_verifier.py::test_get_file_info_local_good_input_wrong_checksum - assert 1 == 0
FAILED tests/test_verifier.py::test_get_file_info_local_good_input_wrong_size - assert 1 == 0

For example, this command works:

$ cernopendata-client download-files --recid 1 --no-expand
==> Downloading file 1 of 6
  -> File: ./1/CMS_Run2010B_BTau_AOD_Apr21ReReco-v1_0000_file_index.json
  -> Progress: 322/322 KiB (100%)
^C

$ cernopendata-client download-files --recid 1
==> Downloading file 1 of 2916
  -> File 00E16FBB-9071-E011-83D3-003048673F12.root is incomplete. Resuming download.
  -> File: ./1/00E16FBB-9071-E011-83D3-003048673F12.root
^C-> Progress: 124229/596996 KiB (20%)
Aborted!

Whilst this (simplest) use case of directly attached files does not work:

$ cernopendata-client download-files --recid 5500
==> Downloading file 1 of 11
==> ERROR: Download error occured. Please try again.
Traceback (most recent call last):
  File "/home/tibor/.virtualenvs/cernopendata-client/bin/cernopendata-client", line 8, in <module>
    sys.exit(cernopendata_client())
             ^^^^^^^^^^^^^^^^^^^^^
  File "/home/tibor/.virtualenvs/cernopendata-client/lib/python3.12/site-packages/click/core.py", line 1161, in __call__
    return self.main(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/tibor/.virtualenvs/cernopendata-client/lib/python3.12/site-packages/click/core.py", line 1082, in main
    rv = self.invoke(ctx)
         ^^^^^^^^^^^^^^^^
  File "/home/tibor/.virtualenvs/cernopendata-client/lib/python3.12/site-packages/click/core.py", line 1697, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
                           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/tibor/.virtualenvs/cernopendata-client/lib/python3.12/site-packages/click/core.py", line 1443, in invoke
    return ctx.invoke(self.callback, **ctx.params)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/tibor/.virtualenvs/cernopendata-client/lib/python3.12/site-packages/click/core.py", line 788, in invoke
    return __callback(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/tibor/.virtualenvs/cernopendata-client/lib/python3.12/site-packages/cernopendata_client/cli.py", line 377, in download_files
    download_single_file(
  File "/home/tibor/.virtualenvs/cernopendata-client/lib/python3.12/site-packages/cernopendata_client/downloader.py", line 340, in download_single_file
    downloader.file_downloader()
  File "/home/tibor/.virtualenvs/cernopendata-client/lib/python3.12/site-packages/cernopendata_client/downloader.py", line 80, in file_downloader
    response = requests.get(self.file_location, headers=headers, stream=True)
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/tibor/.virtualenvs/cernopendata-client/lib/python3.12/site-packages/requests/api.py", line 73, in get
    return request("get", url, params=params, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/tibor/.virtualenvs/cernopendata-client/lib/python3.12/site-packages/requests/api.py", line 59, in request
    return session.request(method=method, url=url, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/tibor/.virtualenvs/cernopendata-client/lib/python3.12/site-packages/requests/sessions.py", line 589, in request
    resp = self.send(prep, **send_kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/tibor/.virtualenvs/cernopendata-client/lib/python3.12/site-packages/requests/sessions.py", line 697, in send
    adapter = self.get_adapter(url=request.url)
              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/tibor/.virtualenvs/cernopendata-client/lib/python3.12/site-packages/requests/sessions.py", line 792, in get_adapter
    raise InvalidSchema(f"No connection adapters were found for {url!r}")
requests.exceptions.InvalidSchema: No connection adapters were found for 'root://eospublic.cern.ch//eos/opendata/cms/software/HiggsExample20112012/BuildFile.xml'

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As discussed IRL, I took over and fixed the download problem and squashed the fix with your branch. I have also added you to the AUTHORS file and fixed an independent metadata filtering test issue following the deprecation of CCID.

tiborsimko added a commit that referenced this pull request Feb 16, 2025
Fixes file download for records having files attached via file indexes.

Closes cernopendata/cernopendata-portal#111

Co-authored-by: Co-authored-by: Tibor Šimko <tibor.simko@cern.ch>
tiborsimko added a commit that referenced this pull request Feb 16, 2025
Fixes metadata filtering test by moving from CCID to ORCID filtering,
following the removal of author's CCID in the portal content.
@tiborsimko tiborsimko changed the title file_indices: adapt to the new schema of metadata for the file indices fix(searcher): adapt to the new metadata schema with file indices Feb 16, 2025
tiborsimko added a commit that referenced this pull request Feb 16, 2025
Fixes file download for records having files attached via file indexes.

Closes cernopendata/cernopendata-portal#111

Co-authored-by: Tibor Šimko <tibor.simko@cern.ch>
tiborsimko added a commit that referenced this pull request Feb 16, 2025
Fixes metadata filtering test by moving from CCID to ORCID filtering,
following the removal of author's CCID in the portal content.
@codecov
Copy link

codecov bot commented Feb 16, 2025

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 80.80%. Comparing base (357a719) to head (4ff86a9).
Report is 12 commits behind head on master.

Additional details and impacted files

Impacted file tree graph

@@            Coverage Diff             @@
##           master     #147      +/-   ##
==========================================
+ Coverage   80.65%   80.80%   +0.14%     
==========================================
  Files          12       12              
  Lines         729      719      -10     
==========================================
- Hits          588      581       -7     
+ Misses        141      138       -3     
Files with missing lines Coverage Δ
cernopendata_client/searcher.py 86.60% <100.00%> (+1.36%) ⬆️
🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

psaiz and others added 2 commits February 16, 2025 16:19
Fixes file download for records having files attached via file indexes.

Closes cernopendata/cernopendata-portal#111
Closes #148

Co-authored-by: Tibor Šimko <tibor.simko@cern.ch>
Fixes metadata filtering test by moving from CCID to ORCID filtering,
following the removal of author's CCID in the portal content.
@tiborsimko tiborsimko merged commit 4ff86a9 into master Feb 16, 2025
17 checks passed
@tiborsimko tiborsimko deleted the fix_111 branch February 16, 2025 15:25
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

rest: client cannot get file locations after latest deployment

2 participants