Skip to content

Fix Python 3.14+ parallelism: force fork start method and fix tar.extract filter#429

Closed
Copilot wants to merge 8 commits intojust-python3.14-supportfrom
copilot/force-fork-parallelism
Closed

Fix Python 3.14+ parallelism: force fork start method and fix tar.extract filter#429
Copilot wants to merge 8 commits intojust-python3.14-supportfrom
copilot/force-fork-parallelism

Conversation

Copy link
Contributor

Copilot AI commented Feb 28, 2026

Python 3.14 introduced two breaking changes affecting zstash: the default multiprocessing start method changed from fork to spawn, and the default tarfile.extract() filter changed to "data", which raises AbsoluteLinkError for entries with absolute symlinks.

Changes

  • zstash/main.py: Import multiprocessing and call multiprocessing.set_start_method("fork", force=True) at the top of main(), guarded by a sys.platform != "win32" check since fork is POSIX-only.
if sys.platform != "win32":
    multiprocessing.set_start_method("fork", force=True)
  • zstash/extract.py: Use filter="tar" for tar.extract() on Python 3.12+, which preserves the old behavior of allowing absolute symlinks while still being safer than "fully_trusted".
if sys.version_info >= (3, 12):
    tar.extract(tarinfo, filter="tar")  # "data" is too restrictive, "fully_trusted" is too permissive.
else:
    tar.extract(tarinfo)

Summary

Objectives:

  • Force fork as the multiprocessing start method so parallel zstash operations work correctly on Python 3.14+
  • Fix tar.extract() to use filter="tar" on Python 3.12+ to prevent AbsoluteLinkError when extracting entries with absolute symlinks

Select one: This pull request is...

  • a bug fix: increment the patch version

Small Change

  • To merge, I will use "Squash and merge". That is, this change should be a single commit.
  • Logic: I have visually inspected the entire pull request myself.
  • Pre-commit checks: All the pre-commits checks have passed.

💡 You can make Copilot smarter by setting up custom instructions, customizing its development environment and configuring Model Context Protocol (MCP) servers. Learn more Copilot coding agent tips in the docs.

* Initial plan

* Add .github/copilot-instructions.md and AGENTS.md

Co-authored-by: forsyth2 <30700190+forsyth2@users.noreply.github.com>

* Manual updates to agent instructions

* Address code review comments

* Address further code review comments

* Remove copilot-instructions.md

---------

Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com>
Co-authored-by: forsyth2 <30700190+forsyth2@users.noreply.github.com>
Co-authored-by: Ryan Forsyth <forsyth2@llnl.gov>
Copilot AI changed the title [WIP] Implement fork parallelism support in zstash Force fork multiprocessing start method to fix Python 3.14+ parallelism Feb 28, 2026
@forsyth2
Copy link
Collaborator

@copilot Yes, that change alone does in fact keep most of the tests passing. However python -m unittest tests.integration.python_tests.group_by_workflow.test_cache_fs.TestCacheFs.test_hpss_none_fs_off isn't fixed. Does that still need this fix applied?

                if sys.version_info >= (3, 12):
                    tar.extract(tarinfo, filter="tar") # "data" is too restrictive, "fully_trusted" is too permissive.
                else:
                    tar.extract(tarinfo)

Full test output

+ zstash create --hpss=none --cache=/global/u1/f/forsyth/ez/zstash/zstash_work_dir/test_cache zstash_src

For help, please see https://e3sm-project.github.io/zstash. Ask questions at https://github.com/E3SM-Project/zstash/discussions/categories/q-a.
INFO: Gathering list of files to archive
INFO: 20260228_012242_069542: Creating new tar archive 000000.tar
INFO: Archiving file0.txt
INFO: Archiving file0_hard.txt
INFO: Archiving file0_soft.txt
INFO: Archiving file_empty.txt
INFO: Archiving file_not_included_hard.txt
INFO: Archiving file_not_included_soft.txt
INFO: Archiving original_was_deleted_hard.txt
INFO: Archiving dir1/file1.txt
INFO: Archiving dir2/file1_hard.txt
INFO: Archiving dir2/file1_soft.txt
INFO: Archiving empty_dir
INFO: 20260228_012242_079362: (add_files): Completed archive file 000000.tar
INFO: Contents of the cache prior to `hpss_put`: ['index.db', '000000.tar']
INFO: 20260228_012242_079588: DIVING: (add_files): Calling hpss_put to dispatch archive file 000000.tar [keep, non_blocking] = [True, False]
INFO: 20260228_012242_079641: in hpss_transfer, prev_transfers is starting as []
INFO: put: HPSS is unavailable
INFO: put: Keeping tar files locally and removing write permissions
INFO: '/global/u1/f/forsyth/ez/zstash/zstash_work_dir/test_cache/000000.tar' original mode=b"'660'"
INFO: '/global/u1/f/forsyth/ez/zstash/zstash_work_dir/test_cache/000000.tar' new mode=b"'440'"
INFO: 20260228_012242_103429: SURFACE (add_files): Called hpss_put to dispatch archive file 000000.tar
INFO: tar name=000000.tar, tar size=30720, tar md5=15a70545d3844ed93c8a1cb447e2e797
INFO: Adding 000000.tar to the database.
INFO: 20260228_012242_112759: in hpss_transfer, prev_transfers is starting as []
INFO: put: HPSS is unavailable

+ zstash extract --hpss=none --cache=/global/u1/f/forsyth/ez/zstash/zstash_work_dir/test_cache

For help, please see https://e3sm-project.github.io/zstash. Ask questions at https://github.com/E3SM-Project/zstash/discussions/categories/q-a.
INFO: /global/u1/f/forsyth/ez/zstash/zstash_work_dir/test_cache/000000.tar exists. Checking expected size matches actual size.
INFO: 000000.tar: Found a single database entry.
INFO: 000000.tar: Size check passed (30720 bytes)
INFO: Opening tar archive /global/u1/f/forsyth/ez/zstash/zstash_work_dir/test_cache/000000.tar
INFO: Extracting file0.txt
INFO: Extracting file0_hard.txt
INFO: Extracting file0_soft.txt
INFO: Extracting file_empty.txt
INFO: Extracting file_not_included_hard.txt
INFO: Extracting file_not_included_soft.txt
Traceback (most recent call last):
  File "/global/homes/f/forsyth/miniforge3/envs/zstash-force-fork-20260227/lib/python3.14/site-packages/zstash/extract.py", line 662, in extractFiles
    tar.extract(tarinfo)
    ~~~~~~~~~~~^^^^^^^^^
  File "/global/homes/f/forsyth/miniforge3/envs/zstash-force-fork-20260227/lib/python3.14/tarfile.py", line 2476, in extract
    tarinfo, unfiltered = self._get_extract_tarinfo(
                          ~~~~~~~~~~~~~~~~~~~~~~~~~^
        member, filter_function, path)
        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/global/homes/f/forsyth/miniforge3/envs/zstash-force-fork-20260227/lib/python3.14/tarfile.py", line 2498, in _get_extract_tarinfo
    self._handle_fatal_error(e)
    ~~~~~~~~~~~~~~~~~~~~~~~~^^^
  File "/global/homes/f/forsyth/miniforge3/envs/zstash-force-fork-20260227/lib/python3.14/tarfile.py", line 2496, in _get_extract_tarinfo
    filtered = filter_function(unfiltered, path)
  File "/global/homes/f/forsyth/miniforge3/envs/zstash-force-fork-20260227/lib/python3.14/tarfile.py", line 859, in data_filter
    new_attrs = _get_filtered_attrs(member, dest_path, True)
  File "/global/homes/f/forsyth/miniforge3/envs/zstash-force-fork-20260227/lib/python3.14/tarfile.py", line 832, in _get_filtered_attrs
    raise AbsoluteLinkError(member)
tarfile.AbsoluteLinkError: 'file_not_included_soft.txt' is a link to an absolute path
ERROR: Retrieving file_not_included_soft.txt
INFO: Extracting original_was_deleted_hard.txt
INFO: Extracting dir1/file1.txt
INFO: Extracting dir2/file1_hard.txt
INFO: Extracting dir2/file1_soft.txt
Traceback (most recent call last):
  File "/global/homes/f/forsyth/miniforge3/envs/zstash-force-fork-20260227/lib/python3.14/site-packages/zstash/extract.py", line 662, in extractFiles
    tar.extract(tarinfo)
    ~~~~~~~~~~~^^^^^^^^^
  File "/global/homes/f/forsyth/miniforge3/envs/zstash-force-fork-20260227/lib/python3.14/tarfile.py", line 2476, in extract
    tarinfo, unfiltered = self._get_extract_tarinfo(
                          ~~~~~~~~~~~~~~~~~~~~~~~~~^
        member, filter_function, path)
        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/global/homes/f/forsyth/miniforge3/envs/zstash-force-fork-20260227/lib/python3.14/tarfile.py", line 2498, in _get_extract_tarinfo
    self._handle_fatal_error(e)
    ~~~~~~~~~~~~~~~~~~~~~~~~^^^
  File "/global/homes/f/forsyth/miniforge3/envs/zstash-force-fork-20260227/lib/python3.14/tarfile.py", line 2496, in _get_extract_tarinfo
    filtered = filter_function(unfiltered, path)
  File "/global/homes/f/forsyth/miniforge3/envs/zstash-force-fork-20260227/lib/python3.14/tarfile.py", line 859, in data_filter
    new_attrs = _get_filtered_attrs(member, dest_path, True)
  File "/global/homes/f/forsyth/miniforge3/envs/zstash-force-fork-20260227/lib/python3.14/tarfile.py", line 832, in _get_filtered_attrs
    raise AbsoluteLinkError(member)
tarfile.AbsoluteLinkError: 'dir2/file1_soft.txt' is a link to an absolute path
ERROR: Retrieving dir2/file1_soft.txt
INFO: Extracting empty_dir
ERROR: Encountered an error for files:
ERROR: file_not_included_soft.txt in 000000.tar
ERROR: dir2/file1_soft.txt in 000000.tar
ERROR: The following tar archives had errors:
ERROR: 000000.tar

FRemoving test files, both locally and at the HPSS repo

======================================================================
FAIL: test_hpss_none_fs_off (tests.integration.python_tests.group_by_workflow.test_cache_fs.TestCacheFs.test_hpss_none_fs_off)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/global/u1/f/forsyth/ez/zstash/tests/integration/python_tests/group_by_workflow/test_cache_fs.py", line 152, in test_hpss_none_fs_off
    self.assertTrue(os.path.islink("dir2/file1_soft.txt"))  # DIFFERENT from fs_on
    ~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
AssertionError: False is not true

----------------------------------------------------------------------
Ran 1 test in 1.278s

FAILED (failures=1)

Copy link
Contributor Author

Copilot AI commented Feb 28, 2026

@copilot Yes, that change alone does in fact keep most of the tests passing. However `python -m unittest tests.integration.python_tests.group_by_workflow.test_cache_fs.TestCacheFs.te...

Yes, that test needed this fix too. Added filter="tar" for tar.extract() on Python 3.12+ in commit d41272f. The "tar" filter allows absolute symlinks (unlike the stricter "data" filter which raises AbsoluteLinkError), while still being safer than "fully_trusted".

Copilot AI changed the title Force fork multiprocessing start method to fix Python 3.14+ parallelism Fix Python 3.14+ parallelism: force fork start method and fix tar.extract filter Feb 28, 2026
@forsyth2
Copy link
Collaborator

I've confirmed all of the Python tests pass now. I also added a commit to fix the pre-commit checks.

@forsyth2 forsyth2 marked this pull request as ready for review March 17, 2026 00:06
@forsyth2 forsyth2 force-pushed the copilot/force-fork-parallelism branch from 0a5d3fd to cb558ab Compare March 17, 2026 00:09
@forsyth2
Copy link
Collaborator

Rebased off main

@forsyth2
Copy link
Collaborator

Closing in favor of #434, which uses only the relevant commits and merges to main.

@forsyth2 forsyth2 closed this Mar 17, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants