[pathfinder] RTLD_DI_LINKMAP-based new implementation of abs_path_for_dynamic_library()#834
Conversation
…eed for EXPECTED_LIB_SYMBOLS
…m check_if_already_loaded_from_elsewhere(), for consistency with load_with_system_search()
…s just an oversight)
|
Auto-sync is disabled for draft pull requests in this repository. Workflows must be run manually. Contributors can view more details about this message here. |
|
/ok to test |
This comment has been minimized.
This comment has been minimized.
|
Auto-sync is disabled for ready for review pull requests in this repository. Workflows must be run manually. Contributors can view more details about this message here. |
|
/ok to test |
| LIBDL.dladdr.argtypes = [ctypes.c_void_p, ctypes.c_void_p] | ||
| LIBDL.dladdr.restype = ctypes.c_int | ||
| LIBDL.dlinfo.argtypes = [ctypes.c_void_p, ctypes.c_int, ctypes.c_void_p] | ||
| LIBDL.dlinfo.restype = ctypes.c_int |
There was a problem hiding this comment.
I am a bit nervous about us using either dladdr or dlinfo since none of them is POSIX-compliant. Since dladdr has been used, switching to dlinfo does not make it more "non-compliant", so it's still a step forward. We just have to see if it runs everywhere...
There was a problem hiding this comment.
since none of them is POSIX-compliant
Do you think the ChatGPT conclusion below is reasonable?
My ChatGPT prompt:
This makes me want to ask:
Does CUDA run on any systems that are not Linux or Windows?
(Windows dynamic loading is handled separately. I.e. basically I'm only asking: does CUDA run on any Unix system that is not Linux?)
Do you know?
ChatGPT said:
Thought for 19s
Short answer: No. Today CUDA is officially supported on Linux (various distros/arches, including aarch64/Jetson) and Windows (including WSL2 Linux environments on Windows). There’s no current macOS runtime support, and no official support for other Unix-like OSes (e.g., *BSD).
So if your scope is “Linux only,” it’s reasonable to lean on glibc-specific behavior (e.g., dlinfo)—CUDA isn’t targeting musl/Alpine or non-Linux Unix platforms.
|
/ok to test |
…r arithmetic instead. Use `os.fsdecode()` instead of `l_name.decode()` to avoid `UnicodeDecodeError`
|
/ok to test |
…ame. Explain safety constraints in depth.
|
/ok to test |
|
Breaking out from this thread since it's getting a bit long and I want to leave another comment there to keep it focused.
Turns out we can. RTLD_DI_ORIGIN = 6
mylib = ctypes.CDLL(...)
path_buf = ctypes.create_string_buffer(4096)
rc = LIBDL.dlinfo(ctypes.c_void_p(mylib._handle), RTLD_DI_ORIGIN, path_buf)
if rc != 0:
err = _dl_last_error()
raise OSError(f"dlinfo failed for {libname=!r} (rc={rc})" + (f": {err}" if err else ""))
print(f"origin = {path_buf.value.decode()}")But for all practical use cases that the pathfinder covers, The only edge case where In fact, it might be a good idea to expand the new |
|
Thanks @leofang , I hadn't seen To be sure that we can rely on it being available, I looked up when it was added in the glibc repo: It's defined in That makes me think it's only a "directory name", and we'd still have to get the filename from somewhere else.
I'll play with this later tonight, to verify that a relative path is in fact not expanded to a full absolute path. If that's true, I'll add a comment in |
…931a1ef212b2826e0122f6e422
Confirmed.
Also confirmed. Putting 1 + 1 together in commit 66c5869:
@leofang as you said, currently pathfinder does not really need this, but it's easy enough to make it complete. |
|
/ok to test |
|
For completeness, here is how I confirmed the |
Yes. As mentioned earlier:
|
|
…for_dynamic_library()` (NVIDIA#834) * Use RTLD_DI_LINKMAP in abs_path_for_dynamic_library(), to eliminate need for EXPECTED_LIB_SYMBOLS * load_dl_linux.py: factor out get_candidate_sonames() and also use from check_if_already_loaded_from_elsewhere(), for consistency with load_with_system_search() * load_dl_windows.py: avoid unnecessary function-level imports (this was just an oversight) * Bump cuda-pathfinder version to `1.1.1a1` * Harden/polish the new abs_path_for_dynamic_library() implementation. * Eliminate `class LinkMap(ctypes.Structure)` and rely purely on pointer arithmetic instead. Use `os.fsdecode()` instead of `l_name.decode()` to avoid `UnicodeDecodeError` * Ensure `_dl_last_error()` does not raise `UnicodeDecodeError` * Add `validate_abs_path()` in test_load_nvidia_dynamic_lib.py * Change back to more intutive approach, using `_LinkMapLNameView` as name. Explain safety constraints in depth. * l_origin + basename(l_name) → abs_path
…and #855 Co-authored-by: leofang <5534781+leofang@users.noreply.github.com>
* Initial plan * Create Sphinx documentation infrastructure for cuda.pathfinder Co-authored-by: leofang <5534781+leofang@users.noreply.github.com> * Fix title underline and test pathfinder documentation build Co-authored-by: leofang <5534781+leofang@users.noreply.github.com> * Address all review feedback - fix references, remove unnecessary files, populate API docs Co-authored-by: leofang <5534781+leofang@users.noreply.github.com> * Add cuda-pathfinder documentation link to CI doc preview Co-authored-by: leofang <5534781+leofang@users.noreply.github.com> * [pre-commit.ci] auto code formatting * Update cuda-core and cuda-bindings README to reference nv-versions.json Co-authored-by: leofang <5534781+leofang@users.noreply.github.com> * Refactor pathfinder documentation based on review feedback - Remove lines 7-14 from api.rst (cuda.pathfinder automodule section) - Convert api.rst to use autosummary instead of direct autodoc directives following cuda-core pattern - Convert contribute.md to contribute.rst in ReST format - Remove _templates/main.html file as it's no longer needed - Update index.rst to reference contribute.rst instead of contribute.md Co-authored-by: leofang <5534781+leofang@users.noreply.github.com> * Add release notes for cuda-pathfinder versions 1.0.0 and 1.1.0 - Created release directory structure under cuda_pathfinder/docs/source/release/ - Added 1.0.0-notes.rst with initial release highlights - Added 1.1.0-notes.rst with CTK 13.0.0 compatibility and bug fixes - Added release.rst index file to organize release notes - Updated index.rst to include release notes in navigation - Follows established documentation patterns from cuda-core and cuda-bindings Co-authored-by: leofang <5534781+leofang@users.noreply.github.com> * Fix SPDX license identifiers and add 1.1.1 release notes for PRs #834 and #855 Co-authored-by: leofang <5534781+leofang@users.noreply.github.com> * Restore _templates/main.html file as requested in review feedback Co-authored-by: leofang <5534781+leofang@users.noreply.github.com> * Change format as requested by Leo #884 (comment) * Remove stray cuda/pathfinder/README.md URL in cuda_python/docs/source/index.rst * Rename release 1.1.1 to 1.X.Y * Add version 1.0.0 in cuda_pathfinder/docs/nv-versions.json * Remove unused cuda_pathfinder/docs/make.bat * Revert "Add version 1.0.0 in cuda_pathfinder/docs/nv-versions.json" This reverts commit d096d21. * Reduce divergence between cuda_bindings/docs/source/contribute.rst and cuda_pathfinder/docs/source/contribute.rst * New pre-commit fixes (related to PR #901) * Also remove version 1.1.0 from cuda_pathfinder/docs/nv-versions.json * Reduce cuda/pathfinder/README.md to a mere pointer to the sphinx-generated documentation. * Add the Search order section from the old README as a new section in the load_nvidia_dynamic_lib() docstring. * Leo's edits to new part of load_nvidia_dynamic_lib docstring Co-authored-by: Leo Fang <leof@nvidia.com> * Add more empty lines in load_nvidia_dynamic_lib docstring * Remove `**` around Linux, Windows (for consistency) * Fix existing (on main) pre-commit error * Add `*/docs/source/generated/` to .gitignore * Add toolshed/setup-docs-env.sh --------- Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com> Co-authored-by: leofang <5534781+leofang@users.noreply.github.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Leo Fang <leof@nvidia.com> Co-authored-by: Ralf W. Grosse-Kunstleve <rgrossekunst@nvidia.com> Co-authored-by: Ralf W. Grosse-Kunstleve <rwgkio@gmail.com>
Closes #833
Also a step towards resolving #776 — please see comments there.
This PR eliminates
supported_nvidia_libs.EXPECTED_LIB_SYMBOLSentirely, which is a major simplification in its own right.Piggy-backed: Minor fix (commit d13ad8e) and cleanup (commit e716f1c).