You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
* Cythonize _module.py (Phase 2a: cdef classes)
Convert Kernel, ObjectCode, and KernelOccupancy to cdef classes with
proper .pxd declarations. This phase establishes the Cython structure
while maintaining Python driver module usage.
Changes:
- Rename _module.py to _module.pyx
- Create _module.pxd with cdef class declarations
- Convert Kernel, ObjectCode, KernelOccupancy to cdef class
- Remove _backend dict in favor of direct driver calls
- Add _init_py() Python-accessible factory for ObjectCode
- Update _program.py and _linker.py to use _init_py()
- Fix test to handle cdef class property descriptors
Phase 2b will convert driver calls to cydriver with nogil blocks.
Phase 2c will add RAII handles to resource_handles.
* Phase 2a refinements: hide private attrs, add public properties
- Use strong types in .pxd (ObjectCode, KernelOccupancy)
- Remove cdef public - attributes now private to C level
- Add Kernel.handle property for external access
- Add ObjectCode.symbol_mapping property (symmetric with input)
- Update _launcher.pyx, _linker.py, tests to use public APIs
* Convert module-level functions and Kernel._get_arguments_info to cdef
- Module globals: _inited, _py_major_ver, _py_minor_ver, _driver_ver,
_kernel_ctypes, _paraminfo_supported -> cdef typed
- Module functions: _lazy_init, _get_py_major_ver, _get_py_minor_ver,
_get_driver_ver, _get_kernel_ctypes, _is_paraminfo_supported,
_make_dummy_library_handle -> cdef inline with exception specs
- Module constant: _supported_code_type -> cdef tuple
- Kernel._get_arguments_info -> cdef tuple
Note: KernelAttributes remains a regular Python class due to
segfaults when converted to cdef class (likely due to weakref
interaction with cdef class properties).
* Convert KernelAttributes to cdef class
Follow the _MemPoolAttributes pattern:
- cdef class with inline cdef attributes (_kernel_weakref, _cache)
- _init as @classmethod (not @staticmethod cdef)
- _get_cached_attribute and _resolve_device_id use except? -1
- Explicit cast when dereferencing weakref
* Add LibraryHandle and KernelHandle to resource_handles infrastructure
Extends the RAII handle system to support CUlibrary and CUkernel driver
objects used in _module.pyx. This provides automatic lifetime management
and proper cleanup for library and kernel handles.
Changes:
- Add LibraryHandle/KernelHandle types with factory functions
- Update Kernel, ObjectCode, KernelOccupancy to use typed handles
- Move KernelAttributes cdef block to .pxd for strong typing
- Update _launcher.pyx to access kernel handle directly via cdef
* Convert _module.pyx driver calls to cydriver with nogil
Replaces Python-level driver API calls with low-level cydriver calls
wrapped in nogil blocks for improved performance. This allows the GIL
to be released during CUDA driver operations.
Changes:
- cuDriverGetVersion, cuKernelGetAttribute, cuKernelGetParamInfo
- cuOccupancy* functions (with appropriate GIL handling for callbacks)
- cuKernelGetLibrary
- Update KernelAttributes._get_cached_attribute to use cydriver types
* Fix SEGV in Kernel.from_handle with non-int types
Remove type annotation from handle parameter to prevent Cython's
automatic float-to-int coercion, which caused a segmentation fault.
The manual isinstance check properly validates all non-int types.
* Refactor ObjectCode._init and add kernel lifetime test
- Change ObjectCode._init from cdef to @classmethod def, matching the
pattern used by Buffer, Stream, Graph, and other classes
- Remove _init_py wrapper (no longer needed)
- Update callers in _program.py and _linker.py
- Add test_kernel_keeps_library_alive to verify that a Kernel keeps its
underlying library alive after ObjectCode goes out of scope
* Simplify resource handle patterns and clean up tests
- Remove Kernel._module (ObjectCode reference no longer needed since
KernelHandle keeps library alive via LibraryHandle dependency)
- Simplify Kernel._from_obj signature (remove unused ObjectCode param)
- KernelAttributes: store KernelHandle instead of weakref to Kernel
- Rename get_kernel_from_library to create_kernel_handle for consistency
- Remove fragile annotation introspection from test_saxpy_arguments
* Simplify _MemPoolAttributes to use direct MemoryPoolHandle
Replace weakref pattern with direct MemoryPoolHandle storage in
_MemPoolAttributes. The handle's shared_ptr keeps the underlying
pool alive, so attributes remain accessible after the MR is deleted.
Note: _MemPool retains __weakref__ because the IPC subsystem uses
WeakValueDictionary to track memory resources across processes.
* Fix access violation in occupancy queries with uninitialized hStream
Zero-initialize CUlaunchConfig struct to prevent garbage values in
hStream field when no stream is provided. The driver dereferences
hStream even when querying occupancy, causing access violations on
some platforms (observed on Windows with RTX Pro 6000).
0 commit comments