Skip to content

ROCR/Driver: add global driver for the hsakmt APIs can not index agent#334

Closed
rocm-devops wants to merge 5 commits intoamd-stagingfrom
honghuan/p_global_driver
Closed

ROCR/Driver: add global driver for the hsakmt APIs can not index agent#334
rocm-devops wants to merge 5 commits intoamd-stagingfrom
honghuan/p_global_driver

Conversation

@rocm-devops
Copy link

The purpose of this PR is for discussion rather than review.

There are many hsakmt calls can not index the agent, they are totally global and some are static method in ROCR.
Like event operations, and many IPC APIs, and the pointer info API.

This PR is separated from:https://github.com/AMD-ROCm-Internal/ROCR-Runtime/pull/233,
cause VIRTIO driver can work without them. And the design of those drivers need many discussion
and refactor as far as I can see.

Add a global driver function table to centralize event-related operations:

1. Add DriverFunctionTable structure to store event operation function pointers
2. Add static helper methods in Driver base class for event operations:
   - CreateEvent
   - DestroyEvent
   - SetEvent
   - WaitEventExt
   - WaitEventsExt
3. Implement InitGlobalDriver in KfdDriver to initialize function table
4. Add stub implementation in XdnaDriver

Signed-off-by: Honglei Huang <Honglei1.Huang@amd.com>
Add virtual memory management functions to the driver interface to improve
memory allocation and deallocation operations. The changes include:

- Add VirtualAddressReserve and VirtualAddressFree static functions
- Add alloc_mem_align and free_memory function pointers to DriverFunctionTable
- Update Runtime to use the new driver interface for virtual memory operations
- Replace direct KMT calls with driver abstraction layer functions

Signed-off-by: Honglei Huang <Honglei1.Huang@amd.com>
Add IPC and graphics memory management functions to improve memory sharing
and interoperability between processes and devices. The changes include:

- Add ShareableMemoryUnmap and ShareableMemoryDeregister functions
- Add RegisterGraphicsHandle and RegisterGraphicsHandleExt functions
- Add driver type compatibility checks for IPC operations
- Update InteropMap and IPC functions to use driver interface
- Replace direct KMT calls with driver abstraction layer functions

Signed-off-by: Honglei Huang <Honglei1.Huang@amd.com>
Add functions to query pointer information and manage memory user data:

- Add query_pointer_info function to retrieve memory pointer information
- Add set_memory_user_data function to set custom metadata for memory
- Register new functions in driver function table

Signed-off-by: Honglei Huang <Honglei1.Huang@amd.com>
Add Shared Virtual Memory (SVM) attribute management functions to the Driver
class to improve code organization and encapsulate driver-specific operations:

- Add SVMSetAttr and SVMGetAttr static functions to Driver class
- Register SVM attribute functions in KfdDriver's global function table
- Update runtime to use Driver interface for SVM operations
- Add proper documentation for SVM interface functions
@rocm-devops
Copy link
Author

I'd say that all these functions (maybe the virtual memory reservation does not fall in this) need to be member functions of the driver instance, rather than static functions of the class. E.g., I'd expect that someone will need events in XDNA and other drivers.

@rocm-devops
Copy link
Author

I'd say that all these functions (maybe the virtual memory reservation does not fall in this) need to be member functions of the driver instance, rather than static functions of the class. E.g., I'd expect that someone will need events in XDNA and other drivers.

Yes, I agree with this since we're using C++ we can just virtualize the Driver API so that each driver (and here driver is distinct from agents because, e.g., we can have multiple drivers supporting the same agent like CPUs agents).

But what I had started to design/implement was something like a driver manager that I called the PlatformDriver. This should contain all the active driver instantiations and have its own API apart from Driver (which are the actual driver instantiations), so for example we could have live XDNA, KFD, and VIRTIO (for GPU) drivers managed by PlatformDriver class.

As a first step, I think the topology and driver discovery should move to PlatformDriver class.

I envision the Driver / platform driver being a distinct sub-module from the Runtime, which is why I started it in the core/driver dir as opposed to core/runtime.

@rocm-devops
Copy link
Author

I'd say that all these functions (maybe the virtual memory reservation does not fall in this) need to be member functions of the driver instance, rather than static functions of the class. E.g., I'd expect that someone will need events in XDNA and other drivers.

Totally agreed, the and the VIRITO can not use event, and considering the difficulty of event refactoring, I will not make any change to these driver functions. This PR is only for discussion.

@rocm-devops
Copy link
Author

I'd say that all these functions (maybe the virtual memory reservation does not fall in this) need to be member functions of the driver instance, rather than static functions of the class. E.g., I'd expect that someone will need events in XDNA and other drivers.

Yes, I agree with this since we're using C++ we can just virtualize the Driver API so that each driver (and here driver is distinct from agents because, e.g., we can have multiple drivers supporting the same agent like CPUs agents).

But what I had started to design/implement was something like a driver manager that I called the PlatformDriver. This should contain all the active driver instantiations and have its own API apart from Driver (which are the actual driver instantiations), so for example we could have live XDNA, KFD, and VIRTIO (for GPU) drivers managed by PlatformDriver class.

As a first step, I think the topology and driver discovery should move to PlatformDriver class.

I envision the Driver / platform driver being a distinct sub-module from the Runtime, which is why I started it in the core/driver dir as opposed to core/runtime.

Totally agreed, really hope to see these implementations

@rocm-devops
Copy link
Author

I'd say that all these functions (maybe the virtual memory reservation does not fall in this) need to be member functions of the driver instance, rather than static functions of the class. E.g., I'd expect that someone will need events in XDNA and other drivers.

Totally agreed, the and the VIRITO can not use event, and considering the difficulty of event refactoring, I will not make any change to these driver functions. This PR is only for discussion.

I wouldn't expect an implementation of events for VIRTIO or xdna. I just don't think that this static table in the base class, which will be shared among driver subclasses, is the correct approach.

@amd-hsivasun
Copy link

Imported to rocm-systems

@jayhawk-commits jayhawk-commits deleted the honghuan/p_global_driver branch August 11, 2025 20:56
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants