Refactor logging_resource_adaptor to shared CCCL MR design#2246
Refactor logging_resource_adaptor to shared CCCL MR design#2246rockhowse merged 8 commits intorapidsai:stagingfrom
Conversation
Convert logging_resource_adaptor from a templated, header-only class to a non-templated class using cuda::mr::shared_resource for reference-counted ownership. Implementation is now compiled into librmm.so. Key changes: - Remove template parameter (was logging_resource_adaptor<Upstream>) - Remove device_memory_resource inheritance, use CCCL resource concepts - Add logging_resource_adaptor_impl in rmm::mr::detail namespace - Remove Upstream* constructor overloads (use device_async_resource_ref) - Store upstream as any_resource for owning semantics - Equality now compares both upstream AND logger - Class is now copyable with shared ownership semantics This is a breaking API change for C++ users.
Temporarily keep device_memory_resource as a base class so existing Python/Cython bindings continue to work. The legacy compatibility layer includes using declarations to resolve ambiguity between base class methods and private do_allocate/do_deallocate overrides that forward to the shared_resource implementation.
Remove template parameter from Cython declarations and usage since logging_resource_adaptor is no longer a class template.
| * @throws rmm::logic_error if `upstream == nullptr` | ||
| * This property declares that a `logging_resource_adaptor_impl` provides device accessible memory | ||
| */ | ||
| friend void get_property(logging_resource_adaptor_impl const&, |
There was a problem hiding this comment.
This should take the properties from the upstream. I can't remember exactly what utility cccl provides for that though
There was a problem hiding this comment.
Yes, that’s a good catch. Until now, RMM had to lie and say every resource (adaptor) was device accessible. Now that we are storing an any_resource we can actually query its properties at runtime and forward those through.
There was a problem hiding this comment.
This is tricky. We have to make a decision. Do we:
- Declare that all resources/adaptors are
cuda::mr::device_accessibleand hardcode that. This is the current strategy and it lets us avoid templates, lets us compile more intolibrmm.so, and is friendlier to Cython. - Forward
<Properties...>through the whole chain oflogging_resource_adaptor->impl->upstream_member and keep it templated and header-only. For Cython we'd still have to declare everything asdevice-accessible because I don't think it is in scope for us to have device and host and device-host versions of every Cython class (no template support).
I think there is an "option 3" which is that we declare no resource properties and just keep any_resource<> everywhere to make it friendly to any resource type (albeit losing the static knowledge of accessibility). I can't recall if this would be fully compatible with CCCL today or if it requires more work in CCCL -- I discussed this a bit with @pciolkosz but don't remember the status quo.
There was a problem hiding this comment.
Update: Option 3 isn't possible at the moment because we have many APIs requiring device_async_resource_ref which enforces the cuda::mr::device_accessible property. I think we kick this can down the road, at least until I'm able to get rid of device_memory_resource and move everything to CCCL resource (ref) classes rather than RMM wrappers.
I think option (1), declaring everything must be cuda::mr::device_accessible, is the best choice we have for the immediate term.
There was a problem hiding this comment.
Bradley and I chatted for a bit today and also came down on option 1 as the short term solution. We'd like to see better property support long-term, but only after device_memory_resource is history.
| * @param auto_flush If true, flushes the log for every (de)allocation | ||
| */ | ||
| logging_resource_adaptor_impl(std::shared_ptr<rapids_logger::logger> logger, | ||
| device_async_resource_ref upstream, |
There was a problem hiding this comment.
question: Should this accept a cuda::mr::any_resource to explicitly indicate that the person constructing this does not need to keep the upstream alive?
There was a problem hiding this comment.
The guidance I've been using is to always use ref types as parameters. That can always be reified into an owning type (any_resource, by copy or incrementing shared refcount).
There was a problem hiding this comment.
Sure, that works. I am wondering about the principle of least surprise. If I think about standard &-ref types:
// Caller knows that they are on the hook to keep `x` alive.
auto takes_ref(foo & x) {
return something_that_uses(x);
}
// Caller knows they are on the hook to implement copy
auto takes_value(foo x) {
return something_that_uses(x);
}
In contrast here:
// Caller must look at the implementation/documentation to know that they
// need not keep `mr` alive
auto logging_resource_adaptor(resource_ref mr, ...) {
}
I agree that functions can take _ref types as parameters and the body can decide to reify. But here we know we're going to reify so why not just advertise that that is happening?
There was a problem hiding this comment.
Ah, I forgot to mention the other catch: we still support device_memory_resource* inputs (currently the caller promises to keep the underlying resource alive) until that can be banished from RAPIDS. The device_async_resource_ref type supports that conversion implicitly, while any_resource won't.
I think it is worth evaluating a switch to any_resource in the future, though.
There was a problem hiding this comment.
I think switching will make sense once we have excised device_memory_resource.
| */ | ||
| class RMM_EXPORT logging_resource_adaptor | ||
| : public device_memory_resource, | ||
| public cuda::mr::shared_resource<detail::logging_resource_adaptor_impl> { |
There was a problem hiding this comment.
question: Is it possible to move the detail::logging_resource_adaptor_impl into a separate file and only forward-declare the struct name in this file.
If yes, then that is great because we can split the detail impl into a header that is never included in user-facing API code and hence hide the API (and ABI)
There was a problem hiding this comment.
Another question. This says "every logging_resource_adaptor is-a shared_resource<detail::logging_resource_adaptor_impl>".
Why is it not instead:
class logging_resource_adaptor {
...
private:
cuda::mr::shared_resource<detail::logging_resource_adaptor_impl> impl_;
};
i.e. "every logging_resource_adaptor has-a shared_resource<detail::logging_resource_adaptor_impl>" that provides the implementation.
There was a problem hiding this comment.
I'll try separating the impl class as you suggested. That would be nice.
is-a shared_resource (inheritance rather than composition) means we don't have to use PIMPL-like forwarding for the APIs that a shared_resource provides: allocation, deallocation, getting properties, copy constructors, etc.
There was a problem hiding this comment.
There is the concept of private inheritance which models more like a has-a pattern than an is-a pattern and is recommend by Scott Meyers Effective C++ [citation needed]
https://isocpp.org/wiki/faq/private-inheritance
There was a problem hiding this comment.
Does that mean that the impl class becomes part of the ABI?
There was a problem hiding this comment.
Private inheritance will make the inherited-from class part of the ABI, but in this case I think David is suggesting using private inheritance for shared_resource<detail::logging_resource_adaptor_impl>, in which case only shared_resource<detail::logging_resource_adaptor_impl> would be part of the ABI, not detail::logging_resource_adaptor_impl itself since it's hidden behind a pointer in shared_resource. In doing this you would lose the automatic availability of shared_resource methods that public inheritance currently gives you, but you can reexpose them with using declarations, which is probably the cleanest solution here.
There was a problem hiding this comment.
I see. That is possible. One downside would be that it becomes impossible to cast an adaptor to a shared_resource<T>, but perhaps that is desirable given that T (the impl class) is a private implementation detail. I think I might go that route!
There was a problem hiding this comment.
Note that this change means that some of the using declarations marked as only for the legacy layer will probably need to stay (but modified to be from shared_resource rather than device_memory_resource). Otherwise we won't have the the shared_resource APIs like allocate and deallocate available on the adaptor itself.
There was a problem hiding this comment.
Yes, that’s a pretty minor change and I don’t have any issue with doing that. Thanks for calling it out.
|
I retargeted this to a |
| * @param auto_flush If true, flushes the log for every (de)allocation | ||
| */ | ||
| logging_resource_adaptor_impl(std::shared_ptr<rapids_logger::logger> logger, | ||
| device_async_resource_ref upstream, |
There was a problem hiding this comment.
I think switching will make sense once we have excised device_memory_resource.
| * @throws rmm::logic_error if `upstream == nullptr` | ||
| * This property declares that a `logging_resource_adaptor_impl` provides device accessible memory | ||
| */ | ||
| friend void get_property(logging_resource_adaptor_impl const&, |
There was a problem hiding this comment.
Bradley and I chatted for a bit today and also came down on option 1 as the short term solution. We'd like to see better property support long-term, but only after device_memory_resource is history.
| */ | ||
| class RMM_EXPORT logging_resource_adaptor | ||
| : public device_memory_resource, | ||
| public cuda::mr::shared_resource<detail::logging_resource_adaptor_impl> { |
There was a problem hiding this comment.
Private inheritance will make the inherited-from class part of the ABI, but in this case I think David is suggesting using private inheritance for shared_resource<detail::logging_resource_adaptor_impl>, in which case only shared_resource<detail::logging_resource_adaptor_impl> would be part of the ABI, not detail::logging_resource_adaptor_impl itself since it's hidden behind a pointer in shared_resource. In doing this you would lose the automatic availability of shared_resource methods that public inheritance currently gives you, but you can reexpose them with using declarations, which is probably the cleanest solution here.
Convert logging_resource_adaptor from a templated, header-only class to a non-templated class using cuda::mr::shared_resource for reference-counted ownership. Implementation is now compiled into librmm.so. Authors: - Bradley Dice (https://github.com/bdice) Approvers: - Vyas Ramasubramani (https://github.com/vyasr) - Lawrence Mitchell (https://github.com/wence-) - David Wendt (https://github.com/davidwendt) URL: #2246
Convert logging_resource_adaptor from a templated, header-only class to a non-templated class using cuda::mr::shared_resource for reference-counted ownership. Implementation is now compiled into librmm.so. Authors: - Bradley Dice (https://github.com/bdice) Approvers: - Vyas Ramasubramani (https://github.com/vyasr) - Lawrence Mitchell (https://github.com/wence-) - David Wendt (https://github.com/davidwendt) URL: #2246
…ared CCCL MR design Converts both resources from header-only class templates to non-template classes backed by detail::*_impl held via cuda::mr::shared_resource, following the pattern established by logging_resource_adaptor (rapidsai#2246) and pool_memory_resource (rapidsai#2258). Part of rapidsai#2011.
…ared CCCL MR design (#2264) ## Summary Converts `fixed_size_memory_resource` and `binning_memory_resource` from header-only class templates to non-template classes backed by a `detail::*_impl` held via `cuda::mr::shared_resource`. Follows the pattern established by `logging_resource_adaptor` (#2246) and `pool_memory_resource` (#2258). Part of #2011. ## Changes **New files (6)** - `cpp/include/rmm/mr/detail/fixed_size_memory_resource_impl.hpp` — impl class declaration; inherits `stream_ordered_memory_resource<impl, fixed_size_free_list>` (same CRTP pattern as pool) - `cpp/src/mr/detail/fixed_size_memory_resource_impl.cpp` — impl member definitions - `cpp/src/mr/fixed_size_memory_resource.cpp` — outer class constructor and delegating methods - `cpp/include/rmm/mr/detail/binning_memory_resource_impl.hpp` — impl class declaration; includes `fixed_size_memory_resource.hpp` for `unique_ptr` member - `cpp/src/mr/detail/binning_memory_resource_impl.cpp` — impl member definitions - `cpp/src/mr/binning_memory_resource.cpp` — outer class constructor and delegating methods **Modified files (11)** - `cpp/include/rmm/mr/fixed_size_memory_resource.hpp` — de-templated; `shared_resource` inheritance; `device_async_resource_ref` constructor only; `static_assert` for concept - `cpp/include/rmm/mr/binning_memory_resource.hpp` — de-templated; same pattern; `Upstream*` constructors removed - `cpp/CMakeLists.txt` — four new `.cpp` files added to library sources - `python/rmm/rmm/librmm/memory_resource.pxd` — template parameters removed from both declarations - `python/rmm/rmm/pylibrmm/memory_resource/_memory_resource.pyx` — template instantiation syntax removed from `new` expressions - `cpp/tests/mr/mr_ref_fixed_size_tests.cpp` — replaced string-factory suite with typed `FixedSizeMRFixture` + `CcclMrRefTest` - `cpp/tests/mr/mr_ref_binning_tests.cpp` — replaced string-factory suites with typed `BinningMRFixture` + all three `CcclMrRef*` suites - `cpp/tests/mr/mr_ref_test.hpp` — `make_fixed_size`/`make_binning` factory helpers rewritten without `owning_wrapper`; type aliases de-templated - `cpp/tests/mr/binning_mr_tests.cpp` — removed explicit template instantiation and `ThrowOnNullUpstream` (null pointer constructor no longer exists); updated `ExplicitBinMR` to use `device_async_resource_ref` - `cpp/tests/mr/cccl_adaptor_tests.cpp` — added `fixed_size_memory_resource` and `binning_memory_resource` to the shared-ownership typed test suite - `cpp/tests/mr/thrust_allocator_tests.cu` — removed `"Binning"` from the string-dispatch parameterization (coverage moved to `BINNING_MR_REF_*` typed suites; the old dispatch path caused a dangling ref crash — the exact bug this PR fixes) ## Breaking changes - `fixed_size_memory_resource<Upstream>` → `fixed_size_memory_resource` (template parameter removed) - `binning_memory_resource<Upstream>` → `binning_memory_resource` (template parameter removed) - `Upstream*` constructor overloads removed; use `device_async_resource_ref` - Both classes become copyable with shared ownership semantics ## Testing `build-rmm-cpp -j0 && test-rmm-cpp`: 89/89 tests pass.
…d CCCL MR design Converts tracking_resource_adaptor, statistics_resource_adaptor, and aligned_resource_adaptor from header-only class templates to non-template classes backed by detail::*_impl held via cuda::mr::shared_resource, following the pattern established by logging_resource_adaptor (rapidsai#2246). Part of rapidsai#2011.
…d CCCL MR design (#2265) ## Summary Converts `tracking_resource_adaptor`, `statistics_resource_adaptor`, and `aligned_resource_adaptor` from header-only class templates to non-template classes backed by a `detail::*_impl` held via `cuda::mr::shared_resource`, following the pattern established by `logging_resource_adaptor` (#2246). Part of #2011. ## Changes ### New files - `cpp/include/rmm/mr/detail/tracking_resource_adaptor_impl.hpp` - `cpp/include/rmm/mr/detail/statistics_resource_adaptor_impl.hpp` - `cpp/include/rmm/mr/detail/aligned_resource_adaptor_impl.hpp` - `cpp/src/mr/detail/tracking_resource_adaptor_impl.cpp` - `cpp/src/mr/detail/statistics_resource_adaptor_impl.cpp` - `cpp/src/mr/detail/aligned_resource_adaptor_impl.cpp` - `cpp/src/mr/tracking_resource_adaptor.cpp` - `cpp/src/mr/statistics_resource_adaptor.cpp` - `cpp/src/mr/aligned_resource_adaptor.cpp` - `cpp/tests/mr/mr_ref_tracking_tests.cpp` — `CcclMrRefTest` / `CcclMrRefAllocationTest` / `CcclMrRefTestMT` instantiations - `cpp/tests/mr/mr_ref_statistics_tests.cpp` — same - `cpp/tests/mr/mr_ref_aligned_tests.cpp` — `CcclMrRefTest` / `CcclMrRefAllocationTest` ### Modified files - Public headers de-templated, private `shared_resource` inheritance, `get_property` friend, `static_assert` concept check - `cpp/CMakeLists.txt` — new `.cpp` sources added - `cpp/tests/CMakeLists.txt` — `TRACKING_MR_REF_TEST`, `STATISTICS_MR_REF_TEST`, `ALIGNED_MR_REF_TEST` targets - `cpp/tests/mr/adaptor_tests.cpp` — removed template/pointer-based aligned tests, updated `owning_wrapper` to use `limiting_resource_adaptor` - `cpp/tests/mr/tracking_mr_tests.cpp`, `statistics_mr_tests.cpp`, `aligned_mr_tests.cpp` — template aliases removed, null/pointer constructions replaced, stacked-adaptor tests use `device_async_resource_ref{mr}` to avoid copy-construction - `cpp/tests/mr/cccl_adaptor_tests.cpp` — all three new adaptors added to the typed shared-ownership suite - `python/rmm/rmm/librmm/memory_resource.pxd` — template parameters removed from `statistics_resource_adaptor` and `tracking_resource_adaptor` - `python/rmm/rmm/pylibrmm/memory_resource/_memory_resource.pyx` — template instantiation syntax removed ## Checklist - [x] Create `detail/*_impl.hpp` (class declaration) - [x] Create `src/mr/detail/*_impl.cpp` (member definitions) - [x] Create `src/mr/*_adaptor.cpp` (outer class definitions) - [x] Modify public headers (de-template, private inheritance, `get_property`, `static_assert`) - [x] Update `CMakeLists.txt` - [x] Update Cython `.pxd` and `.pyx` - [x] Update tests (remove template instantiation, add non-template fixtures) - [x] `pre-commit run --all-files` - [x] `build-rmm-cpp -j0 && test-rmm-cpp`
## Summary - De-template `arena_memory_resource` by removing the `Upstream` template parameter - Split implementation into `detail::arena_memory_resource_impl` held via `cuda::mr::shared_resource` for reference-counted, copyable ownership - Retain the `device_memory_resource` legacy compatibility layer (`do_allocate`, `do_deallocate`, `do_is_equal`) - Update benchmarks, C++ tests, and Python (Cython) bindings to use the non-template `arena_memory_resource` This follows the same pattern established in #2246, #2258, #2264, and #2265 for the other memory resources. ### New files | File | Contents | |------|----------| | `cpp/include/rmm/mr/detail/arena_memory_resource_impl.hpp` | `detail::arena_memory_resource_impl` class declaration | | `cpp/src/mr/detail/arena_memory_resource_impl.cpp` | Impl member function definitions | | `cpp/src/mr/arena_memory_resource.cpp` | Outer class constructor + delegating method definitions | ### Modified files | File | Change | |------|--------| | `cpp/include/rmm/mr/arena_memory_resource.hpp` | De-template, `shared_resource` wrapping | | `cpp/CMakeLists.txt` | Add new `.cpp` source files | | `cpp/tests/mr/arena_mr_tests.cpp` | `arena_memory_resource<device_memory_resource>` → `arena_memory_resource` | | `cpp/tests/mr/mr_ref_arena_tests.cpp` | Add `ArenaMRFixture` + `CcclMrRefTest`/`CcclMrRefAllocationTest`/`CcclMrRefTestMT` instantiations | | `cpp/tests/mr/mr_ref_test.hpp` | Update `make_arena()` and `arena_mr` type alias | | `cpp/tests/mr/cccl_adaptor_tests.cpp` | Add arena `static_assert` + `ArenaMRAdaptorTest` | | `cpp/benchmarks/multi_stream_allocations/multi_stream_allocations_bench.cu` | Update `make_arena()` | | `cpp/benchmarks/random_allocations/random_allocations.cpp` | Update `make_arena()` | | `cpp/benchmarks/replay/replay.cpp` | Update `make_arena()` | | `python/rmm/rmm/librmm/memory_resource.pxd` | Remove `[Upstream]` template from `arena_memory_resource` | | `python/rmm/rmm/pylibrmm/memory_resource/_memory_resource.pyx` | Remove template instantiation syntax for arena |
Summary
Convert
logging_resource_adaptorfrom a templated, header-only class to a non-templated class usingcuda::mr::shared_resourcefor reference-counted ownership. Implementation is now compiled intolibrmm.so.Thanks to @wence- and @davidwendt for input on the design. If this seems reasonable, I will make similar changes for all adaptors. This is breaking change, and I do not plan to have a deprecation period (there is not a good way to provide a transition for downstream users, though I will help refactor all the RAPIDS libraries using RMM).
This replaces the initial design from #2246. Part of #2011.
Key changes
logging_resource_adaptor<Upstream>)logging_resource_adaptor_implinrmm::mr::detailnamespaceUpstream*constructor overloads (usedevice_async_resource_ref)any_resourcefor owning semanticsdevice_memory_resourceinheritance for Python compatibilityBreaking API changes
logging_resource_adaptor<Upstream>→logging_resource_adaptorUpstream*overloads, usedevice_async_resource_refFuture changes
Once we migrate everything from using
device_memory_resourceto the CCCL MR design, we can drop the pieces of code labeled aslegacy device_memory_resource compatibility layer.