Skip to content

Add CodeInstance support to kernel renaming#2475

Merged
imreddyTeja merged 2 commits intomainfrom
tr/renaming1.12
Apr 6, 2026
Merged

Add CodeInstance support to kernel renaming#2475
imreddyTeja merged 2 commits intomainfrom
tr/renaming1.12

Conversation

@imreddyTeja
Copy link
Copy Markdown
Member

@imreddyTeja imreddyTeja commented Mar 30, 2026

In 1.12, kernel renaming does not function correctly when used interactively in the REPL. For example,
profiling update_jacobian results in 213 calls to kernels named __repl_entry_eval_expanded_with_loc_FILE_src_REPL_jl_L301. It looks like in 1.12 many of the frames that used to be
MethodInstance(s) are now CodeInstance(s). This commit adds support for CodeInstance frames.

Before:

julia> CUDA.@profile CA.update_jacobian!(alg, cache, Y, p, 0.2f0, 1.0f0)
Profiler ran for 97.52 s, capturing 10563 events.

Host-side activity: calling CUDA APIs took 34.89 ms (0.04% of the trace)
┌──────────┬────────────┬───────┬──────────────────────────────────────┬─────────────────────┐
│ Time (%) │ Total time │ Calls │ Time distribution                    │ Name                │
├──────────┼────────────┼───────┼──────────────────────────────────────┼─────────────────────┤
│    0.01% │   13.06 ms │   100 │ 130.61 µs ± 77.73  ( 60.08 ‥ 759.84) │ cuModuleGetFunction │
│    0.01% │    12.4 ms │   100 │ 123.96 µs ± 66.33  (  75.1 ‥ 697.37) │ cuModuleLoadDataEx  │
│    0.00% │     4.5 ms │   213 │  21.14 µs ± 14.89  (  3.81 ‥ 72.96)  │ cuLaunchKernel      │
│    0.00% │     1.4 ms │   100 │  13.96 µs ± 2.02   ( 10.25 ‥ 21.93)  │ cuCtxSynchronize    │
│    0.00% │  428.44 µs │     2 │ 214.22 µs ± 109.41 (136.85 ‥ 291.59) │ cuMemFree           │
│    0.00% │  953.67 ns │     1 │                                      │ cuCtxSetCurrent     │
│    0.00% │  476.84 ns │     1 │                                      │ cuCtxGetDevice      │
│    0.00% │  238.42 ns │     1 │                                      │ cuDeviceGetCount    │
└──────────┴────────────┴───────┴──────────────────────────────────────┴─────────────────────┘

Device-side activity: GPU was busy for 12.0 ms (0.01% of the trace)
┌──────────┬────────────┬───────┬──────────────────────────────────────┬───────────────────────────────────────────────────────────┐
│ Time (%) │ Total time │ Calls │ Time distribution                    │ Name                                                      │
├──────────┼────────────┼───────┼──────────────────────────────────────┼───────────────────────────────────────────────────────────┤
│    0.01% │    12.0 ms │   213 │  56.32 µs ± 30.3   (  24.8 ‥ 184.54) │ __repl_entry_eval_expanded_with_loc_FILE_src_REPL_jl_L301 │
└──────────┴────────────┴───────┴──────────────────────────────────────┴───────────────────────────────────────────────────────────┘

After:


julia> CUDA.@profile CA.update_jacobian!(alg, cache, Y, p, 0.2f0, 1.0f0)
Profiler ran for 76.36 s, capturing 8548 events.

Host-side activity: calling CUDA APIs took 6.71 ms (0.01% of the trace)
┌──────────┬────────────┬───────┬──────────────────────────────────────┬────────────────┐
│ Time (%) │ Total time │ Calls │ Time distribution                    │ Name           │
├──────────┼────────────┼───────┼──────────────────────────────────────┼────────────────┤
│    0.01% │    5.74 ms │   213 │  26.97 µs ± 22.01  (   3.1 ‥ 120.16) │ cuLaunchKernel │
└──────────┴────────────┴───────┴──────────────────────────────────────┴────────────────┘

Device-side activity: GPU was busy for 11.97 ms (0.02% of the trace)
┌──────────┬────────────┬───────┬──────────────────────────────────────┬───────────────────────────────────────────────────────────────────────────────────────────────────────┐
│ Time (%) │ Total time │ Calls │ Time distribution                    │ Name                                                                                                  │
├──────────┼────────────┼───────┼──────────────────────────────────────┼───────────────────────────────────────────────────────────────────────────────────────────────────────┤
│    0.00% │    2.81 ms │    46 │   61.0 µs ± 20.0   ( 25.99 ‥ 112.06) │ update_jacobian__FILE_ClimaAtmos_jl_src_prognostic_equations_implicit_manual_sparse_jacobian_jl_L414  │
│    0.00% │    1.28 ms │     8 │  159.5 µs ± 25.82  (134.71 ‥ 184.54) │ _FILE_ClimaAtmos_jl_src_prognostic_equations_implicit_manual_sparse_jacobian_jl_L1338                 │
│    0.00% │  563.86 µs │     8 │  70.48 µs ± 0.18   ( 70.33 ‥ 70.81)  │ _FILE_ClimaAtmos_jl_src_prognostic_equations_implicit_manual_sparse_jacobian_jl_L1344                 │
│    0.00% │  405.79 µs │     4 │ 101.45 µs ± 0.46   (101.09 ‥ 102.04) │ _FILE_ClimaAtmos_jl_src_prognostic_equations_implicit_manual_sparse_jacobian_jl_L1009                 │
│    0.00% │  387.43 µs │     4 │  96.86 µs ± 0.3    ( 96.56 ‥ 97.27)  │ _FILE_ClimaAtmos_jl_src_prognostic_equations_implicit_manual_sparse_jacobian_jl_L650                  │
  • Code follows the style guidelines OR N/A.
  • Unit tests are included OR N/A.
  • Code is exercised in an integration test OR N/A.
  • Documentation has been added/updated OR N/A.

@imreddyTeja imreddyTeja marked this pull request as ready for review March 30, 2026 19:41
@imreddyTeja imreddyTeja requested a review from petebachant March 30, 2026 19:41
Copy link
Copy Markdown
Member

@petebachant petebachant left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. I don't think I ever added a unit test for this, but if it were possible, that would be nice to ensure we can do kernel renaming on Julia 1.10--1.12.

@imreddyTeja
Copy link
Copy Markdown
Member Author

LGTM. I don't think I ever added a unit test for this, but if it were possible, that would be nice to ensure we can do kernel renaming on Julia 1.10--1.12.

That is a good point. ll add a unit test to our current 1.10 buildkite pipeline. The gpu ci is currently only run with 1.10., so 1.11 and 1.12 won't be covered

Comment thread test/gpu/kernel_renaming.jl Outdated
@testset "kernel renaming" begin
ext = Base.get_extension(ClimaCore, :ClimaCoreCUDAExt)
@assert !isnothing(ext) # cuda must be loaded to test this extension
ext.NAME_KERNELS_FROM_STACK_TRACE[] = true
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Might want to move this out into the buildkite env vars if that's the typical usage. Not sure how you usually do it, but I typically go through the env vars.

In 1.12, kernel renaming does not function correctly when used interactively in the REPL.
It looks like in 1.12 many of the frames that used to be
MethodInstance(s) are now CodeInstance(s). This commit adds support for CodeInstance frames.
@imreddyTeja imreddyTeja merged commit b849fea into main Apr 6, 2026
36 checks passed
@imreddyTeja imreddyTeja deleted the tr/renaming1.12 branch April 6, 2026 17:13
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants