Add gpu latency regression test by imreddyTeja · Pull Request #2490 · CliMA/ClimaCore.jl

imreddyTeja · 2026-04-13T19:33:14Z

Adds a test for kernel latency from the cuda ext. This does not include the time between the end of the CUDA launch api call end the actual start of the kernel.

Code follows the style guidelines OR N/A.
Unit tests are included OR N/A.
Code is exercised in an integration test OR N/A.
Documentation has been added/updated OR N/A.

petebachant

Overall LGTM with a few inline comments. Additionally:

Does it make sense to sync the GPU right before each benchmark?
Should we measure allocations as well?
Has this latency ever been identified as a problem, i.e., has there ever been a change merged in that blew it up and dropped some simulation's SYPD significantly? Just curious.

petebachant · 2026-04-14T15:53:03Z

+    # intentionally benchmark without a sync
+    latency = median(@benchmark $scalar_field_1 .= $scalar_field_1 .+ $scalar_field_2).time
+    @test latency ≈ 20500 atol = 3000
+    # update this value if the kernel launch time changes significantly and it is expected


I would usually expect this comment above the line where it's defined

petebachant · 2026-04-14T15:53:16Z

+    lazy_sum_3 = @. lazy(lazy_sum_2 + lazy_sum_2)
+    latency = median(@benchmark $scalar_field_1 .= $lazy_sum_3).time
+    @test latency ≈ 29000 atol = 3000
+    # update this value if the kernel launch time changes significantly and it is expected


Same comment here about being above the definition line

petebachant · 2026-04-14T15:56:30Z

+    # basic expression
+    # intentionally benchmark without a sync
+    latency = median(@benchmark $scalar_field_1 .= $scalar_field_1 .+ $scalar_field_2).time
+    @test latency ≈ 20500 atol = 3000


Should we print something here in the logs like the absolute latency, difference from baseline, and percent change, so we'll know if we've made an improvement and by how much, so we can reset the baseline?

imreddyTeja marked this pull request as ready for review April 13, 2026 19:37

Add gpu latency regression test

9768c4f

imreddyTeja force-pushed the tr/latency-tests branch from 5400bb2 to 9768c4f Compare April 13, 2026 19:43

imreddyTeja requested review from dennisYatunin and petebachant April 13, 2026 20:32

petebachant approved these changes Apr 14, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add gpu latency regression test#2490

Add gpu latency regression test#2490
imreddyTeja wants to merge 1 commit intomainfrom
tr/latency-tests

imreddyTeja commented Apr 13, 2026 •

edited

Loading

Uh oh!

petebachant left a comment

Uh oh!

petebachant Apr 14, 2026

Uh oh!

petebachant Apr 14, 2026

Uh oh!

petebachant Apr 14, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

imreddyTeja commented Apr 13, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

petebachant left a comment

Choose a reason for hiding this comment

Uh oh!

petebachant Apr 14, 2026

Choose a reason for hiding this comment

Uh oh!

petebachant Apr 14, 2026

Choose a reason for hiding this comment

Uh oh!

petebachant Apr 14, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

imreddyTeja commented Apr 13, 2026 •

edited

Loading