Skip to content

Add gpu latency regression test#2490

Open
imreddyTeja wants to merge 1 commit intomainfrom
tr/latency-tests
Open

Add gpu latency regression test#2490
imreddyTeja wants to merge 1 commit intomainfrom
tr/latency-tests

Conversation

@imreddyTeja
Copy link
Copy Markdown
Member

@imreddyTeja imreddyTeja commented Apr 13, 2026

Adds a test for kernel latency from the cuda ext. This does not include the time between the end of the CUDA launch api call end the actual start of the kernel.

  • Code follows the style guidelines OR N/A.
  • Unit tests are included OR N/A.
  • Code is exercised in an integration test OR N/A.
  • Documentation has been added/updated OR N/A.

@imreddyTeja imreddyTeja marked this pull request as ready for review April 13, 2026 19:37
Copy link
Copy Markdown
Member

@petebachant petebachant left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall LGTM with a few inline comments. Additionally:

  1. Does it make sense to sync the GPU right before each benchmark?
  2. Should we measure allocations as well?
  3. Has this latency ever been identified as a problem, i.e., has there ever been a change merged in that blew it up and dropped some simulation's SYPD significantly? Just curious.

# intentionally benchmark without a sync
latency = median(@benchmark $scalar_field_1 .= $scalar_field_1 .+ $scalar_field_2).time
@test latency ≈ 20500 atol = 3000
# update this value if the kernel launch time changes significantly and it is expected
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would usually expect this comment above the line where it's defined

lazy_sum_3 = @. lazy(lazy_sum_2 + lazy_sum_2)
latency = median(@benchmark $scalar_field_1 .= $lazy_sum_3).time
@test latency ≈ 29000 atol = 3000
# update this value if the kernel launch time changes significantly and it is expected
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same comment here about being above the definition line

# basic expression
# intentionally benchmark without a sync
latency = median(@benchmark $scalar_field_1 .= $scalar_field_1 .+ $scalar_field_2).time
@test latency ≈ 20500 atol = 3000
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we print something here in the logs like the absolute latency, difference from baseline, and percent change, so we'll know if we've made an improvement and by how much, so we can reset the baseline?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants