Skip to content

Add TRITON_PROFILE_COMPILE knob for compilation time profiling#905

Open
dshi7 wants to merge 1 commit intomainfrom
daohang/profile_compiling
Open

Add TRITON_PROFILE_COMPILE knob for compilation time profiling#905
dshi7 wants to merge 1 commit intomainfrom
daohang/profile_compiling

Conversation

@dshi7
Copy link
Contributor

@dshi7 dshi7 commented Feb 13, 2026

When enabled, prints per-stage compilation time breakdowns to stderr for each kernel compilation (ir_init, ttir, ttgir, llir, ptx, cubin, store), and benchmark timing summaries after autotuning completes. Default off.

Test plan:

TRITON_ALWAYS_COMPILE=1 TRITON_PROFILE_COMPILE=1 CUDA_VISIBLE_DEVICES=7 third_party/tlx/denoise.sh python third_party/tlx/tutorials/testing/test_blackwell_gemm_perf.py --version ws

LOG: https://www.internalfb.com/intern/paste/P2186293125/

Authored with Claude.

@meta-cla meta-cla bot added the CLA Signed This label is managed by the Meta Open Source bot. label Feb 13, 2026
@dshi7 dshi7 force-pushed the daohang/profile_compiling branch from 6bca499 to 68c8e8a Compare February 13, 2026 21:24
When enabled, prints per-stage compilation time breakdowns to stderr
for each kernel compilation (ir_init, ttir, ttgir, llir, ptx, cubin,
store) along with config info (block sizes, warps, stages), and
benchmark timing summaries with compile vs bench breakdown after
autotuning completes. Default off.

Test plan:
- Unit tests: `pytest python/test/unit/runtime/test_compilation_listener.py`
  - test_profile_compile: verifies stage breakdowns on cache miss,
    "cache hit" label on cache hit
  - test_profile_compile_off_by_default: verifies no output when knob
    is off
- Perf test: `TRITON_PROFILE_COMPILE=1 CUDA_VISIBLE_DEVICES=<gpu> \
    third_party/tlx/denoise.sh python \
    third_party/tlx/tutorials/testing/test_blackwell_gemm_perf.py \
    --version ws`

Authored with Claude.
@dshi7 dshi7 force-pushed the daohang/profile_compiling branch from 68c8e8a to 85be855 Compare February 13, 2026 21:30
@meta-codesync
Copy link

meta-codesync bot commented Feb 13, 2026

@dshi7 has imported this pull request. If you are a Meta employee, you can view this in D93278081.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

CLA Signed This label is managed by the Meta Open Source bot.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant