Skip to content

[compiler-rt] Improve ubsan-minimal runtime for GPU use#193597

Open
jhuber6 wants to merge 2 commits intollvm:mainfrom
jhuber6:UBSanImproveGPU
Open

[compiler-rt] Improve ubsan-minimal runtime for GPU use#193597
jhuber6 wants to merge 2 commits intollvm:mainfrom
jhuber6:UBSanImproveGPU

Conversation

@jhuber6
Copy link
Copy Markdown
Contributor

@jhuber6 jhuber6 commented Apr 22, 2026

Summary:
GPUs are resource constrained, and we don't want to incur too much
overhead for using the runtime over a trap version. Unlike the other
targets, we have cheap printf on account of all the complexity being
done on the host, so we use that directly instead of custom formatting.
Additionally, we split the main function out into a helper function.
Listing it as [cold] prevents spurious inlining that massively bloated
register costs (this is an error reporting mechanism so hopefully that's
not controversial).

In practice for some basic tests, This cuts the register usage by more
than half and the stack size is no longer dynamically sized. The only
stack I saw was for the PC relative checks. This could be lowered
further with more intelligent PC caching, but this is a good, minimally
invasive, start.

@llvmbot
Copy link
Copy Markdown
Member

llvmbot commented Apr 22, 2026

@llvm/pr-subscribers-compiler-rt-sanitizer

Author: Joseph Huber (jhuber6)

Changes

Summary:
GPUs are resource constrained, and we don't want to incur too much
overhead for using the runtime over a trap version. Unlike the other
targets, we have cheap printf on account of all the complexity being
done on the host, so we use that directly instead of custom formatting.
Additionally, we split the main function out into a helper function.
Listing it as [cold] prevents spurious inlining that massively bloated
register costs (this is an error reporting mechanism so hopefully that's
not controversial).

In practice for some basic tests, This cuts the register usage by more
than half and the stack size is no longer dynamically sized. The only
stack I saw was for the PC relative checks. This could be lowered
further with more intelligent PC caching, but this is a good, minimally
invasive, start.


Full diff: https://github.com/llvm/llvm-project/pull/193597.diff

1 Files Affected:

  • (modified) compiler-rt/lib/ubsan_minimal/ubsan_minimal_handlers.cpp (+25-9)
diff --git a/compiler-rt/lib/ubsan_minimal/ubsan_minimal_handlers.cpp b/compiler-rt/lib/ubsan_minimal/ubsan_minimal_handlers.cpp
index c8a72c23c9f79..a77773757dd5a 100644
--- a/compiler-rt/lib/ubsan_minimal/ubsan_minimal_handlers.cpp
+++ b/compiler-rt/lib/ubsan_minimal/ubsan_minimal_handlers.cpp
@@ -9,7 +9,10 @@ extern "C" void ubsan_message(const char *msg);
 static void message(const char *msg) { ubsan_message(msg); }
 #elif defined(SANITIZER_AMDGPU) || defined(SANITIZER_NVPTX)
 #include <stdio.h>
-static void message(const char *msg) { fprintf(stderr, "%s", msg); }
+template <typename... Args>
+static void message(const char *msg, Args &&...args) {
+  fprintf(stderr, msg, args...);
+}
 #else
 #include <unistd.h>
 static void message(const char *msg) { (void)write(2, msg, strlen(msg)); }
@@ -65,8 +68,18 @@ static void format_msg(const char *kind, uintptr_t caller, char *buf,
   *buf = '\0';
 }
 
-SANITIZER_INTERFACE_WEAK_DEF(void, __ubsan_report_error, const char *kind,
-                             uintptr_t caller) {
+static void format(const char *kind, uintptr_t caller) {
+#if defined(SANITIZER_AMDGPU) || defined(SANITIZER_NVPTX)
+  (void)format_msg;
+  message("ubsan: %s by %p\n", kind, reinterpret_cast<void *>(caller));
+#else
+  char msg_buf[128];
+  format_msg(kind, caller, msg_buf, msg_buf + sizeof(msg_buf));
+  message(msg_buf);
+#endif
+}
+
+[[gnu::cold]] static void report_error(const char *kind, uintptr_t caller) {
   if (caller == 0)
     return;
   while (true) {
@@ -98,28 +111,31 @@ SANITIZER_INTERFACE_WEAK_DEF(void, __ubsan_report_error, const char *kind,
     }
     __sanitizer::atomic_store_relaxed(&caller_pcs[sz], caller);
 
-    char msg_buf[128];
-    format_msg(kind, caller, msg_buf, msg_buf + sizeof(msg_buf));
-    message(msg_buf);
+    format(kind, caller);
   }
 }
 
+SANITIZER_INTERFACE_WEAK_DEF(void, __ubsan_report_error, const char *kind,
+                             uintptr_t caller) {
+  report_error(kind, caller);
+}
+
 #if PRESERVE_HANDLERS
 SANITIZER_INTERFACE_WEAK_DEF(void, __ubsan_report_error_preserve,
                              const char *kind, uintptr_t caller)
 [[clang::preserve_all]] {
-  // Additional indirecton so the user can override this with their own
+  // Additional indirection so the user can override this with their own
   // preserve_all function. This would allow, e.g., a function that reports the
   // first error only, so for all subsequent calls we can skip the register save
   // / restore.
-  __ubsan_report_error(kind, caller);
+  report_error(kind, caller);
 }
 #endif
 
 SANITIZER_INTERFACE_WEAK_DEF(void, __ubsan_report_error_fatal, const char *kind,
                              uintptr_t caller) {
   // Use another handlers, in case it's already overriden.
-  __ubsan_report_error(kind, caller);
+  report_error(kind, caller);
 }
 
 #if defined(__ANDROID__)

@github-actions
Copy link
Copy Markdown

github-actions Bot commented Apr 22, 2026

🐧 Linux x64 Test Results

  • 6506 tests passed
  • 1284 tests skipped

✅ The build succeeded and all tests passed.

Summary:
GPUs are resource constrained, and we don't want to incur too much
overhead for using the runtime over a trap version. Unlike the other
targets, we have cheap `printf` on account of all the complexity being
done on the host, so we use that directly instead of custom formatting.
Additionally, we split the main function out into a helper function.
Listing it as `[cold]` prevents spurious inlining that massively bloated
register costs (this is an error reporting mechanism so hopefully that's
not controversial).

In practice for some basic tests, This cuts the register usage by more
than half and the stack size is no longer dynamically sized. The only
stack I saw was for the PC relative checks. This could be lowered
further with more intelligent PC caching, but this is a good, minimally
invasive, start.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants