[compiler-rt] Improve ubsan-minimal runtime for GPU use#193597
[compiler-rt] Improve ubsan-minimal runtime for GPU use#193597
Conversation
|
@llvm/pr-subscribers-compiler-rt-sanitizer Author: Joseph Huber (jhuber6) ChangesSummary: In practice for some basic tests, This cuts the register usage by more Full diff: https://github.com/llvm/llvm-project/pull/193597.diff 1 Files Affected:
diff --git a/compiler-rt/lib/ubsan_minimal/ubsan_minimal_handlers.cpp b/compiler-rt/lib/ubsan_minimal/ubsan_minimal_handlers.cpp
index c8a72c23c9f79..a77773757dd5a 100644
--- a/compiler-rt/lib/ubsan_minimal/ubsan_minimal_handlers.cpp
+++ b/compiler-rt/lib/ubsan_minimal/ubsan_minimal_handlers.cpp
@@ -9,7 +9,10 @@ extern "C" void ubsan_message(const char *msg);
static void message(const char *msg) { ubsan_message(msg); }
#elif defined(SANITIZER_AMDGPU) || defined(SANITIZER_NVPTX)
#include <stdio.h>
-static void message(const char *msg) { fprintf(stderr, "%s", msg); }
+template <typename... Args>
+static void message(const char *msg, Args &&...args) {
+ fprintf(stderr, msg, args...);
+}
#else
#include <unistd.h>
static void message(const char *msg) { (void)write(2, msg, strlen(msg)); }
@@ -65,8 +68,18 @@ static void format_msg(const char *kind, uintptr_t caller, char *buf,
*buf = '\0';
}
-SANITIZER_INTERFACE_WEAK_DEF(void, __ubsan_report_error, const char *kind,
- uintptr_t caller) {
+static void format(const char *kind, uintptr_t caller) {
+#if defined(SANITIZER_AMDGPU) || defined(SANITIZER_NVPTX)
+ (void)format_msg;
+ message("ubsan: %s by %p\n", kind, reinterpret_cast<void *>(caller));
+#else
+ char msg_buf[128];
+ format_msg(kind, caller, msg_buf, msg_buf + sizeof(msg_buf));
+ message(msg_buf);
+#endif
+}
+
+[[gnu::cold]] static void report_error(const char *kind, uintptr_t caller) {
if (caller == 0)
return;
while (true) {
@@ -98,28 +111,31 @@ SANITIZER_INTERFACE_WEAK_DEF(void, __ubsan_report_error, const char *kind,
}
__sanitizer::atomic_store_relaxed(&caller_pcs[sz], caller);
- char msg_buf[128];
- format_msg(kind, caller, msg_buf, msg_buf + sizeof(msg_buf));
- message(msg_buf);
+ format(kind, caller);
}
}
+SANITIZER_INTERFACE_WEAK_DEF(void, __ubsan_report_error, const char *kind,
+ uintptr_t caller) {
+ report_error(kind, caller);
+}
+
#if PRESERVE_HANDLERS
SANITIZER_INTERFACE_WEAK_DEF(void, __ubsan_report_error_preserve,
const char *kind, uintptr_t caller)
[[clang::preserve_all]] {
- // Additional indirecton so the user can override this with their own
+ // Additional indirection so the user can override this with their own
// preserve_all function. This would allow, e.g., a function that reports the
// first error only, so for all subsequent calls we can skip the register save
// / restore.
- __ubsan_report_error(kind, caller);
+ report_error(kind, caller);
}
#endif
SANITIZER_INTERFACE_WEAK_DEF(void, __ubsan_report_error_fatal, const char *kind,
uintptr_t caller) {
// Use another handlers, in case it's already overriden.
- __ubsan_report_error(kind, caller);
+ report_error(kind, caller);
}
#if defined(__ANDROID__)
|
🐧 Linux x64 Test Results
✅ The build succeeded and all tests passed. |
f54cce0 to
8145dbc
Compare
Summary: GPUs are resource constrained, and we don't want to incur too much overhead for using the runtime over a trap version. Unlike the other targets, we have cheap `printf` on account of all the complexity being done on the host, so we use that directly instead of custom formatting. Additionally, we split the main function out into a helper function. Listing it as `[cold]` prevents spurious inlining that massively bloated register costs (this is an error reporting mechanism so hopefully that's not controversial). In practice for some basic tests, This cuts the register usage by more than half and the stack size is no longer dynamically sized. The only stack I saw was for the PC relative checks. This could be lowered further with more intelligent PC caching, but this is a good, minimally invasive, start.
8145dbc to
5eef283
Compare
Summary:
GPUs are resource constrained, and we don't want to incur too much
overhead for using the runtime over a trap version. Unlike the other
targets, we have cheap
printfon account of all the complexity beingdone on the host, so we use that directly instead of custom formatting.
Additionally, we split the main function out into a helper function.
Listing it as
[cold]prevents spurious inlining that massively bloatedregister costs (this is an error reporting mechanism so hopefully that's
not controversial).
In practice for some basic tests, This cuts the register usage by more
than half and the stack size is no longer dynamically sized. The only
stack I saw was for the PC relative checks. This could be lowered
further with more intelligent PC caching, but this is a good, minimally
invasive, start.