Allow printing task backtraces via profiling peek mechanism#56043
Allow printing task backtraces via profiling peek mechanism#56043Drvi wants to merge 1 commit intoJuliaLang:masterfrom
Conversation
base/Base.jl
Outdated
| while _trywait(cond) | ||
| profile = @something(profile, require_stdlib(PkgId(UUID("9abbd945-dff8-562f-b5e8-e1ebf5ef1b79"), "Profile")))::Module | ||
| invokelatest(profile.peek_report[]) | ||
| if Base.get_bool_env("JULIA_PROFILE_PEEK_TASK_BACKTRACES", false) === true |
There was a problem hiding this comment.
I think it is worth making this the default even
| if Base.get_bool_env("JULIA_PROFILE_PEEK_TASK_BACKTRACES", false) === true | |
| if Base.get_bool_env("JULIA_PROFILE_PEEK_TASK_BACKTRACES", true) === true |
It would also potentially be great to put this in signals-unix.c : signal_listener directly, so it operates on fatal signals too
There was a problem hiding this comment.
I've changed the PR so that we print the backtraces by default 👍 As for calling this from the signal handler directly -- do you mean we should do also print these for different signals than SIGUSR1/SIGINFO? I think I'd need some guidance with that as I'm not sure which signals would be suitable to be included.
There was a problem hiding this comment.
FYI: at least for us, with our huge sysimg, this can be really slow. Like >5 minutes slow.
So if that impacts others too, do we want to do this by default?
There was a problem hiding this comment.
Why does system image size affect this? Yes, that would be the list of all "critical" signals defined there. There's already a place we call jl_print_bt_entry_codeloc on the threads, so it would just tack on right after that.
There was a problem hiding this comment.
We believe the reason is because symbolizing the traces is super slow with our giant binary size. But to be honest i think we don't know why it's so slow.
There was a problem hiding this comment.
Ah, gotcha, that does make sense. Symbolizing something big (such as the LLVM debug info when asserts are on) can take quite some time.
There was a problem hiding this comment.
Although that happens in parallel with the runtime, so I think that still might be okay, for something the user has to request pretty explicitly
| invokelatest(profile.peek_report[]) | ||
| if Base.get_bool_env("JULIA_PROFILE_PEEK_TASK_BACKTRACES", false) === true | ||
| println(stderr, "Printing Julia task backtraces...") | ||
| ccall(:jl_print_task_backtraces, Cvoid, ()) |
There was a problem hiding this comment.
This prints to a different abstraction over stderr than the other println calls here, so may need to explicitly call flush around it to ensure the output doesn't get jumbled
Aside, it can also be better to use print(stderr, "<text>\n") instead of println where reliable async behavior is desired, since it ensures the whole text prints in the same syscall
There was a problem hiding this comment.
This is very helpful, thanks!
db82a3c to
460bab6
Compare
460bab6 to
0998426
Compare
vtjnash
left a comment
There was a problem hiding this comment.
Conditional approval, since it would probably be better to print this async (in signal_listener) immediately instead of waiting for 1 second and then printing just before the next next yield point finally returns to the current Task (here), which could be a very long delay
This would be useful e.g. for debugging stuck tests run via
ReTestItems.jl(which uses multiple Julia processes to run the tests)