Commit 2f4bb6e
authored
Rollup merge of rust-lang#153936 - danielzgtg:perf/immediateAbortAvoidPthreadGetattrNp, r=Mark-Simulacrum
Skip stack_start_aligned for immediate-abort
This improves startup performance by 16%, shown by an optimized hello-world program. glibc's `pthread_getattr_np` performs expensive syscalls when reading `/proc/self/maps`. That is all wasted with `panic = immediate-abort` active because `init()` immediately discards the return value from `install_main_guard()`. A similar improvement can be seen in environments that don't have `/proc`. This change is safe because the immediately succeeding comment says that we rely on Linux's "own stack-guard mechanism".
Tracking issue: rust-lang#147286
# Benchmark
Set it up with `cargo new hello-world2`, and replace these files:
```toml
# Cargo.toml
cargo-features = ["panic-immediate-abort"]
[package]
name = "hello-world"
version = "0.1.0"
edition = "2024"
[profile.release]
lto = true
panic = "immediate-abort"
codegen-units = 1
opt-level = "z"
strip = true
# .cargo/config.toml
[unstable]
build-std = ["std"]
```
## Before
```console
home@daniel-desktop3:~/CLionProjects/hello-world2$ hyperfine -N target/release/hello-world2
Benchmark 1: target/release/hello-world2
Time (mean ± σ): 524.8 µs ± 65.1 µs [User: 276.1 µs, System: 187.0 µs]
Range (min … max): 446.4 µs … 975.5 µs 3996 runs
home@daniel-desktop3:~/CLionProjects/hello-world2$ hyperfine -N target/release/hello-world2
Benchmark 1: target/release/hello-world2
Time (mean ± σ): 519.4 µs ± 65.8 µs [User: 282.1 µs, System: 177.7 µs]
Range (min … max): 443.2 µs … 830.5 µs 3612 runs
home@daniel-desktop3:~/CLionProjects/hello-world2$ hyperfine -N target/release/hello-world2
Benchmark 1: target/release/hello-world2
Time (mean ± σ): 520.0 µs ± 64.3 µs [User: 277.1 µs, System: 182.1 µs]
Range (min … max): 447.1 µs … 1001.3 µs 3804 runs
```
For a visualization of the problem, run `cargo +stage1 build --release && perf record --call-graph dwarf -F max ./target/release/hello-world2 && perf script | inferno-collapse-perf | inferno-flamegraph > flamegraph.svg`:
<img width="3832" height="1216" alt="flamegraph with 17.41% __pthread_getattr_np" src="https://github.com/user-attachments/assets/acc2286e-1582-4772-9e3b-68b5c35e3e70" />
## After
```console
home@daniel-desktop3:~/CLionProjects/hello-world2$ hyperfine -N target/release/hello-world2Benchmark 1: target/release/hello-world2
Time (mean ± σ): 444.7 µs ± 57.3 µs [User: 257.4 µs, System: 130.2 µs]
Range (min … max): 379.4 µs … 1289.3 µs 3893 runs
Warning: Statistical outliers were detected. Consider re-running this benchmark on a quiet system without any interferences from other programs. It might help to use the '--warmup' or '--prepare' options.
home@daniel-desktop3:~/CLionProjects/hello-world2$ hyperfine -N target/release/hello-world2
Benchmark 1: target/release/hello-world2
Time (mean ± σ): 452.3 µs ± 60.7 µs [User: 261.5 µs, System: 133.5 µs]
Range (min … max): 374.9 µs … 1512.4 µs 4177 runs
Warning: Statistical outliers were detected. Consider re-running this benchmark on a quiet system without any interferences from other programs. It might help to use the '--warmup' or '--prepare' options.
home@daniel-desktop3:~/CLionProjects/hello-world2$ hyperfine -N target/release/hello-world2
Benchmark 1: target/release/hello-world2
Time (mean ± σ): 441.2 µs ± 56.1 µs [User: 256.2 µs, System: 128.8 µs]
Range (min … max): 375.0 µs … 760.4 µs 4032 runs
```1 file changed
Lines changed: 13 additions & 0 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
429 | 429 | | |
430 | 430 | | |
431 | 431 | | |
| 432 | + | |
| 433 | + | |
| 434 | + | |
| 435 | + | |
| 436 | + | |
432 | 437 | | |
433 | 438 | | |
434 | 439 | | |
| |||
456 | 461 | | |
457 | 462 | | |
458 | 463 | | |
| 464 | + | |
| 465 | + | |
| 466 | + | |
| 467 | + | |
459 | 468 | | |
460 | 469 | | |
461 | 470 | | |
| |||
489 | 498 | | |
490 | 499 | | |
491 | 500 | | |
| 501 | + | |
| 502 | + | |
| 503 | + | |
| 504 | + | |
492 | 505 | | |
493 | 506 | | |
494 | 507 | | |
| |||
0 commit comments