perf (tsort) : avoid reading the whole input into memory and intern strings #9872
perf (tsort) : avoid reading the whole input into memory and intern strings #9872sylvestre merged 3 commits intouutils:mainfrom
Conversation
CodSpeed Performance ReportMerging #9872 will degrade performance by 5.29%Comparing Summary
Benchmarks breakdown
Footnotes
|
|
GNU testsuite comparison: |
81ece9b to
0a56fe6
Compare
|
GNU testsuite comparison: |
|
GNU testsuite comparison: |
|
GNU testsuite comparison: |
ddb2361 to
b98950c
Compare
|
GNU testsuite comparison: |
|
GNU testsuite comparison: |
|
any idea why the perf regressed here? |
|
Oh, @sylvestre , I just added a few changes :) |
|
GNU testsuite comparison: |
|
GNU testsuite comparison: |
|
@sylvestre I believe the regression noticed are due to cache misses because the metadata that goes with the token is just slightly bigger than usize. Here's another benchmark that maybe of interest: graph.tsort, 1 Gb random graph
Massif output:
Gnu tsort:
MB (Memory)
248.6^ #
| ::::::::::@::#
| ::::::::::::::::::: ::: ::::@: #
| :::::@:::: ::::: : ::::: ::: ::: ::::@: #
| @@:: : :@:: : ::::: : ::::: ::: ::: ::::@: #
| ::::@ :: : :@:: : ::::: : ::::: ::: ::: ::::@: #
| @@::: :@ :: : :@:: : ::::: : ::::: ::: ::: ::::@: #
| ::@ ::: :@ :: : :@:: : ::::: : ::::: ::: ::: ::::@: #
| ::::::@ ::: :@ :: : :@:: : ::::: : ::::: ::: ::: ::::@: #
| :::: ::@ ::: :@ :: : :@:: : ::::: : ::::: ::: ::: ::::@: #
| :::::: ::@ ::: :@ :: : :@:: : ::::: : ::::: ::: ::: ::::@: #
| :::: :::: ::@ ::: :@ :: : :@:: : ::::: : ::::: ::: ::: ::::@: #
| :::: : :::: ::@ ::: :@ :: : :@:: : ::::: : ::::: ::: ::: ::::@: #
| :: :: : :::: ::@ ::: :@ :: : :@:: : ::::: : ::::: ::: ::: ::::@: #
| :::: :: : :::: ::@ ::: :@ :: : :@:: : ::::: : ::::: ::: ::: ::::@: #
| : :: :: : :::: ::@ ::: :@ :: : :@:: : ::::: : ::::: ::: ::: ::::@: #
| :: :: :: : :::: ::@ ::: :@ :: : :@:: : ::::: : ::::: ::: ::: ::::@: #
| :: :: :: : :::: ::@ ::: :@ :: : :@:: : ::::: : ::::: ::: ::: ::::@: #
| @@:: :: :: : :::: ::@ ::: :@ :: : :@:: : ::::: : ::::: ::: ::: ::::@: #
| @ :: :: :: : :::: ::@ ::: :@ :: : :@:: : ::::: : ::::: ::: ::: ::::@: #
0 +----------------------------------------------------------------------->Gi (Giga Instructions)
0 23.47
Number of snapshots: 56
Detailed snapshots: [1, 2, 16, 21, 26, 52, 55 (peak)]
uutils tsort
MB (Memory)
200.0^ #
| @::::@:::::::::::@::::@::::@::::@:::#:
| :::::::::::@::::::@::::@::: :::::: @::::@::::@::::@:::#:
| ::::::::::::::@:::: :@::::@::: :::::: @::::@::::@::::@:::#:
| :::: :::::::::::@:::: :@::::@::: :::::: @::::@::::@::::@:::#::
| ::::: :::::::::::@:::: :@::::@::: :::::: @::::@::::@::::@:::#::
| @:::::: :::::::::::@:::: :@::::@::: :::::: @::::@::::@::::@:::#:::
| :@:::::: :::::::::::@:::: :@::::@::: :::::: @::::@::::@::::@:::#:::
| :@:::::: :::::::::::@:::: :@::::@::: :::::: @::::@::::@::::@:::#:::
| :@:::::: :::::::::::@:::: :@::::@::: :::::: @::::@::::@::::@:::#:::
| :@:::::: :::::::::::@:::: :@::::@::: :::::: @::::@::::@::::@:::#:::
| :@:::::: :::::::::::@:::: :@::::@::: :::::: @::::@::::@::::@:::#:::
| @::@:::::: :::::::::::@:::: :@::::@::: :::::: @::::@::::@::::@:::#:::
| :@::@:::::: :::::::::::@:::: :@::::@::: :::::: @::::@::::@::::@:::#:::
| :@::@:::::: :::::::::::@:::: :@::::@::: :::::: @::::@::::@::::@:::#:::
| :@::@:::::: :::::::::::@:::: :@::::@::: :::::: @::::@::::@::::@:::#:::
| :@::@:::::: :::::::::::@:::: :@::::@::: :::::: @::::@::::@::::@:::#:::
| ::@::@:::::: :::::::::::@:::: :@::::@::: :::::: @::::@::::@::::@:::#:::
| ::@::@:::::: :::::::::::@:::: :@::::@::: :::::: @::::@::::@::::@:::#:::
| ::@::@:::::: :::::::::::@:::: :@::::@::: :::::: @::::@::::@::::@:::#:::
0 +----------------------------------------------------------------------->Gi (Giga Instructions)
0 35.83
Number of snapshots: 98
Detailed snapshots: [3, 6, 24, 31, 37, 50, 51, 52, 62, 72, 82, 91 (peak)]
Hyperfine :
Benchmark 1: target/release/coreutils tsort /home/bench_vm/graph.tsort
Time (mean ± σ): 6.044 s ± 0.071 s [User: 5.602 s, System: 0.394 s]
Range (min … max): 5.974 s … 6.168 s 10 runs
Benchmark 2: tsort /home/bench_vm/graph.tsort
Time (mean ± σ): 7.115 s ± 0.147 s [User: 6.738 s, System: 0.328 s]
Range (min … max): 6.910 s … 7.390 s 10 runs
`
Summary: 'target/release/coreutils tsort /home/bench_vm/graph.tsort'
ran 1.18 ± 0.03 times faster than 'tsort /home/bench_vm/graph.tsort'
` |
|
GNU testsuite comparison: |
|
GNU testsuite comparison: |
7e7fa49 to
8c49b1a
Compare
|
GNU testsuite comparison: |
756b865 to
b9604fe
Compare
|
GNU testsuite comparison: |
8ac2d2a to
51b542f
Compare
|
windows fails with: |
3cd87f8 to
a8a1860
Compare
|
@sylvestre fixed, the rest of the work will be put in another PR. |
|
GNU testsuite comparison: |
1 similar comment
|
GNU testsuite comparison: |
|
GNU testsuite comparison: |
|
@sylvestre done :) |
|
The performance regression is inevitable here if we want to support the same input range (2^64 -1 unique input tokens). There's also a slight buffering cost at the given bench graph size. We are still way more memory efficient, so I'd still consider it a net positive. |
This PR is a WIP to get more performance out of tsort.
Move to usize is complete. Next step is to move to vec instead of hashmap