join: consider locale collation in field comparison#9982
join: consider locale collation in field comparison#9982sylvestre merged 2 commits intouutils:mainfrom
Conversation
|
did you run some benchmarks ? |
|
I have a few optimization ideas:
Initial testing shows this could bring the C locale overhead from ~16% down to ~1%. Happy to iterate on this based on your guidance! @sylvestre |
|
Why not both :) |
Yeah, that's exactly what I did. |
|
I don't see the change :) Note that we will need a benchmark for join |
87f207e to
41bce04
Compare
Sorry, I just pushed.
Sure! |
|
in the future, please avoid screenshot, they are terrible for accessibility and search :) thanks and please compare with gnu too |
|
100_000 lines file coreutils$ LC_ALL=C hyperfine --warmup 5 --runs 10 -n "GNU join" '/usr/bin/join file1.txt file2.txt' -n "uutils join" 'target/release/join file1.txt file2.txt'
Benchmark 1: GNU join
Time (mean ± σ): 20.8 ms ± 1.8 ms [User: 18.2 ms, System: 2.5 ms]
Range (min … max): 18.8 ms … 24.2 ms 10 runs
Benchmark 2: uutils join
Time (mean ± σ): 22.9 ms ± 2.2 ms [User: 19.8 ms, System: 3.0 ms]
Range (min … max): 20.4 ms … 27.4 ms 10 runs
Summary
GNU join ran
1.10 ± 0.14 times faster than uutils join coreutils$ LC_ALL=en_US.UTF-8 hyperfine --warmup 5 --runs 10 -n "GNU join" '/usr/bin/join file1.txt file2.txt' -n "uutils join" 'target/release/join file1.txt file2.txt'
Benchmark 1: GNU join
Time (mean ± σ): 31.0 ms ± 3.5 ms [User: 27.1 ms, System: 3.7 ms]
Range (min … max): 27.6 ms … 38.6 ms 10 runs
Benchmark 2: uutils join
Time (mean ± σ): 63.1 ms ± 3.3 ms [User: 59.8 ms, System: 3.1 ms]
Range (min … max): 60.1 ms … 70.4 ms 10 runs
Summary
GNU join ran
2.04 ± 0.26 times faster than uutils join |
41bce04 to
8bcc415
Compare
|
GNU testsuite comparison: |
|
some benchmarks (the way i am expecting to see it) the performance regressed a bit much :/ |
CodSpeed Performance ReportMerging this PR will degrade performance by 4.2%Comparing Summary
Performance Changes
Footnotes
|
Co-authored-by: Sylvestre Ledru <sylvestre@debian.org>


Fixes: #9971
GNU join uses
LC_COLLATEfor field comparison. This PR (refexpr) implements locale-aware string comparison using uucore'si18n::collatormodule.Reproduce: