tr: fix high memory use, possible heap exhaustion#8435
tr: fix high memory use, possible heap exhaustion#8435sylvestre merged 2 commits intouutils:mainfrom
Conversation
|
Somehow test_cp is failing with invalid file descriptor in cp, in the l10n tests. 😕 |
|
I think we may want a larger buffer size, as it could be beneficial for performance, say 4MB at a time, but I just went with GNU buffer size for now. |
|
Please rerun the l10n test, I wonder if there is a race condition with pipe_in of some sort. |
|
GNU testsuite comparison: |
Yeah since you're at it, larger buffer seems worth it... Mind doing a few simple benchmarks with |
drinkcat
left a comment
There was a problem hiding this comment.
Thanks! Change looks fine and we can merge it first, but if you want to do more benchmarking/optimizations, some ideas.
| let filtered = buf[..length] | ||
| .iter() | ||
| .filter_map(|&c| translator.translate(c)); | ||
| output_buf.extend(filtered); |
There was a problem hiding this comment.
Maybe for a follow-up, but if you want to do benchmarking (and fine-tuning) here, I wonder if we can just directly write_all filtered, instead of collecting the data in output_buf first?
(but then... you might want to restore a buffered writer on stdout)
There was a problem hiding this comment.
I think we should deal with performance separately, this is a surprisingly big issue:
$ hyperfine 'gnutr a b < zeroes'
Benchmark 1: gnutr a b < zeroes
Time (mean ± σ): 558.3 ms ± 80.6 ms [User: 356.6 ms, System: 201.5 ms]
Range (min … max): 528.3 ms … 787.5 ms 10 runs
$ hyperfine './target/release/tr a b < zeroes'
Benchmark 1: ./target/release/tr a b < zeroes
⠴ Current estimate: 5.681 s
(main is at 6.8 ish)
There was a problem hiding this comment.
you should run hyperfine this way:
hyperfine 'gnutr a b < zeroes' './target/release/tr a b < zeroes'
you get much better output
There was a problem hiding this comment.
Ouch. Yes, sounds good, would be great if you can file an issue for the performance issue.
(I didn't look at our code for tr, but we use some vector instructions in wc to find newline chars that may also be helpful here.)
Read the input into a statically sized buffer - 8192, matching GNU - instead of reading until the end of the line, as reading until the end of the line in a file with no end of line would result in reading the entire file into memory. Confusingly, GNU tr seems to write the 8192 byte in two chunks of 1024 and 7168 byte, but I can't figure out why it would do that; I don't see any line buffering in GNU tr. Bug-Ubuntu: https://launchpad.net/bugs/2119520
Our stdin that we transform already is buffered (using 8192 byte
buffers in the previous commit), so avoid buffering our output
needlessly.
This effectively changes the code to write complete lines
immediately, for example, in
`( echo a; sleep 1 ) | tr a b`
we receive
read(0, "a\n", 8192) = 2
write(1, "b\n", 2) = 2
read(0, "", 8192) = 0
instead of
read(0, "a\n", 8192) = 2
read(0, "", 8192) = 0
write(1, "b\n", 2) = 2
which matches the GNU coreutils behavior.
|
GNU testsuite comparison: |
First commit changes the code to use a fixed-size buffer to avoid heap exhaustion when reading a file with no new lines.
Second commit removes the buffering on stdout, since we already buffer our reads on stdin (we read in 8192b chunks now). This was needed for observed identical buffering behavior. It's not entirely exact in that GNU writes a first chunk of 1024 bytes and then the remaining 7168 bytes when running on my file created with
truncate file -s 96G(to simulate heap exhaustion).We could add a test case in the form of
ulimit -v 128Mand generating a file that has say256Mbut I'm not sure how meaningful it is.