-
-
Notifications
You must be signed in to change notification settings - Fork 1.8k
Closed
Labels
Description
Component
comm
Description
uutils comm converts each output line to UTF-8 using String::from_utf8_lossy() before printing, which replaces invalid UTF-8 byte sequences with U+FFFD. GNU comm writes raw bytes directly to stdout without UTF-8 conversion, preserving byte-exact input.
Test / Reproduction Steps
echo -ne "\xfe\n\xff\n" > /tmp/a
echo -ne "\xff\n\xfe\n" > /tmp/b
comm /tmp/a /tmp/b | od -An -tx1GNU output:
09 09 ff 0a 09 09 fe 0a
uutils output:
ef bf bd 0a 09 09 ef bf bd 0a 09 ef bf bd 0a
Impact
Non-UTF-8 text are silently corrupted in the output.
Reactions are currently unavailable