printf: accept non-UTF-8 input in FORMAT and#6812
printf: accept non-UTF-8 input in FORMAT and#6812andrewliebenow wants to merge 4 commits intouutils:mainfrom
Conversation
|
Probably needs some refinement. If merged, will resolve #6804 (the |
|
GNU testsuite comparison: |
| } | ||
| } | ||
|
|
||
| pub fn bytes_from_os_str(input: &OsStr) -> UResult<&[u8]> { |
There was a problem hiding this comment.
could you please add a unit test for this function?
and get_str_or_exit_with_error if possible?
There was a problem hiding this comment.
The commit I just pushed cleans up the error handling. I renamed this function to try_get_bytes_from_os_str. I'm not sure how to test this, because on most platforms this operation never fails. I did add a test for this much easier to trigger error:
0f3bc3f#diff-6ea56f426af08b614e48fe26ab7345fd0f2901e87b6b77d5ccf37e4eff6dd3d1
❯ ./target/release/printf '%d' "$(coreutils printf 'Swer an rehte g\xFCete')"
./target/release/printf: invalid (non-UTF-8) argument like 'Swer an rehte g�ete' encountered
0f3bc3f to
d592d16
Compare
|
GNU testsuite comparison: |
1 similar comment
|
GNU testsuite comparison: |
|
I am working on the handling of non-UTF-8 inputs in See the related commit I think we should organize ourselves to not duplicate code. |
If you think your PR will be merged soonish, I will convert this PR to a draft, wait for your PR to be merged, and then update this branch to use those conversion functions. |
It was just merged 👍 |
6a041b9 to
a65a474
Compare
|
@RenjiSann I've included some changes to the code you just added. Please let me know if you have any issues with the changes. |
|
I'm good with all of it. The precaution regarding the potentially overlapping |
|
GNU testsuite comparison: |
a65a474 to
a7ec92c
Compare
|
GNU testsuite comparison: |
|
A few lines are not covered by unit tests (low code coverage) |
ARGUMENT arguments Other implementations of `printf` permit arbitrary data to be passed to `printf`. The only restriction is that a null byte terminates FORMAT and ARGUMENT argument strings (since they are C strings). The current implementation only accepts FORMAT and ARGUMENT arguments that are valid UTF-8 (this is being enforced by clap). This commit removes the UTF-8 validation by switching to OsStr and OsString. This allows users to use `printf` to transmit or reformat null-safe but not UTF-8-safe data, such as text encoded in an 8-bit text encoding. See the `non_utf_8_input` test for an example (ISO-8859-1 text).
a7ec92c to
f2d050e
Compare
|
@andrewliebenow updates on this ? Is it ready yet ? |
|
sorry, it needs to be rebased, sorry again |
|
@andrewliebenow: You can take my rebase from #7209 and move it here, or I can take it over the finish line there, either way is fine with me. |
ARGUMENT arguments
Other implementations of
printfpermit arbitrary data to be passed toprintf. The only restriction is that a null byte terminates FORMAT and ARGUMENT argument strings (since they are C strings).The current implementation only accepts FORMAT and ARGUMENT arguments that are valid UTF-8 (this is being enforced by clap).
This commit removes the UTF-8 validation by switching to OsStr and OsString.
This allows users to use
printfto transmit or reformat null-safe but not UTF-8-safe data, such as text encoded in an 8-bit text encoding. See thenon_utf_8_inputtest for an example (ISO-8859-1 text).