stat: fix mount point handling for non-UTF8 paths#8538
stat: fix mount point handling for non-UTF8 paths#8538sylvestre merged 11 commits intouutils:mainfrom
Conversation
|
GNU testsuite comparison: |
|
please ask your AI to cleanup a bit the patch, too much duplicated code
|
|
Hello @sylvestre |
src/uu/stat/src/stat.rs
Outdated
| for _ in 0..padding_needed { | ||
| print!(" "); | ||
| } | ||
| } else { | ||
| for _ in 0..padding_needed { | ||
| print!(" "); | ||
| } |
There was a problem hiding this comment.
this is duplicated code here
src/uu/stat/src/stat.rs
Outdated
| io::stdout().write_all(display_bytes)?; | ||
| } | ||
| } else { | ||
| io::stdout().write_all(display_bytes)?; |
There was a problem hiding this comment.
okay thanks i will edit it
please look at the other tests for this :) |
|
GNU testsuite comparison: |
|
GNU testsuite comparison: |
1 similar comment
|
GNU testsuite comparison: |
src/uu/stat/src/stat.rs
Outdated
| io::stdout().write_all(display_bytes)?; | ||
|
|
||
| if left && padding_needed > 0 { | ||
| for _ in 0..padding_needed { |
src/uu/stat/src/stat.rs
Outdated
| let bytes = s.as_bytes(); | ||
|
|
||
| if pad_and_print_bytes(bytes, flags.left, width, precision).is_err() { | ||
| let lossy_string = s.to_string_lossy(); |
There was a problem hiding this comment.
rename the variable to explain what it is, not the style
|
GNU testsuite comparison: |
|
GNU testsuite comparison: |
src/uu/stat/src/stat.rs
Outdated
| if path.starts_with(root) { | ||
| // TODO: This is probably wrong, we should pass the OsString | ||
| return Some(root.to_string_lossy().into_owned()); | ||
| return Some(root.clone()); |
There was a problem hiding this comment.
can we avoid the clone here ?
There was a problem hiding this comment.
yes we can by annotating the function and the enum OutputType by a lifetime to be like this
fn find_mount_point<'a, P: AsRef<Path>>(&'a self, p: P) -> Option<&'a OsString>
and the outputtype be generic over 'a but in line
1064: OutputType::OsStr(self.find_mount_point(file).unwrap())
i can't use unwrap or default i don't know how
There was a problem hiding this comment.
if it errors we can fall back to normal outputType::str with a default value of "" is that acceptable ?
|
GNU testsuite comparison: |
|
GNU testsuite comparison: |
src/uu/stat/src/stat.rs
Outdated
| /// | ||
| /// On Unix systems, this preserves non-UTF8 data by printing raw bytes | ||
| /// On other platforms, falls back to lossy string conversion | ||
| fn pad_and_print_bytes( |
There was a problem hiding this comment.
we have unit tests in this file, maybe test also this function ? thanks
|
GNU testsuite comparison: |
|
GNU testsuite comparison: |
|
Hello @sylvestre, is there any updates on this pr |
|
GNU testsuite comparison: |
a0fd51e to
300fbdb
Compare
|
GNU testsuite comparison: |
src/uu/stat/src/stat.rs
Outdated
| (padding_needed, 0) | ||
| }; | ||
|
|
||
| writer.write_all(&vec![b' '; left_pad])?; |
There was a problem hiding this comment.
try you try using using repeat() iterator or a pre-allocated buffer ?
There was a problem hiding this comment.
something like
std::io::repeat(b' ').take(n)
There was a problem hiding this comment.
okay will take a look at it
|
GNU testsuite comparison: |
|
GNU testsuite comparison: |
|
Hello @sylvestre , is there any updates on this pr |
Summary
This PR addresses a TODO in
find_mount_point()by properly handling mount point paths that contain non-UTF8 bytes. Previously,to_string_lossy()was used, which could corrupt mount point names with invalid UTF-8 sequences. The fix ensures raw bytes are preserved on Unix-like systems and safely handled on Windows.Changes Made
find_mount_point()return type updatedChanged from
Option<String>→Option<OsString>to preserve original bytes.Added
OsStrvariant toOutputTypeenumAllows proper handling of non-UTF8 data through the output pipeline.
Implemented
print_os_str().to_string_lossy(), safely handling UTF-16 strings.Added
pad_and_print_bytes()helperHandles alignment, width, and padding for raw byte output.
Updated
'm'format specifierReturns
OutputType::OsStrinstead of a lossy string, preserving original data.Platform Behavior
.to_string_lossy()safely, since Windows internally uses UTF-16.Related Issues
Fixes the TODO in
stat.rs:it is mentioned in this issue