fix versionsort chunk split on non-ASCII numerics#6407
fix versionsort chunk split on non-ASCII numerics#6407ytmimi merged 2 commits intorust-lang:masterfrom
Conversation
ytmimi
left a comment
There was a problem hiding this comment.
The Style Guide mentions that numeric chunks are defined by ascii digits so using is_ascii_digit is definitely the right function call to use.
I've left a few comments inline. I think we should rework the current test case and also add a style_edition=2015 case so that we can compare the sort order when dealing with non ascii numeric characters.
| fn main() { | ||
| first_print(); | ||
| second_print(); | ||
| third_print(); | ||
|
|
||
| assert_eq!("print๙msg".cmp("printémsg"), Ordering::Greater); | ||
| } | ||
|
|
||
| /// '๙' = 0E59;THAI DIGIT NINE;Nd; | ||
| mod print๙msg { | ||
| pub fn print() { | ||
| println!("Non-ASCII Decimal_Number") | ||
| } | ||
| } | ||
|
|
||
| /// '0' = 0030;DIGIT ZERO;Nd; | ||
| mod print0msg { | ||
| pub fn print() { | ||
| println!("ASCII Decimal_Number") | ||
| } | ||
| } | ||
|
|
||
| /// 'é' = 00E9;LATIN SMALL LETTER E WITH ACUTE;Ll; | ||
| mod printémsg { | ||
| pub fn print() { | ||
| println!("Lowercase_Letter") | ||
| } | ||
| } No newline at end of file |
There was a problem hiding this comment.
Let's remove any code that isn't an import since it's not needed for the test case. Though, It would be great to keep the explanatory comments you added about '๙', '0', and 'é' to help document the test case.
There was a problem hiding this comment.
Updated and expanded on the comments to explain why they sort in that order.
| use std::cmp::Ordering; | ||
| use print๙msg::print as first_print; | ||
| use print0msg::print as second_print; | ||
| use printémsg::print as third_print; |
There was a problem hiding this comment.
version sort is only be applied when using style_edition=2024. You'll need to configure that for this test as follows:
| use std::cmp::Ordering; | |
| use print๙msg::print as first_print; | |
| use print0msg::print as second_print; | |
| use printémsg::print as third_print; | |
| // rustfmt-style_edition: 2024 | |
| use std::cmp::Ordering; | |
| use print๙msg::print as first_print; | |
| use print0msg::print as second_print; | |
| use printémsg::print as third_print; |
You can read more about the comment configuration in test cases here.
Also, can you also add a style_edition=2015 import sorting test case so that we can compare the pre version sort ordering.
There was a problem hiding this comment.
Added. The 2015 edition actually sorts the same way since U+0030 < U+00E9 < U+0E59, so the bug only affected 2024 edition (and can't be reproduced on earlier versions because ASCII digits are the earliest Unicode numerics).
There was a problem hiding this comment.
Thanks for adding that additional test case!
ytmimi
left a comment
There was a problem hiding this comment.
Thanks for your help on this one!
|
I think we're good to go here, but just want to do a quick sanity check by running the updated rustfmt on some larger rust codebases. Here's a link to the diff check job. Edit: Job ran successfully ✅ |
|
@ytmimi - I was trying to determine whether we needed to account for this one in the release notes and have two questions:
|
The way I see it, yes, this change is aligned with the clause around non-ascii unicode characters.
I opened #6432 to get the ball rolling on the release notes and removed the label for anything that I already added to that draft PR. I made a note here specifically for this change. |
Description
Fixes a bug with 2024 Edition versionsort, where a string chunk will split on all numeric characters. For example, this would cause imports containing non-ASCII numeric characters to be incorrectly sorted.
Unformatted (Playground link):
Formatted imports (nightly 2024-11-29, Playground link):
Here,
printémsgshould be sorted beforeprint๙msg, but since๙is a non-ASCII numeric, that import is split into two short string chunks, which will sort before the one longer chunk.Changes & Notes
char::is_numeric()check with the correctchar::is_ascii_digit()call inVersionChunkIter::parse_str_chunk().rustfmtagainst the repo and found no changes, so there isn't a separate commit for that.