Clarify ascii whitespace exclusion of vertical tab in the doc#154765
Clarify ascii whitespace exclusion of vertical tab in the doc#154765rust-bors[bot] merged 2 commits intorust-lang:mainfrom
Conversation
This especially means that for c: char, c.is_ascii() && c.is_whitespace() does **not** imply c.is_ascii_whitespace().
|
rustbot has assigned @Mark-Simulacrum. Use Why was this reviewer chosen?The reviewer was selected based on:
|
|
I kind of wonder if this might be something we could get away with fixing since it seems like we're not meeting the Unicode-specified behavior. Maybe worth a PR fixing things that we can crater? |
|
It is indeed conforming to the documentation where it links to https://infra.spec.whatwg.org/#ascii-whitespace, but it's extremely odd that we're using a different definition for specifically |
|
In Zulip from @ChrisDenton
If this is the case, this should probably be explicitly noted in the documentation as well. |
|
Rust's lexer includes vertical tab as whitespace: It matches Common Markdown's definition, but not GitHub Markdown. |
Given that we currently explicitly document that we don't include
Which means people who checked that their format's definition of whitespace matches ours would be broken if we changed it. I also think that even in programs where changing this results in a bug, it's one that's unlikely to be covered by all but the most thorough of test suites, so I doubt crater would reveal the extent of the issues. |
|
This was discussed in today's @rust-lang/libs-api meeting; the situation is unfortunate, but we're in favour of updating the docs and leaving behaviour as-is. +1 |
| /// [`u8::is_ascii_whitespace`]. Importantly, this definition excludes | ||
| /// the `\0x0B` byte even though it has the unicode WhiteSpace property | ||
| /// and is removed by [`str::trim_end`]. |
There was a problem hiding this comment.
| /// [`u8::is_ascii_whitespace`]. Importantly, this definition excludes | |
| /// the `\0x0B` byte even though it has the unicode WhiteSpace property | |
| /// and is removed by [`str::trim_end`]. | |
| /// [`u8::is_ascii_whitespace`]. Importantly, this definition excludes | |
| /// the `\0x0B` byte even though it has the Unicode [`White_Space`] property | |
| /// and is removed by [`str::trim_end`]. | |
| /// | |
| /// [`White_Space`]: https://www.unicode.org/reports/tr44/#White_Space |
… and so on for other references to the property.
There was a problem hiding this comment.
Maybe "Importantly, unlike is_whitespace, this definition..." to draw attention to the contrast between them?
There was a problem hiding this comment.
@asquared31415 I like your idea but I feel like this is already implied by highlighting the difference with the various trim_* and I don't want to make the doc too long so I have elected to not do incoporate your suggestion for now.
|
r=me with @Jules-Bertholet's comment addressed. |
|
@rustbot ready |
|
@bors r=Mark-Simulacrum,WaffleLapkin rollup |
…-Simulacrum,WaffleLapkin Clarify ascii whitespace exclusion of vertical tab in the doc This especially means that for `c: char`, `c.is_ascii() && c.is_whitespace()` does **not** imply `c.is_ascii_whitespace()`, which can cause bug and is highly counterintuitive.
…-Simulacrum,WaffleLapkin Clarify ascii whitespace exclusion of vertical tab in the doc This especially means that for `c: char`, `c.is_ascii() && c.is_whitespace()` does **not** imply `c.is_ascii_whitespace()`, which can cause bug and is highly counterintuitive.
…uwer Rollup of 9 pull requests Successful merges: - #153536 (Add `const_param_ty_unchecked` gate) - #153815 (Fix ICE when Self is used in enum discriminant of a generic enum) - #154882 (Gate tuple const params behind `min_adt_const_params` feature) - #155293 (fix arch names in cfg pretty printer) - #154765 (Clarify ascii whitespace exclusion of vertical tab in the doc) - #155172 (Some small nits for supertrait_item_shadowing, and additional testing) - #155279 (Test/lexer unicode pattern white space) - #155280 (Tests for precise-capture through RPIT and TAIT) - #155304 (remove PointeeParser)
Rollup of 13 pull requests Successful merges: - #154882 (Gate tuple const params behind `min_adt_const_params` feature) - #155259 (explicit-tail-calls: disable two tests on LoongArch) - #155293 (fix arch names in cfg pretty printer) - #155314 (`BorrowedBuf`: Update outdated safety comments in `set_init` users.) - #153469 (docs: clarify path search behavior in std::process::Command::new) - #154765 (Clarify ascii whitespace exclusion of vertical tab in the doc) - #155172 (Some small nits for supertrait_item_shadowing, and additional testing) - #155279 (Test/lexer unicode pattern white space) - #155280 (Tests for precise-capture through RPIT and TAIT) - #155301 (Delete unused `rustc_trait_selection` errors.) - #155303 (remove ibraheemdev from review rotation) - #155304 (remove PointeeParser) - #155319 (Remove dead diagnostic structs.)
Rollup merge of #154765 - krtab:doc_ascii_whitespace, r=Mark-Simulacrum,WaffleLapkin Clarify ascii whitespace exclusion of vertical tab in the doc This especially means that for `c: char`, `c.is_ascii() && c.is_whitespace()` does **not** imply `c.is_ascii_whitespace()`, which can cause bug and is highly counterintuitive.
This especially means that for
c: char,c.is_ascii() && c.is_whitespace()does not implyc.is_ascii_whitespace(), which can cause bug and is highly counterintuitive.