Skip to content

Clarify ascii whitespace exclusion of vertical tab in the doc#154765

Merged
rust-bors[bot] merged 2 commits intorust-lang:mainfrom
krtab:doc_ascii_whitespace
Apr 15, 2026
Merged

Clarify ascii whitespace exclusion of vertical tab in the doc#154765
rust-bors[bot] merged 2 commits intorust-lang:mainfrom
krtab:doc_ascii_whitespace

Conversation

@krtab
Copy link
Copy Markdown
Contributor

@krtab krtab commented Apr 3, 2026

This especially means that for c: char, c.is_ascii() && c.is_whitespace() does not imply c.is_ascii_whitespace(), which can cause bug and is highly counterintuitive.

This especially means that for c: char, c.is_ascii() &&
c.is_whitespace() does **not** imply c.is_ascii_whitespace().
@rustbot rustbot added S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. T-libs Relevant to the library team, which will review and decide on the PR/issue. labels Apr 3, 2026
@rustbot
Copy link
Copy Markdown
Collaborator

rustbot commented Apr 3, 2026

r? @Mark-Simulacrum

rustbot has assigned @Mark-Simulacrum.
They will have a look at your PR within the next two weeks and either review your PR or reassign to another reviewer.

Use r? to explicitly pick a reviewer

Why was this reviewer chosen?

The reviewer was selected based on:

  • Owners of files modified in this PR: @scottmcm, libs
  • @scottmcm, libs expanded to 8 candidates
  • Random selection from Mark-Simulacrum, jhpratt, scottmcm

@tgross35
Copy link
Copy Markdown
Contributor

tgross35 commented Apr 4, 2026

I kind of wonder if this might be something we could get away with fixing since it seems like we're not meeting the Unicode-specified behavior. Maybe worth a PR fixing things that we can crater?

@RalfJung RalfJung added the I-libs-api-nominated Nominated for discussion during a libs-api team meeting. label Apr 4, 2026
@asquared31415
Copy link
Copy Markdown
Contributor

It is indeed conforming to the documentation where it links to https://infra.spec.whatwg.org/#ascii-whitespace, but it's extremely odd that we're using a different definition for specifically is_ascii_whitespace than the unicode property definitions that we use everywhere else.

@asquared31415
Copy link
Copy Markdown
Contributor

In Zulip from @ChrisDenton

I think it was designed more for e.g. rust lang-like use which doesn't consider vertical tab to be whitespace.

If this is the case, this should probably be explicitly noted in the documentation as well.

@teor2345
Copy link
Copy Markdown
Contributor

teor2345 commented Apr 4, 2026

Rust's lexer includes vertical tab as whitespace:
https://doc.rust-lang.org/reference/whitespace.html

It matches Common Markdown's definition, but not GitHub Markdown.

@thomcc
Copy link
Copy Markdown
Member

thomcc commented Apr 5, 2026

I kind of wonder if this might be something we could get away with fixing since it seems like we're not meeting the Unicode-specified behavior. Maybe worth a PR fixing things that we can crater?

Given that we currently explicitly document that we don't include \v, I think it would be poorly behaved on our part to change this, even if it is a bit confusing. The documentation even says:

If you are writing a program that will process an existing file format, check what that format’s definition of whitespace is before using this function.

Which means people who checked that their format's definition of whitespace matches ours would be broken if we changed it.

I also think that even in programs where changing this results in a bug, it's one that's unlikely to be covered by all but the most thorough of test suites, so I doubt crater would reveal the extent of the issues.

@nia-e
Copy link
Copy Markdown
Member

nia-e commented Apr 7, 2026

This was discussed in today's @rust-lang/libs-api meeting; the situation is unfortunate, but we're in favour of updating the docs and leaving behaviour as-is. +1

@nia-e nia-e removed the I-libs-api-nominated Nominated for discussion during a libs-api team meeting. label Apr 7, 2026
Comment on lines +256 to +258
/// [`u8::is_ascii_whitespace`]. Importantly, this definition excludes
/// the `\0x0B` byte even though it has the unicode WhiteSpace property
/// and is removed by [`str::trim_end`].
Copy link
Copy Markdown
Contributor

@Jules-Bertholet Jules-Bertholet Apr 8, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

View changes since the review

Suggested change
/// [`u8::is_ascii_whitespace`]. Importantly, this definition excludes
/// the `\0x0B` byte even though it has the unicode WhiteSpace property
/// and is removed by [`str::trim_end`].
/// [`u8::is_ascii_whitespace`]. Importantly, this definition excludes
/// the `\0x0B` byte even though it has the Unicode [`White_Space`] property
/// and is removed by [`str::trim_end`].
///
/// [`White_Space`]: https://www.unicode.org/reports/tr44/#White_Space

… and so on for other references to the property.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe "Importantly, unlike is_whitespace, this definition..." to draw attention to the contrast between them?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@asquared31415 I like your idea but I feel like this is already implied by highlighting the difference with the various trim_* and I don't want to make the doc too long so I have elected to not do incoporate your suggestion for now.

@Mark-Simulacrum
Copy link
Copy Markdown
Member

r=me with @Jules-Bertholet's comment addressed.

@Mark-Simulacrum Mark-Simulacrum added S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels Apr 11, 2026
@krtab
Copy link
Copy Markdown
Contributor Author

krtab commented Apr 13, 2026

@rustbot ready

@rustbot rustbot added S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. and removed S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. labels Apr 13, 2026
@WaffleLapkin
Copy link
Copy Markdown
Member

@bors r=Mark-Simulacrum,WaffleLapkin rollup

@rust-bors
Copy link
Copy Markdown
Contributor

rust-bors Bot commented Apr 14, 2026

📌 Commit b52a38d has been approved by Mark-Simulacrum,WaffleLapkin

It is now in the queue for this repository.

@rust-bors rust-bors Bot added S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels Apr 14, 2026
JonathanBrouwer added a commit to JonathanBrouwer/rust that referenced this pull request Apr 14, 2026
…-Simulacrum,WaffleLapkin

Clarify ascii whitespace exclusion of vertical tab in the doc

This especially means that for `c: char`, `c.is_ascii() && c.is_whitespace()` does **not** imply `c.is_ascii_whitespace()`, which can cause bug and is highly counterintuitive.
JonathanBrouwer added a commit to JonathanBrouwer/rust that referenced this pull request Apr 14, 2026
…-Simulacrum,WaffleLapkin

Clarify ascii whitespace exclusion of vertical tab in the doc

This especially means that for `c: char`, `c.is_ascii() && c.is_whitespace()` does **not** imply `c.is_ascii_whitespace()`, which can cause bug and is highly counterintuitive.
rust-bors Bot pushed a commit that referenced this pull request Apr 14, 2026
…uwer

Rollup of 9 pull requests

Successful merges:

 - #153536 (Add `const_param_ty_unchecked` gate)
 - #153815 (Fix ICE when Self is used in enum discriminant of a generic enum)
 - #154882 (Gate tuple const params behind `min_adt_const_params` feature)
 - #155293 (fix arch names in cfg pretty printer)
 - #154765 (Clarify ascii whitespace exclusion of vertical tab in the doc)
 - #155172 (Some small nits for supertrait_item_shadowing, and additional testing)
 - #155279 (Test/lexer unicode pattern white space)
 - #155280 (Tests for precise-capture through RPIT and TAIT)
 - #155304 (remove PointeeParser)
rust-bors Bot pushed a commit that referenced this pull request Apr 15, 2026
Rollup of 13 pull requests

Successful merges:

 - #154882 (Gate tuple const params behind `min_adt_const_params` feature)
 - #155259 (explicit-tail-calls: disable two tests on LoongArch)
 - #155293 (fix arch names in cfg pretty printer)
 - #155314 (`BorrowedBuf`: Update outdated safety comments in `set_init` users.)
 - #153469 (docs: clarify path search behavior in std::process::Command::new)
 - #154765 (Clarify ascii whitespace exclusion of vertical tab in the doc)
 - #155172 (Some small nits for supertrait_item_shadowing, and additional testing)
 - #155279 (Test/lexer unicode pattern white space)
 - #155280 (Tests for precise-capture through RPIT and TAIT)
 - #155301 (Delete unused `rustc_trait_selection` errors.)
 - #155303 (remove ibraheemdev from review rotation)
 - #155304 (remove PointeeParser)
 - #155319 (Remove dead diagnostic structs.)
@rust-bors rust-bors Bot merged commit 78a1300 into rust-lang:main Apr 15, 2026
11 checks passed
@rustbot rustbot added this to the 1.97.0 milestone Apr 15, 2026
rust-timer added a commit that referenced this pull request Apr 15, 2026
Rollup merge of #154765 - krtab:doc_ascii_whitespace, r=Mark-Simulacrum,WaffleLapkin

Clarify ascii whitespace exclusion of vertical tab in the doc

This especially means that for `c: char`, `c.is_ascii() && c.is_whitespace()` does **not** imply `c.is_ascii_whitespace()`, which can cause bug and is highly counterintuitive.
@krtab krtab deleted the doc_ascii_whitespace branch April 15, 2026 10:18
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. T-libs Relevant to the library team, which will review and decide on the PR/issue.

Projects

None yet

Development

Successfully merging this pull request may close these issues.