uppercase/lowercase functions are not portable?

Since @ScottPJones was mentioning upper/lowercase functions recently, I took a quick look at them and I noticed that we are calling `towupper` and `towlower`, which are C99 functions that accept `wchar_t` arguments.

Unfortunately, this means that they are broken on Windows (where `wchar_t` is 16 bits) for any character outside the BMP.   Even on other platforms with a 32-bit `wchar_t`, they are going to return different results on different systems, and many systems will have out-of-date Unicode tables.  They are also locale-dependent; I'm not sure if this is desirable for us.

utf8proc has up-to-date upper/lower/titlecase mapping data already in its "database" (generated from http://www.unicode.org/Public/UNIDATA/UnicodeData.txt), so maybe we should just add a `utf8proc_toupper` function (etc.) to utf8proc to make this accessible.  Then we could call that (probably plus a check for the common case of ASCII codepoints).


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

uppercase/lowercase functions are not portable? #11471

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

uppercase/lowercase functions are not portable? #11471

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions