Skip to content

fix encoding of unicode URL fragments#194

Open
haarg wants to merge 2 commits intoperl-pod:masterfrom
haarg:fix-unicode-fragment-encoding
Open

fix encoding of unicode URL fragments#194
haarg wants to merge 2 commits intoperl-pod:masterfrom
haarg:fix-unicode-fragment-encoding

Conversation

@haarg
Copy link
Contributor

@haarg haarg commented Feb 10, 2026

URL fragments are percent encoded. Percent encoding is a byte encoding, so its input must be encoded as UTF-8. Even for documents using a different encoding in their source, such as ISO-8859-1, browsers expect the URL fragments to be UTF-8 encoded.

Since percent encodings only ever deal with 0 to 255, we can precalculate a mapping of characters to encoded form. Encode the URL as UTF-8, then use the mapping to percent encode the result.

No fallback is provided if utf8::encode is not available. If you try to use this with non-ascii characters on perl 5.6, you get to keep both pieces.

Ensude we aren't using HTML::Entities for consistency.

Correct order of got vs expected.

Indent pod template to avoid editor warnings on invalid pod.

Use a consistent name suffix for each test.
URL fragments are percent encoded. Percent encoding is a byte encoding,
so its input must be encoded as UTF-8. Even for documents using a
different encoding in their source, such as ISO-8859-1, browsers expect
the URL fragments to be UTF-8 encoded.

Since percent encodings only ever deal with 0 to 255, we can
precalculate a mapping of characters to encoded form. Encode the URL as
UTF-8, then use the mapping to percent encode the result.

No fallback is provided if utf8::encode is not available. If you try to
use this with non-ascii characters on perl 5.6, you get to keep both
pieces.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant