Skip to content

Conversation

@hata6502
Copy link
Contributor

@hata6502 hata6502 commented Nov 8, 2022

Resolved jsdom/jsdom#3461 .

https://github.com/jsdom/w3c-xmlserializer/blob/master/lib/serialize.js#L151
w3c-xmlserializer checks the given XML node is well-formed.
Emoji characters (such as 🔍) are not matched to XML_CHAR regular expression.
But, Emoji characters are allowed in well-formed XML according to the W3C specification.

https://github.com/jsdom/w3c-xmlserializer/blob/master/lib/serialize.js#L8
XML_CHAR is defined as /^(\x09|\x0A|\x0D|[\x20-\uD7FF]|[\uE000-\uFFFD]|(?:[\uD800-\uDBFF][\uDC00-\uDFFF]))*$/u
Because this regular expression uses u option, surrogate pair (such as emoji) is not matched.

> (/^(\x09|\x0A|\x0D|[\x20-\uD7FF]|[\uE000-\uFFFD]|(?:[\uD800-\uDBFF][\uDC00-\uDFFF]))*$/u).test("🔍")
   false

Surrogate pair is matched by adding [\u{10000}-\u{10FFFF}] range.

> (/^(\x09|\x0A|\x0D|[\x20-\uD7FF]|[\uE000-\uFFFD]|[\u{10000}-\u{10FFFF}])*$/u).test("🔍")
   true

@domenic
Copy link
Member

domenic commented Nov 18, 2022

Thanks for working on this! Can you add a test to https://github.com/jsdom/w3c-xmlserializer/blob/master/test/test.js that fails before this change, and passes afterward?

@hata6502
Copy link
Contributor Author

Thank you for reviewing!
The test case is added. f37176a
It is passed on this pull request, and failed by reverting 8ef9d81 .

@domenic domenic merged commit bfe23db into jsdom:master Nov 20, 2022
@hata6502 hata6502 deleted the allow-emoji branch November 20, 2022 08:20
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[DOMParser] Can't get innerHTML of XML document with emoji

2 participants