ycmd embeds its unicode support files and tests (currently for version 13), but a script (update_unicode.py) is provided to update to the latest unicode version. This used to work to upgrade to version 14, but doesn't anymore with 15. The tests fail for example with:
[ RUN ] UnicodeTest/WordTest.BreakIntoCharacters/1186
./cpp/ycm/tests/Word_test.cpp:60: Failure
Value of: Word( word_.text_ ).Characters()
Expected: { *{ "\xE0\xA4\x95\xE0\xA5\x8D\xE0\xA5\x8D\xE0\xA4\xA4"
As Text: "कत", "\xE0\xA4\x95\xE0\xA4\xA4"
As Text: "कत", "\xE0\xA4\x95\xE0\xA5\x8D\xE0\xA5\x8D\xE0\xA4\xA4"
As Text: "कत", "\xE0\xA4\x95\xE0\xA5\x8D\xE0\xA5\x8D\xE0\xA4\xA4"
As Text: "कत", false, true, false, false } }
Actual: { *{ "\xE0\xA4\x95\xE0\xA5\x8D\xE0\xA5\x8D"
As Text: "क", "\xE0\xA4\x95"
As Text: "क", "\xE0\xA4\x95\xE0\xA5\x8D\xE0\xA5\x8D"
As Text: "क", "\xE0\xA4\x95\xE0\xA5\x8D\xE0\xA5\x8D"
As Text: "क्, false, true, false, false }, *{ "\xE0\xA4\xA4"
As Text: "त", "\xE0\xA4\xA4"
As Text: "त", "\xE0\xA4\xA4"
As Text: "त", "\xE0\xA4\xA4"
As Text: "त", true, true, false, false } }
[ FAILED ] UnicodeTest/WordTest.BreakIntoCharacters/1186, where GetParam() = { "\xE0\xA4\x95\xE0\xA5\x8D\xE0\xA5\x8D\xE0\xA4\xA4"
As Text: "कत", { "\xE0\xA4\x95\xE0\xA5\x8D\xE0\xA5\x8D\xE0\xA4\xA4"
As Text: "कत" } (0 ms)
Would be nice if support for newer Unicode standards could be added to ycmd.
ycmd embeds its unicode support files and tests (currently for version 13), but a script (
update_unicode.py) is provided to update to the latest unicode version. This used to work to upgrade to version 14, but doesn't anymore with 15. The tests fail for example with:The reason is that 15.1 introduces a new rule for (not) breaking: GB9c and of course the new tests exercising this rule fail now.
Prior art implementing this rule elsewhere: JuliaStrings/utf8proc#253
Would be nice if support for newer Unicode standards could be added to ycmd.