Support Unicode 15.1 new GB9c break rule

ycmd embeds its unicode support files and tests (currently for version 13), but a script (`update_unicode.py`) is provided to update to the latest unicode version. This used to work to upgrade to version 14, but doesn't anymore with 15. The tests fail for example with:
```
[ RUN      ] UnicodeTest/WordTest.BreakIntoCharacters/1186
./cpp/ycm/tests/Word_test.cpp:60: Failure
Value of: Word( word_.text_ ).Characters()
Expected: { *{ "\xE0\xA4\x95\xE0\xA5\x8D\xE0\xA5\x8D\xE0\xA4\xA4"
    As Text: "कत", "\xE0\xA4\x95\xE0\xA4\xA4"
    As Text: "कत", "\xE0\xA4\x95\xE0\xA5\x8D\xE0\xA5\x8D\xE0\xA4\xA4"
    As Text: "कत", "\xE0\xA4\x95\xE0\xA5\x8D\xE0\xA5\x8D\xE0\xA4\xA4"
    As Text: "कत", false, true, false, false } }
  Actual: { *{ "\xE0\xA4\x95\xE0\xA5\x8D\xE0\xA5\x8D"
    As Text: "क", "\xE0\xA4\x95"
    As Text: "क", "\xE0\xA4\x95\xE0\xA5\x8D\xE0\xA5\x8D"
    As Text: "क", "\xE0\xA4\x95\xE0\xA5\x8D\xE0\xA5\x8D"
    As Text: "क्, false, true, false, false }, *{ "\xE0\xA4\xA4"
    As Text: "त", "\xE0\xA4\xA4"
    As Text: "त", "\xE0\xA4\xA4"
    As Text: "त", "\xE0\xA4\xA4"
    As Text: "त", true, true, false, false } }

[  FAILED  ] UnicodeTest/WordTest.BreakIntoCharacters/1186, where GetParam() = { "\xE0\xA4\x95\xE0\xA5\x8D\xE0\xA5\x8D\xE0\xA4\xA4"
    As Text: "कत", { "\xE0\xA4\x95\xE0\xA5\x8D\xE0\xA5\x8D\xE0\xA4\xA4"
    As Text: "कत" } (0 ms)
```

The reason is that 15.1 introduces a new rule for (not) breaking: [GB9c](https://unicode.org/reports/tr29/#GB9c) and of course the new tests exercising this rule fail now.

Prior art implementing this rule elsewhere: https://github.com/JuliaStrings/utf8proc/pull/253

Would be nice if support for newer Unicode standards could be added to ycmd.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support Unicode 15.1 new GB9c break rule #1718

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Support Unicode 15.1 new GB9c break rule #1718

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions