CLDR-19258 Add a clean CLDR implementation of DateTimePatternGenerator#5394
CLDR-19258 Add a clean CLDR implementation of DateTimePatternGenerator#5394sffc wants to merge 8 commits intounicode-org:mainfrom
Conversation
|
@pedberg-icu Let me know if you like what I'm trying to do with this PR. |
sffc
left a comment
There was a problem hiding this comment.
I reviewed the changes and prompted the AI to fix some things. This is ready for someone else's review now.
| private String getStringValueWithFallback(String path) { | ||
| Output<String> localeWhereFound = new Output<>(); | ||
| String val = file.getStringValueWithBailey(path, null, localeWhereFound); | ||
|
|
||
| if (!calendarID.equals("gregorian")) { | ||
| // If value is null, or it was only found at root/code-fallback, we must | ||
| // attempt to inherit from the specific locale's Gregorian calendar first. | ||
| if (val == null || (localeWhereFound.value != null && (localeWhereFound.value.equals("root") || localeWhereFound.value.equals(XMLSource.CODE_FALLBACK_ID)))) { | ||
| String gregPath = path.replaceFirst("\\[@type=\"[^\"]+\"\\]", "[@type=\"gregorian\"]"); | ||
| String gregVal = file.getStringValueWithBailey(gregPath, null, localeWhereFound); | ||
| if (gregVal != null && localeWhereFound.value != null && !localeWhereFound.value.equals("root") && !localeWhereFound.value.equals(XMLSource.CODE_FALLBACK_ID)) { | ||
| return gregVal; | ||
| } | ||
| } | ||
| } | ||
| return val; | ||
| } |
There was a problem hiding this comment.
Would like one of the reviewers more familiar with CLDRFile to look at this code. It is AI-generated and tries to obey the mixed vertical and horizontal fallback.
|
I very much like this plan. However, I'd like to see a test added for it, that compares the results over all locales between the ICU implementation and the proposed CLDR one. It should be exhaustive: testing all the reasonable combinations of input skeletons, over all locales and calendars. While this is a large number of combinations, it should be relatively quick to program and we don't care if it takes some minutes to run. I think the simplest way to do that is to take all of the semantic skeletons (plus all singletons) and using all the width combinations, which I think includes the following: G - 1..4 (plus u, U, r??) It does mean using ICUs code by overriding with CLDR data. That's fairly straightforward and I can point you to examples. I'd suggest starting with just gregorian and generic calendars, and first just English and Japanese, then expanding the calendars and locales.
Of course, once we replace the use of ICU's code inside of CLDR, we can drop the test at the same time. @pedberg-icu are there other symbols used in some other calendars that we should add? |
@macchiati @sffc For Chinese calendar we should add U and r. Chinese calendar is also an interesting test because it does not inherit formats sideways, it inherits only up. |
|
I'm happy to add more tests, noting that:
I'm assuming that you are asking more for a spreadsheet to illustrate the differences rather than a test that actually lands in CLDR. That seems like something reasonable to generate. |
|
I'm saying something different. The spec ( as in some other cases) is a somewhat hand wavy capture of what ICU is doing. I'm worried that a simple implementation of the spec will not catch cases that should be caught. By comparing icu's output to your implantation., we can figure out whether any of the differences are ones that should be accounted for by enhancing the spec and your implementation. |
CLDR-19258
I used Gemini 3.1 Pro to write most of this code. My prompt:
After the first round, it produced working code, but it was failing some tests. It said its code was more correct than the old code, which I verified was incorrect. My second prompt:
It worked for another hour and eventually made code that passed the test. It seems the bug was using
getWinningValueinstead ofgetStringValueWithBailey.Then I asked it to add more comments and docs.
ALLOW_MANY_COMMITS=true