Skip to content

CLDR-19258 Add a clean CLDR implementation of DateTimePatternGenerator#5394

Open
sffc wants to merge 8 commits intounicode-org:mainfrom
sffc:cldr-dtpg
Open

CLDR-19258 Add a clean CLDR implementation of DateTimePatternGenerator#5394
sffc wants to merge 8 commits intounicode-org:mainfrom
sffc:cldr-dtpg

Conversation

@sffc
Copy link
Member

@sffc sffc commented Feb 24, 2026

CLDR-19258

  • This PR completes the ticket. (not yet; we should migrate more call sites to use it)

I used Gemini 3.1 Pro to write most of this code. My prompt:

In DateTimeFormats.java, it imports DateTimePatternGenerator from ICU4J. Please write a new DateTimePatternGenerator that exactly follows the specification, replace the imports in DateTimeFormats.java, and ensure the relevant tests pass, such as GenerateDateTimeTestData. You can find the specification in tr35-dates.md under the heading "Elements availableFormats, appendItems".

After the first round, it produced working code, but it was failing some tests. It said its code was more correct than the old code, which I verified was incorrect. My second prompt:

You are wrong: (1) The short dateTimeFormat in English Gregorian has a comma. (2) The skeleton has a j, which means select the default hour cycle, which is h23 for Japanese. (3) The appendItem for time zone in Arabic inherits from root, which does not have ├ markers. I committed your java code but not the test output. Please fix the errors in the java code.

It worked for another hour and eventually made code that passed the test. It seems the bug was using getWinningValue instead of getStringValueWithBailey.

Then I asked it to add more comments and docs.

ALLOW_MANY_COMMITS=true

@sffc sffc requested a review from pedberg-icu February 24, 2026 04:56
@sffc sffc marked this pull request as ready for review February 24, 2026 04:56
@sffc
Copy link
Member Author

sffc commented Feb 24, 2026

@pedberg-icu Let me know if you like what I'm trying to do with this PR.

Copy link
Member Author

@sffc sffc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I reviewed the changes and prompted the AI to fix some things. This is ready for someone else's review now.

Comment on lines 126 to 142
private String getStringValueWithFallback(String path) {
Output<String> localeWhereFound = new Output<>();
String val = file.getStringValueWithBailey(path, null, localeWhereFound);

if (!calendarID.equals("gregorian")) {
// If value is null, or it was only found at root/code-fallback, we must
// attempt to inherit from the specific locale's Gregorian calendar first.
if (val == null || (localeWhereFound.value != null && (localeWhereFound.value.equals("root") || localeWhereFound.value.equals(XMLSource.CODE_FALLBACK_ID)))) {
String gregPath = path.replaceFirst("\\[@type=\"[^\"]+\"\\]", "[@type=\"gregorian\"]");
String gregVal = file.getStringValueWithBailey(gregPath, null, localeWhereFound);
if (gregVal != null && localeWhereFound.value != null && !localeWhereFound.value.equals("root") && !localeWhereFound.value.equals(XMLSource.CODE_FALLBACK_ID)) {
return gregVal;
}
}
}
return val;
}
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would like one of the reviewers more familiar with CLDRFile to look at this code. It is AI-generated and tries to obey the mixed vertical and horizontal fallback.

@sffc sffc requested a review from macchiati February 25, 2026 23:44
@macchiati
Copy link
Member

macchiati commented Feb 26, 2026

I very much like this plan. However, I'd like to see a test added for it, that compares the results over all locales between the ICU implementation and the proposed CLDR one.

It should be exhaustive: testing all the reasonable combinations of input skeletons, over all locales and calendars. While this is a large number of combinations, it should be relatively quick to program and we don't care if it takes some minutes to run.

I think the simplest way to do that is to take all of the semantic skeletons (plus all singletons) and using all the width combinations, which I think includes the following:

G - 1..4 (plus u, U, r??)
y - 1..4
M - 1..5
d - 1..2
E - 1..4
h - 1..2 (plus H)
m - 1..2
s - 1..2
z - 1..5 (plus Z, O, v, V, X, x)

It does mean using ICUs code by overriding with CLDR data. That's fairly straightforward and I can point you to examples.

I'd suggest starting with just gregorian and generic calendars, and first just English and Japanese, then expanding the calendars and locales.

  • Where there are differences, we can then determine whether they are material or not, and either tune the algorithm or exclude them from the test before moving to more combinations.

Of course, once we replace the use of ICU's code inside of CLDR, we can drop the test at the same time.

@pedberg-icu are there other symbols used in some other calendars that we should add?

@pedberg-icu
Copy link
Contributor

pedberg-icu commented Feb 27, 2026

@pedberg-icu are there other symbols used in some other calendars that we should add?

@macchiati @sffc For Chinese calendar we should add U and r. Chinese calendar is also an interesting test because it does not inherit formats sideways, it inherits only up.

@sffc
Copy link
Member Author

sffc commented Feb 27, 2026

I'm happy to add more tests, noting that:

  1. Conformance is intended to cover this already; this implementation aligns for the ones covered in datetime test data (which isn't as comprehensive as I'd like it to be, but it does cover a number of edge cases)
  2. It is expected that there might be slight differences between this code and the ICU4J code. The objective here is to implement the spec based on a clean reading of the spec. If ICU4J behavior differs, it does not necessarily indicate that there is a bug in the CLDR code; it could be a bug in ICU4J code or perhaps ill-defined spec language.
  3. Other implementations like ICU4X also exist.

I'm assuming that you are asking more for a spreadsheet to illustrate the differences rather than a test that actually lands in CLDR. That seems like something reasonable to generate.

@macchiati
Copy link
Member

I'm saying something different. The spec ( as in some other cases) is a somewhat hand wavy capture of what ICU is doing. I'm worried that a simple implementation of the spec will not catch cases that should be caught. By comparing icu's output to your implantation., we can figure out whether any of the differences are ones that should be accounted for by enhancing the spec and your implementation.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants