Creating lists of intermittently failing tests by adamfarley · Pull Request #6472 · adoptium/aqa-tests

adamfarley · 2025-07-31T12:30:16Z

This change lists known unreliable tests in a consistent format.

Resolves #6471

adamfarley · 2025-08-11T09:51:51Z

Requesting reviews from @ShelleyLambert and @sophia-guo please.

smlambert · 2025-08-11T11:38:12Z

thanks @adamfarley - a heads up that my active Github tag is @smlambert ... but I did see this review request in any event.

smlambert

Thank you for compiling this list @adamfarley. Only openjdk testcase failures are handled by jtreg exclude Problemlist format. For the other test groups that are part of AQAvit, they get excluded via playlist exclusions.

I would have expected the perf and system targets to be listed via playlist files (which do not use a different file for each version as the problem lists do).

I guess it depends on what we intend to do with these lists. If we are adding a feature to exclude unreliable targets on the fly, or post-process failures against a list, we have to consider what format to use, and how to organize them within the directory (as different vendors may eventually have different lists).

adamfarley · 2025-08-11T14:03:19Z

Thank you for compiling this list @adamfarley. Only openjdk testcase failures are handled by jtreg exclude Problemlist format. For the other test groups that are part of AQAvit, they get excluded via playlist exclusions.

I noticed that. For Perf and Systemtest, I used the jtreg format for consistency, rather than explicit usability (though a consistent format does make any future parsing easier)

I would have expected the perf and system targets to be listed via playlist files (which do not use a different file for each version as the problem lists do).

I considered that, but opted not to because I'm unaware of a mechanism for disabling targets via the playlist files sometimes. As far as I know, for a given vendor, platform, and version, a target is either enabled or disabled. An openjdk test can include an extra exclude file dynamically, so we can choose whether to exclude the test in some runs or not.

I guess it depends on what we intend to do with these lists. If we are adding a feature to exclude unreliable targets on the fly, or post-process failures against a list, we have to consider what format to use, and how to organize them within the directory (as different vendors may eventually have different lists).

My first priority with this task was to provide a unified list of all intermittently failing tests.

The lists should use a single format to allow parsing and easy-reading.
The lists should be stored in the repository to allow use at runtime.

Specific uses were a secondary priority. My first thought was that they could be used as exclude files during releases (or during reruns, or reruns during a release), but I didn't want to sidetrack myself looking too much into applications. I figured that if we did pick an application for this data, the best thing I can do right now is to use a consistent format across the board, to make conversion/parsing easier later on.

I'll add a note to the General Retrospective to start discussion into some of our options here.

smlambert · 2025-08-11T14:25:10Z

openjdk jtreg problemlists are the 'odd man out'... no other test group uses that format

I do not want to store non-openjdk test material in problemlist format as it might lead people to think that is the commonly used format, which it is not.

Agree that we need to have a requirements gathering session and design discussion on this (in terms of formats, and how to structure and where to locate such files). If this is merely a form of documentation, and not a plan for using for debug / temp exclude, then likely all could be in a doc folder.

adamfarley · 2025-08-11T14:52:57Z

openjdk jtreg problemlists are the 'odd man out'... no other test group uses that format

I do not want to store non-openjdk test material in problemlist format as it might lead people to think that is the commonly used format, which it is not.

Fair. Perhaps I was over-focused on the OpenJDK ProblemList format due to the majority of the intermittent failures being OpenJDK tests.

Agree that we need to have a requirements gathering session and design discussion on this (in terms of formats, and how to structure and where to locate such files). If this is merely a form of documentation, and not a plan for using for debug / temp exclude, then likely all could be in a doc folder.

In my mind, this is a debug/exclude tool first. Mind if I get the ball rolling on Slack?

smlambert · 2025-09-10T14:20:10Z

Thanks for focussing on the intermittent tests for openjdk @adamfarley ! One last request, can you please put the unreliables dir under the excludes dir (alongside vendors and alpine dirs), thanks! (You'll need to update the link in your README once reorganized).

adamfarley · 2025-09-10T14:26:36Z

The perf and system tests are a goal for later, so I've split those files out into #6571

The work to exclude these tests at runtime (and the discussion around that) will be in a third PR so we can get this PR merged and ready for reference.

adamfarley · 2025-09-10T14:31:10Z

Thanks for focussing on the intermittent tests for openjdk @adamfarley ! One last request, can you please put the unreliables dir under the excludes dir (alongside vendors and alpine dirs), thanks! (You'll need to update the link in your README once reorganized).

Sure.

Note: I see this displeases the disableTestsLinter. Looking into this now.

adamfarley · 2025-09-10T14:48:03Z

@sophia-guo

Heya. I see that the exclude_parser.py specifies that every uncommented line needs to be exactly 3 elements.

In the jtreg FAQ, it allows for the 4+ elements to be comments.

What do you think about me changing that code to be a minimum of 3, rather than exactly 3?

(I've add this change to the PR. Let me know your thoughts.)

adamfarley · 2025-09-10T14:50:09Z

Our other option, of course, it to strip off the date. I think it's useful information and saves anyone having to access every link to get a sense for the oldest intermittent tests, but perhaps that information doesn't have as much significance as I presume it has.

sophia-guo · 2025-09-10T15:53:39Z

@adamfarley yes, you can do that. Though personally I think description item is unnecessary as the second element of links( In adoptium we use links as bug information) should already have enough information include description.

adamfarley · 2025-09-11T12:59:40Z

@adamfarley yes, you can do that. Though personally I think description item is unnecessary as the second element of links( In adoptium we use links as bug information) should already have enough information include description.

Okie dokie. I've spoken with Shelley as well, and her take was that keeping each line to 3 elements is a good way to keep the files clean, so I'll undo the change to the parser and strip out the dates.

karianna · 2025-09-16T08:29:12Z

@adamfarley will need a small rebase

adamfarley · 2025-09-16T11:01:30Z

@karianna - The conflict has been resolved. Thanks for pointing it out.

I've also renamed the alpine exclude files and corrected other pieces of code so this doesn't break anything.

sophia-guo · 2025-11-14T20:19:06Z

I noticed a few problemlist files under unreliable/apline/ are blank. Are those files just place holder? Place holders aren't necessary. See my former comment #6462 (comment)

karianna · 2025-11-17T02:59:35Z

@adamfarley will need a rebase also

adamfarley · 2025-11-17T09:25:26Z

I noticed a few problemlist files under unreliable/apline/ are blank. Are those files just place holder? Place holders aren't necessary. See my former comment #6462 (comment)

Good spot. I've removed the placeholder files.

This change lists known unreliable tests in a consistent format. It also retrieves all ProblemLists we may need at runtime, and limits the files we retrieve to the versions we need. Otherwise we're fetching 70+ exclude files. Signed-off-by: Adam Farley <adfarley@ibm.com>

adamfarley · 2025-11-18T15:32:38Z

Thanks @sophia-guo and @smlambert 😎

Inadvertently removed by adoptium#6472. Signed-off-by: Keith W. Campbell <keithc@ca.ibm.com>

Inadvertently removed by #6472. Signed-off-by: Keith W. Campbell <keithc@ca.ibm.com>

adamfarley changed the title ~~WIP: Creating unreliables files and adding first set of infrequent fails~~ Creating lists of intermittent tests Aug 11, 2025

adamfarley changed the title ~~Creating lists of intermittent tests~~ Creating lists of intermittently failing tests Aug 11, 2025

smlambert requested changes Aug 11, 2025

View reviewed changes

adamfarley mentioned this pull request Aug 11, 2025

General Retrospective for July 2025 Release adoptium/temurin#84

Closed

8 tasks

adamfarley added this to 2025 Adoptium Plan Sep 9, 2025

adamfarley removed this from 2025 Adoptium Plan Sep 9, 2025

adamfarley mentioned this pull request Sep 10, 2025

WIP: Documenting intermittently failing tests in perf and system #6571

Open

3 tasks

adamfarley requested a review from sophia-guo September 10, 2025 15:40

adamfarley mentioned this pull request Sep 10, 2025

check duplicate exclusion cases in ProblemList files #5874

Open

karianna requested a review from smlambert September 10, 2025 21:21

smlambert approved these changes Sep 16, 2025

View reviewed changes

adamfarley force-pushed the intermittent_test_failures_list branch 2 times, most recently from 13bf5f0 to fa5d3eb Compare November 17, 2025 11:09

adamfarley force-pushed the intermittent_test_failures_list branch from 69cf1e0 to c190a75 Compare November 17, 2025 13:41

sophia-guo approved these changes Nov 18, 2025

View reviewed changes

sophia-guo merged commit 04af55d into adoptium:master Nov 18, 2025
3 checks passed

keithc-ca added a commit to keithc-ca/aqa-tests that referenced this pull request Nov 18, 2025

Restore wildcard in ProblemList pattern

92a59ad

Inadvertently removed by adoptium#6472. Signed-off-by: Keith W. Campbell <keithc@ca.ibm.com>

keithc-ca mentioned this pull request Nov 18, 2025

Restore wildcard in ProblemList pattern #6732

Merged

smlambert pushed a commit that referenced this pull request Nov 18, 2025

Restore wildcard in ProblemList pattern (#6732)

7a2dd1a

Inadvertently removed by #6472. Signed-off-by: Keith W. Campbell <keithc@ca.ibm.com>

Uh oh!

Conversation

adamfarley commented Jul 31, 2025

Uh oh!

adamfarley commented Aug 11, 2025

Uh oh!

smlambert commented Aug 11, 2025

Uh oh!

smlambert left a comment

Choose a reason for hiding this comment

Uh oh!

adamfarley commented Aug 11, 2025

Uh oh!

smlambert commented Aug 11, 2025

Uh oh!

adamfarley commented Aug 11, 2025

Uh oh!

smlambert commented Sep 10, 2025

Uh oh!

adamfarley commented Sep 10, 2025

Uh oh!

adamfarley commented Sep 10, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

adamfarley commented Sep 10, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

adamfarley commented Sep 10, 2025

Uh oh!

sophia-guo commented Sep 10, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

adamfarley commented Sep 11, 2025

Uh oh!

karianna commented Sep 16, 2025

Uh oh!

adamfarley commented Sep 16, 2025

Uh oh!

sophia-guo commented Nov 14, 2025

Uh oh!

karianna commented Nov 17, 2025

Uh oh!

adamfarley commented Nov 17, 2025

Uh oh!

Uh oh!

adamfarley commented Nov 18, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

adamfarley commented Sep 10, 2025 •

edited

Loading

adamfarley commented Sep 10, 2025 •

edited

Loading

sophia-guo commented Sep 10, 2025 •

edited

Loading