Skip to content

[META] Eliminate flakiness in :server:internalClusterTest #18108

@andrross

Description

@andrross

Please describe the end goal of this project

We have no shortage of issues related to flaky tests. I'm creating this new issue with a more narrow focus (only :server:internalClusterTest) and specific goal (eliminate current flakiness). The intent is to get the most problematic flakiness back under control to make merging PRs a less miserable experience, while we continue to iterate on new mechanisms in #17974

Supporting References

Flakiness of :server:internalClusterTest is simple to measure, and I'll continue posting updates on this issue to track progress.

Test Environment:

  • OS: Ubuntu 24.04.2 LTS
  • Host type: m8g.4xlarge (EC2)
  • JDK: Temurin-21.0.5+11

Test procedure:

% export RESULT_DIR=~/test-results-$(date +"%Y-%m-%d")-$(git rev-parse --verify HEAD --short=8)
% mkdir $RESULT_DIR
% for i in `seq 0 100` ; do ./gradlew ':server:internalClusterTest' 2> $RESULT_DIR/server_internalClusterTest-$(date +"%Y-%m-%d_%H-%M-%S") ; done

Count failures:

tail -n1 $RESULT_DIR/* | grep FAILED  | wc -l

Count number of runs:

ls $RESULT_DIR | wc -l

Display test failures by count:

grep '^REPROD' $RESULT_DIR/* | cut -d ' ' -f6 | sort | uniq -c | sort -rn

Issues

Related to #17974

Related component

Build

Metadata

Metadata

Assignees

No one assigned

    Labels

    :testAdding or fixing a testMetaMeta issue, not directly linked to a PRflaky-testRandom test failure that succeeds on second run

    Type

    No type

    Projects

    Status

    New

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions