Skip to content

Comments

Fix TestNodeStateManager flakiness#28055

Merged
findepi merged 1 commit intotrinodb:masterfrom
findepi:findepi/fix-testnodestatemanager-flakiness-2805e7
Jan 30, 2026
Merged

Fix TestNodeStateManager flakiness#28055
findepi merged 1 commit intotrinodb:masterfrom
findepi:findepi/fix-testnodestatemanager-flakiness-2805e7

Conversation

@findepi
Copy link
Member

@findepi findepi commented Jan 30, 2026

TestNodeStateManager.testDrain was flaky when run together with TestHashJoinOperator. TestHashJoinOperator creates executor and schedules some tasks. Some of these tasks schedule further tasks. Some tasks fail with RejectedExecutionException. TestNodeStateManager used (a shaded version of) awaitility library. This library, by default, installs global uncaught exception handler and was intercepting failures occurring in threads spawned by TestHashJoinOperator and propagating them to unsuspecting TestNodeStateManager.

@brybacki
Copy link
Contributor

Nice work, Awesome!

It also means, that a few blind fixes that we did in the past (like extending timeouts), were probably not needed.

() -> nodeStateManager.getServerState().equals(DRAINED))
.until(() -> nodeStateManager.getServerState().equals(DRAINING));
AtomicBoolean everDrained = new AtomicBoolean(false);
assertEventually(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll take a second look at this condition. I do not remember what was my intention. But probably it does not need to be complex.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

During means: value must be maintained CONTINUOUSLY for the period given to "during" method.
So I think your code skips that part. But as I mentioned before, we probably do not need to verify such state.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

how should i have written this?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it is actually ok. Sorry for the fuss.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

discussed offline, it's apparently fine as is

@brybacki
Copy link
Contributor

I think we could achieve the same by changing from the default config in Awaitility catchUncaughtExceptionsByDefault, doNotCatchUncaughtExceptionsByDefault or ignoreExceptionsByDefault. But if there is a tool in trino that we have full control of, then lets just use it.

LGTM!

`TestNodeStateManager.testDrain` was flaky when run together with
`TestHashJoinOperator`. `TestHashJoinOperator` creates executor and
schedules some tasks. Some of these tasks schedule further tasks. Some
tasks fail with `RejectedExecutionException`. `TestNodeStateManager`
used (a shaded version of) awaitility library. This library, by default,
installs global uncaught exception handler and was intercepting failures
occurring in threads spawned by `TestHashJoinOperator` and propagating
them to unsuspecting `TestNodeStateManager`.
@findepi findepi force-pushed the findepi/fix-testnodestatemanager-flakiness-2805e7 branch from ab7ac82 to 6a0602a Compare January 30, 2026 13:24
@findepi
Copy link
Member Author

findepi commented Jan 30, 2026

I think we could achieve the same by changing from the default config in Awaitility catchUncaughtExceptionsByDefault, doNotCatchUncaughtExceptionsByDefault or ignoreExceptionsByDefault.

yes, but

  • we should not be using Awaitility shaded in testcontainers anyway, so a change is due anyway
  • we need to somehow protect against Awaitility default behavior. Denying use of Awaitility is the simplest way to achieve that. I actually don't know how we could do better.

@findepi findepi merged commit d48da09 into trinodb:master Jan 30, 2026
86 of 97 checks passed
@findepi findepi deleted the findepi/fix-testnodestatemanager-flakiness-2805e7 branch January 30, 2026 13:53
@github-actions github-actions bot added this to the 480 milestone Jan 30, 2026
@ebyhr ebyhr mentioned this pull request Jan 30, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Development

Successfully merging this pull request may close these issues.

Flaky test TestNodeStateManager

3 participants