[Tests] Make testEngineGCDeletesSetting deterministic#38942
[Tests] Make testEngineGCDeletesSetting deterministic#38942matriv merged 7 commits intoelastic:masterfrom
Conversation
`InternalEngine.resolveDocVersion()` uses `relativeTimeInMillis()` from `ThreadPool` so it needs, the cached time to be advanced. Add a check to ensure that and decrease the `thread_pool.estimated_time_interval` to 1msec to prevent long running times for the test. Fixes: elastic#38874 Co-authored-by: Boaz Leskes <b.leskes@gmail.com>
|
Pinging @elastic/es-distributed |
jasontedor
left a comment
There was a problem hiding this comment.
Thanks for diving into this. I left some questions.
| @Override | ||
| protected Settings nodeSettings(int nodeOrdinal) { | ||
| return Settings.builder().put(super.nodeSettings(nodeOrdinal)) | ||
| .put("thread_pool.estimated_time_interval", TimeValue.timeValueMillis(1)) |
There was a problem hiding this comment.
This effectively makes this thread busy spin for this entire test suite. On a machine with a low core count or that is otherwise busy (think how we fork tests across JVMs) it might be too much?
There was a problem hiding this comment.
We can increase it a bit I guess. The reason of changing the setting here, is to avoid adding a public test-only method to force update the cached time -> to avoid too many iterations in the while loop.
There was a problem hiding this comment.
Maybe we can change the logic of the cache - if this is 0, it disables caching.
| } | ||
|
|
||
| // delete is should not be in cache | ||
| assertThrows(client().prepareIndex("test", "type", "1").setSource("f", 3).setIfSeqNo(seqNo).setIfPrimaryTerm(primaryTerm), |
There was a problem hiding this comment.
Wouldn’t it be enough to assert busy and remove the sleep?
There was a problem hiding this comment.
I guess so, this solution just makes more visible of what's happening and less "brute force".
There was a problem hiding this comment.
The goal of the test is to make sure that once time moves the delete is forgotten. If we busy spin on the indexing request (instead of on time - which is what I think Jason refers to with sleep), we will have different semantics as some indexing ops may succeed, changing the dynamics of the test (it now will check that a CASed index operation fails if it's base is an index op, rather than a delete op).
There was a problem hiding this comment.
ah, yeah. Didn't even think that the assertBusy would potentially execute the prepareIndex request multiple times.
There was a problem hiding this comment.
@bleskes Sorry to be unclear. I meant busy spin waiting for the cached time to advance. So instead of sleeping for it to happen, assert that it has happened, busily since it happens in the background.
There was a problem hiding this comment.
@jasontedor You mean this: 9e8c531#diff-2f68a8c77a0935a6ec75d0cc4878e86aR466
as also proposed by @DaveCTurner ?
DaveCTurner
left a comment
There was a problem hiding this comment.
I left some suggestions, but this looks generally good to me.
| */ | ||
| long relativeTimeInMillis() { | ||
| return relativeMillis; | ||
| if (running) { |
There was a problem hiding this comment.
I'd slightly prefer
| if (running) { | |
| if (0 < interval) { |
since this is what the Javadoc says.
There was a problem hiding this comment.
I chose to use the boolean running (properly set in the constructor) to avoid the comparison on every call, and it's used in the same way to avoid having a "running" thread that updates those values.
There was a problem hiding this comment.
I don't clearly understand why avoiding the comparison is a good thing. Is it a performance question? Are you sure that a comparison between a constant and a final field is worse than a second volatile read?
There was a problem hiding this comment.
My thinking was yes the performance, but didn't think about the volatile.
But also to have the logic that when interval == 0 -> running = false so that we avoid having a running thread and we have a common check in all 3 places (the methods and the run()).
If you think that the interval check is better, I'll happily change. And if we do that, should the code in the run() change like: while(running && 0 < interval) ? So that the running flag is just an external means of controlling the thread and not mixed up with the interval value?
There was a problem hiding this comment.
Yep, sounds like a good idea, thanks.
| */ | ||
| long absoluteTimeInMillis() { | ||
| return absoluteMillis; | ||
| if (running) { |
There was a problem hiding this comment.
Similarly, I'd prefer
| if (running) { | |
| if (0 < interval) { |
| for (ThreadPool tPool : internalCluster().getInstances(ThreadPool.class)) { | ||
| long time1 = tPool.relativeTimeInMillis(); | ||
| long time2 = tPool.relativeTimeInMillis(); | ||
| while (time1 == time2) { |
There was a problem hiding this comment.
I think I'd have written this loop as the following, since there's no need for it to be a tight loop:
assertBusy(() -> assertThat(tPool.relativeTimeInMillis(), greaterThan(time1)));
It is, admittedly, somewhat of a question of taste.
|
@jasontedor @bleskes @DaveCTurner Thank you for the feedback! Please check again. |
| } | ||
|
|
||
| // delete is should not be in cache | ||
| assertThrows(client().prepareIndex("test", "type", "1").setSource("f", 3).setIfSeqNo(seqNo).setIfPrimaryTerm(primaryTerm), |
There was a problem hiding this comment.
@bleskes Sorry to be unclear. I meant busy spin waiting for the cached time to advance. So instead of sleeping for it to happen, assert that it has happened, busily since it happens in the background.
| Thread.sleep(300); // wait for cache time to change TODO: this needs to be solved better. To be discussed. | ||
|
|
||
| // Make sure the time has advanced for InternalEngine#resolveDocVersion() | ||
| for (ThreadPool tPool : internalCluster().getInstances(ThreadPool.class)) { |
There was a problem hiding this comment.
Can we please use names consistent with the style in the codebase? For example, this would be threadPool.
|
|
||
| // Make sure the time has advanced for InternalEngine#resolveDocVersion() | ||
| for (ThreadPool tPool : internalCluster().getInstances(ThreadPool.class)) { | ||
| long time1 = tPool.relativeTimeInMillis(); |
There was a problem hiding this comment.
Can we use a clearer name than time1?
|
@elasticmachine run elasticsearch-ci/2 |
|
@jasontedor please check again |
| * Return the current time used for relative calculations. This is | ||
| * {@link System#nanoTime()} truncated to milliseconds. | ||
| * <p> | ||
| * If {@link ThreadPool#ESTIMATED_TIME_INTERVAL_SETTING} is set to 0 |
There was a problem hiding this comment.
Should we make ThreadPool#ESTIMATED_TIME_INTERVAL_SETTING a setting that has 0 as an inclusive lower bound? I am fine with that in a follow up.
`InternalEngine.resolveDocVersion()` uses `relativeTimeInMillis()` from `ThreadPool` so it needs, the cached time to be advanced. Add a check to ensure that and decrease the `thread_pool.estimated_time_interval` to 1msec to prevent long running times for the test. Fixes: elastic#38874 Co-authored-by: Boaz Leskes <b.leskes@gmail.com>
`InternalEngine.resolveDocVersion()` uses `relativeTimeInMillis()` from `ThreadPool` so it needs, the cached time to be advanced. Add a check to ensure that and decrease the `thread_pool.estimated_time_interval` to 1msec to prevent long running times for the test. Fixes: elastic#38874 Co-authored-by: Boaz Leskes <b.leskes@gmail.com>
`InternalEngine.resolveDocVersion()` uses `relativeTimeInMillis()` from `ThreadPool` so it needs, the cached time to be advanced. Add a check to ensure that and decrease the `thread_pool.estimated_time_interval` to 1msec to prevent long running times for the test. Fixes: #38874 Co-authored-by: Boaz Leskes <b.leskes@gmail.com>
`InternalEngine.resolveDocVersion()` uses `relativeTimeInMillis()` from `ThreadPool` so it needs, the cached time to be advanced. Add a check to ensure that and decrease the `thread_pool.estimated_time_interval` to 1msec to prevent long running times for the test. Fixes: #38874 Co-authored-by: Boaz Leskes <b.leskes@gmail.com>
* elastic/master: Ensure index commit released when testing timeouts (elastic#39273) Avoid using TimeWarp in TransformIntegrationTests. (elastic#39277) Fixed missed stopping of SchedulerEngine (elastic#39193) [CI] Mute CcrRetentionLeaseIT.testRetentionLeaseIsRenewedDuringRecovery (elastic#39269) Muting AutoFollowIT.testAutoFollowManyIndices (elastic#39264) Clarify the use of sleep in CCR test Fix testCannotShrinkLeaderIndex (elastic#38529) Fix CCR tests that manipulate transport requests Align generated release notes with doc standards (elastic#39234) Mute test (elastic#39248) ReadOnlyEngine should update translog recovery state information (elastic#39238) Wrap accounting breaker check in assertBusy (elastic#39211) Simplify and Fix Synchronization in InternalTestCluster (elastic#39168) [Tests] Make testEngineGCDeletesSetting deterministic (elastic#38942) Extend nextDoc to delegate to the wrapped doc-value iterator for date_nanos (elastic#39176) Change ShardFollowTask to reuse common serialization logic (elastic#39094) Replace superfluous usage of Counter with Supplier (elastic#39048) Disable bwc tests for elastic#39094
`InternalEngine.resolveDocVersion()` uses `relativeTimeInMillis()` from `ThreadPool` so it needs, the cached time to be advanced. Add a check to ensure that and decrease the `thread_pool.estimated_time_interval` to 1msec to prevent long running times for the test. Fixes: elastic#38874 Co-authored-by: Boaz Leskes <b.leskes@gmail.com>
`InternalEngine.resolveDocVersion()` uses `relativeTimeInMillis()` from `ThreadPool` so it needs, the cached time to be advanced. Add a check to ensure that and decrease the `thread_pool.estimated_time_interval` to 1msec to prevent long running times for the test. Fixes: elastic#38874 Co-authored-by: Boaz Leskes <b.leskes@gmail.com>
`InternalEngine.resolveDocVersion()` uses `relativeTimeInMillis()` from `ThreadPool` so it needs, the cached time to be advanced. Add a check to ensure that and decrease the `thread_pool.estimated_time_interval` to 1msec to prevent long running times for the test. Fixes: #38874 Co-authored-by: Boaz Leskes <b.leskes@gmail.com>
InternalEngine.resolveDocVersion()usesrelativeTimeInMillis()fromThreadPoolso it needs, the cached time to be advanced. Add a checkto ensure that and decrease the
thread_pool.estimated_time_intervalto 1msec to prevent long running times for the test.
Fixes: #38874