Exclude tasks based on serverless status#1760
Conversation
gbanasiak
left a comment
There was a problem hiding this comment.
The overall approach looks great. I did first round of testing and found one issue with custom operations (they are filtered, but shouldn't).
esrally/rally.py
Outdated
| cfg.add(config.Scope.applicationOverride, "system", "list.to_date", args.to_date) | ||
| configure_mechanic_params(args, cfg, command_requires_car=False) | ||
| configure_track_params(arg_parser, args, cfg, command_requires_track=False) | ||
| configure_default_serverless_params(cfg) |
There was a problem hiding this comment.
Is there a benefit of calling configure_default_serverless_params 3 times in list, delete and info instead of calling it once at the top of dispatch_sub_command ?
There was a problem hiding this comment.
Not really, it felt weird to set those values for commands where it's not needed. Happy to change this.
There was a problem hiding this comment.
I'm happy to keep current form as long as all sub-commands have been tested. Our test coverage is not complete hence the thought of moving it to the top.
|
Forgot to mention I also went through all operation types and compared with serverless protection status as of July 27th and found no discrepancies. |
This is not required as it's the whole tuple that needs to be unique, but it's less confusing that way.
dliappis
left a comment
There was a problem hiding this comment.
I am probably missing some background which is not covered in the PR description, so I may be barking at the wrong tree, but I had issues testing this PR.
I've tested this with the PMC track that has a challenge with a parallel operation by explicitly setting
serverless.operatortoFalse:
It's unclear to me how to reproduce what you did here.
Do I have to set the corresponding setting in rally.ini as supported since #1756?
Looking at https://github.com/elastic/rally/pull/1760/files#diff-9cd59b2fffb801ee46e1819d3dc79a407b1b5784a09d535b3ff68820f9c4102dR202 it appears it's not needed to be done explicitly any more.
But in any case with (see below) or without this setting
$ tail -4 ~/.rally/rally.ini
[driver]
serverless.operator=False
serverless.mode=True
and targeting a serverless ES using the pmc track I see:
[INFO] Race id is [8aebdb5b-d810-4a82-aa50-b96b9590ef3f]
[INFO] Treating parallel task in challenge [indexing-querying] as public.
[INFO] Downloading track data (5.5 GB total size) [100.0%]
[INFO] Decompressing track data from [/home/esbench/.rally/benchmarks/data/pmc/documents.json.bz2] to [/home/esbench/.rally/benchmarks/data/pmc/documents.json] (resulting size: [21.66] GB) ... [OK]
[INFO] Preparing file offset table for [/home/esbench/.rally/benchmarks/data/pmc/documents.json] ... [OK]
[INFO] Racing on track [pmc], challenge [indexing-querying] and car ['external'] with version [8.10.0].
which differs from your output, namely doesn't show:
[INFO] Excluding [put-settings], [check-cluster-health], [force-merge], [wait-until-merges-finish] as challenge [indexing-querying] is run on serverless.
Do I also need a modification in the track and/or pass the track parameter run-on-serverless that is introduced in this PR?
|
You do need to manually set |
Where, in |
Ok after explicitly setting It's unclear to me why explicitly setting that value in |
|
I fixed reading from rally.ini. Please take another look! |
tests/track/loader_test.py
Outdated
| } | ||
| reader = loader.TrackSpecificationReader() | ||
| full_track = reader("unittest", copy.deepcopy(track_specification), "/mappings") | ||
| filtered = self.filter(full_track, serverless_mode=True, serverless_operator=True) |
There was a problem hiding this comment.
Nit: can we use filtered_track consistently between the tests?
b-deam
left a comment
There was a problem hiding this comment.
Nice work! I really like how readable it is, given the complexities here.
I tested it using this invocation:
$ esrally race --pipeline=benchmark-only --client-options='{"default": { "use_ssl": true, "verify_certs": false, "basic_auth_user":"elastic","basic_auth_password":"changeme", "timeout": 60, "headers": { "X-Found-Cluster": "b1ec3121-f6d4-4f62-9e57-422581b3bb65.es"}}}' --target-hosts="https://my.elb.eu-west-1.amazonaws.com" --track-params="number_of_replicas:1,bulk_indexing_clients:4" --track nyc_taxis --challenge append-no-conflicts --kill-running-processes --telemetry=blob-store-stats --test-mode
With serverless.operator=True:
[...]
[INFO] Race id is [8a0017a4-fb1a-4392-9ab2-dc9e505c403d]
[INFO] Racing on track [nyc_taxis], challenge [append-no-conflicts] and car ['external'] with version [8.10.0].
Running delete-index [100% done]
Running create-index [100% done]
Running check-cluster-health [100% done]
Running index [100% done]
Running refresh-after-index [100% done]
Running force-merge [100% done]
Running refresh-after-force-merge [100% done]
Running wait-until-merges-finish [100% done]
Running default_no_target [100% done]
Running default [100% done]
Running default_1k [100% done]
Running range [100% done]
Running distance_amount_agg [100% done]
Running autohisto_agg [100% done]
Running date_histogram_agg [100% done]
[...]
and with serverless.operator=False:
[...]
[INFO] Race id is [66e95a15-78b9-4815-a710-ad6c0d298865]
[INFO] Excluding [check-cluster-health], [force-merge], [wait-until-merges-finish] as challenge [append-no-conflicts] is run on serverless.
[INFO] Racing on track [nyc_taxis], challenge [append-no-conflicts] and car ['external'] with version [8.10.0].
Running delete-index [100% done]
Running create-index [100% done]
Running index [100% done]
Running refresh-after-index [100% done]
Running refresh-after-force-merge [100% done]
Running default_no_target [100% done]
Running default [100% done]
Running default_1k [100% done]
Running range [100% done]
Running distance_amount_agg [100% done]
Running autohisto_agg [100% done]
Running date_histogram_agg [100% done]
[...]
When running Rally benchmarks on Elasticsearch Serverless, some operations do not make sense depending on the operator status, such as force merge. To make running benchmarks simpler, this pull requests skips some operations automatically based on the operator status of the user.
I've tested this with the PMC track that has a challenge with a parallel operation by explicitly setting
serverless.operatortoFalse:It prints the following at the beginning of the race:
TODO:
Sub commands to test: