Include a new downsampling operation#1574
Include a new downsampling operation#1574salvatore-campagna merged 28 commits intoelastic:masterfrom
Conversation
The new downsampling operation allows Rally to hit the Elasticsearch /<source-index>/_rollup/<target-index> endpoint to start a downsampling operation on the source index. Expected parameters include: * fixed_interval: the aggregation granularity using the same interval definition used by date histograms * source-index: the index to downsample applying the aggregation * target-index: the index created as a result of the aggregation on the @timestamp field
We would like to benchmark the downsample operation so we make it a non-administrative operation. Also we don't want it to be retried if it fails to avoid latency to be affected by retries.
DJRickyB
left a comment
There was a problem hiding this comment.
Left some suggestions.
One optional thing to consider that might be cool/useful is to add a parameter source for this operation, that can provide defaults for source_index and target_index for ease-of-use. We have helper function params.get_target(track, params) for this purpose in param sources, and your target-index could just default to some fixed pattern like f"{source-index}_{fixed-interval}"
Co-authored-by: Rick Boyd <boyd.richardj@gmail.com>
Co-authored-by: Rick Boyd <boyd.richardj@gmail.com>
|
@elasticmachine run rally/rally-tracks-compat |
|
The CI is failing with |
|
This means that test is now passing! It should be removed from elastic/rally-tracks@117f306 after merging this pull request. Or we could change that line in rally-tracks to See https://docs.pytest.org/en/7.1.x/how-to/skipping.html for details on that pytest fixture. |
Co-authored-by: Rick Boyd <boyd.richardj@gmail.com>
docs/track.rst
Outdated
| """""""""" | ||
|
|
||
| * ``fixed_interval`` (optional, defaults to ``1h``): The aggregation interval key defined as in `https://www.elastic.co/guide/en/elasticsearch/reference/current/search-aggregations-bucket-datehistogram-aggregation.html#fixed_intervals`. | ||
| * ``source-index`` (optional, defaults to ``tsdb-index``): The index containing data to aggregate which includes a ``@timestamp`` field. Note that this index should be marked read-only prior to the execution of this operation. |
There was a problem hiding this comment.
| * ``source-index`` (optional, defaults to ``tsdb-index``): The index containing data to aggregate which includes a ``@timestamp`` field. Note that this index should be marked read-only prior to the execution of this operation. | |
| * ``source-index`` (optional): The index containing data to aggregate which includes a ``@timestamp`` field. Note that this index should be marked read-only prior to the execution of this operation. If there is only one index defined in the ``indices`` of the track definition, that will be used as the default. |
| assert p["source-index"] == "tsdb" | ||
| assert p["target-index"] == "tsdb-1m" | ||
|
|
||
| def test_downsample_empty_params(self): |
There was a problem hiding this comment.
Not sure what this test buys us? Checked with a debugger and captured the value of p:
{'fixed-interval': '1h', 'source-index': None, 'target-index': 'None-1h', 'request-timeout': None, 'headers': None, 'opaque-id': None}Might be better instead to have the Track constructor call include an indices param with value like [{"name": "test-source-index", "body": "index.json"}] and assert that source-index gets provided as test-source-index
tests/track/params_test.py
Outdated
|
|
||
| p = source.params() | ||
|
|
||
| assert p["fixed-interval"] != "" |
There was a problem hiding this comment.
This should be the documented default, and should need changing if the default changes
DJRickyB
left a comment
There was a problem hiding this comment.
Still have 3 "unresolved conversations" on this PR, relating to documentation update and hardening testing the defaults
tests/track/params_test.py
Outdated
| p = source.params() | ||
|
|
||
| assert p["fixed-interval"] == "1h" | ||
| assert p["source-index"] != "" |
There was a problem hiding this comment.
To put my above comment slightly differently, this assert only passes because the source-index is None instead of "", technically different but not a going concern. You've covered my larger concern in test_downsample_default_index_param, where you ensure there's a default when there is supposed to be a default, so perhaps this can just be deleted and the test is left to assert the default fixed-interval?
DJRickyB
left a comment
There was a problem hiding this comment.
This LGTM, thank you for iterating
The new downsampling operation allows Rally to hit the Elasticsearch
/<source-index>/_downsample/<target-index>endpoint to start a downsampling operation on the source index. Expected parameters include:fixed_interval: the aggregation granularity using the same interval definition used by date histogramssource-index: the index to downsample applying the aggregationtarget-index: the index created as a result of the aggregation on the timestamp fieldExample: aggregate the index
test-source-indexon the timestamp field using a 1 minute interval and store the result in a new index calledtest-target-index.