Skip to content

Update track-shared-logsdb-mode component template for elastic/logs and elastic/security tracks#1097

Merged
martijnvg merged 11 commits intoelastic:masterfrom
martijnvg:use_large_binary_blocks_serverless
Mar 31, 2026
Merged

Update track-shared-logsdb-mode component template for elastic/logs and elastic/security tracks#1097
martijnvg merged 11 commits intoelastic:masterfrom
martijnvg:use_large_binary_blocks_serverless

Conversation

@martijnvg
Copy link
Copy Markdown
Member

@martijnvg martijnvg commented Mar 19, 2026

  1. Removed the dependency to index_mode track param to other track params. Which I think was never intended and is undocumented. The used track params in these files are valid outside index_mode.
  2. Allow index.use_time_series_doc_values_format_large_binary_block_size also when serverless_operator == true, so that we can see the effect in serverless.
  3. Added use_time_series_doc_values_format track param.

@martijnvg
Copy link
Copy Markdown
Member Author

martijnvg commented Mar 19, 2026

I verified locally that this works by running:

  • esrally race --track-path=/Users/mvg/dev/code/rally-tracks/elastic/security --preserve-install --on-error=abort --kill-running-processes --pipeline=benchmark-only --target-host=localhost:9200 --track-params="wait_for_status:yellow" --test-mode
  • esrally race --track-path=/Users/mvg/dev/code/rally-tracks/elastic/logs --preserve-install --on-error=abort --kill-running-processes --challenge=logging-querying --track-params="wait_for_status:yellow, bulk_start_date:2020-01-01, bulk_end_date:2020-01-02, raw_data_volume_per_day:10GB, max_generated_corpus_size:4GB, max_total_download_gb:4, number_of_replicas:0, number_of_shards:1" --pipeline=benchmark-only --target-host=localhost:9200 --test-mode

{% if index_mode %}
"index": {
"mode": {{ index_mode | tojson }}
{% if use_doc_values_skipper | default(true) %}
Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Moving this setting to the top, which avoids the if conditions when the add , in front if each setting. The currrent logic always prints the mapping.use_doc_values_skipper index setting.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, that makes sense - but you need to be aware that you would not be able to use any other index settings, if the use_doc_values_skipper were to be set to false. You could extend the endif to the end of index settings, that way an invalid json couldnt be generated - I wonder though if we should try and investigate (outside of this PR), if there's a way we can make this easier - we have similar issues in other tracks, though we tend to get around it by setting the final item in index settings so it always is printed, thus you can include a comma at the end of other lines -> https://github.com/elastic/rally-tracks/blob/master/github_archive/index-template.json#L38

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I need to update this again, the use_doc_values_skipper isn't a serverless public setting.
I will add logic to conditionally add ,, like in the other file.

@martijnvg martijnvg changed the title Allow index.use_time_series_doc_values_format_large_binary_block_size when serverless_operator == true Update track-shared-logsdb-mode component template for elastic/logs and elastic/security track.s Mar 19, 2026
@martijnvg martijnvg requested a review from gareth-ellis March 19, 2026 16:02
Copy link
Copy Markdown
Member

@gareth-ellis gareth-ellis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM - it would be nice if we could find a way to avoid requiring certain parameters to be set in a certain way to ensure we end up with valid json. Theres some CI issues we need to resolve, too, i'll take a quick look and see if I can work out whats going wrong

{% if index_mode %}
"index": {
"mode": {{ index_mode | tojson }}
{% if use_doc_values_skipper | default(true) %}
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, that makes sense - but you need to be aware that you would not be able to use any other index settings, if the use_doc_values_skipper were to be set to false. You could extend the endif to the end of index settings, that way an invalid json couldnt be generated - I wonder though if we should try and investigate (outside of this PR), if there's a way we can make this easier - we have similar issues in other tracks, though we tend to get around it by setting the final item in index settings so it always is printed, thus you can include a comma at the end of other lines -> https://github.com/elastic/rally-tracks/blob/master/github_archive/index-template.json#L38

@martijnvg
Copy link
Copy Markdown
Member Author

@gareth-ellis I think I fixed invalid json error issues. However I don't understand why the current ci jobs have failed.

For example:


self = <it_tracks_serverless.test_logs.TestLogs object at 0x7c16c1b92990>, operator = True, rally = <pytest_rally.rally.Rally object at 0x7c16c1bf06e0>
--
project_config = ServerlessProjectConfig(target_host='rally-tracks-it-serverless-3775-ba1c72.es.eu-west-1.aws.qa.elastic.cloud:443', us...er.json'), operator_client_options_file=local('/tmp/pytest-of-buildkite-agent/pytest-0/client-options0/operator.json'))
 
def test_logs_default(self, operator, rally, project_config: ServerlessProjectConfig):
ret = rally.race(
track="elastic/logs",
challenge="logging-indexing",
track_params="number_of_replicas:1",
client_options=project_config.get_client_options_file(operator),
target_hosts=project_config.target_host,
)
>       assert ret == 0
E       assert 64 == 0
 
it_tracks_serverless/test_logs.py:68: AssertionError

@gareth-ellis
Copy link
Copy Markdown
Member

You need to scroll up slightly.

An example:

[ERROR] Cannot race. Error in load generator [0]
--
2026-03-24 08:09:07 UTC | Cannot run task [create-all-component-templates]: Request returned an error. Error type: api, Description: illegal_argument_exception ({'error': {'root_cause': [{'type': 'illegal_argument_exception', 'reason': 'unknown setting [index.use_time_series_doc_values_format_large_binary_block_size] did you mean [index.use_time_series_doc_values_format_large_block_size]?'}], 'type': 'illegal_argument_exception', 'reason': 'unknown setting [index.use_time_series_doc_values_format_large_binary_block_size] did you mean [index.use_time_series_doc_values_format_large_block_size]?'}, 'status': 400}), HTTP Status: 400
2026-03-24 08:09:07 UTC |  

@martijnvg
Copy link
Copy Markdown
Member Author

Thanks for pointing this out.

So it looks like this setting is unknown in serverless. How does this test verify these templates?
The only limitation is that the setting is currently behind a feature flag. Is for some reason a released serverless started instead of a snapshot?

@martijnvg
Copy link
Copy Markdown
Member Author

@gareth-ellis I've removed the or serverless_operator == true condition again. There are two other improvements in this PR that are still worth getting in.

Copy link
Copy Markdown
Member

@gareth-ellis gareth-ellis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@martijnvg
Copy link
Copy Markdown
Member Author

Hey @gareth-ellis, the 3.10 and 3.13 compat pr ci jobs keep failing with:

Error:  Cannot race. Error in load generator [0]
	Cannot run task [compression-stats]: Request returned an error. Error type: transport, Description: network connection timed out

Do you have an idea what this is happening?

@gareth-ellis
Copy link
Copy Markdown
Member

gareth-ellis commented Mar 26, 2026

It seems to be compression-stats is timing out:

INFO     pytest_rally.rally:rally.py:147 Running command: [esrally race --track="elastic/logs" --challenge="logging-indexing" --track-repository="/home/runner/work/rally-tracks/rally-tracks" --track-revision="de6985c574edc26dcf751baa9b7241c4c8f16d8d" --configuration-name="pytest" --enable-assertions --kill-running-processes --on-error="abort" --pipeline="benchmark-only" --target-hosts="127.0.0.1:19200" --test-mode --track-params="number_of_replicas:0"]
FAILED

Specifically this step:
https://github.com/elastic/rally-tracks/blob/master/elastic/logs/challenges/logging-indexing.json#L26

We probably have a 10 second timeout, could it be the changes in this PR have made that slower? I'll try and reproduce locally and should be able to see what is actually happening

from logs:

026-03-25 19:52:04,131 ActorAddr-(T|:41181)/PID:4399 esrally.driver.driver INFO Worker[0] executing tasks: ['compression-stats']
2026-03-25 19:53:36,600 ActorAddr-(T|:41181)/PID:4399 elastic_transport.node_pool WARNING Node <RallyAiohttpHttpNode(http://127.0.0.1:19200)> has failed for 1 times in a row, putting on 1 second timeout
2026-03-25 19:53:36,601 ActorAddr-(T|:41181)/PID:4399 esrally.driver.driver ERROR Could not execute schedule
Traceback (most recent call last):

  File "/home/runner/.local/share/hatch/env/virtual/rally-tracks/jBbSvtJB/it/lib/python3.10/site-packages/esrally/driver/driver.py", line 1940, in __call__
    total_ops, total_ops_unit, request_meta_data = await execute_single(runner, self.es, params, self.on_error)

  File "/home/runner/.local/share/hatch/env/virtual/rally-tracks/jBbSvtJB/it/lib/python3.10/site-packages/esrally/driver/driver.py", line 2154, in execute_single
    raise exceptions.RallyAssertionError(msg)

esrally.exceptions.RallyAssertionError: Request returned an error. Error type: transport, Description: network connection timed out

2026-03-25 19:53:36,601 ActorAddr-(T|:41181)/PID:4399 esrally.driver.driver INFO Worker[0] finished executing tasks ['compression-stats'] in 92.470416 seconds
2026-03-25 19:53:36,827 ActorAddr-(T|:41181)/PID:4399 esrally.driver.driver ERROR Worker[0] has detected a benchmark failure. Notifying master...
Traceback (most recent call last):

  File "/opt/hostedtoolcache/Python/3.10.20/x64/lib/python3.10/concurrent/futures/thread.py", line 58, in run
    result = self.fn(*self.args, **self.kwargs)

  File "/home/runner/.local/share/hatch/env/virtual/rally-tracks/jBbSvtJB/it/lib/python3.10/site-packages/esrally/driver/driver.py", line 1785, in __call__
    loop.run_until_complete(self.run())

  File "/opt/hostedtoolcache/Python/3.10.20/x64/lib/python3.10/asyncio/base_events.py", line 649, in run_until_complete
    return future.result()

  File "/home/runner/.local/share/hatch/env/virtual/rally-tracks/jBbSvtJB/it/lib/python3.10/site-packages/esrally/driver/driver.py", line 1837, in run
    _ = await asyncio.gather(*awaitables)

  File "/home/runner/.local/share/hatch/env/virtual/rally-tracks/jBbSvtJB/it/lib/python3.10/site-packages/esrally/driver/driver.py", line 2002, in __call__
    raise exceptions.RallyError(f"Cannot run task [{self.task}]: {e}") from None

esrally.exceptions.RallyError: Cannot run task [compression-stats]: Request returned an error. Error type: transport, Description: network connection timed out

2026-03-25 19:53:36,828 ActorAddr-(T|:41007)/PID:4377 esrally.driver.driver ERROR Main driver received a fatal exception from a load generator. Shutting down.

The equivalent from a run on master:

2026-03-25 14:36:59,43 ActorAddr-(T|:44733)/PID:4782 esrally.driver.driver INFO Creating time-period based schedule with [None] distribution for [compression-stats] with a warmup period of [0] seconds and a time period of [None] seconds.
2026-03-25 14:36:59,43 ActorAddr-(T|:44733)/PID:4782 esrally.client.factory INFO Creating ES client connected to [{'host': '127.0.0.1', 'port': 19200}] with options [{'timeout': 60}]
2026-03-25 14:36:59,44 ActorAddr-(T|:44733)/PID:4782 esrally.driver.driver INFO Worker[0] executing tasks: ['compression-stats']
2026-03-25 14:37:50,122 ActorAddr-(T|:44733)/PID:4782 esrally.driver.driver INFO Worker[0] finished executing tasks ['compression-stats'] in 51.077770 seconds

It suggests that probably this PR (or something else) has caused the compression stats to take a bit longer - so we now go over the timeout. We were at 51 seconds before for the entire task - that should be three calls I believe.

@martijnvg
Copy link
Copy Markdown
Member Author

We probably have a 10 second timeout, could it be the changes in this PR have made that slower?

The main change is that without index modes other track params can be used as well. But these track params do have to be enabled and even if that is the case I don't see how that would cause time outs, but maybe I miss something here. What is compression-stats actually doing?

@gareth-ellis
Copy link
Copy Markdown
Member

@martijnvg
Copy link
Copy Markdown
Member Author

martijnvg commented Mar 26, 2026

Thanks, looking at the python method, a few ES apis are being invoked. But the error doesn't say with api invocation times out.

Also I think compression-stats task is often excluded from benchmark runs?

@gareth-ellis
Copy link
Copy Markdown
Member

I reran locally, from master and then from your branch,then from your branch with a longer timeout:

Master:
2026-03-26 09:05:28,610 ActorAddr-(T|:63819)/PID:70845 elastic_transport.transport INFO POST https://127.0.0.1:9200/logs-k8-application.log-default/_search [status:200 duration:0.344s]
2026-03-26 09:05:29,158 ActorAddr-(T|:63819)/PID:70845 elastic_transport.transport INFO POST https://127.0.0.1:9200/logs-k8-application.log-default/_search [status:200 duration:0.272s]

60s timeout:
2026-03-26 09:14:19,46 ActorAddr-(T|:52366)/PID:76722 elastic_transport.transport INFO POST https://127.0.0.1:9200/logs-k8-application.log-default/_search [status:N/A duration:60.646s]

240s timeout:
2026-03-26 09:37:22,535 ActorAddr-(T|:58252)/PID:82257 elastic_transport.transport INFO POST https://127.0.0.1:9200/logs-k8-application.log-default/_search [status:200 duration:105.167s]
2026-03-26 09:40:45,63 ActorAddr-(T|:58252)/PID:82257 elastic_transport.transport INFO POST https://127.0.0.1:9200/logs-k8-application.log-default/_search [status:200 duration:101.032s

@martijnvg
Copy link
Copy Markdown
Member Author

Thanks @gareth-ellis, then there must be something wrong here :)

I don't know what these rally pr jobs do. Would you be able to share how you reproduced this? I'm curious what the exact search request is and with what settings we run. And does this run against main branch of Elasticsearch or are we running agains older ES version here?

@martijnvg martijnvg changed the title Update track-shared-logsdb-mode component template for elastic/logs and elastic/security track.s Update track-shared-logsdb-mode component template for elastic/logs and elastic/security tracks Mar 27, 2026
@martijnvg
Copy link
Copy Markdown
Member Author

The performance issue was fixed via elastic/elasticsearch#145077

@martijnvg martijnvg merged commit 7fa6019 into elastic:master Mar 31, 2026
15 checks passed
@esbenchmachine esbenchmachine added the backport pending Awaiting backport to stable release branch label Mar 31, 2026
@esbenchmachine
Copy link
Copy Markdown
Collaborator

@martijnvg
A backport is pending for this PR.
Apply all the labels that correspond to Elasticsearch minor versions expected to work with this PR, but select only from the available ones.
If intended for future releases, apply label for next minor

When a vX.Y label is added, a new pull request will be automatically created, unless merge conflicts are detected or if the label supplied points to the next Elasticsearch minor version. If successful, a link to the newly opened backport PR will be provided in a comment.

In case of merge conflicts during backporting, create the backport PR manually following the steps from README:
Final steps to complete the backporting process:

  1. Ensure the correct version labels exist in this PR.
  2. Ensure each backport pull request is labeled with backport.
  3. Review and merge each backport pull request into the appropriate version branch.
  4. Remove backport pending label from this PR once all backport PRs are merged.

Thank you!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

backport pending Awaiting backport to stable release branch

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants