Stabilize integration tests and fail fast on persistent REST protocol errors#2082
Stabilize integration tests and fail fast on persistent REST protocol errors#2082fressi-elastic wants to merge 18 commits intoelastic:masterfrom
Conversation
Add TestCluster.is_cluster_running_on_port() to detect our cluster by cluster name. EsMetricsStore.start() reuses a running node on HTTP_PORT and only waits for yellow health; otherwise behavior is unchanged. Document that wait_for_cluster_health needs http_port set. Made-with: Cursor
…ation-tests-stability
Added TestCluster.is_cluster_running_on_port() method to check if a cluster is active on a specified port. Updated EsMetricsStore.start() to utilize a running node on HTTP_PORT and only wait for yellow health, while maintaining existing behavior otherwise. Updated documentation for wait_for_cluster_health to require http_port setting.
Add isolate_git_global_config autouse session fixture that sets GIT_CONFIG_GLOBAL to it/resources/gitconfig-empty for all integration tests. shared_setup depends on it so isolation runs first. Remove per-test GIT_CONFIG_GLOBAL handling from test_run_without_http_connection. Made-with: Cursor
There was a problem hiding this comment.
Pull request overview
This PR aims to improve integration test stability by reducing flakiness from transient Elasticsearch startup errors and by isolating integration tests from developer-specific Git configuration.
Changes:
- Retry transient-looking protocol connection errors in
wait_for_rest_layer()before raising a scheme-mismatchSystemSetupError. - Reuse an already-running Elasticsearch “metrics store” cluster (on a fixed port) and add an explicit wait for cluster health.
- Isolate integration tests from
~/.gitconfigby settingGIT_CONFIG_GLOBALto an empty config file.
Reviewed changes
Copilot reviewed 6 out of 6 changed files in this pull request and generated 3 comments.
Show a summary per file
| File | Description |
|---|---|
esrally/client/factory.py |
Adjusts connection error handling to retry protocol errors during startup and improves debug logging. |
tests/client/factory_test.py |
Updates unit test to match the new retry-then-fail behavior for protocol errors and avoids real sleeping. |
it/conftest.py |
Reuses an existing metrics-store cluster on port 10200 and waits for yellow health. |
it/__init__.py |
Adds helpers to detect an already-running cluster by port and to wait for cluster health. |
it/basic_test.py |
Ensures integration test subprocesses ignore user git config by setting GIT_CONFIG_GLOBAL. |
it/resources/gitconfig-empty |
Adds an empty gitconfig resource intended to be used for GIT_CONFIG_GLOBAL isolation. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 5 out of 5 changed files in this pull request and generated 1 comment.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
…ation-tests-stability
Use request_timeout=timeout+30s so HTTP client limits stay above the Elasticsearch cluster.health wait parameter. Express timeout in seconds (float) for callers; metrics store IT uses defaults. Made-with: Cursor
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 5 out of 5 changed files in this pull request and generated 2 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
- Retry is_cluster_running_on_port with bounded attempts, request timeout, and debug logging on failure (integration test stability). - Cap consecutive protocol-like ConnectionErrors in wait_for_rest_layer so scheme misconfiguration fails faster than full max_attempts. Made-with: Cursor
1a6d4ee to
8e8d5a7
Compare
TarFile.extractall(filter=...) is only available from Python 3.12 (PEP 706). Pass the filter only when supported so decompression works on 3.10 and 3.11. Made-with: Cursor
Replace **params dict with explicit Python 3.12+ branches so filter is typed as a literal and mypy accepts TarFile.extractall. Made-with: Cursor
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 6 out of 6 changed files in this pull request and generated 3 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Elasticsearch returns HTTP 200 with timed_out:true when the server-side wait window expires without reaching wait_for_status. Treat that as a failed wait and raise AssertionError with status context. Made-with: Cursor
Document and annotate wait_for_rest_layer protocol-error handling, _do_tar_decompress PEP 706 behavior, TestCluster probe/health helpers, and session pytest fixtures (Generator return types). Made-with: Cursor
Use an absolute it/ directory for GIT_CONFIG_GLOBAL so subprocesses find gitconfig-empty regardless of cwd. Expose cluster probing as probe_cluster_on_port (returns cluster name or None); is_cluster_running_on_port compares to cfg. EsMetricsStore.start fails fast when the port serves another cluster or a non-ES listener before install. Remove a useless pylint suppression on configure_file_handler. Made-with: Cursor
Explain in the docstring and inline comment that extractall(filter=) is only available from 3.12 (PEP 706) and that the legacy path lacks member-path filtering, so untrusted archives could escape target_directory. Made-with: Cursor
Add helpers to clear Rally-owned Elasticsearch on the IT benchmark port (19200) and expose it via pytest fixtures that yield the port number. stop_rally_provisioned_es_on_port returns None; ensure_benchmark_http_port_free must not assign that result back to http_port, which previously made fixtures yield None and broke --target-hosts. Update distribution, sources, tracker, and related tests to use the fixture port consistently. Made-with: Cursor
~/.gitconfigcannot skew git/network behavior in tests.TarFile.extractall'sfilter=argument to 3.12+ for runtime and typing.wait_for_rest_layertests for the new protocol-error cap.