tests/kgo: handle errors when polling for verifier status#27013
Merged
nvartolomei merged 2 commits intoredpanda-data:devfrom Jul 30, 2025
Merged
tests/kgo: handle errors when polling for verifier status#27013nvartolomei merged 2 commits intoredpanda-data:devfrom
nvartolomei merged 2 commits intoredpanda-data:devfrom
Conversation
3322956 to
9c8ca51
Compare
nvartolomei
previously approved these changes
Jul 28, 2025
Contributor
There was a problem hiding this comment.
I'm thinking about the case when kgo crashes/doesn't start and that this will probably cause longer waits. I.e. minute instead of failing in few seconds.
Should we limit the number of retries? I.e. if we can't reach the status in 3 tries one second apart then it is not worth trying anymore? Or something like that.
Same for any HTTP error codes. We shouldn't retry them at all because we don't expect any.
From chatgtp which looks suspicious but i wanted to show the connect/read params for Retry object.
import requests
from requests.adapters import HTTPAdapter
from urllib3.util.retry import Retry
# Retry configuration
retry_strategy = Retry(
total=3, # total number of retries
connect=3, # retries on connection errors
read=3, # retries on read errors (socket errors)
status=0, # no retries on specific HTTP status codes
backoff_factor=1, # sleep interval (exponential backoff): 1s, 2s, 4s...
allowed_methods=["GET", "POST"], # methods to retry (e.g., GET, POST, PUT, etc.)
raise_on_status=False, # don't raise error on HTTP errors (since status=0)
)
adapter = HTTPAdapter(max_retries=retry_strategy)
session = requests.Session()
session.mount("http://", adapter)
session.mount("https://", adapter)
try:
response = session.get("https://example.com", timeout=5)
response.raise_for_status()
print(response.status_code)
except requests.exceptions.RequestException as e:
print(f"Request failed: {e}")
9c8ca51 to
0a3b78e
Compare
nvartolomei
previously approved these changes
Jul 29, 2025
0a3b78e to
67598f8
Compare
It sometimes happens that the verifier status request returns an error due to transient network issue or application being slow to start. In this case the test failed completely as the producer status wasn't reported even if the producer finished successfully. Added error handling to the status loop to prevent those errors from failing the tests. Signed-off-by: Michał Maślanka <michal@redpanda.com>
67598f8 to
c7237ba
Compare
nvartolomei
approved these changes
Jul 30, 2025
joe-redpanda
approved these changes
Jul 30, 2025
Collaborator
CI test resultstest results on build#69963
|
Collaborator
|
/backport v25.2.x |
Collaborator
|
/backport v25.1.x |
Collaborator
|
/backport v24.3.x |
Collaborator
|
Failed to create a backport PR to v25.1.x branch. I tried: |
Collaborator
|
Failed to create a backport PR to v24.3.x branch. I tried: |
This was referenced Jul 30, 2025
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
It sometimes happens that the verifier status request returns an error due to transient network issue or application being slow to start. In this case the test failed completely as the producer status wasn't reported even if the producer finished successfully. Added error handling to the status loop to prevent those errors from failing the tests.
Backports Required
Release Notes