-
Notifications
You must be signed in to change notification settings - Fork 333
Retry incomplete HTTP downloads #1504
Copy link
Copy link
Closed
Labels
:UsabilityMakes Rally easier to useMakes Rally easier to usebugSomething's wrongSomething's wronggood first issueSmall, contained changes that are good for newcomersSmall, contained changes that are good for newcomershelp wantedWe'd be happy about a community contributionWe'd be happy about a community contribution
Milestone
Metadata
Metadata
Assignees
Labels
:UsabilityMakes Rally easier to useMakes Rally easier to usebugSomething's wrongSomething's wronggood first issueSmall, contained changes that are good for newcomersSmall, contained changes that are good for newcomershelp wantedWe'd be happy about a community contributionWe'd be happy about a community contribution
Type
Fields
Give feedbackNo fields configured for issues without a type.
We have an internal track that downloads many files individually with
esrally.track.loader.Downloader.downloadthat eventually callsesrally.utils.net.downloadwhich uses urllib3 to download the data. Thisdownloadfunction checks that we downloaded all the bytes as specified in the Content-Length header. If not, it simply fails:In that instance, the data was downloaded from https://rally-tracks.elastic.co/observability/logging/system/infra-stats/system.syslog/raw/document-50.json.bz2. This is a proxy maintained by Elastic, and apparently sometimes it serves us incomplete results.
But why isn't urllib3 covering this for us? https://blog.petrzemek.net/2018/04/22/on-incomplete-http-reads-and-the-requests-library-in-python/ has all the details. The 3.0 branch of requests has died since then, but thankfully we use urllib3 directly, and this post taught me that there is an undocumented flag in urllib3 to cover our use case: urllib3/urllib3#949. (It will become the default in urllib3 v2).
So it appears that simply setting
enforce_content_lengthand remove the custom checks will fix our issue.