Specify base-10 for int() for known strings and bytes#1397
Specify base-10 for int() for known strings and bytes#1397inqueue wants to merge 1 commit intoelastic:masterfrom
Conversation
dliappis
left a comment
There was a problem hiding this comment.
Thank you for raising this PR and going through the code base for potential improvements based on #1379.
Looking at the files here, it seems to me that none of these uses are related to runners or operations in the hot code path that can potentially cause loaddriver bottlenecks; they are rather used in setup tasks, before the benchmark starts, like downloading the corpora, configuring/starting elasticsearch and parsing cli arguments.
Therefore, I don't think they add value. If we don't have any uses of int() in hot code paths, I am leaning towards closing this PR and the corresponding issue. WDYT?
pquentin
left a comment
There was a problem hiding this comment.
I'm a bit on the fence here. Explicit is better than implicit, but this is also more likely to raise questions than anything else, and the performance benefit is zero: I'd lean towards rejecting this.
By the way, Jason, do you have examples of cases where we don't always send strings to int()?
|
It occurred to me that we could at least document this best practice (among others in the future perhaps) in a section that is linked in all the warnings callouts we currently have for custom parameter sources, custom runners and custom schedulers. WDYT? |
Take all of my +1s @dliappis. Expanding this a bit, I'd love a section in the docs that explains what we mean by the "hot path"/"performance-critical code-path", but also more generally a high-level overview of why this is so important in the first place. Reference material on client-side bottlenecks would be great: why they're so crucial to avoid, how they manifest themselves, what we do in Rally to avoid them, etc. You can piece some of this together with what's documented already, but a conceptual overview would help a lot. |
I like the idea of a dedicated section explaining the hot path which outlines best practices around it. While I appreciate the warning to be careful, it would be useful to know how and why.
Input parameters could be a string from CLI arguments or read as a string or integer from JSON: rally/esrally/driver/runner.py Line 852 in 605da76 Input is read from rally/esrally/driver/driver.py Line 639 in 605da76 rally/esrally/driver/driver.py Line 1135 in 605da76 I agree we are not gaining any value from this change. For a few cases, there may be some refactoring we could do to avoid |
Would you like to work on this?
Thanks for capturing the issue. The reason I asked is that mypy won't be happy about it eventually, so we might as well fix if by making sure a given value is always a string or always an int. But leavings things as they are currently is also fine, and in any case we all agree that we can close this pull request. |
Sure, I will work on the docs section about the hot path. Closing this PR. |
|
I opened a new issue to discuss the docs improvement: #1421 |
With this commit, we explicitly specify base-10 for string and bytes
int()conversions. Specifying the base is known to boost performance forint()conversions as mentioned in #1379. This optimization can only beapplied when we know the input to be a string or bytes.
While I would have liked to have added the optimization to more code paths, there is a limitation when we cannot guarantee non-numeric inputs. Python will throw a
TypeErrorwhen attempting to convert a non-string with an explicit base, therefore we only designate base-10 when we know for certain the input will be a non-numeric type. A full review of every invocation ofint()in the code base revealed we cannot always make that guarantee.Closes #1379