Add delay and duration parameters to jfr telmetry device#1630
Add delay and duration parameters to jfr telmetry device#1630pquentin merged 3 commits intoelastic:masterfrom
Conversation
|
@elasticmachine test this please |
dliappis
left a comment
There was a problem hiding this comment.
Thank you for the quick and meticulous PR!
I left a question/suggestion about how we expose the options as telemetry parameters (sorry the issue should have thought about it) and that preferably we should namespace them as we do for other telemetry params e.g. node-stats-sample-internal etc.
I haven't tested the PR myself, but triggered a CI run.
esrally/telemetry.py
Outdated
| def java_opts(self, log_file): | ||
| recording_template = self.telemetry_params.get("recording-template") | ||
| delay = self.telemetry_params.get("delay") | ||
| duration = self.telemetry_params.get("duration") |
There was a problem hiding this comment.
I wonder if we should prefix these (and possible also the recording-template) with jfr i.e. jfr-delay and jfr-duration.
The rationale is that telemetry device parameters are not namespaced and the meaning of delay and duration is too wide to be dedicated only for the jfr telemetry device.
While at it, we could also rename recording-template but a) it'd be a breaking change for some and should also be documented in migrate.rst and b) recording-template is less generic and thus not so important to change right away.
WDYT?
There was a problem hiding this comment.
I agree that adding the jfr prefix would make it more consistent with the existing parameter naming convention.
If we are not adding it to the recording-template then wouldn't it break this consistency?
There was a problem hiding this comment.
I talked to @dliappis offline. It's OK to keep it like this for now. As PEP 8 says, a foolish consistency is the hobgoblin of little minds. :)
That said, in the future we'll want to switch. This requires a deprecation cycle, much like was done for relative-time (search for it in https://esrally.readthedocs.io/en/stable/migrate.html). Can you please create an issue about migrating to jfr-recording-template so that we don't forget to do it?
There was a problem hiding this comment.
Hey @pquentin, just to confirm the todos:
- Add the prefix
jfrtodelayanddurationmaking themjfr-delayandjfr-durationrespectively. - Leave
recording-templateas is. - Create a separate issue for
jfr-recording-template
There was a problem hiding this comment.
I've pushed the changes in the latest commit. Additionally #1631 has been raised for the jfr-recording-template migration.
esrally/telemetry.py
Outdated
| self.logger.info("jfr: Using default recording template.") | ||
| if delay: | ||
| self.logger.info("jfr: Using delay [%s].", delay) | ||
| jfr_cmd += ",delay={}".format(delay) |
There was a problem hiding this comment.
| jfr_cmd += ",delay={}".format(delay) | |
| jfr_cmd += f",delay={delay}" |
There was a problem hiding this comment.
Had used the string.format as the existing code was using the same.
Using f string would be a better alternative. Will be updating the same for all.
There was a problem hiding this comment.
Using f string would be a better alternative. Will be updating the same for all.
FYI we are hoping to migrate to f-strings in the future as much as possible throughout the codebase , especially in "hot" code paths, because they provide the best performance
esrally/telemetry.py
Outdated
| jfr_cmd += ",delay={}".format(delay) | ||
| if duration: | ||
| self.logger.info("jfr: Using duration [%s].", duration) | ||
| jfr_cmd += ",duration={}".format(duration) |
There was a problem hiding this comment.
| jfr_cmd += ",duration={}".format(duration) | |
| jfr_cmd += f",duration={duration}" |
esrally/telemetry.py
Outdated
| jfr_cmd += ",duration={}".format(duration) | ||
| if recording_template: | ||
| self.logger.info("jfr: Using recording template [%s].", recording_template) | ||
| jfr_cmd += ",settings={}".format(recording_template) |
There was a problem hiding this comment.
| jfr_cmd += ",settings={}".format(recording_template) | |
| jfr_cmd += f",settings={recording_template}" |
I've left my replies. Shall I proceed with the changes? Additionally just wanted to know if there should be separate commits or a single squashed commit? |
pquentin
left a comment
There was a problem hiding this comment.
Looks great! Left two comments, one about code style and the other about testing strategy.
tests/telemetry_test.py
Outdated
| def test_sets_options_for_pre_java_9_custom_delay(self): | ||
| jfr = telemetry.FlightRecorder(telemetry_params={"delay": "10s"}, log_root="/var/log", java_major_version=random.randint(0, 8)) | ||
| assert jfr.java_opts("/var/log/test-recording.jfr") == [ | ||
| "-XX:+UnlockDiagnosticVMOptions", | ||
| "-XX:+DebugNonSafepoints", | ||
| "-XX:+UnlockCommercialFeatures", | ||
| "-XX:+FlightRecorder", | ||
| "-XX:FlightRecorderOptions=disk=true,maxage=0s,maxsize=0,dumponexit=true,dumponexitpath=/var/log/test-recording.jfr", | ||
| "-XX:StartFlightRecording=defaultrecording=true,delay=10s", | ||
| ] | ||
|
|
||
| def test_sets_options_for_java_9_or_10_custom_delay(self): | ||
| jfr = telemetry.FlightRecorder(telemetry_params={"delay": "10s"}, log_root="/var/log", java_major_version=random.randint(9, 10)) | ||
| assert jfr.java_opts("/var/log/test-recording.jfr") == [ | ||
| "-XX:+UnlockDiagnosticVMOptions", | ||
| "-XX:+DebugNonSafepoints", | ||
| "-XX:+UnlockCommercialFeatures", | ||
| "-XX:StartFlightRecording=maxsize=0,maxage=0s,disk=true,dumponexit=true,filename=/var/log/test-recording.jfr,delay=10s", | ||
| ] | ||
|
|
||
| def test_sets_options_for_java_11_or_above_custom_delay(self): | ||
| jfr = telemetry.FlightRecorder(telemetry_params={"delay": "10s"}, log_root="/var/log", java_major_version=random.randint(11, 999)) | ||
| assert jfr.java_opts("/var/log/test-recording.jfr") == [ | ||
| "-XX:+UnlockDiagnosticVMOptions", | ||
| "-XX:+DebugNonSafepoints", | ||
| "-XX:StartFlightRecording=maxsize=0,maxage=0s,disk=true,dumponexit=true,filename=/var/log/test-recording.jfr,delay=10s", | ||
| ] | ||
|
|
||
| def test_sets_options_for_pre_java_9_custom_duration(self): | ||
| jfr = telemetry.FlightRecorder(telemetry_params={"duration": "20m"}, log_root="/var/log", java_major_version=random.randint(0, 8)) | ||
| assert jfr.java_opts("/var/log/test-recording.jfr") == [ | ||
| "-XX:+UnlockDiagnosticVMOptions", | ||
| "-XX:+DebugNonSafepoints", | ||
| "-XX:+UnlockCommercialFeatures", | ||
| "-XX:+FlightRecorder", | ||
| "-XX:FlightRecorderOptions=disk=true,maxage=0s,maxsize=0,dumponexit=true,dumponexitpath=/var/log/test-recording.jfr", | ||
| "-XX:StartFlightRecording=defaultrecording=true,duration=20m", | ||
| ] | ||
|
|
||
| def test_sets_options_for_java_9_or_10_custom_duration(self): | ||
| jfr = telemetry.FlightRecorder(telemetry_params={"duration": "20m"}, log_root="/var/log", java_major_version=random.randint(9, 10)) | ||
| assert jfr.java_opts("/var/log/test-recording.jfr") == [ | ||
| "-XX:+UnlockDiagnosticVMOptions", | ||
| "-XX:+DebugNonSafepoints", | ||
| "-XX:+UnlockCommercialFeatures", | ||
| "-XX:StartFlightRecording=maxsize=0,maxage=0s,disk=true,dumponexit=true,filename=/var/log/test-recording.jfr,duration=20m", | ||
| ] | ||
|
|
||
| def test_sets_options_for_java_11_or_above_custom_duration(self): | ||
| jfr = telemetry.FlightRecorder( | ||
| telemetry_params={"duration": "20m"}, log_root="/var/log", java_major_version=random.randint(11, 999) | ||
| ) | ||
| assert jfr.java_opts("/var/log/test-recording.jfr") == [ | ||
| "-XX:+UnlockDiagnosticVMOptions", | ||
| "-XX:+DebugNonSafepoints", | ||
| "-XX:StartFlightRecording=maxsize=0,maxage=0s,disk=true,dumponexit=true,filename=/var/log/test-recording.jfr,duration=20m", | ||
| ] | ||
|
|
||
| def test_sets_options_for_pre_java_9_custom_delay_and_duration(self): | ||
| jfr = telemetry.FlightRecorder( | ||
| telemetry_params={"delay": "10s", "duration": "20m"}, log_root="/var/log", java_major_version=random.randint(0, 8) | ||
| ) | ||
| assert jfr.java_opts("/var/log/test-recording.jfr") == [ | ||
| "-XX:+UnlockDiagnosticVMOptions", | ||
| "-XX:+DebugNonSafepoints", | ||
| "-XX:+UnlockCommercialFeatures", | ||
| "-XX:+FlightRecorder", | ||
| "-XX:FlightRecorderOptions=disk=true,maxage=0s,maxsize=0,dumponexit=true,dumponexitpath=/var/log/test-recording.jfr", | ||
| "-XX:StartFlightRecording=defaultrecording=true,delay=10s,duration=20m", | ||
| ] | ||
|
|
||
| def test_sets_options_for_java_9_or_10_custom_delay_and_duration(self): | ||
| jfr = telemetry.FlightRecorder( | ||
| telemetry_params={"delay": "10s", "duration": "20m"}, log_root="/var/log", java_major_version=random.randint(9, 10) | ||
| ) | ||
| assert jfr.java_opts("/var/log/test-recording.jfr") == [ | ||
| "-XX:+UnlockDiagnosticVMOptions", | ||
| "-XX:+DebugNonSafepoints", | ||
| "-XX:+UnlockCommercialFeatures", | ||
| "-XX:StartFlightRecording=maxsize=0,maxage=0s,disk=true,dumponexit=true," | ||
| "filename=/var/log/test-recording.jfr,delay=10s,duration=20m", | ||
| ] | ||
|
|
||
| def test_sets_options_for_java_11_or_above_custom_delay_and_duration(self): | ||
| jfr = telemetry.FlightRecorder( | ||
| telemetry_params={"delay": "10s", "duration": "20m"}, log_root="/var/log", java_major_version=random.randint(11, 999) | ||
| ) | ||
| assert jfr.java_opts("/var/log/test-recording.jfr") == [ | ||
| "-XX:+UnlockDiagnosticVMOptions", | ||
| "-XX:+DebugNonSafepoints", | ||
| "-XX:StartFlightRecording=maxsize=0,maxage=0s,disk=true,dumponexit=true," | ||
| "filename=/var/log/test-recording.jfr,delay=10s,duration=20m", | ||
| ] | ||
|
|
||
| def test_sets_options_for_pre_java_9_custom_delay_duration_recording_template(self): | ||
| jfr = telemetry.FlightRecorder( | ||
| telemetry_params={"recording-template": "profile", "duration": "20m", "delay": "10s"}, | ||
| log_root="/var/log", | ||
| java_major_version=random.randint(0, 8), | ||
| ) | ||
| assert jfr.java_opts("/var/log/test-recording.jfr") == [ | ||
| "-XX:+UnlockDiagnosticVMOptions", | ||
| "-XX:+DebugNonSafepoints", | ||
| "-XX:+UnlockCommercialFeatures", | ||
| "-XX:+FlightRecorder", | ||
| "-XX:FlightRecorderOptions=disk=true,maxage=0s,maxsize=0,dumponexit=true,dumponexitpath=/var/log/test-recording.jfr", | ||
| "-XX:StartFlightRecording=defaultrecording=true,delay=10s,duration=20m,settings=profile", | ||
| ] | ||
|
|
||
| def test_sets_options_for_java_9_or_10_custom_delay_duration_recording_template(self): | ||
| jfr = telemetry.FlightRecorder( | ||
| telemetry_params={"recording-template": "profile", "duration": "20m", "delay": "10s"}, | ||
| log_root="/var/log", | ||
| java_major_version=random.randint(9, 10), | ||
| ) | ||
| assert jfr.java_opts("/var/log/test-recording.jfr") == [ | ||
| "-XX:+UnlockDiagnosticVMOptions", | ||
| "-XX:+DebugNonSafepoints", | ||
| "-XX:+UnlockCommercialFeatures", | ||
| "-XX:StartFlightRecording=maxsize=0,maxage=0s,disk=true,dumponexit=true," | ||
| "filename=/var/log/test-recording.jfr,delay=10s,duration=20m,settings=profile", | ||
| ] | ||
|
|
||
| def test_sets_options_for_java_11_or_above_custom_delay_duration_recording_template(self): | ||
| jfr = telemetry.FlightRecorder( | ||
| telemetry_params={"recording-template": "profile", "duration": "20m", "delay": "10s"}, | ||
| log_root="/var/log", | ||
| java_major_version=random.randint(11, 999), | ||
| ) | ||
| assert jfr.java_opts("/var/log/test-recording.jfr") == [ | ||
| "-XX:+UnlockDiagnosticVMOptions", | ||
| "-XX:+DebugNonSafepoints", | ||
| "-XX:StartFlightRecording=maxsize=0,maxage=0s,disk=true,dumponexit=true," | ||
| "filename=/var/log/test-recording.jfr,delay=10s,duration=20m,settings=profile", | ||
| ] | ||
|
|
There was a problem hiding this comment.
Thanks for being that thorough in your tests. However, I think it's a bit too much. Each test should test for a specific thing, and each change should only break one test. Case in point, to change duration to jfr-duration and delay to jfr-delay, you now have to change twelve tests!
In practice, you don't have to test specifically for the Java version, it's done already. And you don't have to test for the "no parameters case", it's done already. You can reduce your twelve tests to one with all the settings enabled (params, delay and duration).
There was a problem hiding this comment.
Makes sense, will be updating this to just have a single test will all params.
Feel free to proceed based on the feedback of @pquentin and mine. No particular preferences for the commits as anyway when it's time to merge we'll be squashing. |
|
@elasticmachine test this please |
pquentin
left a comment
There was a problem hiding this comment.
This looks great! We only need to test that it actually works. Have you tried?
Also left a small question.
docs/telemetry.rst
Outdated
| * ``jfr-delay``: (Optional) Length of time to wait before starting to record (INTEGER followed by 's' for seconds 'm' for minutes or 'h' for hours, 0s). It is up to you to correctly pass in the value. | ||
| * ``jfr-duration``: (Optional) Length of time to record. Note that 0s means forever (INTEGER followed by 's' for seconds 'm' for minutes or 'h' for hours, 0s). It is up to you to correctly pass in the value. |
There was a problem hiding this comment.
Why do we we include , 0s at the end? I don't understand what this means.
There was a problem hiding this comment.
0s is default value for both the parameters. I've updated the documentation to reflect this properly.
…parameters for jfr
|
@elasticmachine test this please |
Although I have checked that the final java cmd is as expected, I didn't get the opportunity to actually test this out. |
pquentin
left a comment
There was a problem hiding this comment.
Thanks! This works.
I tried esrally race --distribution-version=8.5.1 --test-mode --track=geonames --telemetry jfr --telemetry-params="jfr-delay:10s,jfr-duration:10s" and did get a 10 seconds record starting after 10 seconds.
We don't usually merge on Fridays, and tomorrow is a Shut it Down day at Elastic, so we'll merge on Monday.

make check-allsuccessfully? - YESThis closes: #1617
Two additional parameters have been added for JFR
delaydurationAdditionally, the code block for the existing parameter
recording-templatehas been refactored to avoid code duplication.