Skip to content

Add delay and duration parameters to jfr telmetry device#1630

Merged
pquentin merged 3 commits intoelastic:masterfrom
debanjanc01:fix-1617-jfr-parameters
Nov 29, 2022
Merged

Add delay and duration parameters to jfr telmetry device#1630
pquentin merged 3 commits intoelastic:masterfrom
debanjanc01:fix-1617-jfr-parameters

Conversation

@debanjanc01
Copy link
Copy Markdown
Contributor


This closes: #1617

Two additional parameters have been added for JFR

  1. delay
  2. duration

Additionally, the code block for the existing parameter recording-template has been refactored to avoid code duplication.

@dliappis dliappis added enhancement Improves the status quo :Telemetry Telemetry Devices that gather additional metrics labels Nov 21, 2022
@dliappis dliappis added this to the 2.7.1 milestone Nov 21, 2022
@dliappis
Copy link
Copy Markdown
Contributor

@elasticmachine test this please

Copy link
Copy Markdown
Contributor

@dliappis dliappis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for the quick and meticulous PR!

I left a question/suggestion about how we expose the options as telemetry parameters (sorry the issue should have thought about it) and that preferably we should namespace them as we do for other telemetry params e.g. node-stats-sample-internal etc.

I haven't tested the PR myself, but triggered a CI run.

def java_opts(self, log_file):
recording_template = self.telemetry_params.get("recording-template")
delay = self.telemetry_params.get("delay")
duration = self.telemetry_params.get("duration")
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if we should prefix these (and possible also the recording-template) with jfr i.e. jfr-delay and jfr-duration.
The rationale is that telemetry device parameters are not namespaced and the meaning of delay and duration is too wide to be dedicated only for the jfr telemetry device.

While at it, we could also rename recording-template but a) it'd be a breaking change for some and should also be documented in migrate.rst and b) recording-template is less generic and thus not so important to change right away.

WDYT?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree that adding the jfr prefix would make it more consistent with the existing parameter naming convention.
If we are not adding it to the recording-template then wouldn't it break this consistency?

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I talked to @dliappis offline. It's OK to keep it like this for now. As PEP 8 says, a foolish consistency is the hobgoblin of little minds. :)

That said, in the future we'll want to switch. This requires a deprecation cycle, much like was done for relative-time (search for it in https://esrally.readthedocs.io/en/stable/migrate.html). Can you please create an issue about migrating to jfr-recording-template so that we don't forget to do it?

Copy link
Copy Markdown
Contributor Author

@debanjanc01 debanjanc01 Nov 21, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey @pquentin, just to confirm the todos:

  1. Add the prefix jfr to delay and duration making them jfr-delay and jfr-duration respectively.
  2. Leave recording-template as is.
  3. Create a separate issue for jfr-recording-template

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've pushed the changes in the latest commit. Additionally #1631 has been raised for the jfr-recording-template migration.

self.logger.info("jfr: Using default recording template.")
if delay:
self.logger.info("jfr: Using delay [%s].", delay)
jfr_cmd += ",delay={}".format(delay)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
jfr_cmd += ",delay={}".format(delay)
jfr_cmd += f",delay={delay}"

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Had used the string.format as the existing code was using the same.
Using f string would be a better alternative. Will be updating the same for all.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Using f string would be a better alternative. Will be updating the same for all.

FYI we are hoping to migrate to f-strings in the future as much as possible throughout the codebase , especially in "hot" code paths, because they provide the best performance

jfr_cmd += ",delay={}".format(delay)
if duration:
self.logger.info("jfr: Using duration [%s].", duration)
jfr_cmd += ",duration={}".format(duration)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
jfr_cmd += ",duration={}".format(duration)
jfr_cmd += f",duration={duration}"

jfr_cmd += ",duration={}".format(duration)
if recording_template:
self.logger.info("jfr: Using recording template [%s].", recording_template)
jfr_cmd += ",settings={}".format(recording_template)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
jfr_cmd += ",settings={}".format(recording_template)
jfr_cmd += f",settings={recording_template}"

@debanjanc01
Copy link
Copy Markdown
Contributor Author

debanjanc01 commented Nov 21, 2022

Thank you for the quick and meticulous PR!

I left a question/suggestion about how we expose the options as telemetry parameters (sorry the issue should have thought about it) and that preferably we should namespace them as we do for other telemetry params e.g. node-stats-sample-internal etc.

I haven't tested the PR myself, but triggered a CI run.

I've left my replies. Shall I proceed with the changes? Additionally just wanted to know if there should be separate commits or a single squashed commit?

Copy link
Copy Markdown
Member

@pquentin pquentin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great! Left two comments, one about code style and the other about testing strategy.

Comment on lines +298 to +433
def test_sets_options_for_pre_java_9_custom_delay(self):
jfr = telemetry.FlightRecorder(telemetry_params={"delay": "10s"}, log_root="/var/log", java_major_version=random.randint(0, 8))
assert jfr.java_opts("/var/log/test-recording.jfr") == [
"-XX:+UnlockDiagnosticVMOptions",
"-XX:+DebugNonSafepoints",
"-XX:+UnlockCommercialFeatures",
"-XX:+FlightRecorder",
"-XX:FlightRecorderOptions=disk=true,maxage=0s,maxsize=0,dumponexit=true,dumponexitpath=/var/log/test-recording.jfr",
"-XX:StartFlightRecording=defaultrecording=true,delay=10s",
]

def test_sets_options_for_java_9_or_10_custom_delay(self):
jfr = telemetry.FlightRecorder(telemetry_params={"delay": "10s"}, log_root="/var/log", java_major_version=random.randint(9, 10))
assert jfr.java_opts("/var/log/test-recording.jfr") == [
"-XX:+UnlockDiagnosticVMOptions",
"-XX:+DebugNonSafepoints",
"-XX:+UnlockCommercialFeatures",
"-XX:StartFlightRecording=maxsize=0,maxage=0s,disk=true,dumponexit=true,filename=/var/log/test-recording.jfr,delay=10s",
]

def test_sets_options_for_java_11_or_above_custom_delay(self):
jfr = telemetry.FlightRecorder(telemetry_params={"delay": "10s"}, log_root="/var/log", java_major_version=random.randint(11, 999))
assert jfr.java_opts("/var/log/test-recording.jfr") == [
"-XX:+UnlockDiagnosticVMOptions",
"-XX:+DebugNonSafepoints",
"-XX:StartFlightRecording=maxsize=0,maxage=0s,disk=true,dumponexit=true,filename=/var/log/test-recording.jfr,delay=10s",
]

def test_sets_options_for_pre_java_9_custom_duration(self):
jfr = telemetry.FlightRecorder(telemetry_params={"duration": "20m"}, log_root="/var/log", java_major_version=random.randint(0, 8))
assert jfr.java_opts("/var/log/test-recording.jfr") == [
"-XX:+UnlockDiagnosticVMOptions",
"-XX:+DebugNonSafepoints",
"-XX:+UnlockCommercialFeatures",
"-XX:+FlightRecorder",
"-XX:FlightRecorderOptions=disk=true,maxage=0s,maxsize=0,dumponexit=true,dumponexitpath=/var/log/test-recording.jfr",
"-XX:StartFlightRecording=defaultrecording=true,duration=20m",
]

def test_sets_options_for_java_9_or_10_custom_duration(self):
jfr = telemetry.FlightRecorder(telemetry_params={"duration": "20m"}, log_root="/var/log", java_major_version=random.randint(9, 10))
assert jfr.java_opts("/var/log/test-recording.jfr") == [
"-XX:+UnlockDiagnosticVMOptions",
"-XX:+DebugNonSafepoints",
"-XX:+UnlockCommercialFeatures",
"-XX:StartFlightRecording=maxsize=0,maxage=0s,disk=true,dumponexit=true,filename=/var/log/test-recording.jfr,duration=20m",
]

def test_sets_options_for_java_11_or_above_custom_duration(self):
jfr = telemetry.FlightRecorder(
telemetry_params={"duration": "20m"}, log_root="/var/log", java_major_version=random.randint(11, 999)
)
assert jfr.java_opts("/var/log/test-recording.jfr") == [
"-XX:+UnlockDiagnosticVMOptions",
"-XX:+DebugNonSafepoints",
"-XX:StartFlightRecording=maxsize=0,maxage=0s,disk=true,dumponexit=true,filename=/var/log/test-recording.jfr,duration=20m",
]

def test_sets_options_for_pre_java_9_custom_delay_and_duration(self):
jfr = telemetry.FlightRecorder(
telemetry_params={"delay": "10s", "duration": "20m"}, log_root="/var/log", java_major_version=random.randint(0, 8)
)
assert jfr.java_opts("/var/log/test-recording.jfr") == [
"-XX:+UnlockDiagnosticVMOptions",
"-XX:+DebugNonSafepoints",
"-XX:+UnlockCommercialFeatures",
"-XX:+FlightRecorder",
"-XX:FlightRecorderOptions=disk=true,maxage=0s,maxsize=0,dumponexit=true,dumponexitpath=/var/log/test-recording.jfr",
"-XX:StartFlightRecording=defaultrecording=true,delay=10s,duration=20m",
]

def test_sets_options_for_java_9_or_10_custom_delay_and_duration(self):
jfr = telemetry.FlightRecorder(
telemetry_params={"delay": "10s", "duration": "20m"}, log_root="/var/log", java_major_version=random.randint(9, 10)
)
assert jfr.java_opts("/var/log/test-recording.jfr") == [
"-XX:+UnlockDiagnosticVMOptions",
"-XX:+DebugNonSafepoints",
"-XX:+UnlockCommercialFeatures",
"-XX:StartFlightRecording=maxsize=0,maxage=0s,disk=true,dumponexit=true,"
"filename=/var/log/test-recording.jfr,delay=10s,duration=20m",
]

def test_sets_options_for_java_11_or_above_custom_delay_and_duration(self):
jfr = telemetry.FlightRecorder(
telemetry_params={"delay": "10s", "duration": "20m"}, log_root="/var/log", java_major_version=random.randint(11, 999)
)
assert jfr.java_opts("/var/log/test-recording.jfr") == [
"-XX:+UnlockDiagnosticVMOptions",
"-XX:+DebugNonSafepoints",
"-XX:StartFlightRecording=maxsize=0,maxage=0s,disk=true,dumponexit=true,"
"filename=/var/log/test-recording.jfr,delay=10s,duration=20m",
]

def test_sets_options_for_pre_java_9_custom_delay_duration_recording_template(self):
jfr = telemetry.FlightRecorder(
telemetry_params={"recording-template": "profile", "duration": "20m", "delay": "10s"},
log_root="/var/log",
java_major_version=random.randint(0, 8),
)
assert jfr.java_opts("/var/log/test-recording.jfr") == [
"-XX:+UnlockDiagnosticVMOptions",
"-XX:+DebugNonSafepoints",
"-XX:+UnlockCommercialFeatures",
"-XX:+FlightRecorder",
"-XX:FlightRecorderOptions=disk=true,maxage=0s,maxsize=0,dumponexit=true,dumponexitpath=/var/log/test-recording.jfr",
"-XX:StartFlightRecording=defaultrecording=true,delay=10s,duration=20m,settings=profile",
]

def test_sets_options_for_java_9_or_10_custom_delay_duration_recording_template(self):
jfr = telemetry.FlightRecorder(
telemetry_params={"recording-template": "profile", "duration": "20m", "delay": "10s"},
log_root="/var/log",
java_major_version=random.randint(9, 10),
)
assert jfr.java_opts("/var/log/test-recording.jfr") == [
"-XX:+UnlockDiagnosticVMOptions",
"-XX:+DebugNonSafepoints",
"-XX:+UnlockCommercialFeatures",
"-XX:StartFlightRecording=maxsize=0,maxage=0s,disk=true,dumponexit=true,"
"filename=/var/log/test-recording.jfr,delay=10s,duration=20m,settings=profile",
]

def test_sets_options_for_java_11_or_above_custom_delay_duration_recording_template(self):
jfr = telemetry.FlightRecorder(
telemetry_params={"recording-template": "profile", "duration": "20m", "delay": "10s"},
log_root="/var/log",
java_major_version=random.randint(11, 999),
)
assert jfr.java_opts("/var/log/test-recording.jfr") == [
"-XX:+UnlockDiagnosticVMOptions",
"-XX:+DebugNonSafepoints",
"-XX:StartFlightRecording=maxsize=0,maxage=0s,disk=true,dumponexit=true,"
"filename=/var/log/test-recording.jfr,delay=10s,duration=20m,settings=profile",
]

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for being that thorough in your tests. However, I think it's a bit too much. Each test should test for a specific thing, and each change should only break one test. Case in point, to change duration to jfr-duration and delay to jfr-delay, you now have to change twelve tests!

In practice, you don't have to test specifically for the Java version, it's done already. And you don't have to test for the "no parameters case", it's done already. You can reduce your twelve tests to one with all the settings enabled (params, delay and duration).

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense, will be updating this to just have a single test will all params.

@dliappis
Copy link
Copy Markdown
Contributor

Shall I proceed with the changes? Additionally just wanted to know if there should be separate commits or a single squashed commit?

Feel free to proceed based on the feedback of @pquentin and mine. No particular preferences for the commits as anyway when it's time to merge we'll be squashing.

@dliappis
Copy link
Copy Markdown
Contributor

@elasticmachine test this please

Copy link
Copy Markdown
Member

@pquentin pquentin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks great! We only need to test that it actually works. Have you tried?

Also left a small question.

Comment on lines +63 to +64
* ``jfr-delay``: (Optional) Length of time to wait before starting to record (INTEGER followed by 's' for seconds 'm' for minutes or 'h' for hours, 0s). It is up to you to correctly pass in the value.
* ``jfr-duration``: (Optional) Length of time to record. Note that 0s means forever (INTEGER followed by 's' for seconds 'm' for minutes or 'h' for hours, 0s). It is up to you to correctly pass in the value.
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we we include , 0s at the end? I don't understand what this means.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

0s is default value for both the parameters. I've updated the documentation to reflect this properly.

@dliappis
Copy link
Copy Markdown
Contributor

@elasticmachine test this please

@debanjanc01
Copy link
Copy Markdown
Contributor Author

This looks great! We only need to test that it actually works. Have you tried?

Also left a small question.

Although I have checked that the final java cmd is as expected, I didn't get the opportunity to actually test this out.

Copy link
Copy Markdown
Member

@pquentin pquentin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! This works.

I tried esrally race --distribution-version=8.5.1 --test-mode --track=geonames --telemetry jfr --telemetry-params="jfr-delay:10s,jfr-duration:10s" and did get a 10 seconds record starting after 10 seconds.

image

We don't usually merge on Fridays, and tomorrow is a Shut it Down day at Elastic, so we'll merge on Monday.

@debanjanc01
Copy link
Copy Markdown
Contributor Author

Thanks for the quick reviews and suggestions @dliappis @pquentin !

@pquentin pquentin merged commit b5bc1b6 into elastic:master Nov 29, 2022
@pquentin pquentin changed the title added delay and duration parameters to jfr telmetry device Add delay and duration parameters to jfr telmetry device Mar 2, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement Improves the status quo :Telemetry Telemetry Devices that gather additional metrics

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Add delay and duration parameters for the JFR telemetry device

3 participants