feat: Add a parallel JSON log by default#1892
Conversation
|
Need to merge #1890 in for linter to pass. |
gbanasiak
left a comment
There was a problem hiding this comment.
I did a quick functional test and I'm not getting timestamp field in the logs:
% esrally list races
% cat ~/.rally/logs/rally.log
2024-11-18 13:12:31,530 -not-actor-/PID:16574 esrally.rally INFO OS [uname_result(system='Darwin', node='pc-913.home', release='23.6.0', version='Darwin Kernel Version 23.6.0: Thu Sep 12 23:35:10 PDT 2024; root:xnu-10063.141.1.701.1~1/RELEASE_ARM64_T6030', machine='arm64')]
2024-11-18 13:12:31,530 -not-actor-/PID:16574 esrally.rally INFO Python [namespace(name='cpython', cache_tag='cpython-312', version=sys.version_info(major=3, minor=12, micro=2, releaselevel='final', serial=0), hexversion=51118832, _multiarch='darwin')]
2024-11-18 13:12:31,557 -not-actor-/PID:16574 esrally.rally INFO Rally version [2.11.1.dev0 (git revision: 066bb7587468b4bef74b3b5978042d1dfd074d97)]
2024-11-18 13:12:31,557 -not-actor-/PID:16574 esrally.utils.net INFO Connecting directly to the Internet (no proxy support) for [all_proxy].
2024-11-18 13:12:31,557 -not-actor-/PID:16574 esrally.utils.net INFO Connecting directly to the Internet (no proxy support) for [all_proxy].
2024-11-18 13:12:31,557 -not-actor-/PID:16574 esrally.rally INFO Cleaning track dependency directory [/Users/grzegorz/.rally/libs]...
2024-11-18 13:12:31,557 -not-actor-/PID:16574 esrally.metrics INFO Creating file race store
% cat ~/.rally/logs/rally.json
{"message": "OS [uname_result(system='Darwin', node='pc-913.home', release='23.6.0', version='Darwin Kernel Version 23.6.0: Thu Sep 12 23:35:10 PDT 2024; root:xnu-10063.141.1.701.1~1/RELEASE_ARM64_T6030', machine='arm64')]", "taskName": null, "actorAddress": "-not-actor-"}
{"message": "Python [namespace(name='cpython', cache_tag='cpython-312', version=sys.version_info(major=3, minor=12, micro=2, releaselevel='final', serial=0), hexversion=51118832, _multiarch='darwin')]", "taskName": null, "actorAddress": "-not-actor-"}
{"message": "Rally version [2.11.1.dev0 (git revision: 066bb7587468b4bef74b3b5978042d1dfd074d97)]", "taskName": null, "actorAddress": "-not-actor-"}
{"message": "Connecting directly to the Internet (no proxy support) for [all_proxy].", "taskName": null, "actorAddress": "-not-actor-"}
{"message": "Connecting directly to the Internet (no proxy support) for [all_proxy].", "taskName": null, "actorAddress": "-not-actor-"}
{"message": "Cleaning track dependency directory [/Users/grzegorz/.rally/libs]...", "taskName": null, "actorAddress": "-not-actor-"}
{"message": "Creating file race store", "taskName": null, "actorAddress": "-not-actor-"}
% python --version
Python 3.12.2
% pip list | grep python-json
python-json-logger 2.0.7
|
Let's also update https://github.com/elastic/rally/blob/master/docs/configuration.rst#logging, second paragraph. |
066bb75 to
378d078
Compare
Care to give it a try again @gbanasiak? I've added the |
32ebe9c to
d70823e
Compare
|
That's better, now I'm getting this after formatting: There's a new To make this easier to ingest by Filebeat / Elastic Agent we could change This could be addressed by proper Filebeat configuration too. I was thinking about taking this one step further and modifying log attributes to be ECS compliant (also ECS fields in Filebeat docs). Perhaps this could be done with object translator? But I don't find obvious candidates for some of the fields: |
We have to change |
|
I've used the I've also subclassed the formatter in order to move the thespian log fields into their own |
|
Example result: |
gbanasiak
left a comment
There was a problem hiding this comment.
Great idea to use ecs-logging. I did another round of e2e tests and found no collisions when ingesting with Filebeat.
Can we drop log.original entirely? It's useful when documents are processed with ES ingest pipeline, but I don't think we need this (at least not yet) as message field has exactly the same content.
I left some comments and questions.
esrally/log.py
Outdated
| if log_record.get("actorAddress"): | ||
| actor["address"] = log_record.pop("actorAddress") | ||
| if log_record.get("taskName"): | ||
| actor["task"] = log_record.pop("taskName") | ||
| if actor: | ||
| collections.deep_update(log_record, {"rally": {"actor": actor}}) |
There was a problem hiding this comment.
I don't see rally.actor.task populated ever, so I think taskName is lost. Once this is explained we need to think about the right placement for this info. As per docs it should correspond to asyncio.Task name so this is not related to Thespian actors. I would put it outside of rally.actor, rally.asyncio maybe? But then maybe not rally.actor but rather rally.thespian? Ugh, naming...
What's the rationale for the new collections.deep_update() method? Couldn't we simply set rally key in log record dict? It's a new top-level element, so I think that's safe?
There was a problem hiding this comment.
Or we can just drop taskName for now.
There was a problem hiding this comment.
Changed in version 3.12: taskName was added
Huh, turns out that's fairly new.
We can add %(taskName)s to the format string we pass to the Formatter.
There was a problem hiding this comment.
Huh, turns out that's fairly new.
Ah, yes, most recently I've tested with 3.11.7 from esbench env. After switching to 3.12.2 and:
diff --git a/esrally/resources/logging.json b/esrally/resources/logging.json
index c1e4b6da..7651ebc5 100644
--- a/esrally/resources/logging.json
+++ b/esrally/resources/logging.json
@@ -2,12 +2,12 @@
"version": 1,
"formatters": {
"normal": {
- "format": "%(asctime)s,%(msecs)d %(actorAddress)s/PID:%(process)d %(name)s %(levelname)s %(message)s",
+ "format": "%(asctime)s,%(msecs)d %(actorAddress)s/PID:%(process)d %(name)s %(taskName)s %(levelname)s %(message)s",
"datefmt": "%Y-%m-%d %H:%M:%S",
"()": "esrally.log.configure_utc_formatter"
},
"json": {
- "format": "%(message)s",
+ "format": "%(message)s %(taskName)s",
"()": "esrally.log.configure_ecs_formatter"
},
"profile": {
I can see taskName populated in rally.log but not in rally.json. I think it gets swallowed in here. We could override format_to_ecs() but I'm also fine just skipping it for now. We can revisit once we find a use for this information. What we're adding in process object is already very useful.
Still interested in hearing the rationale behind deep_update() :)
There was a problem hiding this comment.
deep_update is so that just in case we have something already there, like someone adds a future mutator, we don't clobber any dict that is already defined. Like would happen with dict.update()
Absolutely, I'm just going to drop that in |
For ease of programmatic parsing, and uploading to Kibana with filebeat
These take the log record, and add additional fields to it. This is configurable via the `mutators` property in `~/.rally/logging.json`.
5959cfb to
14abad5
Compare
|
Okay! It's been a while since I remembered this, but I've added the ability to change mutators and such. The current format: Note: |
Removed old stuff, and pulled out MutatorType to make the RallyEcsFormatter definition shorter
gbanasiak
left a comment
There was a problem hiding this comment.
Many thanks for adding this. LGTM. See remark regarding protected access.
This fixes a lint error that we got from trying to use a protected method. Thanks @gbanasiak for the catch.
For ease of programmatic parsing, and uploading to Kibana with filebeat
make check-allsuccessfully?