Skip to content

Add resource stats to task framework#2089

Merged
dblock merged 3 commits intoopensearch-project:mainfrom
sruti1312:feature/resource-stats
Mar 28, 2022
Merged

Add resource stats to task framework#2089
dblock merged 3 commits intoopensearch-project:mainfrom
sruti1312:feature/resource-stats

Conversation

@sruti1312
Copy link
Copy Markdown
Contributor

@sruti1312 sruti1312 commented Feb 11, 2022

Signed-off-by: sruti1312 srutiparthiban@gmail.com

Description

Add resource stats to task framework. The task expose interfaces to allow updating the resource stats.

Sample task response:

"rh6TkaWmTwCp3Tb0zrjHKQ:8120" : {
     "node" : "rh6TkaWmTwCp3Tb0zrjHKQ",
     "id" : 8120,
     "type" : "direct",
     "action" : "indices:data/read/search[phase/query]",
     "start_time_in_millis" : 1637004446880,
     "running_time_in_nanos" : 9010172401,
     "cancellable" : true,
     "parent_task_id" : "rh6TkaWmTwCp3Tb0zrjHKQ:8119",
     "headers" : { },
     "resource_stats" : {
            "total_resource_consumption" : {
              "cpu_time_in_nanos" : 0,
              "memory_in_bytes" : 0
            }
     }
}     

Issues Resolved

This is used for task resource tracking.
#1009
#1329

Check List

  • New functionality includes testing.
    • All tests pass
  • New functionality has been documented.
    • New functionality has javadoc added
  • Commits are signed per the DCO using --signoff

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

@sruti1312 sruti1312 requested a review from a team as a code owner February 11, 2022 06:04
@opensearch-ci-bot
Copy link
Copy Markdown
Collaborator

Can one of the admins verify this patch?

@opensearch-ci-bot
Copy link
Copy Markdown
Collaborator

✅   Gradle Check success fa63613384c4b3e2e19f41b58cd76f175396ad62
Log 2334

Reports 2334

Copy link
Copy Markdown
Member

@dblock dblock left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it simplify things to make ResourceStats a class that is composed of a collection of ResourceStat? From a cursory overview, It feels like it would move serialization into the class along with other helper methods that operate on the collection of stats, simplify usage, and make some of the enums child classes of ResourceStat. It would also enable extending resource stats in the future into something more than just a map of string/value.

@andrross
Copy link
Copy Markdown
Member

Would it simplify things to make ResourceStats a class that is composed of a collection of ResourceStat?

I agree. In general, when you find yourself creating things like Map<String, Map<String, Long>> it's often a sign that you can improve readability and maintainability by creating dedicated classes with more meaningful names and types.

@sruti1312
Copy link
Copy Markdown
Contributor Author

sruti1312 commented Feb 16, 2022

@dblock @andrross
TaskInfo.class includes a parser that is used by REST high level client to parse the task response. This is different as compared to other stats objects like NodeStats or JvmStats. The complexity with creating an object like ResourceStats is that we will not be able to use inbuilt object parsers and create custom parsing logic (token by token) if we want to use the above structure. Different approaches that does not need custom parsers are as follows,

  1. Using a list - List<ResourceStats>:
ResourceStats: {
     stats_type::String
     resource_metadata::Map<String, Long>
}

The structure would look like this,

"resource_stats" : {
            "stats_type": "total_resource_consumption",
            "resource_metadata": {
              "cpu_time_in_nanos" : 100,
              "memory_in_bytes" : 100
            },  
            "stats_type": "response_resource_consumption",
             "resource_metadata": {
              "cpu_time_in_nanos" : 50,
              "memory_in_bytes" : 50
            },  
            "stats_type": "operator_resource_consumption",
            "resource_metadata": {
              "cpu_time_in_nanos" : 50,
              "memory_in_bytes" : 50
            }
     }

Note: stats_type's value cannot be used as the key instead of resource_metadata as parser does not support dynamic key values. If we need to support this, we will need custom parsing logic.

  1. Current approach - Map<String, Map<String, Long>>
"resource_stats" : {
            "total_resource_consumption" : {
              "cpu_time_in_nanos" : 100,
              "memory_in_bytes" : 100
            },  
            "response_resource_consumption" : {
              "cpu_time_in_nanos" : 50,
              "memory_in_bytes" : 50
            },  
            "operator_resource_consumption" : {
              "cpu_time_in_nanos" : 50,
              "memory_in_bytes" : 50
            }
     }

Any thoughts?

@opensearch-ci-bot
Copy link
Copy Markdown
Collaborator

❌   Gradle Check failure 8e9f1bbb2d5b9b8116831fa021c756051cecc170
Log 2465

Reports 2465

@opensearch-ci-bot
Copy link
Copy Markdown
Collaborator

❌   Gradle Check failure da4b58b61e789b2a18474956212ae37610a99eec
Log 2472

Reports 2472

@opensearch-ci-bot
Copy link
Copy Markdown
Collaborator

❌   Gradle Check failure e8ecfd06c0206c9a52e33fe4aa7dd3f718fe6111
Log 2475

Reports 2475

@opensearch-ci-bot
Copy link
Copy Markdown
Collaborator

❌   Gradle Check failure 67d10ce640623c17d9fb5fafd52c47fc2444739e
Log 2477

Reports 2477

@opensearch-ci-bot
Copy link
Copy Markdown
Collaborator

❌   Gradle Check failure 6f2ebcd240d4dbc6cdb530bb33ab9cc8f5799534
Log 2479

Reports 2479

@opensearch-ci-bot
Copy link
Copy Markdown
Collaborator

❌   Gradle Check failure 5da6e8a5983215f35ebe928581363815a6fdc8c0
Log 2481

Reports 2481

@opensearch-ci-bot
Copy link
Copy Markdown
Collaborator

❌   Gradle Check failure 0bbad3d8edb27fa524c3fb9800a8709a21045fe6
Log 2482

Reports 2482

@opensearch-ci-bot
Copy link
Copy Markdown
Collaborator

❌   Gradle Check failure 9cca9ea49d21a82450fd73dc1fb6b025502d4779
Log 2484

Reports 2484

@sruti1312 sruti1312 requested review from andrross and dblock February 17, 2022 16:39
@andrross
Copy link
Copy Markdown
Member

The complexity with creating an object like ResourceStats is that we will not be able to use inbuilt object parsers and create custom parsing logic

Is it a bad thing to write custom parsing logic? The option you presented above as "current approach" is I believe the correct way to structure the JSON. The question here is how to model Java code. The approaches I would consider are:

  1. Current approach: Map<String, Map<String, Long>>

  2. Map of Resource stats: Map<String, ResourceUsage>

ResourceUsage {
  long cpuTimeInNanos;
  long memoryInBytes;
}
  1. Full custom data structure:
ResourceStats {
  ResourceUsage totalResourceConsumption;
  ResourceUsage responseResourceConsumption;
  ResourceUsage operatorResourceConsumption;
}

ResourceUsage {
  long cpuTimeInNanos;
  long memoryInBytes;
}

Copy link
Copy Markdown
Contributor

@Bukhtawar Bukhtawar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The overall structure might need to be optimised in favour of performance, since every update might need some synchronisation based on whether the actual worker thread is updating its resource overhead or the refresh thread.

Note: We can easily make it lock-free if we update the thread context and avoid the refresh. In which case the overhead can be tracked at discrete checkpoints on-demand(high resource consumption of CPU/memory) something similar to task cancellation instead of a refresh thread.
I would recommend simplifying this in favour of performance and maintainability whilst sacrificing accuracy.

If we go with refresh let's do all the computing without any contention with the common resource structure and do a CAS at the end to swap current and older structures.

@xjtushilei
Copy link
Copy Markdown

I am curious how to get cpu time and memory consumption for each query? It's a very difficult job

@Bukhtawar
Copy link
Copy Markdown
Contributor

Bukhtawar commented Feb 24, 2022

@xjtushilei ThreadMXBean#getThreadAllocatedBytes​ is the underlying mechanism that would be used to compute the query cpu time and memory

@opensearch-ci-bot
Copy link
Copy Markdown
Collaborator

❌   Gradle Check failure b11eb0402c18162027f75e9086285f6847199c50
Log 2787

Reports 2787

@opensearch-ci-bot
Copy link
Copy Markdown
Collaborator

❌   Gradle Check failure 8071f4e56e5e1d379734432319525318d287cdac
Log 2800

Reports 2800

@opensearch-ci-bot
Copy link
Copy Markdown
Collaborator

❌   Gradle Check failure 09972f21d74f82b3549bae8f28c5ba3897e26da3
Log 2810

Reports 2810

@opensearch-ci-bot
Copy link
Copy Markdown
Collaborator

❌   Gradle Check failure fcfac4eb15ddf9fcd8921474d1dd29ae940bbf01
Log 2889

Reports 2889

Signed-off-by: sruti1312 <srutiparthiban@gmail.com>
@opensearch-ci-bot
Copy link
Copy Markdown
Collaborator

❌   Gradle Check failure 78880ef84769b0309430c34d34032f5b3dbf81e2
Log 3713

Reports 3713

@opensearch-ci-bot
Copy link
Copy Markdown
Collaborator

✅   Gradle Check success b64ce26
Log 3715

Reports 3715

@sruti1312 sruti1312 requested a review from nknize March 23, 2022 23:27
@sruti1312
Copy link
Copy Markdown
Contributor Author

@dblock and @nknize I have addressed the comments. Can you help with reviewing this PR?

@dblock
Copy link
Copy Markdown
Member

dblock commented Mar 28, 2022

@Bukhtawar you're good with this?

@Bukhtawar
Copy link
Copy Markdown
Contributor

We will revisit as needed, no immediate blockers for merge

@dblock dblock dismissed nknize’s stale review March 28, 2022 17:00

All changes were made.

@dblock dblock merged commit 8b997c1 into opensearch-project:main Mar 28, 2022
@dblock
Copy link
Copy Markdown
Member

dblock commented Mar 29, 2022

@sruti1312 Do we want this in 2.0? 2.x? Let's label backports if so.

@sruti1312
Copy link
Copy Markdown
Contributor Author

@dblock We will want this in 2.0. This PR has the v2.0.0 tag. Is there another tag that we need to included specifically for backports? If so can you help with adding it as I do not have permissions to add/update labels.

@sruti1312 sruti1312 deleted the feature/resource-stats branch March 29, 2022 17:59
@dblock dblock added backport 2.x Backport to 2.x branch backport 2.0 Backport to 2.0 branch labels Mar 29, 2022
@dblock
Copy link
Copy Markdown
Member

dblock commented Mar 29, 2022

I labeled to backport 2.x and 2.0, we have auto-backport going on.

opensearch-trigger-bot bot pushed a commit that referenced this pull request Mar 29, 2022
* Add resource stats to task framework

Signed-off-by: sruti1312 <srutiparthiban@gmail.com>

* Update thread resource info and add tests

Signed-off-by: sruti1312 <srutiparthiban@gmail.com>
(cherry picked from commit 8b997c1)
opensearch-trigger-bot bot pushed a commit that referenced this pull request Mar 29, 2022
* Add resource stats to task framework

Signed-off-by: sruti1312 <srutiparthiban@gmail.com>

* Update thread resource info and add tests

Signed-off-by: sruti1312 <srutiparthiban@gmail.com>
(cherry picked from commit 8b997c1)
return taskInfo(localNodeId, description, status);
if (excludeStats == false) {
resourceStats = new TaskResourceStats(new HashMap<>() {
{
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We probably should not create a new class for that: new TaskResourceStats(Collections.singletonMap(TOTAL, getTotalResourceStats()))

nknize pushed a commit that referenced this pull request Apr 5, 2022
Adds resource stats to task framework

Signed-off-by: sruti1312 <srutiparthiban@gmail.com>
Signed-off-by: Nicholas Walter Knize <nknize@apache.org>
(cherry picked from commit 8b997c1)
nknize pushed a commit that referenced this pull request Apr 5, 2022
Adds resource stats to task framework

Signed-off-by: sruti1312 <srutiparthiban@gmail.com>
Signed-off-by: Nicholas Walter Knize <nknize@apache.org>
(cherry picked from commit 8b997c1)
PritLadani added a commit to PritLadani/OpenSearch that referenced this pull request Sep 6, 2022
Backporting pull requests opensearch-project#2089 and opensearch-project#3982

Signed-off-by: PritLadani <pritkladani@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

backport 2.x Backport to 2.x branch backport 2.0 Backport to 2.0 branch feature New feature or request v2.0.0 Version 2.0.0

Projects

None yet

Development

Successfully merging this pull request may close these issues.

10 participants