Skip to content

[Optimization] Reduce ShardId key size in AsyncShardFetch Revamp strategy #12010

@amkhar

Description

@amkhar

Describe the bug

AsyncShardFetch Revamp strategy (explained here ) uses ShardId object directly in the key to store metadata of all the shards received from all the nodes. So overall memory usage goes in the factor of

ShardId object size * shard_count * node_count = this goes in GBs as ShardId object contains more data.

Related component

Cluster Manager

To Reproduce

  1. Use the latest code of this project
  2. spin up a new cluster with 500 nodes and 500K shards
  3. Restart all cluster manager nodes at once
  4. We'll see heap getting full, can go up to 50GB

Expected behavior

We should reduce the key size and not use full object, ideally use smaller sized key to reduce overall heap usage.

Additional Details

Screenshots
async-shard-fetch-batch-dominator-tree

heap dump collected from a cluster where batch size is 4000 and one batch is taking more than 500MBs.

Metadata

Metadata

Assignees

Labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions