Skip to content

Fix running tracks across multiple load driver machines#1763

Merged
b-deam merged 5 commits intoelastic:masterfrom
b-deam:multi-loaddriver-support
Aug 22, 2023
Merged

Fix running tracks across multiple load driver machines#1763
b-deam merged 5 commits intoelastic:masterfrom
b-deam:multi-loaddriver-support

Conversation

@b-deam
Copy link
Copy Markdown
Member

@b-deam b-deam commented Aug 17, 2023

This commit adds a 'new' Bootstrap message handler to the Worker actor, which is responsible for ensuring that the new actor has the correct configuration and track modules loaded into its system path. This restores the ability to run tracks across multiple load driver machines.

The reason tracks worked across Worker actors on coordinator machine (where the race was invoked) is because the default behaviour for new actors created with multiprocTCPBase is that they fork (in the Unix sense) from their parent process. When a process is forked, it inherits the memory and other properties from the parent, including the system path.

When a new Worker was created on a remote load driver machine the existing parent was missing paths, meaning there were missing modules for the load path.

Notes:

Fixes #1752
Relates #1206

@b-deam b-deam added bug Something's wrong :Usability Makes Rally easier to use :Load Driver Changes that affect the core of the load driver such as scheduling, the measurement approach etc. labels Aug 17, 2023
@b-deam b-deam added this to the 2.10.0 milestone Aug 17, 2023
@b-deam b-deam self-assigned this Aug 17, 2023
@b-deam b-deam removed the :Usability Makes Rally easier to use label Aug 17, 2023
Copy link
Copy Markdown
Member

@pquentin pquentin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wow, that's a nice catch! I left a small nit about naming.

Comment on lines +1218 to +1220
self.driver_actor = sender
# load node-specific config to have correct paths available
self.cfg = load_local_config(msg.config)
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we really need to attach those to self? If we do we should set them to None in __init__ but it looks they're only used locally.

Additionally, I think having both cfg and msg.config is confusing. What do you think about node_config instead of cfg?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good points! I refactored these attributes in 34566f1 as I noticed we were needlessly setting the config twice otherwise.

Copy link
Copy Markdown
Member

@gareth-ellis gareth-ellis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@b-deam
Copy link
Copy Markdown
Member Author

b-deam commented Aug 21, 2023

buildkite test this please

@b-deam
Copy link
Copy Markdown
Member Author

b-deam commented Aug 21, 2023

Hmm, there's some race condition here related to multiple workers on the same host all loading the track (i.e. running git operations) in parallel:

2023-08-21 04:42:59,804 ActorAddr-(T|:61276)/PID:44024 esrally.utils.process INFO fatal: Unable to create '/Users/bradleydeam/.rally/benchmarks/tracks/rally-tracks-compat/.git/index.lock': File exists.

I'll work out a fix.

@b-deam b-deam requested a review from pquentin August 21, 2023 06:43
@b-deam
Copy link
Copy Markdown
Member Author

b-deam commented Aug 22, 2023

Confirmed working as of latest commit:
image

@b-deam b-deam merged commit 181485a into elastic:master Aug 22, 2023
@pquentin pquentin changed the title Add 'Bootstrap' message handler to 'Worker' actor Fix running tracks across multiple load driver machines Aug 24, 2023
@pquentin pquentin modified the milestones: 2.10.0, 2.9.0 Aug 24, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug Something's wrong :Load Driver Changes that affect the core of the load driver such as scheduling, the measurement approach etc.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Track execution can hang with multiple load generator machines

3 participants