Fix parallelization and a couple small updates for webvoyager benchmark#479
Open
Fix parallelization and a couple small updates for webvoyager benchmark#479
Conversation
…om eval client other than gpt
7e3acef to
7a96283
Compare
0bc2800 to
31501ab
Compare
Contributor
|
this looks good overall to me, everything still works for GAIA as well with this fix? |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Fixes https://github.com/microsoft/magentic-ui2.0/issues/24
This PR includes multiple fixes:
Fix --parallel:
Currently webvoyager benchmark crashes when parallel > 1. The error is:
Root cause of the bug:
Python's multiprocessing needs to pickle objects to send them to worker processes. The code was passing system and benchmark instances that contained unpicklable variables like azure token provider, browser instances etc
This PR passes constructors for parallel mode and refresh the system per task.
Other fixes: