deflake app level autoscaling test#57967
Conversation
Signed-off-by: abrar <abrar@anyscale.com>
There was a problem hiding this comment.
Code Review
This pull request addresses a flaky test in the app-level autoscaling logic. The core issue, where num_waiter == 0 was used to check for request completion, is correctly identified and fixed by instead waiting on the ObjectRefs of the remote calls. This is a solid fix that should improve test stability.
I've found one issue where a necessary wait_for_condition call was accidentally removed, which could introduce a new race condition. I've left a comment with a suggestion to add it back.
The rest of the changes correctly apply the fix across several test cases. While there is some code repetition that could be refactored into a helper function in a follow-up, the current changes are focused and effectively solve the flakiness problem.
Signed-off-by: abrar <abrar@anyscale.com>
num_waiter == 0 does not necessarily mean that the request has been completed. --------- Signed-off-by: abrar <abrar@anyscale.com>
num_waiter == 0 does not necessarily mean that the request has been completed. --------- Signed-off-by: abrar <abrar@anyscale.com>
num_waiter == 0 does not necessarily mean that the request has been completed. --------- Signed-off-by: abrar <abrar@anyscale.com> Signed-off-by: Aydin Abiar <aydin@anyscale.com>
num_waiter == 0 does not necessarily mean that the request has been completed. --------- Signed-off-by: abrar <abrar@anyscale.com> Signed-off-by: Future-Outlier <eric901201@gmail.com>
num_waiter == 0 does not necessarily mean that the request has been completed. --------- Signed-off-by: abrar <abrar@anyscale.com>
num_waiter == 0 does not necessarily mean that the request has been completed. --------- Signed-off-by: abrar <abrar@anyscale.com> Signed-off-by: peterxcli <peterxcli@gmail.com>
num_waiter == 0 does not necessarily mean that the request has been completed.