Skip to content

[Data][Flaky] pytorch_resnet_batch_prediction is flaky #58546

@bveeramani

Description

@bveeramani

Test File

pytorch_resnet_batch_prediction.ipynb

Issue Description

The test fails intermittently with an assertion error indicating that the internal input queue for a MapBatches operator is not empty when it's expected to be. This suggests a race condition or timing issue in the streaming executor's queue management.

Root Cause

During execution, the streaming executor validates that operator queues are empty at certain checkpoints. The assertion fails when the MapBatches(ResnetModel) operator still has 1 bundle in its internal input queue when the validation occurs:

AssertionError: Expected Internal Input Queue for MapBatches(ResnetModel) to be empty, but found 1 bundles

Example Failure

[2025-11-11T06:47:23Z]   File "/rayci/python/ray/data/_internal/execution/streaming_executor.py", line 522, in _scheduling_loop_step
[2025-11-11T06:47:23Z]     self._validate_operator_queues_empty(op, state)
[2025-11-11T06:47:23Z]   File "/rayci/python/ray/data/_internal/execution/streaming_executor.py", line 564, in _validate_operator_queues_empty
[2025-11-11T06:47:23Z]     assert op.internal_input_queue_num_blocks() == 0, error_msg.format(
[2025-11-11T06:47:23Z] AssertionError: Expected Internal Input Queue for MapBatches(ResnetModel) to be empty, but found 1 bundles

This appears to be an internal Ray Data execution bug rather than a test issue.

Metadata

Metadata

Assignees

Labels

P1Issue that should be fixed within a few weeksbugSomething that is supposed to be working; but isn'tdataRay Data-related issuesgood-first-issueGreat starter issue for someone just starting to contribute to Ray

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions