Skip to content

Commit 14f1bb6

Browse files
committed
Fix divide by zero error when all URI size estimations fail
- Modified _sample_sizes to append 0 for failed URLs instead of discarding them - Failed URLs (returning None or raising exceptions) now contribute 0 to size estimates - Ensures file_sizes list length matches number of sampled URIs - Allows avg_nbytes_per_row == 0 check to catch the all-failures case - Maintains empty row_sizes check for edge case when no URIs are sampled
1 parent 47131b2 commit 14f1bb6

File tree

1 file changed

+3
-2
lines changed

1 file changed

+3
-2
lines changed

python/ray/data/_internal/planner/plan_download_op.py

Lines changed: 3 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -281,6 +281,7 @@ def _estimate_nrows_per_partition(self, block: pa.Table) -> int:
281281
]
282282

283283
target_nbytes_per_partition = self._data_context.target_max_block_size
284+
284285
avg_nbytes_per_row = sum(row_sizes) / len(row_sizes)
285286
if avg_nbytes_per_row == 0:
286287
logger.warning(
@@ -325,9 +326,9 @@ def get_file_size(uri_path, fs):
325326
for future in as_completed(futures):
326327
try:
327328
size = future.result()
328-
if size is not None:
329-
file_sizes.append(size)
329+
file_sizes.append(size if size is not None else 0)
330330
except Exception as e:
331331
logger.warning(f"Error fetching file size for download: {e}")
332+
file_sizes.append(0)
332333

333334
return file_sizes

0 commit comments

Comments
 (0)