Bound object spilling file size to avoid disk increase pressure by yancanmao · Pull Request #60098 · ray-project/ray

yancanmao · 2026-01-13T10:04:20Z

Description

Problem

Ray object spilling fuses multiple objects into a single spill file. Fusion is bounded only by min_spilling_size_ and max_fused_object_count_, not by total spill file size. In practice, individual objects can be very large, so a fused spill file can still grow to many GBs even with a small object count.

In large-scale training setups where memory and local disk are comparable in size (e.g., ~2 TB RAM / ~2 TB local disk per node), sustained memory pressure triggers frequent spilling and makes disk a constrained resource.

Spill file reclamation is file-granular and driven by object reference counts. When large objects are fused together, a spill file is often referenced by multiple long-lived objects. As a result, the file cannot be reclaimed until all references are released. Under sustained spilling, new large spill files keep being created while old ones remain pinned, causing disk usage to monotonically increase and eventually leading to disk-full failures.

While a strict one-object-per-file approach would avoid this retention coupling, it is not practical due to excessive file counts and I/O overhead.

Proposal

We propose to add an optional upper bound on fused spill file size.

Introduce max_spilling_file_size (bytes, default disabled with value -1).
When forming a spill batch, stop adding objects once the accumulated size would exceed the size cap.
Always allow spilling at least one object, even if it exceeds the cap.

This complements max_fused_object_count_ by bounding spill files in terms of bytes, not just object count. Large objects can naturally spill into smaller or single-object files, allowing more flexible and timely disk reclamation.

Related issues

Closes #60097

Additional information

added a new config max_spilling_file_size, default to -1

Signed-off-by: Mao Yancan <yancan.mao@bytedance.com>

gemini-code-assist

Code Review

This pull request introduces a valuable feature to restrict the maximum size of object spill files, addressing potential disk-full issues in large-scale scenarios. The implementation is clean, adding a new configuration max_spilling_file_size and correctly modifying the object spilling logic in TryToSpillObjects. The changes are well-documented and include a new test case that validates the core functionality. My only suggestion is to enhance the test coverage to include a scenario where a single object's size exceeds the new limit, to fully verify the intended behavior.

src/ray/raylet/tests/local_object_manager_test.cc

github-actions · 2026-01-28T12:26:25Z

This pull request has been automatically marked as stale because it has not had
any activity for 14 days. It will be closed in another 14 days if no further activity occurs.
Thank you for your contributions.

You can always ask for help on our discussion forum or Ray's public slack channel.

If you'd like to keep this open, just leave any comment, and the stale label will be removed.

edoakes · 2026-01-30T22:32:12Z

Thanks for another contribution @yancanmao and apologies for the delay reviewing. @Kunchd will review shortly

Kunchd

Thanks for the contribution! The content is very helpful.

I left a couple of nits and questions, but we're almost there.

src/ray/raylet/local_object_manager.cc

src/ray/raylet/local_object_manager.h

src/ray/raylet/tests/local_object_manager_test.cc

Kunchd · 2026-01-30T23:38:15Z

src/ray/raylet/local_object_manager.cc

+        break;
+      }
+
+      bytes_to_spill += object_size;


Out of curiosity, if you limit the amount of bytes that can be spilled each time we attempt to spill, would that result in the system being unable to spill fast enough resulting in significant amount of OOMs in plasma? Wondering if experiments were done for this scenario.

Thanks for the question.

We haven’t run a full “OOM frequency vs. cap size” study yet. Capping spill bytes per attempt (via max_spilling_file_size) reduces spill fusion and can lower effective throughput due to higher per-file overhead.

In a small local benchmark (fixed object_store=75MiB, min_spilling_size=10MiB), a small cap (i.e., 10MB) led to more fragmented spill output and increased end-to-end ray.get / ray.put time by ~20% compared to the unlimited case.

This is an expected trade-off: smaller caps bound spill file sizes and smooth disk usage but may hurt throughput, while larger or unlimited caps favor efficiency. The default remains unlimited, with the cap provided as a tuning knob for users who need tighter control over spill I/O or disk usage.

Thanks for the explanation! It make sense that the default is -1. I'm wondering if this should be documented in the ray_config_def or some other place, what do you think?

In a small local benchmark (fixed object_store=75MiB, min_spilling_size=10MiB), a small cap (i.e., 10MB) led to more fragmented spill output and increased end-to-end ray.get / ray.put time by ~20% compared to the unlimited case.

We should dissect this a little bit more. It would be ideal to have it on by default at a reasonably tuned value. 20% is a lot of overhead for file creation.

@yancanmao We believe that the tradeoff here might reveal an area that could deliver impactful improvement to the system.

Could you try the benchmark with a higher number of I/O workers (for example export RAY_max_io_workers=8) and see if the overhead drops. This will help identify whether or not the 20% drop in throughput is caused by artificial bottleneck within the system.

We really appreciate the work!

@israbbani @Kunchd Thanks for the great suggestions!

I realized my previous benchmark was a bit too "quick and dirty" and represented an extreme edge case (very small cap). To address your concerns about the overhead and whether it's an artificial bottleneck, I re-ran a more rigorous benchmark with a fixed high-pressure workload (Writing 2GB data continuously).

Here are the results with max_io_workers=4:

As you can see in Groups B, C, and D (10MB - 50MB), the throughput is comparable to (or even slightly better than) the Baseline. The "20% overhead" I mentioned earlier only appears in Group E (5MB), where the system is forced to create hundreds of tiny files, causing syscall overheads to dominate. Interestingly, even in Group A (Unlimited), Ray generated small files (~8.8MB) instead of large chunks. This confirms that Ray's current batching (min_spilling_size) is "optimistic"—under memory pressure, it abandons batching to prevent OOM.

Since Ray naturally fragments files under pressure anyway (as seen in Baseline), setting a max_spilling_file_size doesn't necessarily hurt performance. Instead, it provides a predictable upper bound. This is crucial for disk space management—ensuring that we don't end up with massive "zombie files" that can't be deleted because a few small objects inside are still alive.

Here are the results with max_io_workers=8:

We can also see increasing number of io workers can increase the throughput effectively.

My benchmark code is as follows:

import ray import numpy as np import time import os import shutil import gc from tqdm import tqdm # --- Global configuration --- TOTAL_SIZE_GB = 2 OBJECT_SIZE_MB = 5 NUM_OBJECTS = (TOTAL_SIZE_GB * 1024) // OBJECT_SIZE_MB SPILL_DIR_ROOT = "/tmp/ray_spill_experiment" # Experiment configurations EXPERIMENTS = [ { "name": "A: Splitting 100", "min_spill": 100 * 1024 * 1024, "max_cap": 100 * 1024 * 1024, "desc": "Accumulate 100MB -> Write 100MB file" }, { "name": "B: Splitting 50", "min_spill": 50 * 1024 * 1024, "max_cap": 50 * 1024 * 1024, "desc": "Accumulate 50MB -> Write 50MB file" }, { "name": "C: Splitting 20", "min_spill": 20 * 1024 * 1024, "max_cap": 20 * 1024 * 1024, "desc": "Accumulate 20MB -> Write 20MB file" }, { "name": "D: Splitting 10", "min_spill": 10 * 1024 * 1024, "max_cap": 10 * 1024 * 1024, "desc": "Accumulate 10MB -> Write 10MB file" }, { "name": "E: Splitting 5", "min_spill": 5 * 1024 * 1024, "max_cap": 5 * 1024 * 1024, "desc": "Accumulate 5MB -> Write 5MB file (More wakeups)" } ] def get_real_spill_dir(root_dir): """Return the actual spill directory under the given root. Uses startswith("ray_spilled_objects") to match the per-node spill directory. """ for dirpath, dirnames, filenames in os.walk(root_dir): for dirname in dirnames: if dirname.startswith("ray_spilled_objects"): return os.path.join(dirpath, dirname) return None def get_file_stats(directory): if not directory or not os.path.exists(directory): return 0, 0 total_size = 0 file_count = 0 for dirpath, dirnames, filenames in os.walk(directory): for f in filenames: fp = os.path.join(dirpath, f) if not os.path.islink(fp): total_size += os.path.getsize(fp) file_count += 1 return file_count, total_size / 1024 / 1024 def run_experiment(config): print(f"\n>>> Running: {config['name']} <<<") print(f" ({config['desc']})") # 1. Clean up the environment if os.path.exists(SPILL_DIR_ROOT): shutil.rmtree(SPILL_DIR_ROOT) os.makedirs(SPILL_DIR_ROOT, exist_ok=True) if ray.is_initialized(): ray.shutdown() # 2. Ray configuration sys_config = { "automatic_object_spilling_enabled": True, "object_spilling_threshold": 0.8, "min_spilling_size": config['min_spill'], "max_spilling_file_size_bytes": config['max_cap'], "max_io_workers": 8 } # 3. Start Ray mem_limit = 100 * 1024 * 1024 print(f" [System] ObjStore: {mem_limit//1024//1024} MB | MinSpill: {config['min_spill']//1024//1024} MB") ray.init( object_store_memory=mem_limit, _system_config=sys_config, _temp_dir=SPILL_DIR_ROOT, logging_level="ERROR" ) data_shape = (OBJECT_SIZE_MB * 1024 * 1024 // 8, ) refs = [] # Reuse a shared payload object to reduce Python GC noise payload_ref = ray.put(np.zeros(data_shape, dtype=np.int64)) payload = ray.get(payload_ref) # --- Phase 1: Write Stress --- start_time = time.time() # Put as fast as possible for _ in tqdm(range(NUM_OBJECTS), desc=" Writing", leave=False): ref = ray.put(payload) refs.append(ref) # Important: give Ray enough time to finish the last async flush. # This matters more when there are many small spill files and the IO queue is long. time.sleep(5) end_time = time.time() write_duration = end_time - start_time write_bw = (TOTAL_SIZE_GB * 1024) / write_duration # --- Spill file stats --- real_dir = get_real_spill_dir(SPILL_DIR_ROOT) print(f" [Debug] Found Spill Dir: {real_dir}") # Print to confirm the directory was found file_count, disk_usage = get_file_stats(real_dir) avg_file_size = disk_usage / file_count if file_count > 0 else 0 print(f" [Result] Throughput: {write_bw:.2f} MB/s | Files: {file_count} | AvgSize: {avg_file_size:.2f} MB") # --- Phase 2: Cleanup --- del refs gc.collect() time.sleep(2) ray.shutdown() return { "Group": config['name'], "Write Speed (MB/s)": write_bw, "File Count": file_count, "Avg File Size (MB)": avg_file_size } if __name__ == "__main__": results = [] for exp_config in EXPERIMENTS: try: res = run_experiment(exp_config) results.append(res) except Exception as e: print(f"Experiment failed: {e}") import traceback traceback.print_exc() print("\n" + "="*90) print(f"{'Group':<30} | {'Speed (MB/s)':<15} | {'Files':<10} | {'Avg Size (MB)':<15}") print("-" * 90) for res in results: print(f"{res['Group']:<30} | {res['Write Speed (MB/s)']:<15.2f} | {res['File Count']:<10} | {res['Avg File Size (MB)']:<15.2f}") print("="*90)

Thanks for the in depth analysis! I think this is an indicator that there exists a potential artificial bottleneck in how the system spills right now that can be a project for improvement.

src/ray/raylet/tests/local_object_manager_test.cc

Signed-off-by: Mao Yancan <yancan.mao@bytedance.com>

yancanmao · 2026-01-31T08:42:49Z

Thanks @Kunchd for reviewing this PR! I have revised the pr according to your comments.

Kunchd

Thanks for addressing all my previous comments! I've left two quick nits, but we should be close to done.

src/ray/raylet/local_object_manager.cc

Kunchd · 2026-02-02T02:32:25Z

src/ray/raylet/local_object_manager.cc

+        break;
+      }
+
+      bytes_to_spill += object_size;


Thanks for the explanation! It make sense that the default is -1. I'm wondering if this should be documented in the ray_config_def or some other place, what do you think?

src/ray/raylet/local_object_manager.h

Signed-off-by: Mao Yancan <yancan.mao@bytedance.com>

yancanmao · 2026-02-02T04:35:38Z

Thanks @Kunchd for the comment! I have done the nit update according to the comment.

Kunchd

Looks good! Thanks for the contribution!

No passing pre-merge

Kunchd · 2026-02-02T18:14:48Z

I'll will approve the PR once we pass pre-merge tests. You may want to update the branch to the latest version of master.

israbbani · 2026-02-02T19:52:51Z

@yancanmao this is an interesting proposal. I think your approach makes sense. I wonder if we can take it further still.

If I summarize the problem statement as:

Infinite spilling has 20% (which may not be accurate because of artificial bottlenecks) higher throughput than a small spill file. However, infinite spilling will run out of disk.

Then, I'm inclined to believe that compaction of spill files might be a good compromise. Especially if you have SSD throughput available and CPU (worker) cores stalling because of a lack of available memory.

israbbani · 2026-02-02T19:54:08Z

src/ray/common/ray_config_def.h

+/// due to higher per-file overhead. If spilling cannot keep up with allocation under
+/// memory pressure, this may increase the likelihood of object store OOMs.
+/// Set to -1 to disable this limit.
+RAY_CONFIG(int64_t, max_spilling_file_size, -1)


Can you please add the unit to the name of the config? E.g. max_spilling_file_size_bytes. It makes it self-documenting for readers. I know min_spilling_size doesn't follow this, but that's not a good thing.

israbbani · 2026-02-02T19:56:46Z

src/ray/raylet/local_object_manager.cc

+  RAY_LOG(DEBUG) << absl::StrFormat(
+      "Choosing objects to spill with minimum total size %lld, max fused file size %s "
+      "or with total # of objects = %lld",
+      static_cast<long long>(min_spilling_size_),
+      max_spilling_file_size_str,
+      static_cast<long long>(max_fused_object_count_));


Since you're improving the code, can you move this to the ctor of LocalObjectManager? It's good to log static configuration once on startup instead of every time we try to spill.

israbbani · 2026-02-02T19:57:07Z

src/ray/raylet/local_object_manager.h


+  /// Maximum bytes to include in a single spill request (i.e. fused spill file).
+  /// If <= 0, the limit is disabled.
+  int64_t max_spilling_file_size_;


Same comment as above. Please add the unit to the variable name.

Signed-off-by: Mao Yancan <yancan.mao@bytedance.com>

cursor

Cursor Bugbot has reviewed your changes and found 1 potential issue.

src/ray/raylet/tests/local_object_manager_test.cc

Signed-off-by: Mao Yancan <yancan.mao@bytedance.com>

yancanmao · 2026-02-03T09:32:12Z

Thanks @Kunchd @israbbani for your insightful comments and discussion! This helped me understand deeper on the spilling mechanisms. I have updated the code accordingly.
Regarding future improvements, I totally agree that there is significant room for optimization (e.g., compaction) for better object management. I’d be happy to collaborate on that in a follow-up if you are interested.

Kunchd

@yancanmao Thank you for the contribution, and we would love to collaborate on further improving the object spilling logic!

From our discussions and your investigation, it's clear that there are many opportunities for improving the object spilling logic. We will be discussing more on a comprehensive effort to improve object spilling offline, but if you're interested, the most immediate action item would be to investigate the existing implementation of IO workers (e.g. how it performs the write to disk and how many workers are kept around) and whether that's causing an artificial bottleneck. Compaction, like you've mentioned, would be an optimization further down the line.

If you are on the ray slack, we can also loop you in on any discussions we have regarding object spilling improvements there.

yancanmao · 2026-02-04T08:00:55Z

Thanks @Kunchd @israbbani! I'll review the current IO worker implementation and run some isolated benchmarks on file read/write performance as suggested.

I would definitely love to join the discussion on Slack. I'll ping you there to follow up!

…project#60098) ### Problem Ray object spilling fuses multiple objects into a single spill file. Fusion is bounded only by `min_spilling_size_` and `max_fused_object_count_`, not by total spill file size. In practice, individual objects can be very large, so a fused spill file can still grow to many GBs even with a small object count. In large-scale training setups where memory and local disk are comparable in size (e.g., ~2 TB RAM / ~2 TB local disk per node), sustained memory pressure triggers frequent spilling and makes disk a constrained resource. Spill file reclamation is file-granular and driven by object reference counts. When large objects are fused together, a spill file is often referenced by multiple long-lived objects. As a result, the file cannot be reclaimed until all references are released. Under sustained spilling, new large spill files keep being created while old ones remain pinned, causing disk usage to monotonically increase and eventually leading to disk-full failures. While a strict one-object-per-file approach would avoid this retention coupling, it is not practical due to excessive file counts and I/O overhead. ### Proposal We propose to add an optional upper bound on fused spill file size. - Introduce `max_spilling_file_size` (bytes, default disabled with value `-1`). - When forming a spill batch, stop adding objects once the accumulated size would exceed the size cap. - Always allow spilling at least one object, even if it exceeds the cap. This complements `max_fused_object_count_` by bounding spill files in terms of bytes, not just object count. Large objects can naturally spill into smaller or single-object files, allowing more flexible and timely disk reclamation. ## Related issues Closes ray-project#60097 ## Additional information - added a new config `max_spilling_file_size`, default to `-1` --------- Signed-off-by: Mao Yancan <yancan.mao@bytedance.com> Co-authored-by: Mao Yancan <yancan.mao@bytedance.com> Signed-off-by: tiennguyentony <46289799+tiennguyentony@users.noreply.github.com>

…project#60098) ### Problem Ray object spilling fuses multiple objects into a single spill file. Fusion is bounded only by `min_spilling_size_` and `max_fused_object_count_`, not by total spill file size. In practice, individual objects can be very large, so a fused spill file can still grow to many GBs even with a small object count. In large-scale training setups where memory and local disk are comparable in size (e.g., ~2 TB RAM / ~2 TB local disk per node), sustained memory pressure triggers frequent spilling and makes disk a constrained resource. Spill file reclamation is file-granular and driven by object reference counts. When large objects are fused together, a spill file is often referenced by multiple long-lived objects. As a result, the file cannot be reclaimed until all references are released. Under sustained spilling, new large spill files keep being created while old ones remain pinned, causing disk usage to monotonically increase and eventually leading to disk-full failures. While a strict one-object-per-file approach would avoid this retention coupling, it is not practical due to excessive file counts and I/O overhead. ### Proposal We propose to add an optional upper bound on fused spill file size. - Introduce `max_spilling_file_size` (bytes, default disabled with value `-1`). - When forming a spill batch, stop adding objects once the accumulated size would exceed the size cap. - Always allow spilling at least one object, even if it exceeds the cap. This complements `max_fused_object_count_` by bounding spill files in terms of bytes, not just object count. Large objects can naturally spill into smaller or single-object files, allowing more flexible and timely disk reclamation. ## Related issues Closes ray-project#60097 ## Additional information - added a new config `max_spilling_file_size`, default to `-1` --------- Signed-off-by: Mao Yancan <yancan.mao@bytedance.com> Co-authored-by: Mao Yancan <yancan.mao@bytedance.com>

### Problem Ray object spilling fuses multiple objects into a single spill file. Fusion is bounded only by `min_spilling_size_` and `max_fused_object_count_`, not by total spill file size. In practice, individual objects can be very large, so a fused spill file can still grow to many GBs even with a small object count. In large-scale training setups where memory and local disk are comparable in size (e.g., ~2 TB RAM / ~2 TB local disk per node), sustained memory pressure triggers frequent spilling and makes disk a constrained resource. Spill file reclamation is file-granular and driven by object reference counts. When large objects are fused together, a spill file is often referenced by multiple long-lived objects. As a result, the file cannot be reclaimed until all references are released. Under sustained spilling, new large spill files keep being created while old ones remain pinned, causing disk usage to monotonically increase and eventually leading to disk-full failures. While a strict one-object-per-file approach would avoid this retention coupling, it is not practical due to excessive file counts and I/O overhead. ### Proposal We propose to add an optional upper bound on fused spill file size. - Introduce `max_spilling_file_size` (bytes, default disabled with value `-1`). - When forming a spill batch, stop adding objects once the accumulated size would exceed the size cap. - Always allow spilling at least one object, even if it exceeds the cap. This complements `max_fused_object_count_` by bounding spill files in terms of bytes, not just object count. Large objects can naturally spill into smaller or single-object files, allowing more flexible and timely disk reclamation. ## Related issues Closes #60097 ## Additional information - added a new config `max_spilling_file_size`, default to `-1` --------- Signed-off-by: Mao Yancan <yancan.mao@bytedance.com> Co-authored-by: Mao Yancan <yancan.mao@bytedance.com> Signed-off-by: elliot-barn <elliot.barnwell@anyscale.com>

### Problem Ray object spilling fuses multiple objects into a single spill file. Fusion is bounded only by `min_spilling_size_` and `max_fused_object_count_`, not by total spill file size. In practice, individual objects can be very large, so a fused spill file can still grow to many GBs even with a small object count. In large-scale training setups where memory and local disk are comparable in size (e.g., ~2 TB RAM / ~2 TB local disk per node), sustained memory pressure triggers frequent spilling and makes disk a constrained resource. Spill file reclamation is file-granular and driven by object reference counts. When large objects are fused together, a spill file is often referenced by multiple long-lived objects. As a result, the file cannot be reclaimed until all references are released. Under sustained spilling, new large spill files keep being created while old ones remain pinned, causing disk usage to monotonically increase and eventually leading to disk-full failures. While a strict one-object-per-file approach would avoid this retention coupling, it is not practical due to excessive file counts and I/O overhead. ### Proposal We propose to add an optional upper bound on fused spill file size. - Introduce `max_spilling_file_size` (bytes, default disabled with value `-1`). - When forming a spill batch, stop adding objects once the accumulated size would exceed the size cap. - Always allow spilling at least one object, even if it exceeds the cap. This complements `max_fused_object_count_` by bounding spill files in terms of bytes, not just object count. Large objects can naturally spill into smaller or single-object files, allowing more flexible and timely disk reclamation. ## Related issues Closes #60097 ## Additional information - added a new config `max_spilling_file_size`, default to `-1` --------- Signed-off-by: Mao Yancan <yancan.mao@bytedance.com> Co-authored-by: Mao Yancan <yancan.mao@bytedance.com>

…project#60098) ### Problem Ray object spilling fuses multiple objects into a single spill file. Fusion is bounded only by `min_spilling_size_` and `max_fused_object_count_`, not by total spill file size. In practice, individual objects can be very large, so a fused spill file can still grow to many GBs even with a small object count. In large-scale training setups where memory and local disk are comparable in size (e.g., ~2 TB RAM / ~2 TB local disk per node), sustained memory pressure triggers frequent spilling and makes disk a constrained resource. Spill file reclamation is file-granular and driven by object reference counts. When large objects are fused together, a spill file is often referenced by multiple long-lived objects. As a result, the file cannot be reclaimed until all references are released. Under sustained spilling, new large spill files keep being created while old ones remain pinned, causing disk usage to monotonically increase and eventually leading to disk-full failures. While a strict one-object-per-file approach would avoid this retention coupling, it is not practical due to excessive file counts and I/O overhead. ### Proposal We propose to add an optional upper bound on fused spill file size. - Introduce `max_spilling_file_size` (bytes, default disabled with value `-1`). - When forming a spill batch, stop adding objects once the accumulated size would exceed the size cap. - Always allow spilling at least one object, even if it exceeds the cap. This complements `max_fused_object_count_` by bounding spill files in terms of bytes, not just object count. Large objects can naturally spill into smaller or single-object files, allowing more flexible and timely disk reclamation. ## Related issues Closes ray-project#60097 ## Additional information - added a new config `max_spilling_file_size`, default to `-1` --------- Signed-off-by: Mao Yancan <yancan.mao@bytedance.com> Co-authored-by: Mao Yancan <yancan.mao@bytedance.com> Signed-off-by: Adel Nour <ans9868@nyu.edu>

…project#60098) ### Problem Ray object spilling fuses multiple objects into a single spill file. Fusion is bounded only by `min_spilling_size_` and `max_fused_object_count_`, not by total spill file size. In practice, individual objects can be very large, so a fused spill file can still grow to many GBs even with a small object count. In large-scale training setups where memory and local disk are comparable in size (e.g., ~2 TB RAM / ~2 TB local disk per node), sustained memory pressure triggers frequent spilling and makes disk a constrained resource. Spill file reclamation is file-granular and driven by object reference counts. When large objects are fused together, a spill file is often referenced by multiple long-lived objects. As a result, the file cannot be reclaimed until all references are released. Under sustained spilling, new large spill files keep being created while old ones remain pinned, causing disk usage to monotonically increase and eventually leading to disk-full failures. While a strict one-object-per-file approach would avoid this retention coupling, it is not practical due to excessive file counts and I/O overhead. ### Proposal We propose to add an optional upper bound on fused spill file size. - Introduce `max_spilling_file_size` (bytes, default disabled with value `-1`). - When forming a spill batch, stop adding objects once the accumulated size would exceed the size cap. - Always allow spilling at least one object, even if it exceeds the cap. This complements `max_fused_object_count_` by bounding spill files in terms of bytes, not just object count. Large objects can naturally spill into smaller or single-object files, allowing more flexible and timely disk reclamation. ## Related issues Closes ray-project#60097 ## Additional information - added a new config `max_spilling_file_size`, default to `-1` --------- Signed-off-by: Mao Yancan <yancan.mao@bytedance.com> Co-authored-by: Mao Yancan <yancan.mao@bytedance.com>

…project#60098) ### Problem Ray object spilling fuses multiple objects into a single spill file. Fusion is bounded only by `min_spilling_size_` and `max_fused_object_count_`, not by total spill file size. In practice, individual objects can be very large, so a fused spill file can still grow to many GBs even with a small object count. In large-scale training setups where memory and local disk are comparable in size (e.g., ~2 TB RAM / ~2 TB local disk per node), sustained memory pressure triggers frequent spilling and makes disk a constrained resource. Spill file reclamation is file-granular and driven by object reference counts. When large objects are fused together, a spill file is often referenced by multiple long-lived objects. As a result, the file cannot be reclaimed until all references are released. Under sustained spilling, new large spill files keep being created while old ones remain pinned, causing disk usage to monotonically increase and eventually leading to disk-full failures. While a strict one-object-per-file approach would avoid this retention coupling, it is not practical due to excessive file counts and I/O overhead. ### Proposal We propose to add an optional upper bound on fused spill file size. - Introduce `max_spilling_file_size` (bytes, default disabled with value `-1`). - When forming a spill batch, stop adding objects once the accumulated size would exceed the size cap. - Always allow spilling at least one object, even if it exceeds the cap. This complements `max_fused_object_count_` by bounding spill files in terms of bytes, not just object count. Large objects can naturally spill into smaller or single-object files, allowing more flexible and timely disk reclamation. ## Related issues Closes ray-project#60097 ## Additional information - added a new config `max_spilling_file_size`, default to `-1` --------- Signed-off-by: Mao Yancan <yancan.mao@bytedance.com> Co-authored-by: Mao Yancan <yancan.mao@bytedance.com> Signed-off-by: peterxcli <peterxcli@gmail.com>

Mao Yancan added 2 commits January 13, 2026 16:49

Enable max file size control to avoid out of disk

d3bfa8d

Signed-off-by: Mao Yancan <yancan.mao@bytedance.com>

Fix ut for local object manager max file size cap

e8e39e1

Signed-off-by: Mao Yancan <yancan.mao@bytedance.com>

yancanmao requested a review from a team as a code owner January 13, 2026 10:04

gemini-code-assist bot reviewed Jan 13, 2026

View reviewed changes

src/ray/raylet/tests/local_object_manager_test.cc Show resolved Hide resolved

ray-gardener bot added core Issues that should be addressed in Ray Core community-contribution Contributed by the community labels Jan 13, 2026

yancanmao changed the title ~~Object spill max file restrict~~ Bound object spilling file size to avoid disk increase pressure Jan 14, 2026

github-actions bot added the stale The issue is stale. It will be closed within 7 days unless there are further conversation label Jan 28, 2026

edoakes assigned Kunchd Jan 29, 2026

Kunchd removed the stale The issue is stale. It will be closed within 7 days unless there are further conversation label Jan 29, 2026

edoakes added the go add ONLY when ready to merge, run all tests label Jan 30, 2026

Kunchd reviewed Jan 30, 2026

View reviewed changes

nit update

3d2b555

Signed-off-by: Mao Yancan <yancan.mao@bytedance.com>

Kunchd reviewed Feb 2, 2026

View reviewed changes

nit update

c852736

Signed-off-by: Mao Yancan <yancan.mao@bytedance.com>

Kunchd previously approved these changes Feb 2, 2026

View reviewed changes

israbbani reviewed Feb 2, 2026

View reviewed changes

yancanmao and others added 2 commits February 3, 2026 10:15

Merge branch 'master' into object_spill_max_file_restrict

873eca4

nit update

b7c87b1

Signed-off-by: Mao Yancan <yancan.mao@bytedance.com>

cursor bot reviewed Feb 3, 2026

View reviewed changes

src/ray/raylet/tests/local_object_manager_test.cc Outdated Show resolved Hide resolved

nit update

7ff354c

Signed-off-by: Mao Yancan <yancan.mao@bytedance.com>

Kunchd approved these changes Feb 4, 2026

View reviewed changes

edoakes merged commit 6556eaf into ray-project:master Feb 4, 2026
6 checks passed

Conversation

yancanmao commented Jan 13, 2026

Description

Problem

Proposal

Related issues

Additional information

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

github-actions bot commented Jan 28, 2026

Uh oh!

edoakes commented Jan 30, 2026

Uh oh!

Kunchd left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

yancanmao commented Jan 31, 2026

Uh oh!

Kunchd left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

yancanmao commented Feb 2, 2026

Uh oh!

Kunchd left a comment

Choose a reason for hiding this comment

Uh oh!

Kunchd commented Feb 2, 2026

Uh oh!

israbbani commented Feb 2, 2026

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

cursor bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

yancanmao commented Feb 3, 2026

Uh oh!

Kunchd left a comment

Choose a reason for hiding this comment

Uh oh!

yancanmao commented Feb 4, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone