When we use the backend of tq+mooncake-store, the volume of text scenarios is relatively small, but the volume of multi-modal scenarios is relatively large. As the number of GBS and images increases linearly, the amount to be put may approach 10G. However, when choosing Mooncake-store as the backend, the kv_batch_put uses the no zero-copy API and requires a relatively large local buffer size to copy the data. This requirement only exists for the put operation, but the get client inherits this configuration, resulting in nearly gpu_per_node * local buffer size of invalid data on one machine. This issue becomes more obvious in the multi-modal scenario.
When we use the backend of tq+mooncake-store, the volume of text scenarios is relatively small, but the volume of multi-modal scenarios is relatively large. As the number of GBS and images increases linearly, the amount to be put may approach 10G. However, when choosing Mooncake-store as the backend, the kv_batch_put uses the no zero-copy API and requires a relatively large local buffer size to copy the data. This requirement only exists for the put operation, but the get client inherits this configuration, resulting in nearly gpu_per_node * local buffer size of invalid data on one machine. This issue becomes more obvious in the multi-modal scenario.