Skip to content

Benchmark test over RoCE network #82

@yanminjia

Description

@yanminjia

We ran test_internode.py over RoCE network with 4 H800 servers with 8 GPUs as per one server. But the test result is pretty poor by comparing with the case of 4 H800-servers on IB network.

case#1, 4 H800 servers on IB network
Image

case#2, 4 H800 servers on RoCE nework

[tuning] Best dispatch (FP8): SMs 24, NVL chunk 8, RDMA chunk 8: 29.92 GB/s (RDMA), 60.35 GB/s (NVL)
[tuning] Best dispatch (BF16): SMs 24, NVL chunk 12, RDMA chunk 4: 29.54 GB/s (RDMA), 59.58 GB/s (NVL)
[tuning] Best combine: SMs 24, NVL chunk 1, RDMA chunk 16: 13.59 GB/s (RDMA), 27.41 GB/s (NVL)

I am not sure if we have the benchmark test result on RoCE network. Additionally, it would be highly appreciated if any comment.

Many thanks.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions