Skip to content

[Roadmap] TransferQueue Q2 roadmap #49

@0oshowero0

Description

@0oshowero0

Performance

MooncakeStore [Call for Contribution]

  • Use MooncakeStore put_from and get_into + serial_utils.py of TQ to eliminate the complex conditional branch for tensor and non-tensor data, and use zero-copy to improve performance.
    tensor_keys = []
    tensor_values = []
    non_tensor_keys = []
    non_tensor_values = []
    for key, value in zip(keys, values, strict=True):
    if isinstance(value, torch.Tensor):
    tensor = value.contiguous()
    # TODO: use gpu direct rdma instead
    if tensor.device.type == "cuda":
    tensor = tensor.cpu()
    tensor_keys.append(key)
    tensor_values.append(tensor)
    else:
    non_tensor_keys.append(key)
    non_tensor_values.append(pickle.dumps(value))
    if tensor_keys:
    self._batch_put_tensors(tensor_keys, tensor_values)
    if non_tensor_keys:
    self._batch_put_bytes(non_tensor_keys, non_tensor_values)
  • Support GPU tensor transfer using GDR

Optimization

MooncakeStore [Call for Contribution]

Yuanrong

General

  • Refactor BatchMeta as ordinary class to avoid extra transformation before sending @0oshowero0
  • Extract common optimizations from Yuanrong backend to upper level

Stress Test

Integrations

verl

  • Follow up upstream requirements

ROLL

slime

  • Discuss with the community for TQ integration

Documentation & Tutorial

  • Website: Build a dedicated documentation website.
  • API Reference: Generate a comprehensive API list.
  • Onboarding: Provide guiding docs to simplify the user onboarding process.
  • Tutorial: Add pluggable backend tutorial, demonstrate how to switch between supported backends and illustrate the custom_backend_meta usage. @tianyi-ge

CI/CD

  • Unified Workflow: Unify CI experience across GitCode and GitHub (addressing missing tests on GitCode).
  • Performance Checkpoint: Add periodic performance testing to detect regressions in new PRs.
  • Refactoring: Refactor unit test logic for better maintainability.

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions