DualPipe Enhanced

Analytical tools and benchmarking framework for DualPipe and DualPipeV bidirectional pipeline parallelism algorithms from the DeepSeek-V3 Technical Report.

Pure Python • Zero Dependencies • No GPU Required

This enhanced version provides three modules for analyzing, simulating, and benchmarking bidirectional pipeline parallelism strategies—all in pure Python with no external dependencies. Understand DualPipe scheduling, compare strategies, estimate memory, and find optimal configurations without running distributed training.

What is DualPipe?

DualPipe is an innovative bidirectional pipeline parallelism algorithm that achieves full overlap of forward and backward computation-communication phases while reducing pipeline bubbles. Unlike traditional pipeline parallelism (GPipe, 1F1B), DualPipe divides the model vertically into two halves and allows data to flow bidirectionally.

For detailed information, refer to the DeepSeek-V3 Technical Report and profile data.

Visual Guide

DualPipe Scheduling:

Example DualPipe scheduling for 8 PP ranks and 20 micro-batches in two directions. Two cells enclosed by a shared black border have mutually overlapped computation and communication.

DualPipeV Scheduling:

Example DualPipeV scheduling for 4 PP ranks (8 PP stages) and 10 micro-batches. DualPipeV is a concise V-shape schedule derived from DualPipe using a "cut-in-half" procedure (thanks to Sea AI Lab for the blog post).

Enhanced Modules

Module 1: Pipeline Analysis (`dualpipe_enhanced/analysis.py`) - 49 tests

Computational analysis of pipeline scheduling without a distributed runtime.

Key Classes:

ScheduleAnalyzer: Generate operation timelines for DualPipe/DualPipeV, compute bubble ratios, find optimal chunk counts
BubbleCalculator: Calculate and compare bubble ratios across all strategies (DualPipe, DualPipeV, 1F1B, GPipe)
CommunicationAnalyzer: Count P2P operations, estimate communication volume, analyze bandwidth requirements

Module 2: Schedule Simulator (`dualpipe_enhanced/simulator.py`) - 43 tests

Pure Python simulation of full pipeline execution across ranks and microbatches.

Key Classes:

PipelineEvent: Dataclass representing individual pipeline events (forward, backward, weight, communication)
ScheduleSimulator: Simulate complete schedules for DualPipe, DualPipeV, 1F1B, and GPipe strategies
ScheduleComparator: Compare strategies by makespan, bubble ratio, memory factor with ASCII reports
ConfigOptimizer: Find optimal chunk counts, run sensitivity analysis, determine communication thresholds

Module 3: Benchmarking & Performance Estimation (`dualpipe_enhanced/benchmark.py`) - 40 tests

Estimate memory usage, throughput, and scalability characteristics.

Key Classes:

MemoryEstimator: Estimate activation, parameter, and total memory per rank for each strategy
ThroughputEstimator: Estimate step time, model FLOPs utilization (MFU), samples/tokens per second
ScalabilityAnalyzer: Perform weak and strong scaling analysis, find scaling limits
ConfigurationSearch: Search for optimal (num_ranks, num_chunks) configurations under memory constraints

Installation

Clone the repository (no pip install needed):

git clone https://github.com/deepseek-ai/DualPipe.git dualpipe-enhanced
cd dualpipe-enhanced

Verify the enhanced modules are present:

ls dualpipe_enhanced/
# Expected: __init__.py, analysis.py, simulator.py, benchmark.py

Quick Start

Example 1: Compare Bubble Ratios

from dualpipe_enhanced.analysis import BubbleCalculator

# Compare bubble ratios for 4 pipeline stages, 16 microbatches
calc = BubbleCalculator(num_ranks=4)
num_chunks = 16

dualpipe_result = calc.dualpipe_bubble(num_chunks)
dualpipev_result = calc.dualpipev_bubble(num_chunks)
gpipe_result = calc.gpipe_bubble(num_chunks)

print(f"DualPipe   bubble_ratio={dualpipe_result['bubble_ratio']:.3f}")
print(f"DualPipeV  bubble_ratio={dualpipev_result['bubble_ratio']:.3f}")
print(f"GPipe      bubble_ratio={gpipe_result['bubble_ratio']:.3f}")

Output:

DualPipe   bubble_ratio=0.108
DualPipeV  bubble_ratio=0.125
GPipe      bubble_ratio=0.375

Example 2: Simulate Full Schedule

from dualpipe_enhanced.simulator import ScheduleComparator

# Simulate all strategies for 4 ranks, 8 chunks
comparator = ScheduleComparator(num_ranks=4, num_chunks=8)
results = comparator.compare_strategies()

# Print ASCII report
print(comparator.generate_report())

Output:

Strategy Performance Comparison
================================
Makespan (lower is better):
  1. dualpipev   : 17.0
  2. dualpipe    : 18.0
  3. 1f1b        : 20.0
  4. gpipe       : 30.0

Bubble Ratio (lower is better):
  1. dualpipev   : 0.059
  2. dualpipe    : 0.067
  3. 1f1b        : 0.150
  4. gpipe       : 0.400

Example 3: Estimate Memory

from dualpipe_enhanced.benchmark import MemoryEstimator

# Estimate memory for 4 ranks, 16 chunks
# Model: 1B parameters, hidden_dim=1024, seq_len=2048, batch_size=64
mem = MemoryEstimator(
    num_ranks=4,
    num_chunks=16,
    model_params_per_rank=250_000_000,
    hidden_dim=1024,
    seq_len=2048,
    batch_size=64
)

for strategy in ["dualpipe", "1f1b", "gpipe"]:
    activation = mem.estimate_activation_memory(strategy)
    print(f"{strategy}: {activation['peak_mb']:.1f} MB activation")

Example 4: Find Optimal Configuration

from dualpipe_enhanced.benchmark import ConfigurationSearch

# Search for optimal (num_ranks, num_chunks) under 80GB GPU memory
searcher = ConfigurationSearch(
    model_params=10_000_000,
    hidden_dim=1024,
    seq_len=512,
    batch_size=64,
    gpu_memory_gb=80.0
)

configs = searcher.search_optimal_config(min_ranks=2, max_ranks=8)

for i, cfg in enumerate(configs[:3], 1):
    print(f"{i}. Ranks={cfg['num_ranks']}, Chunks={cfg['num_chunks']}, "
          f"Throughput={cfg['throughput']:.2f} samples/s")

Using Original DualPipe

The original DualPipe implementation is available in the dualpipe/ directory:

python examples/example_dualpipe.py
python examples/example_dualpipev.py

Note: For real-world applications, you will need to implement a custom overlapped_forward_backward method tailored to your specific module.

Testing

Run all 132 tests:

python3 -m pytest tests/ -v

Expected output:

tests/test_analysis.py ......................................... [ 37%]
tests/test_simulator.py .......................................... [ 75%]
tests/test_benchmark.py ........................................... [100%]

======================== 132 passed in 0.09s ========================

Test Coverage:

test_analysis.py: 49 tests for ScheduleAnalyzer, BubbleCalculator, CommunicationAnalyzer
test_simulator.py: 43 tests for PipelineEvent, ScheduleSimulator, ScheduleComparator, ConfigOptimizer
test_benchmark.py: 40 tests for MemoryEstimator, ThroughputEstimator, ScalabilityAnalyzer, ConfigurationSearch

Architecture

dualpipe_enhanced/              # Pure Python analytical modules
├── __init__.py
├── analysis.py                 # Pipeline analysis (49 tests)
├── simulator.py                # Schedule simulation (43 tests)
└── benchmark.py                # Performance estimation (40 tests)

dualpipe/                       # Original DualPipe implementation
├── dualpipe.py
├── dualpipev.py
├── comm.py
└── utils.py

examples/                       # Usage examples for original DualPipe
├── example_dualpipe.py
└── example_dualpipev.py

tests/                          # Test suite (132 tests)
├── test_analysis.py
├── test_simulator.py
└── test_benchmark.py

Zero Dependencies

All core analytical functionality is pure Python—no PyTorch, no CUDA, no distributed runtime needed. Uses only Python's standard library:

dataclasses (for PipelineEvent)
typing (for type hints)
math (for calculations)

Pipeline Bubbles and Memory Comparison

Theoretical comparison of pipeline strategies (same number of PP stages):

Method	Bubble	Parameter Per Device	Activation Per Device	#Devices
1F1B	(PP-1)(𝐹+𝐵)	1×	PP	PP
ZB1P	(PP-1)(𝐹+𝐵-2𝑊)	1×	PP	PP
DualPipe	(PP/2-1)(𝐹&𝐵+𝐵-3𝑊)	2×	PP+1	PP
DualPipeV	(PP/2-1)(𝐹&𝐵+𝐵-3𝑊)	2×	PP+1	PP/2

PP denotes the number of pp stages (even). 𝐹 denotes the execution time of a forward chunk, 𝐵 denotes the execution time of a full backward chunk, 𝑊 denotes the execution time of a "backward for weights" chunk, and 𝐹&𝐵 denotes the execution time of two mutually overlapped forward and backward chunks.

Use Cases

Algorithm Research: Understand DualPipe's scheduling properties without distributed training
Configuration Planning: Find optimal rank and chunk counts before launching training
Strategy Comparison: Compare DualPipe vs. 1F1B vs. GPipe for your hardware setup
Memory Analysis: Estimate memory requirements under different configurations
Scaling Studies: Analyze weak/strong scaling characteristics across cluster sizes

References

DeepSeek-V3 Technical Report: https://arxiv.org/abs/2412.19437
Original DualPipe: github.com/deepseek-ai/DualPipe
Related Work:
- GPipe (Huang et al., 2019)
- 1F1B (Narayanan et al., 2021)
- Sea AI Lab "Cut-in-half" procedure for DualPipeV

Requirements

For Enhanced Modules:

Python 3.7+
No external dependencies (pure Python)

For Original DualPipe:

PyTorch 2.0 and above

Developers

Original DualPipe: Created and developed by Jiashi Li, Chengqi Deng, and Wenfeng Liang (DeepSeek-AI)

Enhanced Modules: Analytical framework for pipeline parallelism research and optimization

Citation

@misc{deepseekai2025deepseekv3technicalreport,
      title={DeepSeek-V3 Technical Report},
      author={DeepSeek-AI},
      year={2025},
      eprint={2412.19437},
      archivePrefix={arXiv},
      primaryClass={cs.CL},
      url={https://arxiv.org/abs/2412.19437},
}

License

MIT License. See LICENSE file for details.

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
dualpipe		dualpipe
dualpipe_enhanced		dualpipe_enhanced
examples		examples
images		images
tests		tests
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

DualPipe Enhanced

What is DualPipe?

Visual Guide

Enhanced Modules

Module 1: Pipeline Analysis (`dualpipe_enhanced/analysis.py`) - 49 tests

Module 2: Schedule Simulator (`dualpipe_enhanced/simulator.py`) - 43 tests

Module 3: Benchmarking & Performance Estimation (`dualpipe_enhanced/benchmark.py`) - 40 tests

Installation

Quick Start

Example 1: Compare Bubble Ratios

Example 2: Simulate Full Schedule

Example 3: Estimate Memory

Example 4: Find Optimal Configuration

Using Original DualPipe

Testing

Architecture

Zero Dependencies

Pipeline Bubbles and Memory Comparison

Use Cases

References

Requirements

Developers

Citation

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

DualPipe Enhanced

What is DualPipe?

Visual Guide

Enhanced Modules

Module 1: Pipeline Analysis (dualpipe_enhanced/analysis.py) - 49 tests

Module 2: Schedule Simulator (dualpipe_enhanced/simulator.py) - 43 tests

Module 3: Benchmarking & Performance Estimation (dualpipe_enhanced/benchmark.py) - 40 tests

Installation

Quick Start

Example 1: Compare Bubble Ratios

Example 2: Simulate Full Schedule

Example 3: Estimate Memory

Example 4: Find Optimal Configuration

Using Original DualPipe

Testing

Architecture

Zero Dependencies

Pipeline Bubbles and Memory Comparison

Use Cases

References

Requirements

Developers

Citation

License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Module 1: Pipeline Analysis (`dualpipe_enhanced/analysis.py`) - 49 tests

Module 2: Schedule Simulator (`dualpipe_enhanced/simulator.py`) - 43 tests

Module 3: Benchmarking & Performance Estimation (`dualpipe_enhanced/benchmark.py`) - 40 tests

Packages