Welcome to VenusSim! This is a simulator specifically designed for LLM training, developed by the Infrawaves team. We focus on performance and flexibility, aiming to provide a lightweight and accurate simulation environment to help developers efficiently conduct research and development work. Let's work together to enhance productivity and explore more possibilities.
The usage process of VenusSim is as follows:
- Install VenusSim.
- Modify the configuration file.
- Run the simulation.
- View the simulation results.
- Create Container
# Create Base Container
docker run -it --name llm-sim ubuntu:latest /bin/bash
- Copy the following content to /home/build.sh
apt update
apt install \
build-essential \
python3 python3-dev python3-pip python3-venv \
gcc g++ make cmake \
libboost-dev libboost-program-options-dev \
libprotobuf-dev protobuf-compiler \
git
#Pull VenusSim
git clone git@github.com:cq-eng/VenusSim.git
#astra-sim
cd /home/VenusSim
pip3 install --upgrade pip
pip3 install protobuf==3.6.1 pydot
cd llm-simulator
chmod 777 ./build/astra_analytical/build.sh
./build/astra_analytical/build.sh
# chakra_linker_visualizer
cd /home/VenusSim
python3 -m venv chakra_env
source chakra_env/bin/activate
cd chakra_linker_visualizer
pip install .
cd extern/et_replay
pip install .
- Run build.sh
bash build.sh
After completing the container construction, the /home directory should have the following file structure:
/home/VenusSim
# Python virtual environment, used for chakra_lingker_visualizer and generator.
/chakra_env
# Chakra native trace_linker and timeline_visualizer.
/chakra_linker_visualizer
# astra-sim
/llm-simulator
# trace generator
/trace-generatorAfter confirming the file structure is complete, the installation is finished.
The configuration file is located in trace-generator/demo/llm-sim-example.
- llm-sim-config.json
- Sim-config
- system.json
- network.yml
- memory.json
- comm-group.json
The llm-sim-config.json consists of three parts: trace_generator_config, simulator_config and time_line_visualizer_config
trace_generator_config
This configuration file is used to configure the generation of traces. The specific parameter descriptions are as follows:
| Megatron config params | |
|---|---|
mlp-worker-num |
Machine Count |
world-size |
Total GPU Count |
pipeline-model-parallel-size |
PP size |
context-parallel-size |
CP size |
tensor-model-parallel-size |
TP size |
sequence-parallel |
SP size |
vpp-enable |
0: Disabled 1: Enabled |
num-layers-per-virtual-pipeline-stage |
Number of model layers included in each PP stage |
global-batch-size-mlp-multiplier |
global-batch-size = mlp-worker-num * global-batch-size-mlp-multiplier |
micro-batch-size |
Micro batch size |
forward_only |
0: Disabled 1: Enabled |
no-overlap-p2p-communication |
1: Disabled 0: Enabled |
use-distributed-optimizer |
0: Disabled 1: Enabled |
overlap-grad-reduce |
0: Disabled 1: Enabled |
no-delay-grad-reduce |
1: Disabled 0: Enabled |
overlap-param-gather |
0: Disabled 1: Enabled |
delay-param-gather |
0: Disabled 1: Enabled |
train-iters |
Number of training iterations |
| Model config params | |
model-type |
|
seq-length |
Sequence Length |
decoder-seq-length |
Decoder Sequence Length |
hidden-size |
Hidden Layer Size |
ffn-hidden-size |
FFN Hidden Layer Size |
num-attention-heads |
Number of Attention Heads |
num-query-groups |
Number of MQA Groups |
num-layers |
Number of Model Layers |
vocab-size |
Vocabulary Size |
swiglu |
Swish-Gated Linear Unit 0: Disabled 1: Enabled |
untie-embeddings-and-output-weights |
Whether Input embedding and Output embedding Share Parameters 0: Not Shared 1: Shared |
| Trace generation config params | |
output_filepath |
Trace output path |
output_filename |
Trace output name |
log_filename |
Log file name during trace generation |
trace_generate |
When set to 0, the trace generation phase will be skipped, typically used when the trace has already been generated and the simulator needs to be run multiple times |
use_chakra |
Use Chakra as the trace input for the simulator. When using it, you need to uncomment the line #define Use_Chakra in llm-simulator/astra-sim/workload/ETDefinition.hh from the llm-simulator and recompile the llm-simulator. |
use_simplified_trace |
Use simplified traces as the trace input for the simulator (saving significant memory compared to Chakra). When using it, you need to comment out the line #define Use_Chakra in llm-simulator/astra-sim/workload/ETDefinition.hh from the llm-simulator and recompile the llm-simulator (this configuration is the default for the project and has been compiled accordingly during installation). |
use_simplified_template_trace |
Use simplified template traces as the trace input for the simulator (saving memory compared to simplified traces but slightly increasing runtime). When using it, you need to enable use_simplified_trace and meet the runtime conditions for simplified traces. |
total_file_size_gb_limit |
The total size limit for trace files. If this size is exceeded, the system will delete the generated JSON files and retain only the .et files used for simulation. |
keep_json_files |
Ignore total_file_size_gb_limit during use and force saving JSON files. |
use_step_snapshot |
When set to 1, it will forcibly use the data in step_snapshot_data as the computation time for forward and backward passes. |
step_snapshot_data |
There are three data points, representing the sum of forward and backward times for the pp-0, pp-mid, and pp-end stages (forward: backward = 1:2). |
simulator_config
| binary | The compiled executable file of llm-simulator, usually located in the llm-simulator/build/astra_analytical/build/bin path. |
|---|---|
| system | The absolute path of system.json in sim-config. |
| network | The absolute path of system.json in sim-config |
| remote_memory | The absolute path of memory.json in sim-config |
| comm_group | The absolute path of comm_group.json in sim-config |
| trace_output | The output file path for llm-simulator's execution results |
time_line_visualizer_config
| output_filename | The output file path after visualization |
|---|---|
| single_pp_group | When set to 1, only the visualization result of one PP group will be output. When there are too many ranks, this will significantly reduce runtime and minimize redundant output. |
- system.json
- network.yml
- memory.json (No modification required)
- comm-group.json (Automatically generated, no modification required)
For the above files, only system.json and network.yml need to be modified according to the actual simulation scenario. system.json contains some system parameters of the simulator, while network.yml is used to configure the network parameters of the simulator. system.json generally remains in its default configuration. For the specific meaning of network.yml, please refer to Astra-Sim.
Below is a specific demo:
# Dim0 FullyConnected, Dim1 Ring.
topology: [ FullyConnected, Ring ]
# 16 NPUs:Dim0 8 NPUs,Dim1 2 NPUs
npus_count: [ 8, 2 ] # number of NPUs
# link Bandwidth (per direction)
bandwidth: [ 450.0, 50.0 ] # GB/s
# link Latency
latency: [ 1000.0, 6000.0 ] # nsAfter confirming the installation and configuration files are correct, you can enter the following command to run the demo simulation.
# Enter the virtual environment
source /home/VenusSim/chakra_env/bin/activate
# Switch to the trace-generator directory
cd /home/VenusSim/trace-generator
# Run the simulation program, and fill in the path to llm-sim-config.json according to the actual path.
python3 llm_sim.py --config-file demo/llm-sim-example/llm-sim-config.json-
Open https://ui.perfetto.dev/ in your browser.
-
Open the visualization output file specified by
time_line_visualizer_config:output_filenameinllm-sim-config.jsonat the above URL to view the visual simulation results.