Skip to content

Conversation

@TaekyungHeo
Copy link
Contributor

@TaekyungHeo TaekyungHeo commented May 8, 2024

Summary

Fix bugs in et_converter

  • Fix formatting due to ruff in et_converter/pytorch2chakra_converter.py
  • Remove UniqueIdAssigner from et_converter/pytorch2chakra_converter.py
  • Update pytorch2chakra_converter to support various versions
  • Ignore et_converter/ for pyre checks
  • Use GPU operator's name for identifying communication type
  • Fix communication size calculation logic
  • Remove inter-stream dependencies

Test Plan

1. Run trace_link

chakra_trace_link --pytorch-et-file /Users/theo/Downloads/llama_pytorch24.05/megatron_et_0.json --kineto-file /Users/theo/Downloads/llama_pytorch24.05/megatron_kineto_0.json --output-file ~/megatron_0.json &
chakra_trace_link --pytorch-et-file /Users/theo/Downloads/llama_pytorch24.05/megatron_et_1.json --kineto-file /Users/theo/Downloads/llama_pytorch24.05/megatron_kineto_1.json --output-file ~/megatron_1.json &
chakra_trace_link --pytorch-et-file /Users/theo/Downloads/llama_pytorch24.05/megatron_et_2.json --kineto-file /Users/theo/Downloads/llama_pytorch24.05/megatron_kineto_2.json --output-file ~/megatron_2.json &
chakra_trace_link --pytorch-et-file /Users/theo/Downloads/llama_pytorch24.05/megatron_et_3.json --kineto-file /Users/theo/Downloads/llama_pytorch24.05/megatron_kineto_3.json --output-file ~/megatron_3.json &
chakra_trace_link --pytorch-et-file /Users/theo/Downloads/llama_pytorch24.05/megatron_et_4.json --kineto-file /Users/theo/Downloads/llama_pytorch24.05/megatron_kineto_4.json --output-file ~/megatron_4.json &
chakra_trace_link --pytorch-et-file /Users/theo/Downloads/llama_pytorch24.05/megatron_et_5.json --kineto-file /Users/theo/Downloads/llama_pytorch24.05/megatron_kineto_5.json --output-file ~/megatron_5.json &
chakra_trace_link --pytorch-et-file /Users/theo/Downloads/llama_pytorch24.05/megatron_et_6.json --kineto-file /Users/theo/Downloads/llama_pytorch24.05/megatron_kineto_6.json --output-file ~/megatron_6.json &
chakra_trace_link --pytorch-et-file /Users/theo/Downloads/llama_pytorch24.05/megatron_et_7.json --kineto-file /Users/theo/Downloads/llama_pytorch24.05/megatron_kineto_7.json --output-file ~/megatron_7.json &

2. Run et_converter

chakra_converter --input_filename ~/megatron_0.json --output_filename megatron_0.chakra --input_type PyTorch > /tmp/rank_0 &
chakra_converter --input_filename ~/megatron_1.json --output_filename megatron_1.chakra --input_type PyTorch > /tmp/rank_1 &
chakra_converter --input_filename ~/megatron_2.json --output_filename megatron_2.chakra --input_type PyTorch > /tmp/rank_2 &
chakra_converter --input_filename ~/megatron_3.json --output_filename megatron_3.chakra --input_type PyTorch > /tmp/rank_3 &
chakra_converter --input_filename ~/megatron_4.json --output_filename megatron_4.chakra --input_type PyTorch > /tmp/rank_4 &
chakra_converter --input_filename ~/megatron_5.json --output_filename megatron_5.chakra --input_type PyTorch > /tmp/rank_5 &
chakra_converter --input_filename ~/megatron_6.json --output_filename megatron_6.chakra --input_type PyTorch > /tmp/rank_6 &
chakra_converter --input_filename ~/megatron_7.json --output_filename megatron_7.chakra --input_type PyTorch > /tmp/rank_7 &

3. Results
Screenshot 2024-05-08 at 7 42 27 PM

@TaekyungHeo TaekyungHeo requested a review from a team as a code owner May 8, 2024 23:55
@github-actions
Copy link

github-actions bot commented May 8, 2024

MLCommons CLA bot All contributors have signed the MLCommons CLA ✍️ ✅

@TaekyungHeo TaekyungHeo force-pushed the et-converter-bugfix branch 2 times, most recently from 319e63d to af4e466 Compare May 9, 2024 00:09
@TaekyungHeo TaekyungHeo changed the title Bugfix for et_converter Fix bugs in et_converter May 9, 2024
@TaekyungHeo TaekyungHeo force-pushed the et-converter-bugfix branch from af4e466 to 0d2c3c4 Compare May 9, 2024 00:12
Inter-stream dependencies result in bugs because of false dependencies,
especially when a single CPU operator issues multiple GPU operators.
@TaekyungHeo TaekyungHeo force-pushed the et-converter-bugfix branch from 0d2c3c4 to c06a14e Compare May 9, 2024 00:14
@srinivas212 srinivas212 merged commit f316efc into main May 9, 2024
@github-actions github-actions bot locked and limited conversation to collaborators May 9, 2024
@TaekyungHeo TaekyungHeo deleted the et-converter-bugfix branch May 9, 2024 18:28
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants