Skip to content

[cm4mlops] development plan #1023

@gfursin

Description

@gfursin

The feedback from the MLCommons TF on automation and reproducibility to extend CM workflows to support the following MLC projects:

  • check how to add network and multi-node code to MLPerf inference and CM automation (collaboration with MLC Network TF)

    • extend MLPerf inference with Flask code, gluing with our ref client/server code (Python and later C++) and CM wrapping
    • address suggestions from Nvidia
      • --network-server=IP1,IP2...
      • --network-client
  • continue improving unified CM interface to run MLPerf inference implementations from different vendors

    • Optimized MLPerf inference implementations
      • Intel submissions (see Intel docs)
        • Support installation of conda packages in CM
      • Qualcomm submission
        • Add CM scripts to preprocess, calibrate and compile QAIC models for ResNet50, RetinaNet and Bert
        • Test in AWS
        • Test on Thundercomm RB6
          • Automatic model installation from a host device
        • Automatic detection and usage of quantization parameters
      • Nvidia submission
      • Google submission
      • NeuralMagic submission
    • Add possibility to run any MLPerf implementation including ref
    • Add possibility to change target device (eg GeForce instead of A100)
    • Expose batch sizes from all existing MLPerf inference reference implementations (when applicable) in edge category in a unified way for ONNX, PyTorch and TF via the CM interface. Report implementations with hardwired batch size.
    • Request from Miro: improve MLPerf inference docs for various backends
  • Develop universal CM-MLPerf docker to run any implementation with local data set and model (similar to Nvidia and Intel but with a unified CM interface)

  • Prototype new universal CM workflow to run any app on any target (with C++/Android/SSH)

  • Add support for any ONNX+loadgen model testing with tuning (prototyped already)

  • Improve CM docs (basic CM message and tutorials/notes for "users" and "developers")

  • Update/improve a list of all reusable, portable and tech-agnostic CM-MLOps scripts

  • Improve CM logging (stdout and stderr)

  • Visualize CM script dependencies

  • Check other suggestions from student teams from SCC'23

  • Start adding FAQ/notes from Discord/GitHub discussions about CM-MLPerf

  • prototype/reuse above universal CM workflow with ABTF for

    • inference
      • support different targets (host, remove embedded, Android)
      • get all info about target
      • add Python and C++ code for loadgen with different backends (PyTorch, ONNX, TF, TFLite, QAIC)
      • add object detection with COCO and trained model from Rod (without accuracy for now)
      • connect with training CM workflow
    • training (https://github.com/mlcommons/abtf-ssd-pytorch)
      • present CM-MLPerf at Croissant TF and discuss possible collaboration (doc)
      • add CM script to get Croissant
      • add datasets via Croissant
      • train and save model in CM cache to be loaded to inference
      • test with Rod
    • present prototype progress in next ABTF meeting (Grigori)
  • unify experiment and visualization

    • prepare high-level meta to run the whole experiment
    • [ ]aggregate and visualize results
    • if MLPerf run is very short, we need to kind of calibrate it by multiplting N*10 for example similar to what I did in CK

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    Status

    No status

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions