Skip to content

Comments

merging#1

Merged
ksingla025 merged 61 commits intoWhissleAI:mainfrom
NVIDIA-NeMo:main
Jul 1, 2024
Merged

merging#1
ksingla025 merged 61 commits intoWhissleAI:mainfrom
NVIDIA-NeMo:main

Conversation

@ksingla025
Copy link

What does this PR do ?

Add a one line overview of what this PR aims to accomplish.

Collection: [Note which collection this PR will affect]

Changelog

  • Add specific line by line info of high level changes in this PR.

Usage

  • You can potentially add a usage example below
# Add a code snippet demonstrating how to use this 

GitHub Actions CI

The Jenkins CI system has been replaced by GitHub Actions self-hosted runners.

The GitHub Actions CI will run automatically when the "Run CICD" label is added to the PR.
To re-run CI remove and add the label again.
To run CI on an untrusted fork, a NeMo user with write access must first click "Approve and run".

Before your PR is "Ready for review"

Pre checks:

  • Make sure you read and followed Contributor guidelines
  • Did you write any new necessary tests?
  • Did you add or update any necessary documentation?
  • Does the PR affect components that are optional to install? (Ex: Numba, Pynini, Apex etc)
    • Reviewer: Does the PR have correct import guards for all optional libraries?

PR Type:

  • New Feature
  • Bugfix
  • Documentation

If you haven't finished some of the above items you can still open "Draft" PR.

Who can review?

Anyone in the community is free to review the PR once the checks have passed.
Contributor guidelines contains specific people who can review PRs to various areas.

Additional Information

  • Related to # (issue)

ashors1 and others added 30 commits June 13, 2024 10:38
* add nsys callback

* Apply isort and black reformatting

Signed-off-by: ashors1 <ashors1@users.noreply.github.com>

---------

Signed-off-by: ashors1 <ashors1@users.noreply.github.com>
Co-authored-by: ashors1 <ashors1@users.noreply.github.com>
Co-authored-by: Marc Romeyn <mromeijn@nvidia.com>
* Trying to add support for mcore

* Introducing OptimizerModule & LRSchedulerModule

* Apply isort and black reformatting

Signed-off-by: marcromeyn <marcromeyn@users.noreply.github.com>

* Remove some un-used code

* Make design more robust

* Trying to fix failing megatron_parallel tests

* Introducing OptimizerModule & LRSchedulerModule

* Removing un-used import

* Apply isort and black reformatting

Signed-off-by: marcromeyn <marcromeyn@users.noreply.github.com>

* Adding lr-schedulers

* Apply isort and black reformatting

Signed-off-by: marcromeyn <marcromeyn@users.noreply.github.com>

* Fix bug with setting finalize_model_grads

* Apply isort and black reformatting

Signed-off-by: marcromeyn <marcromeyn@users.noreply.github.com>

---------

Signed-off-by: marcromeyn <marcromeyn@users.noreply.github.com>
Co-authored-by: marcromeyn <marcromeyn@users.noreply.github.com>
Signed-off-by: Onur Yilmaz <oyilmaz@nvidia.com>
Signed-off-by: Oliver Koenig <okoenig@nvidia.com>
* Initial reference code commit, unchanged

Signed-off-by: Guy Jacob <guyj@nvidia.com>

* Hyena code changes for NeMO compatibility

Signed-off-by: Guy Jacob <guyj@nvidia.com>

* MCore spec override functionality + example config w. hyena

Signed-off-by: Guy Jacob <guyj@nvidia.com>

* Additional changes - now working on char-level TinyShakespeare

* Add missing input LayerNorm to spec (in the default attention
  spec it's fused with the projection Linear layer, so not
  explicitly defined)
* Shape conversion at start and end of Hyena forward

Signed-off-by: Guy Jacob <guyj@nvidia.com>

* Add fftconv cuda impl from safari

Signed-off-by: Guy Jacob <guyj@nvidia.com>

* Workaround for shape error in fftconv

See: HazyResearch/safari#26 (comment)
Signed-off-by: Guy Jacob <guyj@nvidia.com>

* Explicitly convert kernel to FP32

(torch.fft doesn't support bf16)

Signed-off-by: Guy Jacob <guyj@nvidia.com>

* Working run configs

Signed-off-by: Guy Jacob <guyj@nvidia.com>

* Remove sharded_state_dict from HyenaOperator

(made redundant by the default inmplementation in Megatron)

Signed-off-by: Guy Jacob <guyj@nvidia.com>

* Update configs

Signed-off-by: Guy Jacob <guyj@nvidia.com>

* Testing TE Linear classes in HyenaOperator

Signed-off-by: Guy Jacob <guyj@nvidia.com>

* Revert to FusedDense for in/out projections after merging with 24.01.01

Signed-off-by: Guy Jacob <guyj@nvidia.com>

* Fix bug (use fused LNorm+Linear), bring back TE layers

Signed-off-by: Guy Jacob <guyj@nvidia.com>

* Configs rename + cleanup

Signed-off-by: Guy Jacob <guyj@nvidia.com>

* FlashFFTConv, Multi-head, some cleanup

Signed-off-by: Guy Jacob <guyj@nvidia.com>

* Bug fix - init FlashFFTConv with 2*seq_len

Signed-off-by: Guy Jacob <guyj@nvidia.com>

* ModuleSpec + replace nn.Conv1d with causal_conv1d

Signed-off-by: Guy Jacob <guyj@nvidia.com>

* Remove unneeded arguments

Signed-off-by: Guy Jacob <guyj@nvidia.com>

* More cleanup, remove fftconv ref functions

Signed-off-by: Guy Jacob <guyj@nvidia.com>

* Refactor HyenaFilter + more cleanup

* Refactor in spirit of implementation in MAD-Lab repo:
  https://github.com/athms/mad-lab/blob/main/mad/model/layers/hyena.py

Signed-off-by: Guy Jacob <guyj@nvidia.com>

* Add missing attributions

Signed-off-by: Guy Jacob <guyj@nvidia.com>

* Remove fftconv sources

Signed-off-by: Guy Jacob <guyj@nvidia.com>

* Bug fixes

Signed-off-by: Guy Jacob <guyj@nvidia.com>

* Remove d_model from external API, take from TransformerConfig

Signed-off-by: Guy Jacob <guyj@nvidia.com>

* cleanup config

Signed-off-by: Guy Jacob <guyj@nvidia.com>

* Remove spec override logic (possibly push separately)

Signed-off-by: Guy Jacob <guyj@nvidia.com>

* Add tests

Signed-off-by: Guy Jacob <guyj@nvidia.com>

* Keep only megatron_gpt_config_hyena (w. 153m parameters)

Signed-off-by: Guy Jacob <guyj@nvidia.com>

* Black + isort formatting changes

Signed-off-by: Guy Jacob <guyj@nvidia.com>

* Fixes following PR review

* Clearer names + more documentation for config params
* Clearer README
* Check seq len < 8K with safari-fftconv
* Avoid 0*bias op during forward

Signed-off-by: Guy Jacob <guyj@nvidia.com>

* Fix tests following param name changes

Signed-off-by: Guy Jacob <guyj@nvidia.com>

---------

Signed-off-by: Guy Jacob <guyj@nvidia.com>
* Update build_dataset.py

fix bug during eval

Signed-off-by: He Huang (Steve) <105218074+stevehuang52@users.noreply.github.com>

* Update build_dataset.py

Signed-off-by: He Huang (Steve) <105218074+stevehuang52@users.noreply.github.com>

* Update build_dataset.py

Signed-off-by: He Huang (Steve) <105218074+stevehuang52@users.noreply.github.com>

* Update build_dataset.py

Signed-off-by: He Huang (Steve) <105218074+stevehuang52@users.noreply.github.com>

* Apply isort and black reformatting

Signed-off-by: stevehuang52 <stevehuang52@users.noreply.github.com>

---------

Signed-off-by: He Huang (Steve) <105218074+stevehuang52@users.noreply.github.com>
Signed-off-by: stevehuang52 <stevehuang52@users.noreply.github.com>
Co-authored-by: stevehuang52 <stevehuang52@users.noreply.github.com>
Signed-off-by: smajumdar <titu1994@gmail.com>
* Refactor Quantizer for reusing in QAT

Signed-off-by: Keval Morabia <28916987+kevalmorabia97@users.noreply.github.com>

* Address more reviewer comments

Signed-off-by: Keval Morabia <28916987+kevalmorabia97@users.noreply.github.com>

* update yaml config

Signed-off-by: Keval Morabia <28916987+kevalmorabia97@users.noreply.github.com>

---------

Signed-off-by: Keval Morabia <28916987+kevalmorabia97@users.noreply.github.com>
* added pipeline_dtype for pipeline parallelism to megatron strategy and parallelism calls

* fix typos

* Apply isort and black reformatting

Signed-off-by: skothenhill-nv <skothenhill-nv@users.noreply.github.com>

---------

Signed-off-by: skothenhill-nv <skothenhill-nv@users.noreply.github.com>
Co-authored-by: skothenhill-nv <skothenhill-nv@users.noreply.github.com>
)

* [WIP] move experiement manager features into PTL

* cleanup and minor refactoring

* add async checkpointing support, some cleanup of modelcheckpoint and setup_nemo

* more cleanup

* cleanup, reorganization, minor debugging

* Apply isort and black reformatting

Signed-off-by: ashors1 <ashors1@users.noreply.github.com>

* Proposal to have AutoResume & Experiment

* Apply isort and black reformatting

Signed-off-by: marcromeyn <marcromeyn@users.noreply.github.com>

* small fix

* small bug fixes and cleanup

* Apply isort and black reformatting

Signed-off-by: ashors1 <ashors1@users.noreply.github.com>

* remove async checkpointing support. Support will be added in a subsequent PR

* Apply isort and black reformatting

Signed-off-by: ashors1 <ashors1@users.noreply.github.com>

* remove unneeded import

* bug fix

* remove deprecated prefix

* rename Experiment to NeMoLogger

* add option to instantiate model checkpoint callback inside of nemo_logger setup

* Apply isort and black reformatting

Signed-off-by: ashors1 <ashors1@users.noreply.github.com>

* Proposal to move ModelCheckpoint into NeMoLogger

* Apply isort and black reformatting

Signed-off-by: marcromeyn <marcromeyn@users.noreply.github.com>

* minor fixes

* fix merge conflict

* Apply isort and black reformatting

Signed-off-by: ashors1 <ashors1@users.noreply.github.com>

* remove unused imports

---------

Signed-off-by: ashors1 <ashors1@users.noreply.github.com>
Signed-off-by: marcromeyn <marcromeyn@users.noreply.github.com>
Co-authored-by: ashors1 <ashors1@users.noreply.github.com>
Co-authored-by: Marc Romeyn <mromeijn@nvidia.com>
Co-authored-by: marcromeyn <marcromeyn@users.noreply.github.com>
* Add S3 dirpath and asynchronous uploading support for basic checkpointing

Signed-off-by: Alexander Zhang <alxzhang@amazon.com>

* Update megtron_gpt_pretraining config to support S3 checkpointing

Signed-off-by: Alexander Zhang <alxzhang@amazon.com>

* Removed unused imports

Signed-off-by: Alexander Zhang <alxzhang@amazon.com>

* move s3_checkpoint_io into callbacks. consolidate checkpoint_file_utils into s3_utils.py

Signed-off-by: Alexander Zhang <alxzhang@amazon.com>

* Update setup() in nemo_model_checkpoint to broadcast checkpoint path and work with upstreamed implementation of removing unfinished checkpoints

Signed-off-by: Alexander Zhang <alxzhang@amazon.com>

* Add boto3 dependency for testing

Signed-off-by: Alexander Zhang <alxzhang@amazon.com>

* Remove redundant setup() in nemo_model_checkpoint

Signed-off-by: Alexander Zhang <alxzhang@amazon.com>

* Remove comment line from import

Signed-off-by: Alexander Zhang <alxzhang@amazon.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Removed explicit CRT calls since boto[crt] automatically uses CRT for file upload and download

Signed-off-by: Alexander Zhang <alxzhang@amazon.com>

* Style fix

Signed-off-by: Alexander Zhang <alxzhang@amazon.com>

* remove un-used s3transfer import

Signed-off-by: Alexander Zhang <alxzhang@amazon.com>

* add s3 prefix for s3-related checkpointing config

Signed-off-by: Alexander Zhang <alxzhang@amazon.com>

* dummy sleep function lowered from 1 to 0.01 seconds

Signed-off-by: Alexander Zhang <alxzhang@amazon.com>

* Remove local_rank checking for rank, and use is_global_rank_zero.

Signed-off-by: Alexander Zhang <alxzhang@amazon.com>

* Style fix

Signed-off-by: Alexander Zhang <alxzhang@amazon.com>

* Apply isort and black reformatting

Signed-off-by: alxzhang-amazon <alxzhang-amazon@users.noreply.github.com>

* add tenacity dependency

Signed-off-by: Alexander Zhang <alxzhang@amazon.com>

* Apply isort and black reformatting

Signed-off-by: alxzhang-amazon <alxzhang-amazon@users.noreply.github.com>

* Add filtering of unfinished checkpoint to non-s3 checkpoint resuming

Signed-off-by: Alexander Zhang <alxzhang@amazon.com>

* isort black reformatting

Signed-off-by: Alexander Zhang <alxzhang@amazon.com>

* Apply isort and black reformatting

Signed-off-by: alxzhang-amazon <alxzhang-amazon@users.noreply.github.com>

* Remove dependency requirement for checking if dirpath is an s3 path

Signed-off-by: Alexander Zhang <alxzhang@amazon.com>

* Make dependencies fully optional; allow exp_manager to optionally import S3Utils depending on whether dirpath is an S3 address or not

Signed-off-by: Alexander Zhang <alxzhang@amazon.com>

* Add rst doc for s3 checkpointing

Signed-off-by: Alexander Zhang <alxzhang@amazon.com>

* Remove unneeded assert

Signed-off-by: Alexander Zhang <alxzhang@amazon.com>

* Removed dependencies

Signed-off-by: Alexander Zhang <alxzhang@amazon.com>

* Apply isort and black reformatting

Signed-off-by: alxzhang-amazon <alxzhang-amazon@users.noreply.github.com>

* Updated documentation on async save to S3

Signed-off-by: Alexander Zhang <alxzhang@amazon.com>

* Apply isort and black reformatting

Signed-off-by: alxzhang-amazon <alxzhang-amazon@users.noreply.github.com>

* Update S3 checkpointing doc and fix visibility on website. Update the nlp_overrides DDP initializer to properly assign updated checkpoint io to base class.

Signed-off-by: Alexander Zhang <alxzhang@amazon.com>

* Apply isort and black reformatting

Signed-off-by: alxzhang-amazon <alxzhang-amazon@users.noreply.github.com>

* Slight fix in s3 checkpoint doc

Signed-off-by: Alexander Zhang <alxzhang@amazon.com>

---------

Signed-off-by: Alexander Zhang <alxzhang@amazon.com>
Signed-off-by: alxzhang-amazon <166076199+alxzhang-amazon@users.noreply.github.com>
Signed-off-by: alxzhang-amazon <alxzhang-amazon@users.noreply.github.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: alxzhang-amazon <alxzhang-amazon@users.noreply.github.com>
)

* move load state dict after initialize parallel state

Signed-off-by: Ryan Li <rynli@amazon.com>

* delay sharded_state_dict in save_to

Signed-off-by: Ryan Li <rynli@amazon.com>

---------

Signed-off-by: Ryan Li <rynli@amazon.com>
Co-authored-by: Ryan Li <rynli@amazon.com>
* Add python_requires

Prevents people from getting unexpected syntax errors when they
install on a python version too old.

Signed-off-by: Daniel Galvez <dgalvez@nvidia.com>

* Apply isort and black reformatting

Signed-off-by: galv <galv@users.noreply.github.com>

---------

Signed-off-by: Daniel Galvez <dgalvez@nvidia.com>
Signed-off-by: galv <galv@users.noreply.github.com>
Co-authored-by: Somshubra Majumdar <titu1994@gmail.com>
* Enable user to optionally upgrade megatron

* restore missing args for the older version of megatron

* Apply isort and black reformatting

Signed-off-by: jstjohn <jstjohn@users.noreply.github.com>

---------

Signed-off-by: jstjohn <jstjohn@users.noreply.github.com>
Co-authored-by: Marc Romeyn <mromeijn@nvidia.com>
…#9476)

* Fixing imports of NeMoLogging, AutoResume & ModelCheckpoint

* Apply isort and black reformatting

Signed-off-by: marcromeyn <marcromeyn@users.noreply.github.com>

---------

Signed-off-by: marcromeyn <marcromeyn@users.noreply.github.com>
Co-authored-by: marcromeyn <marcromeyn@users.noreply.github.com>
* modelopt refactor

* refactor all ammo occurrences to modelopt

* Apply isort and black reformatting

Signed-off-by: suiyoubi <suiyoubi@users.noreply.github.com>

* rename atq->mtq ato->mto

---------

Signed-off-by: suiyoubi <suiyoubi@users.noreply.github.com>
Co-authored-by: suiyoubi <suiyoubi@users.noreply.github.com>
* Fixing defaults in llm.train & Mistral7BModel

* Apply isort and black reformatting

Signed-off-by: marcromeyn <marcromeyn@users.noreply.github.com>

* Fix calling super.init inside Mistral7BModel

* Apply isort and black reformatting

Signed-off-by: marcromeyn <marcromeyn@users.noreply.github.com>

* Remove fit_kwargs from llm.train

* Fix bugs in lr-schedules

* Apply isort and black reformatting

Signed-off-by: marcromeyn <marcromeyn@users.noreply.github.com>

* Only pass first optimizer when there's 1

* Apply isort and black reformatting

Signed-off-by: marcromeyn <marcromeyn@users.noreply.github.com>

* Adding zero_grad to training_step

* Apply isort and black reformatting

Signed-off-by: marcromeyn <marcromeyn@users.noreply.github.com>

* Fix bugs in OptimizerModule

* Apply isort and black reformatting

Signed-off-by: marcromeyn <marcromeyn@users.noreply.github.com>

* Fix bugs in OptimizerModule

* Expose ModelCheckpoint in nemo.lightning

* Apply isort and black reformatting

Signed-off-by: marcromeyn <marcromeyn@users.noreply.github.com>

---------

Signed-off-by: marcromeyn <marcromeyn@users.noreply.github.com>
Co-authored-by: marcromeyn <marcromeyn@users.noreply.github.com>
* fix minor import bug

Signed-off-by: Onur Yilmaz <oyilmaz@nvidia.com>

* deploy in-framework model with script

* make query_llm work with in framework models

Signed-off-by: Onur Yilmaz <oyilmaz@nvidia.com>

* added in framework test

Signed-off-by: Onur Yilmaz <oyilmaz@nvidia.com>

* Apply isort and black reformatting

Signed-off-by: oyilmaz-nvidia <oyilmaz-nvidia@users.noreply.github.com>

* Apply isort and black reformatting

Signed-off-by: artbataev <artbataev@users.noreply.github.com>

* fix codeql issues

Signed-off-by: Onur Yilmaz <oyilmaz@nvidia.com>

* Apply isort and black reformatting

Signed-off-by: oyilmaz-nvidia <oyilmaz-nvidia@users.noreply.github.com>

* rename test filename to avoid nemo ci issues

Signed-off-by: Onur Yilmaz <oyilmaz@nvidia.com>

---------

Signed-off-by: Onur Yilmaz <oyilmaz@nvidia.com>
Signed-off-by: oyilmaz-nvidia <oyilmaz-nvidia@users.noreply.github.com>
Signed-off-by: artbataev <artbataev@users.noreply.github.com>
Co-authored-by: oyilmaz-nvidia <oyilmaz-nvidia@users.noreply.github.com>
Co-authored-by: artbataev <artbataev@users.noreply.github.com>
* Integrate tokenizer import into model.import_ckpt

* Apply isort and black reformatting

Signed-off-by: marcromeyn <marcromeyn@users.noreply.github.com>

* Apply isort and black reformatting

Signed-off-by: marcromeyn <marcromeyn@users.noreply.github.com>

* Fixing bug in ModelConnector.nemo_save

* Apply isort and black reformatting

Signed-off-by: marcromeyn <marcromeyn@users.noreply.github.com>

* Default to ddp=pytorch inside ModelConnector

* Apply isort and black reformatting

Signed-off-by: marcromeyn <marcromeyn@users.noreply.github.com>

---------

Signed-off-by: marcromeyn <marcromeyn@users.noreply.github.com>
Co-authored-by: marcromeyn <marcromeyn@users.noreply.github.com>
* fix unwrap model

Signed-off-by: Chen Cui <chcui@nvidia.com>

* add O2 to ci test

Signed-off-by: Chen Cui <chcui@nvidia.com>

* fix ci test

Signed-off-by: Chen Cui <chcui@nvidia.com>

* fix ci test

Signed-off-by: Chen Cui <chcui@nvidia.com>

* fix ci test

Signed-off-by: Chen Cui <chcui@nvidia.com>

---------

Signed-off-by: Chen Cui <chcui@nvidia.com>
Co-authored-by: Malay Nagda <malayn@malayn-mlt.client.nvidia.com>
Co-authored-by: Somshubra Majumdar <titu1994@gmail.com>
* add nemotron news

Signed-off-by: eharper <eharper@nvidia.com>

* add nemotron news

Signed-off-by: eharper <eharper@nvidia.com>

---------

Signed-off-by: eharper <eharper@nvidia.com>
Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>
* Add CICD test for Stable Diffusion

Signed-off-by: Michal Futrega <mfutrega@nvidia.com>

* Update cicd-main.yml

Signed-off-by: Michal Futrega <mfutrega@nvidia.com>

* Use single gpu runner

Signed-off-by: Michal Futrega <mfutrega@nvidia.com>

---------

Signed-off-by: Michal Futrega <mfutrega@nvidia.com>
* use default collate if dataset does not have one

Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>

* mixtral config

Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>

* add convert_state

Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>

* fix StateDictTransform for 2D layers, e.g. MoE

Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>

* pass num_moe_experts to specs

Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>

* udpate MixtralModel

Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>

* Apply isort and black reformatting

Signed-off-by: akoumpa <akoumpa@users.noreply.github.com>
Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>

* mini docstring

Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>

* Apply isort and black reformatting

Signed-off-by: akoumpa <akoumpa@users.noreply.github.com>

---------

Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>
Signed-off-by: akoumpa <akoumpa@users.noreply.github.com>
Co-authored-by: akoumpa <akoumpa@users.noreply.github.com>
* update mcoreddp call

Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>

* update mcore commits

Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>

---------

Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>
Co-authored-by: Pablo Garay <palenq@gmail.com>
* add llama

Signed-off-by: Chen Cui <chcui@nvidia.com>

* Apply isort and black reformatting

Signed-off-by: cuichenx <cuichenx@users.noreply.github.com>

* add llama

Signed-off-by: Chen Cui <chcui@nvidia.com>

* Apply isort and black reformatting

Signed-off-by: cuichenx <cuichenx@users.noreply.github.com>

* add llama3

Signed-off-by: Chen Cui <chcui@nvidia.com>

* Apply isort and black reformatting

Signed-off-by: cuichenx <cuichenx@users.noreply.github.com>

* fix typo

Signed-off-by: Chen Cui <chcui@nvidia.com>

* enable importers with multiple models

Signed-off-by: Chen Cui <chcui@nvidia.com>

* Apply isort and black reformatting

Signed-off-by: cuichenx <cuichenx@users.noreply.github.com>

* add gemma

Signed-off-by: Chen Cui <chcui@nvidia.com>

* Apply isort and black reformatting

Signed-off-by: cuichenx <cuichenx@users.noreply.github.com>

* checks

Signed-off-by: Chen Cui <chcui@nvidia.com>

* Apply isort and black reformatting

Signed-off-by: cuichenx <cuichenx@users.noreply.github.com>

---------

Signed-off-by: Chen Cui <chcui@nvidia.com>
Signed-off-by: cuichenx <cuichenx@users.noreply.github.com>
Co-authored-by: cuichenx <cuichenx@users.noreply.github.com>
Co-authored-by: Marc Romeyn <mromeijn@nvidia.com>
pzelasko and others added 21 commits June 26, 2024 12:29
* Fix lhotse tests for v1.24.0

Signed-off-by: Piotr Żelasko <petezor@gmail.com>

* Fix RIR test

Signed-off-by: Piotr Żelasko <petezor@gmail.com>

---------

Signed-off-by: Piotr Żelasko <petezor@gmail.com>
* add reset_lr functionality

Signed-off-by: dimapihtar <dpihtar@gmail.com>

* fix reset_lr logic

Signed-off-by: dimapihtar <dpihtar@gmail.com>

* Apply isort and black reformatting

Signed-off-by: dimapihtar <dimapihtar@users.noreply.github.com>

* move reset_lr from optim section

Signed-off-by: dimapihtar <dpihtar@gmail.com>

* Apply isort and black reformatting

Signed-off-by: dimapihtar <dimapihtar@users.noreply.github.com>

* add reset_lr value to config

Signed-off-by: dimapihtar <dpihtar@gmail.com>

* set reset_lr False by default

Signed-off-by: dimapihtar <dpihtar@gmail.com>

* remove extra line

Signed-off-by: dimapihtar <dpihtar@gmail.com>

* add reset_lr test

Signed-off-by: dimapihtar <dpihtar@gmail.com>

* add reset_lr test

Signed-off-by: dimapihtar <dpihtar@gmail.com>

* remove extra quote

Signed-off-by: dimapihtar <dpihtar@gmail.com>

* add ability to reset schedule's max_steps and decay_steps

Signed-off-by: dimapihtar <dpihtar@gmail.com>

* Apply isort and black reformatting

Signed-off-by: dimapihtar <dimapihtar@users.noreply.github.com>

* change scheduler's first step logic when using reset_lr

Signed-off-by: dimapihtar <dpihtar@gmail.com>

* revert config

Signed-off-by: dimapihtar <dpihtar@gmail.com>

* fix reset_lr logic

Signed-off-by: dimapihtar <dpihtar@gmail.com>

* Apply isort and black reformatting

Signed-off-by: dimapihtar <dimapihtar@users.noreply.github.com>

* revert config

Signed-off-by: dimapihtar <dpihtar@gmail.com>

* revert config

Signed-off-by: dimapihtar <dpihtar@gmail.com>

* update reset_lr comments

Signed-off-by: dimapihtar <dpihtar@gmail.com>

* add use cases for reset_lr feature

Signed-off-by: dimapihtar <dpihtar@gmail.com>

---------

Signed-off-by: dimapihtar <dpihtar@gmail.com>
Signed-off-by: dimapihtar <dimapihtar@users.noreply.github.com>
Co-authored-by: dimapihtar <dimapihtar@users.noreply.github.com>
* Add Python AIStore SDK to requirements and bump min Lhotse version

Signed-off-by: Piotr Żelasko <petezor@gmail.com>

* Move AIStore Python SDK to Dockerfile, remove matplotlib/ipywidgets deps

Signed-off-by: Piotr Żelasko <petezor@gmail.com>

---------

Signed-off-by: Piotr Żelasko <petezor@gmail.com>
…tead of onnx.export() (#9147)

* Ininial WARs to implement dynamo option for export

Signed-off-by: Boris Fomitchev <bfomitchev@nvidia.com>

* including weights in .onnx

Signed-off-by: Boris Fomitchev <bfomitchev@nvidia.com>

* dynamo_export works for many small models

Signed-off-by: Boris Fomitchev <bfomitchev@nvidia.com>

* External weights behaviour fixed

Signed-off-by: Boris Fomitchev <bfomitchev@nvidia.com>

* Cleanup

Signed-off-by: Boris Fomitchev <bfomitchev@nvidia.com>

* Apply isort and black reformatting

Signed-off-by: borisfom <borisfom@users.noreply.github.com>

* print cleaned up

Signed-off-by: Boris Fomitchev <bfomitchev@nvidia.com>

* Added overloadable dynamic_shapes_for_export

Signed-off-by: Boris Fomitchev <bfomitchev@nvidia.com>

* Addressing code review

Signed-off-by: Boris Fomitchev <bfomitchev@nvidia.com>

* Fixing CI issues

Signed-off-by: Boris Fomitchev <bfomitchev@nvidia.com>

* Fixing CI test failure

Signed-off-by: Boris Fomitchev <bfomitchev@nvidia.com>

* Eliminated test cross-contamination

Signed-off-by: Boris Fomitchev <bfomitchev@nvidia.com>

---------

Signed-off-by: Boris Fomitchev <bfomitchev@nvidia.com>
Signed-off-by: borisfom <borisfom@users.noreply.github.com>
Co-authored-by: Eric Harper <complex451@gmail.com>
Co-authored-by: Somshubra Majumdar <titu1994@gmail.com>
* Adding tokenizer to io-test + making it pass

* Handling tokenizer correctly inside dump_io

* Apply isort and black reformatting

Signed-off-by: marcromeyn <marcromeyn@users.noreply.github.com>

* Removing not used import

---------

Signed-off-by: marcromeyn <marcromeyn@users.noreply.github.com>
Co-authored-by: marcromeyn <marcromeyn@users.noreply.github.com>
* Move mistral_7b.py to mistral.py

Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>

* rename MixtralConfig to MixtralConfig8x7B

Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>

* mistral rename: mistralconfig7b & mistralmodel

Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>

* fix

Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>

---------

Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>
* Use closed-formula to round by multiple

Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>

* Apply isort and black reformatting

Signed-off-by: akoumpa <akoumpa@users.noreply.github.com>

---------

Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>
Signed-off-by: akoumpa <akoumpa@users.noreply.github.com>
Co-authored-by: akoumpa <akoumpa@users.noreply.github.com>
Co-authored-by: Pablo Garay <palenq@gmail.com>
* ci: Do not attempt to send slack on fork

Signed-off-by: Oliver Koenig <okoenig@nvidia.com>

* test

Signed-off-by: Oliver Koenig <okoenig@nvidia.com>

---------

Signed-off-by: Oliver Koenig <okoenig@nvidia.com>
* fix minor import bug

Signed-off-by: Onur Yilmaz <oyilmaz@nvidia.com>

* fix export test

Signed-off-by: Onur Yilmaz <oyilmaz@nvidia.com>

* Apply isort and black reformatting

Signed-off-by: oyilmaz-nvidia <oyilmaz-nvidia@users.noreply.github.com>

---------

Signed-off-by: Onur Yilmaz <oyilmaz@nvidia.com>
Signed-off-by: oyilmaz-nvidia <oyilmaz-nvidia@users.noreply.github.com>
Co-authored-by: oyilmaz-nvidia <oyilmaz-nvidia@users.noreply.github.com>
Co-authored-by: Pablo Garay <palenq@gmail.com>
Signed-off-by: Abhishree <abhishreetm@gmail.com>
* Initial straggler det impl

Signed-off-by: Jacek Bieniusiewicz <jbieniusiewi@nvidia.com>

* Apply isort and black reformatting

Signed-off-by: jbieniusiewi <jbieniusiewi@users.noreply.github.com>

* Fixed CI code checks

Signed-off-by: Jacek Bieniusiewicz <jbieniusiewi@nvidia.com>

* Apply isort and black reformatting

Signed-off-by: jbieniusiewi <jbieniusiewi@users.noreply.github.com>

* Removed unused import

Signed-off-by: Jacek Bieniusiewicz <jbieniusiewi@nvidia.com>

* remove submodule

Signed-off-by: Maanu Grover <maanug@nvidia.com>

* Updated documentation; Updated callback params; Cosmetic changes

Signed-off-by: Jacek Bieniusiewicz <jbieniusiewi@nvidia.com>

* Apply isort and black reformatting

Signed-off-by: jbieniusiewi <jbieniusiewi@users.noreply.github.com>

* Fixed straggler det config; Added basic test

Signed-off-by: Jacek Bieniusiewicz <jbieniusiewi@nvidia.com>

* Apply isort and black reformatting

Signed-off-by: jbieniusiewi <jbieniusiewi@users.noreply.github.com>

* Fixes in test_straggler_det.py

Signed-off-by: Jacek Bieniusiewicz <jbieniusiewi@nvidia.com>

* Updated straggler callback API

Signed-off-by: Jacek Bieniusiewicz <jbieniusiewi@nvidia.com>

* Apply isort and black reformatting

Signed-off-by: jbieniusiewi <jbieniusiewi@users.noreply.github.com>

* stop_if_detected=False by default

Signed-off-by: Jacek Bieniusiewicz <jbieniusiewi@nvidia.com>

---------

Signed-off-by: Jacek Bieniusiewicz <jbieniusiewi@nvidia.com>
Signed-off-by: jbieniusiewi <jbieniusiewi@users.noreply.github.com>
Signed-off-by: Maanu Grover <maanug@nvidia.com>
Co-authored-by: jbieniusiewi <jbieniusiewi@users.noreply.github.com>
Co-authored-by: Maanu Grover <maanug@nvidia.com>
Signed-off-by: ashors1 <ashors@nvidia.com>
Co-authored-by: Marc Romeyn <mromeijn@nvidia.com>
* fix checkpoint loading

* fix

* fixes

* another fix

* Apply isort and black reformatting

Signed-off-by: ashors1 <ashors1@users.noreply.github.com>

---------

Signed-off-by: ashors1 <ashors1@users.noreply.github.com>
Co-authored-by: ashors1 <ashors1@users.noreply.github.com>
Co-authored-by: Marc Romeyn <mromeijn@nvidia.com>
* fix minor import bug

Signed-off-by: Onur Yilmaz <oyilmaz@nvidia.com>

* fix export test

Signed-off-by: Onur Yilmaz <oyilmaz@nvidia.com>

* Apply isort and black reformatting

Signed-off-by: oyilmaz-nvidia <oyilmaz-nvidia@users.noreply.github.com>

* remove n_gpus param

Signed-off-by: Onur Yilmaz <oyilmaz@nvidia.com>

* add and fix parameters

Signed-off-by: Onur Yilmaz <oyilmaz@nvidia.com>

* fix deploy script

Signed-off-by: Onur Yilmaz <oyilmaz@nvidia.com>

* Apply isort and black reformatting

Signed-off-by: oyilmaz-nvidia <oyilmaz-nvidia@users.noreply.github.com>

* rename tps and pps params

Signed-off-by: Onur Yilmaz <oyilmaz@nvidia.com>

---------

Signed-off-by: Onur Yilmaz <oyilmaz@nvidia.com>
Signed-off-by: oyilmaz-nvidia <oyilmaz-nvidia@users.noreply.github.com>
Co-authored-by: oyilmaz-nvidia <oyilmaz-nvidia@users.noreply.github.com>
* Consolidate gpt continue training with pretraining

Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com>

* Apply isort and black reformatting

Signed-off-by: yaoyu-33 <yaoyu-33@users.noreply.github.com>

* fix default config

Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com>

* Add github action cicd

Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com>

* extract _integrate_original_checkpoint_data as a method

Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com>

* Apply isort and black reformatting

Signed-off-by: yaoyu-33 <yaoyu-33@users.noreply.github.com>

* fix getattr

Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com>

* Revert "Add github action cicd"

This reverts commit a453f16.

* Update comments in nlp_overrides.py

Signed-off-by: yaoyu-33 <54727607+yaoyu-33@users.noreply.github.com>

---------

Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com>
Signed-off-by: yaoyu-33 <yaoyu-33@users.noreply.github.com>
Signed-off-by: yaoyu-33 <54727607+yaoyu-33@users.noreply.github.com>
Co-authored-by: yaoyu-33 <yaoyu-33@users.noreply.github.com>
* Add support to change Multi task model prompt

Signed-off-by: smajumdar <titu1994@gmail.com>

* Add support to change Multi task model prompt

Signed-off-by: smajumdar <titu1994@gmail.com>

* Apply isort and black reformatting

Signed-off-by: titu1994 <titu1994@users.noreply.github.com>

* Update nemo/collections/common/prompts/formatter.py

Co-authored-by: Piotr Żelasko <petezor@gmail.com>
Signed-off-by: Somshubra Majumdar <titu1994@gmail.com>

* Address comments

Signed-off-by: smajumdar <titu1994@gmail.com>

* Apply isort and black reformatting

Signed-off-by: titu1994 <titu1994@users.noreply.github.com>

* Address comments

Signed-off-by: smajumdar <titu1994@gmail.com>

---------

Signed-off-by: smajumdar <titu1994@gmail.com>
Signed-off-by: titu1994 <titu1994@users.noreply.github.com>
Signed-off-by: Somshubra Majumdar <titu1994@gmail.com>
Co-authored-by: Piotr Żelasko <petezor@gmail.com>
* Add video-neva TRT export

* Add TRT inference

* Change config

* Apply isort and black reformatting

Signed-off-by: meatybobby <meatybobby@users.noreply.github.com>

* Change export params

* Remove unused import

* Add neva export

* Apply isort and black reformatting

Signed-off-by: meatybobby <meatybobby@users.noreply.github.com>

* Change unpack nemo

* Apply isort and black reformatting

Signed-off-by: meatybobby <meatybobby@users.noreply.github.com>

* Add trt infer config

* Fix neva trt inference

* Apply isort and black reformatting

Signed-off-by: meatybobby <meatybobby@users.noreply.github.com>

* Add exporter

* Apply isort and black reformatting

Signed-off-by: meatybobby <meatybobby@users.noreply.github.com>

* Fix infer

* Add PyTriton

* Apply isort and black reformatting

Signed-off-by: meatybobby <meatybobby@users.noreply.github.com>

* Fix deploy wrong dim

* Apply isort and black reformatting

Signed-off-by: meatybobby <meatybobby@users.noreply.github.com>

* Change to pass PIL Image

* Apply isort and black reformatting

Signed-off-by: meatybobby <meatybobby@users.noreply.github.com>

* Fix video neva deploy

* Change query

* Change deploy

* Remove unused import

* Change ptuning

* Change to mm exporter

* Add script

* Apply isort and black reformatting

Signed-off-by: meatybobby <meatybobby@users.noreply.github.com>

* Fix script

---------

Signed-off-by: meatybobby <meatybobby@users.noreply.github.com>
Co-authored-by: meatybobby <meatybobby@users.noreply.github.com>
* Fix assertions for adapter types

Signed-off-by: smajumdar <titu1994@gmail.com>

* Apply isort and black reformatting

Signed-off-by: titu1994 <titu1994@users.noreply.github.com>

* Cleanup

Signed-off-by: smajumdar <titu1994@gmail.com>

* Apply isort and black reformatting

Signed-off-by: titu1994 <titu1994@users.noreply.github.com>

* Finalize support for decoder adapters

Signed-off-by: smajumdar <titu1994@gmail.com>

* Apply isort and black reformatting

Signed-off-by: titu1994 <titu1994@users.noreply.github.com>

* fix the freeze/unfreeze problem by replacing as_frozen with torch.inference_mode

* Apply isort and black reformatting

Signed-off-by: weiqingw4ng <weiqingw4ng@users.noreply.github.com>

* Update tests to new generic way of module update

Signed-off-by: smajumdar <titu1994@gmail.com>

* Finalize code for update module

Signed-off-by: smajumdar <titu1994@gmail.com>

* Apply isort and black reformatting

Signed-off-by: titu1994 <titu1994@users.noreply.github.com>

* Fix variable name

Signed-off-by: smajumdar <titu1994@gmail.com>

* Finalize projection support for transformer mha adapters

Signed-off-by: smajumdar <titu1994@gmail.com>

* Apply isort and black reformatting

Signed-off-by: titu1994 <titu1994@users.noreply.github.com>

* Correct implementation of freeze restore

Signed-off-by: smajumdar <titu1994@gmail.com>

* Apply isort and black reformatting

Signed-off-by: titu1994 <titu1994@users.noreply.github.com>

* Corrects the implementation of replace_adapter_modules to limit to just the top level modules

Signed-off-by: smajumdar <titu1994@gmail.com>

* Apply isort and black reformatting

Signed-off-by: titu1994 <titu1994@users.noreply.github.com>

* Remove registration of Transformer MHA

Signed-off-by: smajumdar <titu1994@gmail.com>

* Remove registration of Transformer MHA

Signed-off-by: smajumdar <titu1994@gmail.com>

* Address reviewer comments

Signed-off-by: smajumdar <titu1994@gmail.com>

---------

Signed-off-by: smajumdar <titu1994@gmail.com>
Signed-off-by: titu1994 <titu1994@users.noreply.github.com>
Signed-off-by: weiqingw4ng <weiqingw4ng@users.noreply.github.com>
Co-authored-by: Weiqing Wang <weiqingw@nvidia.com>
Co-authored-by: weiqingw4ng <weiqingw4ng@users.noreply.github.com>
Copy link

@github-advanced-security github-advanced-security bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

CodeQL found more than 20 potential problems in the proposed changes. Check the Files changed tab for more details.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.