Skip to content

Commit

Permalink
Add tests for ft launcher and straggler detection
Browse files Browse the repository at this point in the history
Signed-off-by: Shriya Palsamudram <[email protected]>

Fix FaultTolerencePlugin

Signed-off-by: Shriya Palsamudram <[email protected]>

Apply isort and black reformatting

Signed-off-by: ShriyaPalsamudram <[email protected]>

Add StragglerDetection callback to all NeMo2.0 recipes

Signed-off-by: Shriya Palsamudram <[email protected]>

Apply isort and black reformatting

Signed-off-by: ShriyaPalsamudram <[email protected]>

Add missing and remove unsued imports

Signed-off-by: Shriya Palsamudram <[email protected]>

Apply isort and black reformatting

Signed-off-by: ShriyaPalsamudram <[email protected]>

Add ft launcher test

Signed-off-by: Shriya Palsamudram <[email protected]>

fix typo

Signed-off-by: Shriya Palsamudram <[email protected]>

Apply isort and black reformatting

Signed-off-by: ShriyaPalsamudram <[email protected]>

fix more typos

Signed-off-by: Shriya Palsamudram <[email protected]>

Apply isort and black reformatting

Signed-off-by: ShriyaPalsamudram <[email protected]>

add ft launcher using nemo-run for llama3 test

Signed-off-by: Shriya Palsamudram <[email protected]>

Apply isort and black reformatting

Signed-off-by: ShriyaPalsamudram <[email protected]>

fix serialization errors

Signed-off-by: Shriya Palsamudram <[email protected]>

Apply isort and black reformatting

Signed-off-by: ShriyaPalsamudram <[email protected]>

create seperate ft test

Signed-off-by: Shriya Palsamudram <[email protected]>

Apply isort and black reformatting

Signed-off-by: ShriyaPalsamudram <[email protected]>

change github actions test

Signed-off-by: Shriya Palsamudram <[email protected]>

draft crash simulation

Signed-off-by: Shriya Balaji Palsamudram <[email protected]>

Apply isort and black reformatting

Signed-off-by: ShriyaPalsamudram <[email protected]>

Simulate a crash using step, disable checkpointing

Signed-off-by: Shriya Palsamudram <[email protected]>

Apply isort and black reformatting

Signed-off-by: ShriyaPalsamudram <[email protected]>

Add a straggler detection test as well

Signed-off-by: Shriya Palsamudram <[email protected]>

Apply isort and black reformatting

Signed-off-by: ShriyaPalsamudram <[email protected]>

Apply isort and black reformatting

Signed-off-by: ShriyaPalsamudram <[email protected]>

Revert enabling straggler_detection by default in all recipes

Signed-off-by: Shriya Palsamudram <[email protected]>

Remove unused imports

Signed-off-by: Shriya Palsamudram <[email protected]>

Remove extra check in ConfigValidationPlugin

Signed-off-by: Shriya Palsamudram <[email protected]>

Address pylinter issues

Signed-off-by: Shriya Palsamudram <[email protected]>

Improve straggler detection testing and add doc string

Signed-off-by: Shriya Palsamudram <[email protected]>

Apply isort and black reformatting

Signed-off-by: ShriyaPalsamudram <[email protected]>

fix paths

Signed-off-by: Shriya Palsamudram <[email protected]>

Add assert for crash

Signed-off-by: Shriya Palsamudram <[email protected]>

Apply isort and black reformatting

Signed-off-by: ShriyaPalsamudram <[email protected]>

Append run logs to a file after a crash

Signed-off-by: Shriya Palsamudram <[email protected]>

Set FAULT_TOL_FINISHED_FLAG_FILE and FAULT_TOL_CFG_PATH

Signed-off-by: Shriya Palsamudram <[email protected]>

Add openai-gelu in gated activation (#11293)

Fixes per comments (#11280)

* Fixes per comments

Signed-off-by: Gomathy Venkata Krishnan <[email protected]>

* Update README

Signed-off-by: Gomathy Venkata Krishnan <[email protected]>

---------

Signed-off-by: Gomathy Venkata Krishnan <[email protected]>

Add T5TTS (#11193)

* added training and inference recipes for T5-TTS.
* fix some attention errors
* add copyright headers.
* added TODO and detail error log info.
* fixed missing a corner case.
* added classes to __all__
* fixed to return either self-attention scores or cross-attention scores in ParallelTransformerLayer_ class.

Signed-off-by: XuesongYang <[email protected]>

---------

Signed-off-by: Jason <[email protected]>
Signed-off-by: blisc <[email protected]>
Signed-off-by: Xuesong Yang <[email protected]>
Signed-off-by: Xuesong Yang <[email protected]>
Signed-off-by: XuesongYang <[email protected]>
Co-authored-by: blisc <[email protected]>
Co-authored-by: Xuesong Yang <[email protected]>
Co-authored-by: Xuesong Yang <[email protected]>
Co-authored-by: XuesongYang <[email protected]>

ci: Exclude CPU machines from scan (#11300)

Signed-off-by: Oliver Koenig <[email protected]>

Revert "fix(export): GPT models w/ bias=False convert properly (#11255)" (#11301)

This reverts commit 2d4f4953881b9e2d118d3ffeba7e64625d827d11.

remove redundant docs (#11302)

Create phi3mini.py (#11281)

* Create phi3mini.py

Signed-off-by: mayani-nv <[email protected]>

Apply isort and black reformatting

Signed-off-by: mayani-nv <[email protected]>

Update __init__.py

Signed-off-by: mayani-nv <[email protected]>

Update __init__.py

Signed-off-by: mayani-nv <[email protected]>

Apply isort and black reformatting

Signed-off-by: mayani-nv <[email protected]>

* Create phi3_mini_4k_instruct.py for adding to recipe

Signed-off-by: mayani-nv <[email protected]>

Apply isort and black reformatting

Signed-off-by: mayani-nv <[email protected]>

Update phi3_mini_4k_instruct.py and removed Performant recipe

Signed-off-by: mayani-nv <[email protected]>

Update phi3_mini_4k_instruct.py and removing performant condition

Signed-off-by: mayani-nv <[email protected]>

Update phi3_mini_4k_instruct.py with docstring changes

Signed-off-by: mayani-nv <[email protected]>

* Update __init__.py

Signed-off-by: mayani-nv <[email protected]>

* fixing pylint warnings

* Apply isort and black reformatting

Signed-off-by: mayani-nv <[email protected]>

* correcting typos and adding working recipe files

---------

Signed-off-by: mayani-nv <[email protected]>
Signed-off-by: mayani-nv <[email protected]>
Co-authored-by: Chen Cui <[email protected]>
Co-authored-by: mayani-nv <[email protected]>

Integrate lm-eval-harness for evaluations in NeMo (#10621)

* Add evaluate method and other minor fixes

Signed-off-by: Abhishree <[email protected]>

* Add inference params to evaluate method

Signed-off-by: Abhishree <[email protected]>

* Add wait_for_rest_service fn to evaluate method

Signed-off-by: Abhishree <[email protected]>

* Apply isort and black reformatting

Signed-off-by: athitten <[email protected]>

* Add logprobs to be returned by Pytriton for trtllm models

Signed-off-by: Abhishree <[email protected]>

* Increase max_retries in wait_for_rest_service method

Signed-off-by: Abhishree <[email protected]>

* Apply isort and black reformatting

Signed-off-by: athitten <[email protected]>

* Add unset slurm vars and use env vars for Triton args

Signed-off-by: Abhishree <[email protected]>

* Add logic to get logProbs from logits

Signed-off-by: Abhishree <[email protected]>

* Refactor, clean and organize the code

1) Refactors the code and creates an evaluation folder where all util methods live
2) Add doctsrings, comments
3) Expose gather_context_logits, gather_generation_logits in trtllm and add output_generation_logits flag to return generation logits and remove output_logporbs as its not getting used anymore

Signed-off-by: Abhishree <[email protected]>

* Add copyright and initialize special_tokens_kwargs in eval_utils.py

Signed-off-by: Abhishree <[email protected]>

* Add the following chanes

1) Move get_trtllm_deployable and unset_environment_variables to deploy base.py
2) Rename eval_utils.py to base.py
3) REstore scripts/export/convert_nemo2_for_export.py

Signed-off-by: Abhishree <[email protected]>

* Fix a minor typo

Signed-off-by: Abhishree <[email protected]>

* Revert output_log_probs and all_probs arg in tensorrt_llm_run.py

Signed-off-by: Abhishree <[email protected]>

* Fix docstrings formatting

Signed-off-by: Abhishree <[email protected]>

* Pylint and other minor fixes

Signed-off-by: Abhishree <[email protected]>

* Fix pylint and typos

Signed-off-by: Abhishree <[email protected]>

* Apply isort and black reformatting

Signed-off-by: athitten <[email protected]>

* Avoid multiple calls for tokenizer_type

Co-authored-by: Ananth Subramaniam <[email protected]>
Signed-off-by: Abhishree Thittenamane <[email protected]>

* Replace print statements with logging statements

Signed-off-by: Abhishree <[email protected]>

* Apply isort and black reformatting

Signed-off-by: athitten <[email protected]>

---------

Signed-off-by: Abhishree <[email protected]>
Signed-off-by: athitten <[email protected]>
Signed-off-by: Abhishree Thittenamane <[email protected]>
Co-authored-by: athitten <[email protected]>
Co-authored-by: Ananth Subramaniam <[email protected]>

ci: Fix release workflow (#11286)

* ci: Fix release workflow

Signed-off-by: Oliver Koenig <[email protected]>

* fix

Signed-off-by: Oliver Koenig <[email protected]>

* fix

Signed-off-by: Oliver Koenig <[email protected]>

* fix

Signed-off-by: Oliver Koenig <[email protected]>

* fix

Signed-off-by: Oliver Koenig <[email protected]>

* Update .github/workflows/release.yml

Signed-off-by: oliver könig <[email protected]>

---------

Signed-off-by: Oliver Koenig <[email protected]>
Signed-off-by: oliver könig <[email protected]>

Update import 'pytorch_lightning' -> 'lightning.pytorch' (#11252)

* update import in collections/llm

Signed-off-by: Maanu Grover <[email protected]>

* update import in lightning

Signed-off-by: Maanu Grover <[email protected]>

* update fabric import in lightning

Signed-off-by: Maanu Grover <[email protected]>

* Apply isort and black reformatting

Signed-off-by: maanug-nv <[email protected]>

* update import in collections/asr

Signed-off-by: Maanu Grover <[email protected]>

* update import in collections/tts

Signed-off-by: Maanu Grover <[email protected]>

* Apply isort and black reformatting

Signed-off-by: maanug-nv <[email protected]>

* update requirements

Signed-off-by: Maanu Grover <[email protected]>

* unused imports

Signed-off-by: Maanu Grover <[email protected]>

* update import in tests

Signed-off-by: Maanu Grover <[email protected]>

* Apply isort and black reformatting

Signed-off-by: maanug-nv <[email protected]>

* update import in collections/common

Signed-off-by: Maanu Grover <[email protected]>

* update import in core

Signed-off-by: Maanu Grover <[email protected]>

* Apply isort and black reformatting

Signed-off-by: maanug-nv <[email protected]>

* update import in utils

Signed-off-by: Maanu Grover <[email protected]>

* Apply isort and black reformatting

Signed-off-by: maanug-nv <[email protected]>

* update import in collections/nlp

Signed-off-by: Maanu Grover <[email protected]>

* update fabric import in collections/nlp

Signed-off-by: Maanu Grover <[email protected]>

* Apply isort and black reformatting

Signed-off-by: maanug-nv <[email protected]>

* update fabric import in utils

Signed-off-by: Maanu Grover <[email protected]>

* Apply isort and black reformatting

Signed-off-by: maanug-nv <[email protected]>

* update import in nlp examples

Signed-off-by: Maanu Grover <[email protected]>

* update import in asr examples

Signed-off-by: Maanu Grover <[email protected]>

* update import in llm examples

Signed-off-by: Maanu Grover <[email protected]>

* update import in tts examples

Signed-off-by: Maanu Grover <[email protected]>

* update fabric import in nlp examples

Signed-off-by: Maanu Grover <[email protected]>

* update import in deploy

Signed-off-by: Maanu Grover <[email protected]>

* Apply isort and black reformatting

Signed-off-by: maanug-nv <[email protected]>

* update import in slu examples

Signed-off-by: Maanu Grover <[email protected]>

* update import in speaker_tasks examples

Signed-off-by: Maanu Grover <[email protected]>

* update import in collections/audio

Signed-off-by: Maanu Grover <[email protected]>

* update import in audio examples

Signed-off-by: Maanu Grover <[email protected]>

* update import in collections/llm

Signed-off-by: Maanu Grover <[email protected]>

* Apply isort and black reformatting

Signed-off-by: maanug-nv <[email protected]>

* update import in collections/vlm

Signed-off-by: Maanu Grover <[email protected]>

* update import in collections/diffusion

Signed-off-by: Maanu Grover <[email protected]>

* update import in collections/vision

Signed-off-by: Maanu Grover <[email protected]>

* update import in collections/multimodal

Signed-off-by: Maanu Grover <[email protected]>

* update import in multimodal examples

Signed-off-by: Maanu Grover <[email protected]>

* update import in vision examples

Signed-off-by: Maanu Grover <[email protected]>

* Apply isort and black reformatting

Signed-off-by: maanug-nv <[email protected]>

* update import in scripts

Signed-off-by: Maanu Grover <[email protected]>

* Update baseline

Signed-off-by: maanug-nv <[email protected]>

* Apply isort and black reformatting

Signed-off-by: artbataev <[email protected]>

* revert bad change

Signed-off-by: Maanu Grover <[email protected]>

---------

Signed-off-by: Maanu Grover <[email protected]>
Signed-off-by: maanug-nv <[email protected]>
Signed-off-by: artbataev <[email protected]>
Co-authored-by: maanug-nv <[email protected]>
Co-authored-by: artbataev <[email protected]>

fix perf plugin CUDA_DEVICE_MAX_CONNECTIONS setting (#11299)

* fix

Signed-off-by: Jimmy Zhang <[email protected]>

* Docstrings

Signed-off-by: Jimmy Zhang <[email protected]>

* Apply isort and black reformatting

Signed-off-by: JimmyZhang12 <[email protected]>

---------

Signed-off-by: Jimmy Zhang <[email protected]>
Signed-off-by: JimmyZhang12 <[email protected]>
Co-authored-by: Jimmy Zhang <[email protected]>
Co-authored-by: JimmyZhang12 <[email protected]>

PTQ via NeMo-Run CLI (#10984)

* PTQ support in nemo CLI

Signed-off-by: Jan Lasek <[email protected]>

* Naming engine vs checkpoint

Signed-off-by: Jan Lasek <[email protected]>

---------

Signed-off-by: Jan Lasek <[email protected]>

PTQ memory optimization (#11257)

* Initial commit

Signed-off-by: Piotr Kaminski <[email protected]>

* Apply isort and black reformatting

Signed-off-by: Laplasjan107 <[email protected]>

* Add sample generate

Signed-off-by: Piotr Kaminski <[email protected]>

* Apply isort and black reformatting

Signed-off-by: Laplasjan107 <[email protected]>

* Nemotron quantization, reduce diff

Signed-off-by: Piotr Kaminski <[email protected]>

* Apply isort and black reformatting

Signed-off-by: Laplasjan107 <[email protected]>

* Apply isort and black reformatting

Signed-off-by: Laplasjan107 <[email protected]>

* Reduce diff

Signed-off-by: Piotr Kaminski <[email protected]>

* code review suggestions

Signed-off-by: Piotr Kaminski <[email protected]>

* Bug fixes

Signed-off-by: Piotr Kaminski <[email protected]>

* remove not needed import

Signed-off-by: Piotr Kaminski <[email protected]>

* fix model type and allow ddp/optim setup

Signed-off-by: Piotr Kaminski <[email protected]>

* Apply isort and black reformatting

Signed-off-by: Laplasjan107 <[email protected]>

---------

Signed-off-by: Piotr Kaminski <[email protected]>
Signed-off-by: Laplasjan107 <[email protected]>
Signed-off-by: Piotr Kamiński <[email protected]>
Co-authored-by: Piotr Kaminski <[email protected]>
Co-authored-by: Laplasjan107 <[email protected]>
Co-authored-by: Jan Lasek <[email protected]>

update README.md (#11223)

Signed-off-by: yaoyu-33 <[email protected]>

Add `attention_bias` argument in transformer block and transformer layer modules, addressing change in MCore (#11289)

* fix api

Signed-off-by: yaoyu-33 <[email protected]>

* fix ci

Signed-off-by: yaoyu-33 <[email protected]>

* add docstring

Signed-off-by: yaoyu-33 <[email protected]>

* Apply isort and black reformatting

Signed-off-by: yaoyu-33 <[email protected]>

* fix docstring2

Signed-off-by: yaoyu-33 <[email protected]>

* Apply isort and black reformatting

Signed-off-by: yaoyu-33 <[email protected]>

* fix line too long

Signed-off-by: yaoyu-33 <[email protected]>

---------

Signed-off-by: yaoyu-33 <[email protected]>
Signed-off-by: yaoyu-33 <[email protected]>
Co-authored-by: yaoyu-33 <[email protected]>

Remove pytorch-lightning (#11306)

* update import in docs

Signed-off-by: Maanu Grover <[email protected]>

* update import in tutorials

Signed-off-by: Maanu Grover <[email protected]>

* remove pl requirement

Signed-off-by: Maanu Grover <[email protected]>

* missed import updates

Signed-off-by: Maanu Grover <[email protected]>

---------

Signed-off-by: Maanu Grover <[email protected]>

Adding multimodal examples (#11279)

* Adding multimodal examples

* Apply isort and black reformatting

Signed-off-by: shanmugamr1992 <[email protected]>

---------

Signed-off-by: shanmugamr1992 <[email protected]>
Co-authored-by: Shanmugam Ramasamy <[email protected]>
Co-authored-by: shanmugamr1992 <[email protected]>

Update T5 attention-mask shapes to be compatible with all attention-backend in new TE versions (#11059)

* initial commits

* updating cicd test

* commit for FlashFused T5 from Mcore

* testing CICD

* update code for data/mock, update mcore commit for dockerfile

* fix error

* fix error

* fix error in nemo/collections/llm/inference/base.py

* update t5/data/mock.py

* fix cicd erorr

* remove unused libs

* address Yu Yao's comments

* Apply isort and black reformatting

Signed-off-by: huvunvidia <[email protected]>

---------

Signed-off-by: huvunvidia <[email protected]>
Co-authored-by: Huy Vu2 <[email protected]>
Co-authored-by: huvunvidia <[email protected]>

Add HF untrusted code toggle (#11313)

* add trust_remote_code toggle

Signed-off-by: Alexandros Koumparoulis <[email protected]>

* Apply isort and black reformatting

Signed-off-by: akoumpa <[email protected]>

---------

Signed-off-by: Alexandros Koumparoulis <[email protected]>
Signed-off-by: akoumpa <[email protected]>
Co-authored-by: akoumpa <[email protected]>

P2p chunk size setting in nemo 2.0 (#11312)

* NCCL P2P communication chunk size

Signed-off-by: Sangkug Lym <[email protected]>

* NCCL P2P communication chunk size

Signed-off-by: Sangkug Lym <[email protected]>

---------

Signed-off-by: Sangkug Lym <[email protected]>

Nemo2 batcheval (#11158)

* initial draft for eval api

Signed-off-by: HuiyingLi <[email protected]>

* add dp to generate

Signed-off-by: HuiyingLi <[email protected]>

* Apply isort and black reformatting

Signed-off-by: HuiyingLi <[email protected]>

* add top_k=1 to defaul inf param to get deterministic output

Signed-off-by: HuiyingLi <[email protected]>

* change name

Signed-off-by: HuiyingLi <[email protected]>

* add eval ds and write to file to llm.generate

Signed-off-by: HuiyingLi <[email protected]>

* support standalone input jsonl

Signed-off-by: HuiyingLi <[email protected]>

---------

Signed-off-by: HuiyingLi <[email protected]>
Signed-off-by: HuiyingLi <[email protected]>
Co-authored-by: HuiyingLi <[email protected]>

DoRA (#11104)

* initial commit for DoRA

Signed-off-by: Chen Cui <[email protected]>

* clean up code

Signed-off-by: Chen Cui <[email protected]>

* Apply isort and black reformatting

Signed-off-by: cuichenx <[email protected]>

* clean up

Signed-off-by: Chen Cui <[email protected]>

* fix TP

Signed-off-by: Chen Cui <[email protected]>

* Apply isort and black reformatting

Signed-off-by: cuichenx <[email protected]>

* add dropout correction term

Signed-off-by: Chen Cui <[email protected]>

* Apply isort and black reformatting

Signed-off-by: cuichenx <[email protected]>

* add copyright and doc strings

Signed-off-by: Chen Cui <[email protected]>

* Apply isort and black reformatting

Signed-off-by: cuichenx <[email protected]>

* fix

Signed-off-by: Chen Cui <[email protected]>

* Apply isort and black reformatting

Signed-off-by: cuichenx <[email protected]>

* docstrings

Signed-off-by: Chen Cui <[email protected]>

* Apply isort and black reformatting

Signed-off-by: cuichenx <[email protected]>

* docstrings

Signed-off-by: Chen Cui <[email protected]>

* add ci test

Signed-off-by: Chen Cui <[email protected]>

* add ci test

Signed-off-by: Chen Cui <[email protected]>

* typo

Signed-off-by: Chen Cui <[email protected]>

* remove unused code

Signed-off-by: Chen Cui <[email protected]>

* remove commented out code

Signed-off-by: Chen Cui <[email protected]>

* fix

Signed-off-by: Chen Cui <[email protected]>

* bug

Signed-off-by: Chen Cui <[email protected]>

---------

Signed-off-by: Chen Cui <[email protected]>
Signed-off-by: cuichenx <[email protected]>
Co-authored-by: cuichenx <[email protected]>

Profiling - support Chakra & Kineto trace dumping (#11115)

* Support chakra trace dumping by cfg

Signed-off-by: Lily Wang <[email protected]>

remove the manual recording of process::init

Signed-off-by: Lily Wang <[email protected]>

1. Remove unnecessary kineto config  2. Fix typo

Signed-off-by: Lily Wang <[email protected]>

Change warning to exception when nsys is enabled with chakra profiling

Signed-off-by: Lily Wang <[email protected]>

* Apply isort and black reformatting

Signed-off-by: pablo-garay <[email protected]>

* fix bug in identifying profiling start step

Signed-off-by: Lily Wang <[email protected]>

* Update baseline

Signed-off-by: lilyw97 <[email protected]>

* [1]remove unused import [2]switch to use isinstance instead of type() [3]move torch.profiling to function

Signed-off-by: Lily Wang <[email protected]>

* Apply isort and black reformatting

Signed-off-by: lilyw97 <[email protected]>

---------

Signed-off-by: Lily Wang <[email protected]>
Signed-off-by: pablo-garay <[email protected]>
Signed-off-by: lilyw97 <[email protected]>
Signed-off-by: Maanu Grover <[email protected]>
Co-authored-by: Lily Wang <[email protected]>
Co-authored-by: Pablo Garay <[email protected]>
Co-authored-by: pablo-garay <[email protected]>
Co-authored-by: lilyw97 <[email protected]>
Co-authored-by: Maanu Grover <[email protected]>

NeMo 2.0 SFT PEFT notebooks (#10874)

* nemo2-sft notebook initial draft

Signed-off-by: HuiyingLi <[email protected]>

* remove mixtral info

Signed-off-by: HuiyingLi <[email protected]>

* minor fixes

Signed-off-by: HuiyingLi <[email protected]>

* minor fixes

Signed-off-by: HuiyingLi <[email protected]>

* minor fixes

Signed-off-by: HuiyingLi <[email protected]>

* add import_ckpt script and minor changes

Signed-off-by: HuiyingLi <[email protected]>

* Random read for tarr files in lhotse dataloaders (#10536)

* Random read for tarr files in lhotse dataloaders

Signed-off-by: Nune <[email protected]>

* Apply isort and black reformatting

Signed-off-by: nune-tadevosyan <[email protected]>

* Solve failled tests

Signed-off-by: Nune <[email protected]>

* Apply isort and black reformatting

Signed-off-by: nune-tadevosyan <[email protected]>

* Adding a testcase

Signed-off-by: Nune <[email protected]>

* Apply isort and black reformatting

Signed-off-by: nune-tadevosyan <[email protected]>

* Some changs in tests

Signed-off-by: Nune <[email protected]>

* Apply isort and black reformatting

Signed-off-by: nune-tadevosyan <[email protected]>

* removing import

Signed-off-by: Nune <[email protected]>

---------

Signed-off-by: Nune <[email protected]>
Signed-off-by: nune-tadevosyan <[email protected]>
Co-authored-by: nune-tadevosyan <[email protected]>

* training code for hybrid-autoregressive inference model (#10841)

* training code for hybrid-autoregressive inference model

Signed-off-by: Hainan Xu <[email protected]>

* Apply isort and black reformatting

Signed-off-by: hainan-xv <[email protected]>

---------

Signed-off-by: Hainan Xu <[email protected]>
Signed-off-by: hainan-xv <[email protected]>
Co-authored-by: Hainan Xu <[email protected]>
Co-authored-by: hainan-xv <[email protected]>

* [🤠]: Howdy folks, let's bump `Dockerfile.ci` to 772faca ! (#10871)

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: pablo-garay <[email protected]>

* Use trainer.local_rank/global_rank (#10860)

* fix global_rank calculation

Signed-off-by: Alexandros Koumparoulis <[email protected]>

* use trainer's global/local rank

Signed-off-by: Alexandros Koumparoulis <[email protected]>

---------

Signed-off-by: Alexandros Koumparoulis <[email protected]>

* remove stacking operation from batched functions (#10524)

* remove stacking operations

Signed-off-by: lilithgrigoryan <[email protected]>

* fixes im base class

Signed-off-by: lilithgrigoryan <[email protected]>

* clean up

Signed-off-by: lilithgrigoryan <[email protected]>

* Apply isort and black reformatting

Signed-off-by: lilithgrigoryan <[email protected]>

* remove potentially uninitialized local variable

Signed-off-by: lilithgrigoryan <[email protected]>

* restore batch_intilize states funcname

Signed-off-by: lilithgrigoryan <[email protected]>

* fix typo

Signed-off-by: lilithgrigoryan <[email protected]>

* fix potentially uninitialized local variable

Signed-off-by: lilithgrigoryan <[email protected]>

* fix potentially uninitialized local variable
in stateless transduser

Signed-off-by: lilithgrigoryan <[email protected]>

* fix test

Signed-off-by: lilithgrigoryan <[email protected]>

* Apply isort and black reformatting

Signed-off-by: lilithgrigoryan <[email protected]>

* fix docstring, rm comment

Signed-off-by: lilithgrigoryan <[email protected]>

* fix dosctrings

Signed-off-by: lilithgrigoryan <[email protected]>

---------

Signed-off-by: lilithgrigoryan <[email protected]>
Signed-off-by: lilithgrigoryan <[email protected]>
Co-authored-by: lilithgrigoryan <[email protected]>
Co-authored-by: lilithgrigoryan <[email protected]>

* [NeMo-UX] Add llm.generate to nemo.collections.llm (#10471)

* Add llm.generate

Signed-off-by: Hemil Desai <[email protected]>

* Remove comment

Signed-off-by: Hemil Desai <[email protected]>

* Apply isort and black reformatting

Signed-off-by: hemildesai <[email protected]>

* Fix launching with python

Signed-off-by: Hemil Desai <[email protected]>

* PR feedback

Signed-off-by: Hemil Desai <[email protected]>

* PR feedback

Signed-off-by: Hemil Desai <[email protected]>

* Apply isort and black reformatting

Signed-off-by: hemildesai <[email protected]>

* Add assert cp

Signed-off-by: Hemil Desai <[email protected]>

* Add example script

Signed-off-by: Hemil Desai <[email protected]>

* Fix

Signed-off-by: Hemil Desai <[email protected]>

---------

Signed-off-by: Hemil Desai <[email protected]>
Signed-off-by: hemildesai <[email protected]>
Co-authored-by: hemildesai <[email protected]>

* Adding support for LightningDataModule inside Fabric-API (#10879)

* Make FabricMegatronMixedPrecision match MegatronMixedPrecision

Signed-off-by: Marc Romeijn <[email protected]>

* Apply isort and black reformatting

Signed-off-by: marcromeyn <[email protected]>

* Supporting DataModule in fabric-API

Signed-off-by: Marc Romeijn <[email protected]>

* Adding support for LightningDataModule inside Fabric-API

Signed-off-by: Marc Romeijn <[email protected]>

* Apply isort and black reformatting

Signed-off-by: marcromeyn <[email protected]>

* Remove import in mock.py

Signed-off-by: Marc Romeijn <[email protected]>

---------

Signed-off-by: Marc Romeijn <[email protected]>
Signed-off-by: marcromeyn <[email protected]>
Co-authored-by: marcromeyn <[email protected]>

* initial draft

Signed-off-by: smajumdar <[email protected]>

* Initial local run

Signed-off-by: smajumdar <[email protected]>

* Initial local run

Signed-off-by: smajumdar <[email protected]>

* Initial local run

Signed-off-by: smajumdar <[email protected]>

* Initial local run

Signed-off-by: smajumdar <[email protected]>

* Save yaml config for model in nemo.lightning.io (#10765)

* Save yaml config for model in nemo.lightning.io

Signed-off-by: Hemil Desai <[email protected]>

* Apply isort and black reformatting

Signed-off-by: hemildesai <[email protected]>

* Fix bug

Signed-off-by: Hemil Desai <[email protected]>

* Fix bug

Signed-off-by: Hemil Desai <[email protected]>

* fix bug

Signed-off-by: Hemil Desai <[email protected]>

* Add explicit yaml comparison

Signed-off-by: Hemil Desai <[email protected]>

* Apply isort and black reformatting

Signed-off-by: hemildesai <[email protected]>

* relax test

Signed-off-by: Hemil Desai <[email protected]>

---------

Signed-off-by: Hemil Desai <[email protected]>
Signed-off-by: hemildesai <[email protected]>
Co-authored-by: hemildesai <[email protected]>

* Move collectiob.nlp imports inline for t5 (#10877)

* Move collectiob.nlp imports inline for t5

Signed-off-by: Marc Romeyn <[email protected]>

* Apply isort and black reformatting

Signed-off-by: marcromeyn <[email protected]>

---------

Signed-off-by: Marc Romeyn <[email protected]>
Signed-off-by: marcromeyn <[email protected]>
Co-authored-by: marcromeyn <[email protected]>

* add world_size/pp_size runtime check (#10842)

* add world_size/pp_size runtime check

Signed-off-by: Alexandros Koumparoulis <[email protected]>

* fix msg precision

Signed-off-by: Alexandros Koumparoulis <[email protected]>

* fix test_init_parallel_ranks ws=3 pp=3

Signed-off-by: Alexandros Koumparoulis <[email protected]>

---------

Signed-off-by: Alexandros Koumparoulis <[email protected]>

* fix peft resume (#10887)

Signed-off-by: Chen Cui <[email protected]>

* Update engine build step for TRT-LLM 0.13.0 (#10880)

* Setting use_fused_mlp for TRT-LLM >= 0.13.0

Signed-off-by: Jan Lasek <[email protected]>

* Unused import removal

Signed-off-by: Jan Lasek <[email protected]>

---------

Signed-off-by: Jan Lasek <[email protected]>

* Akoumparouli/nemo ux moe loss logging (#10128)

* Move across pipeline loss reduction to a separate function

Signed-off-by: Alexandros Koumparoulis <[email protected]>

* Add support for MoE loss logging

Signed-off-by: Alexandros Koumparoulis <[email protected]>

* fix

Signed-off-by: Alexandros Koumparoulis <[email protected]>

* fix

Signed-off-by: Alexandros Koumparoulis <[email protected]>

* remove unused function

Signed-off-by: Alexandros Koumparoulis <[email protected]>

* Apply isort and black reformatting

Signed-off-by: akoumpa <[email protected]>

---------

Signed-off-by: Alexandros Koumparoulis <[email protected]>
Signed-off-by: akoumpa <[email protected]>
Co-authored-by: akoumpa <[email protected]>

* enable vboost and set LM SM margin (#10853)

* enable vboost

Signed-off-by: Malay Nagda <[email protected]>

* Apply isort and black reformatting

Signed-off-by: malay-nagda <[email protected]>

* env vars

Signed-off-by: Malay Nagda <[email protected]>

* Apply isort and black reformatting

Signed-off-by: malay-nagda <[email protected]>

* add perf plugin

Signed-off-by: Jimmy Zhang <[email protected]>

* Apply isort and black reformatting

Signed-off-by: JimmyZhang12 <[email protected]>

* revert default executor

Signed-off-by: Jimmy Zhang <[email protected]>

* Apply isort and black reformatting

Signed-off-by: JimmyZhang12 <[email protected]>

* fix typo

Signed-off-by: Jimmy Zhang <[email protected]>

* fix more typo

Signed-off-by: Jimmy Zhang <[email protected]>

* ln margin knob

Signed-off-by: Jimmy Zhang <[email protected]>

* Apply isort and black reformatting

Signed-off-by: JimmyZhang12 <[email protected]>

* specify lm margin

Signed-off-by: Jimmy Zhang <[email protected]>

* Apply isort and black reformatting

Signed-off-by: JimmyZhang12 <[email protected]>

---------

Signed-off-by: Malay Nagda <[email protected]>
Signed-off-by: malay-nagda <[email protected]>
Signed-off-by: malay-nagda <[email protected]>
Signed-off-by: Jimmy Zhang <[email protected]>
Signed-off-by: JimmyZhang12 <[email protected]>
Co-authored-by: malay-nagda <[email protected]>
Co-authored-by: Jimmy Zhang <[email protected]>
Co-authored-by: JimmyZhang12 <[email protected]>

* use _get_extra_te_kwargs_meta in fabric (call mcore's _get_extra_te_k… (#10608)

* use _get_extra_te_kwargs_meta in fabric (call mcore's _get_extra_te_kwargs & overwrite device)

Signed-off-by: Alexandros Koumparoulis <[email protected]>

* Apply isort and black reformatting

Signed-off-by: akoumpa <[email protected]>

---------

Signed-off-by: Alexandros Koumparoulis <[email protected]>
Signed-off-by: akoumpa <[email protected]>
Co-authored-by: akoumpa <[email protected]>

* Use torch sdpa implementation in ASR mha (#9590)

* use pytorch sdpa

Signed-off-by: WoodieDudy <[email protected]>

* sdpa work

Signed-off-by: WoodieDudy <[email protected]>

* Apply isort and black reformatting

Signed-off-by: titu1994 <[email protected]>

* sdpa flag to false & sdpa_backend arg

Signed-off-by: WoodieDudy <[email protected]>

* Apply isort and black reformatting

Signed-off-by: WoodieDudy <[email protected]>

* change arg name

Signed-off-by: WoodieDudy <[email protected]>

* Apply isort and black reformatting

Signed-off-by: WoodieDudy <[email protected]>

* fix config args

Signed-off-by: WoodieDudy <[email protected]>

* Apply isort and black reformatting

Signed-off-by: WoodieDudy <[email protected]>

* add condition on version

Signed-off-by: WoodieDudy <[email protected]>

* Apply isort and black reformatting

Signed-off-by: WoodieDudy <[email protected]>

* update condition on version

Signed-off-by: WoodieDudy <[email protected]>

* remove condition on torch version

Signed-off-by: WoodieDudy <[email protected]>

* Apply isort and black reformatting

Signed-off-by: WoodieDudy <[email protected]>

* move code to init

Signed-off-by: WoodieDudy <[email protected]>

* Apply isort and black reformatting

Signed-off-by: WoodieDudy <[email protected]>

* refactor

Signed-off-by: WoodieDudy <[email protected]>

* Apply isort and black reformatting

Signed-off-by: WoodieDudy <[email protected]>

* refactor

Signed-off-by: WoodieDudy <[email protected]>

---------

Signed-off-by: WoodieDudy <[email protected]>
Signed-off-by: titu1994 <[email protected]>
Signed-off-by: WoodieDudy <[email protected]>
Co-authored-by: Somshubra Majumdar <[email protected]>
Co-authored-by: titu1994 <[email protected]>
Co-authored-by: WoodieDudy <[email protected]>
Co-authored-by: Nithin Rao <[email protected]>

* Add registry to register all needed classes with artifacts in nemo.lightning.io (#10861)

* Add registry to register all needed classes with artifacts in nemo.lightning.io

Signed-off-by: Hemil Desai <[email protected]>

* Apply isort and black reformatting

Signed-off-by: hemildesai <[email protected]>

* Fixes

Signed-off-by: Hemil Desai <[email protected]>

* Apply isort and black reformatting

Signed-off-by: hemildesai <[email protected]>

* Fix

Signed-off-by: Hemil Desai <[email protected]>

* comments

Signed-off-by: Hemil Desai <[email protected]>

* Apply isort and black reformatting

Signed-off-by: hemildesai <[email protected]>

* Apply isort and black reformatting

Signed-off-by: artbataev <[email protected]>

* Remove cyclic import

Signed-off-by: Hemil Desai <[email protected]>

---------

Signed-off-by: Hemil Desai <[email protected]>
Signed-off-by: hemildesai <[email protected]>
Signed-off-by: artbataev <[email protected]>
Co-authored-by: hemildesai <[email protected]>
Co-authored-by: artbataev <[email protected]>

* call __post_init__ after altering config values (#10885)

* call __post_init__ after altering config values

Signed-off-by: Alexandros Koumparoulis <[email protected]>

* test fix

Signed-off-by: Alexandros Koumparoulis <[email protected]>

* turn off SP

Signed-off-by: Alexandros Koumparoulis <[email protected]>

---------

Signed-off-by: Alexandros Koumparoulis <[email protected]>

* Nemo 2.0 ckpt support in TRT-LLM export (#10891)

* fix minor import bug

Signed-off-by: Onur Yilmaz <[email protected]>

* Add registry to register all needed classes with artifacts in nemo.lightning.io

Signed-off-by: Hemil Desai <[email protected]>

* Apply isort and black reformatting

Signed-off-by: hemildesai <[email protected]>

* Fixes

Signed-off-by: Hemil Desai <[email protected]>

* Apply isort and black reformatting

Signed-off-by: hemildesai <[email protected]>

* Fix

Signed-off-by: Hemil Desai <[email protected]>

* nemo 2.0 support in export to trt-llm

Signed-off-by: Onur Yilmaz <[email protected]>

* get mixing from main

Signed-off-by: Onur Yilmaz <[email protected]>

* Apply isort and black reformatting

Signed-off-by: oyilmaz-nvidia <[email protected]>

* fix style

Signed-off-by: Onur Yilmaz <[email protected]>

---------

Signed-off-by: Onur Yilmaz <[email protected]>
Signed-off-by: Hemil Desai <[email protected]>
Signed-off-by: hemildesai <[email protected]>
Signed-off-by: oyilmaz-nvidia <[email protected]>
Co-authored-by: Hemil Desai <[email protected]>
Co-authored-by: hemildesai <[email protected]>
Co-authored-by: oyilmaz-nvidia <[email protected]>

* [Docs] Fix doc warnings, focus on feature and multimodal sections (#10171)

* various simple docs source fixes

Signed-off-by: Elena Rastorgueva <[email protected]>

* fix docstrings and typing with forward reference

Signed-off-by: Elena Rastorgueva <[email protected]>

* Apply isort and black reformatting

Signed-off-by: erastorgueva-nv <[email protected]>

* fix typing forward reference for PromptedAudioToTextLhotseDataset

Signed-off-by: Elena Rastorgueva <[email protected]>

* fix feature warnings

Signed-off-by: yaoyu-33 <[email protected]>

* Try fix some model part errors

Signed-off-by: yaoyu-33 <[email protected]>

* try add requirements

Signed-off-by: yaoyu-33 <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* try add requirements

Signed-off-by: yaoyu-33 <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Fix indent in docstring

Signed-off-by: yaoyu-33 <[email protected]>

* Apply isort and black reformatting

Signed-off-by: yaoyu-33 <[email protected]>

* update

Signed-off-by: yaoyu-33 <[email protected]>

* handle duplicate issue

Signed-off-by: yaoyu-33 <[email protected]>

* handle duplicate issue

Signed-off-by: yaoyu-33 <[email protected]>

* fix imagen cite

* fix ratio issues

Signed-off-by: yaoyu-33 <[email protected]>

* fix Dreambooth

Signed-off-by: yaoyu-33 <[email protected]>

* Fix activation recomputation

Signed-off-by: yaoyu-33 <[email protected]>

* fix sequence packing

Signed-off-by: yaoyu-33 <[email protected]>

* fix asr_language_modeling_and_customization

Signed-off-by: yaoyu-33 <[email protected]>

* fixes wip

Signed-off-by: Huiying Li <[email protected]>

---------

Signed-off-by: Elena Rastorgueva <[email protected]>
Signed-off-by: erastorgueva-nv <[email protected]>
Signed-off-by: yaoyu-33 <[email protected]>
Signed-off-by: yaoyu-33 <[email protected]>
Signed-off-by: Huiying Li <[email protected]>
Signed-off-by: Yu Yao <[email protected]>
Co-authored-by: Elena Rastorgueva <[email protected]>
Co-authored-by: Elena Rastorgueva <[email protected]>
Co-authored-by: erastorgueva-nv <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: yaoyu-33 <[email protected]>
Co-authored-by: Ao Tang <[email protected]>
Co-authored-by: Huiying Li <[email protected]>

* calculate step time batch end-batch end (#10202)

* log step time at end

Signed-off-by: Malay Nagda <[email protected]>

* Apply isort and black reformatting

Signed-off-by: malay-nagda <[email protected]>

* use nemo logging

Signed-off-by: Malay Nagda <[email protected]>

* Apply isort and black reformatting

Signed-off-by: malay-nagda <[email protected]>

* cleanup

Signed-off-by: Malay Nagda <[email protected]>

* check remove

Signed-off-by: Malay Nagda <[email protected]>

* delta timing callback

Signed-off-by: Malay Nagda <[email protected]>

* comment and name change

Signed-off-by: Malay Nagda <[email protected]>

---------

Signed-off-by: Malay Nagda <[email protected]>
Signed-off-by: malay-nagda <[email protected]>
Co-authored-by: malay-nagda <[email protected]>

* late import prettytable (#10912)

Signed-off-by: Maanu Grover <[email protected]>

* [🤠]: Howdy folks, let's bump `Dockerfile.ci` to 0d89fc4 ! (#10919)

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>

* Warning for missing FP8 checkpoint support for vLLM deployment (#10906)

Signed-off-by: Jan Lasek <[email protected]>

* Add lhotse fixes for rnnt model training and WER hanging issue with f… (#10821)

* Add lhotse fixes for rnnt model training and WER hanging issue with f… (#10787)

* Add lhotse fixes for rnnt model training and WER hanging issue with fuse batching

Signed-off-by: Nithin Rao Koluguri <nithinraok>

* Apply isort and black reformatting

Signed-off-by: nithinraok <[email protected]>

---------

Signed-off-by: Nithin Rao Koluguri <nithinraok>
Signed-off-by: nithinraok <[email protected]>
Co-authored-by: Nithin Rao Koluguri <nithinraok>
Co-authored-by: nithinraok <[email protected]>

* Apply isort and black reformatting

Signed-off-by: nithinraok <[email protected]>

* Apply isort and black reformatting

Signed-off-by: artbataev <[email protected]>

---------

Signed-off-by: Nithin Rao Koluguri <nithinraok>
Signed-off-by: nithinraok <[email protected]>
Signed-off-by: artbataev <[email protected]>
Co-authored-by: nithinraok <[email protected]>
Co-authored-by: artbataev <[email protected]>

* Fix ASR tests (#10794)

* Make tests required

Signed-off-by: Vladimir Bataev <[email protected]>

* Debug torch.load issue

Signed-off-by: Vladimir Bataev <[email protected]>

* Apply isort and black reformatting

Signed-off-by: artbataev <[email protected]>

* Run only necessary tests

Signed-off-by: Vladimir Bataev <[email protected]>

* Try fix loading

Signed-off-by: Vladimir Bataev <[email protected]>

* Apply isort and black reformatting

Signed-off-by: artbataev <[email protected]>

* Avoid caching fixture

Signed-off-by: Vladimir Bataev <[email protected]>

* Try restore model several times

Signed-off-by: Vladimir Bataev <[email protected]>

* Try customize temporary directory

Signed-off-by: Vladimir Bataev <[email protected]>

* Apply isort and black reformatting

Signed-off-by: artbataev <[email protected]>

* Reorder tests

Signed-off-by: Vladimir Bataev <[email protected]>

* Disable one test

Signed-off-by: Vladimir Bataev <[email protected]>

* Apply isort and black reformatting

Signed-off-by: artbataev <[email protected]>

* Avoid xxlarge model

Signed-off-by: Vladimir Bataev <[email protected]>

* Disable test

Signed-off-by: Vladimir Bataev <[email protected]>

* Revert changes

Signed-off-by: Vladimir Bataev <[email protected]>

* Apply isort and black reformatting

Signed-off-by: artbataev <[email protected]>

* Magic fix

Signed-off-by: Vladimir Bataev <[email protected]>

* Revert unnecessary changes

Signed-off-by: Vladimir Bataev <[email protected]>

* Clean up

Signed-off-by: Vladimir Bataev <[email protected]>

* Disable all jobs except L0

Signed-off-by: Vladimir Bataev <[email protected]>

* RNNT alignments - merge with unit tests

Signed-off-by: Vladimir Bataev <[email protected]>

* Fix CUDA graph frame-looping decoder to handle non-CUDA inputs

Signed-off-by: Vladimir Bataev <[email protected]>

* Fix config

Signed-off-by: Vladimir Bataev <[email protected]>

* Log test results

Signed-off-by: Vladimir Bataev <[email protected]>

* Apply isort and black reformatting

Signed-off-by: artbataev <[email protected]>

* Use less audio files for tests

Signed-off-by: Vladimir Bataev <[email protected]>

---------

Signed-off-by: Vladimir Bataev <[email protected]>
Signed-off-by: artbataev <[email protected]>
Co-authored-by: artbataev <[email protected]>

* Integrating mcore export (#10238)

* Integrating mcore export

* Integrating mcore export

* Apply isort and black reformatting

Signed-off-by: shanmugamr1992 <[email protected]>

* Move trt imports in nemo.collections.llm inside respective functions (#10234)

Signed-off-by: Hemil Desai <[email protected]>

* Add tests for LazyNeMoIterator and fix case with metadata_only=True and offsets in manifest (#10198)

* Add tests for LazyNeMoIterator and fix case with manifest_only=True and offsets in manifest

Signed-off-by: Piotr Żelasko <[email protected]>

* Address code review

Signed-off-by: Piotr Żelasko <[email protected]>

* fix tests

Signed-off-by: Piotr Żelasko <[email protected]>

* fix tests

Signed-off-by: Piotr Żelasko <[email protected]>

---------

Signed-off-by: Piotr Żelasko <[email protected]>

* [NeMo-UX] Fix a serialization bug that prevents users from moving checkpoints (#9939)

* perfor serialization using relative paths to allow users to move checkpoints after they're saved

Signed-off-by: ashors1 <[email protected]>

* Apply isort and black reformatting

Signed-off-by: ashors1 <[email protected]>

* remove unused import

Signed-off-by: ashors1 <[email protected]>

* fix artifact load

Signed-off-by: ashors1 <[email protected]>

* fix path artifact

Signed-off-by: ashors1 <[email protected]>

* remove unused import

Signed-off-by: ashors1 <[email protected]>

---------

Signed-off-by: ashors1 <[email protected]>
Signed-off-by: ashors1 <[email protected]>
Co-authored-by: ashors1 <[email protected]>

* Add MemoryProfileCallback (#10166)

* Add MemoryProfileCallback

Signed-off-by: Shriya Palsamudram <[email protected]>

* Apply isort and black reformatting

Signed-off-by: ShriyaPalsamudram <[email protected]>

* Remove reference cycles, save snapshot on specific ranks

Signed-off-by: Shriya Palsamudram <[email protected]>

* Remove unnecessary imports

Signed-off-by: Shriya Palsamudram <[email protected]>

* Apply isort and black reformatting

Signed-off-by: ShriyaPalsamudram <[email protected]>

* Update docstring

Signed-off-by: Shriya Palsamudram <[email protected]>

---------

Signed-off-by: Shriya Palsamudram <[email protected]>
Signed-off-by: ShriyaPalsamudram <[email protected]>
Signed-off-by: Shriya Rishab <[email protected]>
Co-authored-by: ShriyaPalsamudram <[email protected]>

* Lower bound transformers to support nemotron (#10240)

Signed-off-by: Dong Hyuk Chang <[email protected]>
Co-authored-by: Dong Hyuk Chang <[email protected]>

* [Audio] SSL Pretraining framework for flow-matching model for audio processing (#10052)

Flow matching generative model with SSL pretraining framework

Signed-off-by: Pin-Jui Ku <[email protected]>
Co-authored-by: Kuray107 <[email protected]>

* Revert torchrun fix for model import (#10251)

Signed-off-by: Alexandros Koumparoulis <[email protected]>

* [NeMo-UX[ Move nemotron imports inline (#10255)

* Move nemotron transformers + tokenizer imports inline to reduce number of required deps

Signed-off-by: Marc Romeyn <[email protected]>

* Apply isort and black reformatting

Signed-off-by: marcromeyn <[email protected]>

---------

Signed-off-by: Marc Romeyn <[email protected]>
Signed-off-by: marcromeyn <[email protected]>
Co-authored-by: marcromeyn <[email protected]>

* Wrap CPU model init with megatron_lazy_init_context (#10219)

* Wrap CPU model init with megatron_lazy_init_context

Signed-off-by: Alexandros Koumparoulis <[email protected]>

* Cleanup checkpoint-dir if saving fails

Signed-off-by: Alexandros Koumparoulis <[email protected]>

* Apply isort and black reformatting

Signed-off-by: akoumpa <[email protected]>

---------

Signed-off-by: Alexandros Koumparoulis <[email protected]>
Signed-off-by: akoumpa <[email protected]>
Co-authored-by: akoumpa <[email protected]>

* Bump `Dockerfile.ci` (2024-08-22) (#10227)

* [🤠]: Howdy folks, let's bump `Dockerfile.ci` to 124bcff !

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>

* fix bert flags

Signed-off-by: Oliver Koenig <[email protected]>

---------

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Signed-off-by: Oliver Koenig <[email protected]>
Co-authored-by: pablo-garay <[email protected]>

* salm export trtllm (#10245)

Signed-off-by: slyne deng <[email protected]>
Co-authored-by: slyne deng <[email protected]>

* [🤠]: Howdy folks, let's bump `Dockerfile.ci` to ef85bc9 ! (#10250)

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: pablo-garay <[email protected]>

* [🤠]: Howdy folks, let's bump `Dockerfile.ci` to 01ca03f ! (#10266)

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Signed-off-by: oliver könig <[email protected]>
Co-authored-by: pablo-garay <[email protected]>

* Load model in the target export precision by default in PTQ (#10267)

* Load model in the target export precision by default

Signed-off-by: Jan Lasek <[email protected]>

* Enable megatron_amp_O2=true to actually use half-precision

Signed-off-by: Jan Lasek <[email protected]>

---------

Signed-off-by: Jan Lasek <[email protected]>
Signed-off-by: Jan Lasek <[email protected]>

* Add WandbPlugin, NsysPlugin and PreemptionPlugin to nemo.lightning.run.plugins (#10223)

* Add WandbPlugin, NsysPlugin and PreemptionPlugin to nemo.lightning.run.plugins

Signed-off-by: Hemil Desai <[email protected]>

* Apply isort and black reformatting

Signed-off-by: hemildesai <[email protected]>

* Remove duplicate

Signed-off-by: Hemil Desai <[email protected]>

* Add entity to wandb logger

Signed-off-by: Hemil Desai <[email protected]>

* Add documentation

Signed-off-by: Hemil Desai <[email protected]>

* Apply isort and black reformatting

Signed-off-by: hemildesai <[email protected]>

* Add warning

Signed-off-by: Hemil Desai <[email protected]>

* Apply isort and black reformatting

Signed-off-by: hemildesai <[email protected]>

* PR feedback

Signed-off-by: Hemil Desai <[email protected]>

* Apply isort and black reformatting

Signed-off-by: hemildesai <[email protected]>

* Add comments

Signed-off-by: Hemil Desai <[email protected]>

* Apply isort and black reformatting

Signed-off-by: hemildesai <[email protected]>

---------

Signed-off-by: Hemil Desai <[email protected]>
Signed-off-by: hemildesai <[email protected]>
Co-authored-by: hemildesai <[email protected]>

* [NeMo-UX] Handle absolute logger directories in nemo_logger (#10259)

* handle absolute and relative logger directories

Signed-off-by: Anna Shors <[email protected]>

* merge lines

Signed-off-by: ashors1 <[email protected]>

---------

Signed-off-by: Anna Shors <[email protected]>
Signed-off-by: ashors1 <[email protected]>

* Add sdxl notebook (#10139)

* Add sdxl notebook

Signed-off-by: mingyuanm <[email protected]>

* Rename

Signed-off-by: mingyuanm <[email protected]>

* final Update SDXL notebook

Signed-off-by: mingyuanm <[email protected]>

---------

Signed-off-by: mingyuanm <[email protected]>

* Updating some coments

* Apply isort and black reformatting

Signed-off-by: shanmugamr1992 <[email protected]>

* Updating some coments

* Apply isort and black reformatting

Signed-off-by: shanmugamr1992 <[email protected]>

* Updating some coments

* Small change

* Apply isort and black reformatting

Signed-off-by: shanmugamr1992 <[email protected]>

* Apply isort and black reformatting

Signed-off-by: shanmugamr1992 <[email protected]>

* ADD support for layernorm1p

* Apply isort and black reformatting

Signed-off-by: shanmugamr1992 <[email protected]>

* Update Dockerfile.ci

Signed-off-by: Shanmugam Ramasamy <[email protected]>

* Update Dockerfile.ci

Signed-off-by: Shanmugam Ramasamy <[email protected]>

* Update Dockerfile.ci

Signed-off-by: Shanmugam Ramasamy <[email protected]>

---------

Signed-off-by: shanmugamr1992 <[email protected]>
Signed-off-by: Hemil Desai <[email protected]>
Signed-off-by: Piotr Żelasko <[email protected]>
Signed-off-by: ashors1 <[email protected]>
Signed-off-by: ashors1 <[email protected]>
Signed-off-by: Shriya Palsamudram <[email protected]>
Signed-off-by: ShriyaPalsamudram <[email protected]>
Signed-off-by: Shriya Rishab <[email protected]>
Signed-off-by: Dong Hyuk Chang <[email protected]>
Signed-off-by: Pin-Jui Ku <[email protected]>
Signed-off-by: Alexandros Koumparoulis <[email protected]>
Signed-off-by: Marc Romeyn <[email protected]>
Signed-off-by: marcromeyn <[email protected]>
Signed-off-by: akoumpa <[email protected]>
Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Signed-off-by: Oliver Koenig <[email protected]>
Signed-off-by: slyne deng <[email protected]>
Signed-off-by: oliver könig <[email protected]>
Signed-off-by: Jan Lasek <[email protected]>
Signed-off-by: Jan Lasek <[email protected]>
Signed-off-by: hemildesai <[email protected]>
Signed-off-by: Anna Shors <[email protected]>
Signed-off-by: mingyuanm <[email protected]>
Signed-off-by: Shanmugam Ramasamy <[email protected]>
Co-authored-by: Shanmugam Ramasamy <[email protected]>
Co-authored-by: shanmugamr1992 <[email protected]>
Co-authored-by: Hemil Desai <[email protected]>
Co-authored-by: Piotr Żelasko <[email protected]>
Co-authored-by: Anna Shors <[email protected]>
Co-authored-by: ashors1 <[email protected]>
Co-authored-by: Shriya Rishab <[email protected]>
Co-authored-by: ShriyaPalsamudram <[email protected]>
Co-authored-by: Dong Hyuk Chang <[email protected]>
Co-authored-by: Dong Hyuk Chang <[email protected]>
Co-authored-by: Kuray107 <[email protected]>
Co-authored-by: Kuray107 <[email protected]>
Co-authored-by: Alexandros Koumparoulis <[email protected]>
Co-authored-by: Marc Romeyn <[email protected]>
Co-authored-by: marcromeyn <[email protected]>
Co-authored-by: akoumpa <[email protected]>
Co-authored-by: oliver könig <[email protected]>
Co-authored-by: pablo-garay <[email protected]>
Co-authored-by: Slyne Deng <[email protected]>
Co-authored-by: slyne deng <[email protected]>
Co-authored-by: Jan Lasek <[email protected]>
Co-authored-by: hemildesai <[email protected]>
Co-authored-by: Ming <[email protected]>
Co-authored-by: Shanmugam Ramasamy <[email protected]>

* Fix artifact saving (#10914)

Signed-off-by: Hemil Desai <[email protected]>

* Lora improvement (#10918)

* pull out freeze model

Signed-off-by: Chen Cui <[email protected]>

* add wildcard match to lora target modules

Signed-off-by: Chen Cui <[email protected]>

---------

Signed-off-by: Chen Cui <[email protected]>

* Huvu/t5 nemo2.0 peft (#10916)

* adding peft test and cicd

* add setting mcore model to train in peft.py

* adding test for T5 lora

* fix follow Chen's fix

* restore cicd-main.yml

---------

Co-authored-by: Huy Vu2 <[email protected]>

* Add tie_word_embeddings=True (#10710)

Signed-off-by: Yoshi Suhara <[email protected]>

* Use a context-manager when opening files (#10895)

* Use a context-manager when opening files

Signed-off-by: Alexandros Koumparoulis <[email protected]>

* Apply isort and black reformatting

Signed-off-by: akoumpa <[email protected]>

* Apply isort and black reformatting

Signed-off-by: artbataev <[email protected]>

---------

Signed-off-by: Alexandros Koumparoulis <[email protected]>
Signed-off-by: akoumpa <[email protected]>
Signed-off-by: artbataev <[email protected]>
Co-authored-by: akoumpa <[email protected]>
Co-authored-by: artbataev <[email protected]>

* long context performance numbers in doc (#10784)

* long context perf

Signed-off-by: Youngeun Kwon <[email protected]>

* update the long context perf

Signed-off-by: Youngeun Kwon <[email protected]>

* Akoumparouli/mcore microbatch calculator fix (#10780)

* move tests/lightning/{,_}io

Signed-off-by: Alexandros Koumparoulis <[email protected]>

* add microbatch calculator context manager

Signed-off-by: Alexandros Koumparoulis <[email protected]>

* use microbatch calculator context manager

Signed-off-by: Alexandros Koumparoulis <[email protected]>

* add on_load_checkpoint test to ValidateModelRestoration; use ctx manager to reconfigure microbatch calculator; update save/restore path; add cleanup step at the end

Signed-off-by: Alexandros Koumparoulis <[email protected]>

* remove unused var

Signed-off-by: Alexandros Koumparoulis <[email protected]>

* fix

Signed-off-by: Alexandros Koumparoulis <[email protected]>

* Apply isort and black reformatting

Signed-off-by: akoumpa <[email protected]>

---------

Signed-off-by: Alexandros Koumparoulis <[email protected]>
Signed-off-by: akoumpa <[email protected]>
Co-authored-by: akoumpa <[email protected]>
Signed-off-by: Youngeun Kwon <[email protected]>

* remove 8x3b recipes (#10764)

* remove 8x3b recipes

Signed-off-by: Alexandros Koumparoulis <[email protected]>

* remove 8x3b from test_nemo_run

Signed-off-by: Alexandros Koumparoulis <[email protected]>

* rm from __init__

Signed-off-by: Alexandros Koumparoulis <[email protected]>

---------

Signed-off-by: Alexandros Koumparoulis <[email protected]>
Signed-off-by: Youngeun Kwon <[email protected]>

* change the figure file name

Signed-off-by: Youngeun Kwon <[email protected]>

* Accommodating the reviewer's comment

Signed-off-by: Youngeun Kwon <[email protected]>

* update the y-axis title

Signed-off-by: Youngeun Kwon <[email protected]>

* [🤠]: Howdy folks, let's bump `Dockerfile.ci` to 3f90b98 ! (#10789)

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: pablo-garay <[email protected]>
Signed-off-by: Youngeun Kwon <[email protected]>

* Add ModelOpt transformer model pruning example for Llama models, default to llama3.1-8b-base (#10294)

* Add ModelOpt transformer model pruning example for Llama3 model

Signed-off-by: Shengliang Xu <[email protected]>

* Apply isort and black reformatting

Signed-off-by: shengliangxu <[email protected]>
Signed-off-by: Shengliang Xu <[email protected]>

* examples code is at wrong dir, move them

Signed-off-by: Shengliang Xu <[email protected]>

* changes as suggested in comment

remove some logging and unused config code, update example model to
llama3.1

Signed-off-by: Shengliang Xu <[email protected]>

* Add pruning of hidden_size into example

Signed-off-by: Shengliang Xu <[email protected]>

* Apply isort and black reformatting

Signed-off-by: shengliangxu <[email protected]>
Signed-off-by: Shengliang Xu <[email protected]>

* Update examples/nlp/language_modeling/conf/megatron_gpt_prune.yaml

Signed-off-by: Keval Morabia <[email protected]>

* Add pruning test to cicd-main.yml

Signed-off-by: Keval Morabia <[email protected]>

* Update cicd-main.yml

Signed-off-by: Keval Morabia <[email protected]>

* Update cicd-main.yml

Signed-off-by: Keval Morabia <[email protected]>

* Update cicd-main.yml

Signed-off-by: Keval Morabia <[email protected]>

* Update cicd-main.yml

Signed-off-by: Keval Morabia <2891698…
  • Loading branch information
ShriyaPalsamudram committed Dec 2, 2024
1 parent ed244d9 commit 05dac7e
Show file tree
Hide file tree
Showing 684 changed files with 19,005 additions and 3,085 deletions.
283 changes: 251 additions & 32 deletions .github/workflows/cicd-main.yml

Large diffs are not rendered by default.

2 changes: 1 addition & 1 deletion .github/workflows/monitor-vms.yml
Original file line number Diff line number Diff line change
Expand Up @@ -27,7 +27,7 @@ jobs:
| jq -c '[
.runners[]
| select(.status == "online")
| select(.name | contains("gpu"))
| select(.name | contains("cpu") | not)
| {
"vm": .name,
"n_gpus": [
Expand Down
3 changes: 2 additions & 1 deletion .github/workflows/release.yml
Original file line number Diff line number Diff line change
Expand Up @@ -23,7 +23,7 @@ on:

jobs:
release:
uses: NVIDIA/NeMo-FW-CI-templates/.github/workflows/_release_library.yml@v0.10.0
uses: NVIDIA/NeMo-FW-CI-templates/.github/workflows/_release_library.yml@v0.12.3
with:
release-ref: ${{ inputs.release-ref }}
image-name: nemo_container
Expand All @@ -39,3 +39,4 @@ jobs:
TWINE_USERNAME: ${{ secrets.TWINE_USERNAME }}
TWINE_PASSWORD: ${{ secrets.TWINE_PASSWORD }}
SLACK_RELEASE_ENDPOINT: ${{ secrets.SLACK_RELEASE_ENDPOINT }}
PAT: ${{ secrets.PAT }}
10 changes: 7 additions & 3 deletions .secrets.baseline
Original file line number Diff line number Diff line change
Expand Up @@ -90,6 +90,10 @@
{
"path": "detect_secrets.filters.allowlist.is_line_allowlisted"
},
{
"path": "detect_secrets.filters.common.is_baseline_file",
"filename": ".secrets.baseline"
},
{
"path": "detect_secrets.filters.common.is_ignored_due_to_verification_policies",
"min_level": 2
Expand Down Expand Up @@ -273,7 +277,7 @@
"filename": "scripts/checkpoint_converters/convert_mistral_7b_hf_to_nemo.py",
"hashed_secret": "e0308bd21bffc156d79208f9ecf130370a015002",
"is_verified": false,
"line_number": 460
"line_number": 471
}
],
"scripts/dataset_processing/nlp/intent_and_slot/assistant_utils.py": [
Expand Down Expand Up @@ -1929,7 +1933,7 @@
"filename": "tutorials/speaker_tasks/Speaker_Diarization_Inference.ipynb",
"hashed_secret": "80903ddedcf4ec0a2ee5911cefa7e1ad52419dcc",
"is_verified": false,
"line_number": 989
"line_number": 990
}
],
"tutorials/tools/DefinedCrowd_x_NeMo_ASR_Training_Tutorial.ipynb": [
Expand Down Expand Up @@ -2083,5 +2087,5 @@
}
]
},
"generated_at": "2024-10-25T13:43:17Z"
"generated_at": "2024-11-14T09:37:19Z"
}
5 changes: 4 additions & 1 deletion Dockerfile.ci
Original file line number Diff line number Diff line change
Expand Up @@ -54,7 +54,7 @@ RUN pip install nemo_run@git+https://github.com/NVIDIA/NeMo-Run.git@${NEMO_RUN_T
# Install NeMo requirements
ARG TE_TAG=7d576ed25266a17a7b651f2c12e8498f67e0baea
ARG MODELOPT_VERSION=0.19.0
ARG MCORE_TAG=aded519cfb1de2abf96f36ca059f992294b7876f
ARG MCORE_TAG=c1728c12f1f1cdbb786e52f1ffe512295d76bef3

ARG APEX_TAG=810ffae374a2b9cb4b5c5e28eaeca7d7998fca0c
RUN \
Expand Down Expand Up @@ -84,6 +84,9 @@ git checkout ${MCORE_TAG} && \
popd
export PYTHONPATH="${PYTHONPATH}:/workspace/Megatron-LM"

# Install nvidia-resiliency-ext
pip install --no-cache-dir "git+https://github.com/NVIDIA/nvidia-resiliency-ext.git@97aad77609d2e25ed38ac5c99f0c13f93c48464e"

EOF

# Copy over NeMo code
Expand Down
58 changes: 0 additions & 58 deletions docs/source/nlp/distillation.rst

This file was deleted.

67 changes: 0 additions & 67 deletions docs/source/nlp/nemo_megatron/model_distillation/drop_layers.rst

This file was deleted.

2 changes: 1 addition & 1 deletion docs/source/nlp/punctuation_and_capitalization.rst
Original file line number Diff line number Diff line change
Expand Up @@ -240,7 +240,7 @@ An example of a config file is
- trainer config
-
- Parameters of
`pytorch_lightning.Trainer <https://pytorch-lightning.readthedocs.io/en/latest/common/trainer.html#trainer-class-api>`_.
`lightning.pytorch.Trainer <https://pytorch-lightning.readthedocs.io/en/latest/common/trainer.html#trainer-class-api>`_.
* - **exp_manager**
- exp manager config
-
Expand Down
2 changes: 1 addition & 1 deletion docs/source/starthere/fundamentals.rst
Original file line number Diff line number Diff line change
Expand Up @@ -116,7 +116,7 @@ Below is an example training script for our ``ExampleEncDecModel`` model. We hig
:linenos:
:emphasize-lines: 10, 11, 12
import pytorch_lightning as pl
import lightning.pytorch as pl
from nemo.collections.path_to_model_class import ExampleEncDecModel
from nemo.core.config import hydra_runner
Expand Down
2 changes: 1 addition & 1 deletion examples/asr/asr_adapters/eval_asr_adapter.py
Original file line number Diff line number Diff line change
Expand Up @@ -36,7 +36,7 @@
"""

import pytorch_lightning as pl
import lightning.pytorch as pl
from omegaconf import OmegaConf, open_dict

from nemo.collections.asr.models import ASRModel
Expand Down
2 changes: 1 addition & 1 deletion examples/asr/asr_adapters/train_asr_adapter.py
Original file line number Diff line number Diff line change
Expand Up @@ -84,7 +84,7 @@
import os
from dataclasses import is_dataclass

import pytorch_lightning as pl
import lightning.pytorch as pl
from omegaconf import DictConfig, OmegaConf, open_dict

from nemo.collections.asr.models import ASRModel
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -49,7 +49,7 @@
from dataclasses import dataclass
from typing import Optional

import pytorch_lightning as pl
import lightning.pytorch as pl
import torch
from omegaconf import OmegaConf

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -42,7 +42,7 @@
from dataclasses import dataclass
from typing import Optional

import pytorch_lightning as pl
import lightning.pytorch as pl
import torch
from omegaconf import OmegaConf

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -64,7 +64,7 @@
from dataclasses import dataclass
from typing import Optional

import pytorch_lightning as pl
import lightning.pytorch as pl
import torch
from omegaconf import OmegaConf, open_dict

Expand Down
2 changes: 1 addition & 1 deletion examples/asr/asr_ctc/speech_to_text_ctc.py
Original file line number Diff line number Diff line change
Expand Up @@ -68,7 +68,7 @@
"""

import pytorch_lightning as pl
import lightning.pytorch as pl
from omegaconf import OmegaConf

from nemo.collections.asr.models import EncDecCTCModel
Expand Down
2 changes: 1 addition & 1 deletion examples/asr/asr_ctc/speech_to_text_ctc_bpe.py
Original file line number Diff line number Diff line change
Expand Up @@ -64,7 +64,7 @@
"""

import pytorch_lightning as pl
import lightning.pytorch as pl
from omegaconf import OmegaConf

from nemo.collections.asr.models.ctc_bpe_models import EncDecCTCModelBPE
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -58,7 +58,7 @@
"""

import pytorch_lightning as pl
import lightning.pytorch as pl
from omegaconf import OmegaConf

from nemo.collections.asr.models import EncDecHybridRNNTCTCBPEModel
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -69,7 +69,7 @@
"""

import pytorch_lightning as pl
import lightning.pytorch as pl
from omegaconf import OmegaConf

from nemo.collections.asr.models import EncDecHybridRNNTCTCModel
Expand Down
2 changes: 1 addition & 1 deletion examples/asr/asr_transducer/speech_to_text_rnnt.py
Original file line number Diff line number Diff line change
Expand Up @@ -67,7 +67,7 @@
"""

import pytorch_lightning as pl
import lightning.pytorch as pl
from omegaconf import OmegaConf

from nemo.collections.asr.models import EncDecRNNTModel
Expand Down
2 changes: 1 addition & 1 deletion examples/asr/asr_transducer/speech_to_text_rnnt_bpe.py
Original file line number Diff line number Diff line change
Expand Up @@ -59,7 +59,7 @@
"""

import pytorch_lightning as pl
import lightning.pytorch as pl
from omegaconf import OmegaConf

from nemo.collections.asr.models import EncDecRNNTBPEModel
Expand Down
2 changes: 1 addition & 1 deletion examples/asr/asr_with_tts/speech_to_text_bpe_with_text.py
Original file line number Diff line number Diff line change
Expand Up @@ -49,7 +49,7 @@
"""


import pytorch_lightning as pl
import lightning.pytorch as pl
from omegaconf import OmegaConf

from nemo.collections.asr.models.hybrid_asr_tts_models import ASRWithTTSModel
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -45,7 +45,7 @@
"""


import pytorch_lightning as pl
import lightning.pytorch as pl
from omegaconf import OmegaConf

from nemo.collections.asr.models.hybrid_asr_tts_models import ASRWithTTSModel
Expand Down
2 changes: 1 addition & 1 deletion examples/asr/conf/asr_adapters/asr_adaptation.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -182,7 +182,7 @@ trainer:
val_check_interval: 1.0 # Set to 0.25 to check 4 times per epoch, or an int for number of iterations
accelerator: auto
strategy:
_target_: pytorch_lightning.strategies.DDPStrategy
_target_: lightning.pytorch.strategies.DDPStrategy
gradient_as_bucket_view: true
accumulate_grad_batches: 1
gradient_clip_val: null
Expand Down
2 changes: 1 addition & 1 deletion examples/asr/conf/asr_adapters/asr_adaptation_hp.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -182,7 +182,7 @@ trainer:
val_check_interval: 1.0 # Set to 0.25 to check 4 times per epoch, or an int for number of iterations
accelerator: auto
strategy:
_target_: pytorch_lightning.strategies.DDPStrategy
_target_: lightning.pytorch.strategies.DDPStrategy
gradient_as_bucket_view: true
accumulate_grad_batches: 1
gradient_clip_val: null
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -81,7 +81,7 @@ trainer:
val_check_interval: 1.0 # Set to 0.25 to check 4 times per epoch, or an int for number of iterations
accelerator: auto
strategy:
_target_: pytorch_lightning.strategies.DDPStrategy
_target_: lightning.pytorch.strategies.DDPStrategy
gradient_as_bucket_view: true
accumulate_grad_batches: 1
gradient_clip_val: 0.0
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -145,7 +145,7 @@ trainer:
val_check_interval: 1.0 # Set to 0.25 to check 4 times per epoch, or an int for number of iterations
accelerator: auto
strategy:
_target_: pytorch_lightning.strategies.DDPStrategy
_target_: lightning.pytorch.strategies.DDPStrategy
gradient_as_bucket_view: true
accumulate_grad_batches: 1
gradient_clip_val: 0.0
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -172,7 +172,7 @@ trainer:
val_check_interval: 1.0 # Set to 0.25 to check 4 times per epoch, or an int for number of iterations
accelerator: auto
strategy:
_target_: pytorch_lightning.strategies.DDPStrategy
_target_: lightning.pytorch.strategies.DDPStrategy
gradient_as_bucket_view: true
accumulate_grad_batches: 1
gradient_clip_val: 1.0
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -177,7 +177,7 @@ trainer:
val_check_interval: 1.0 # Set to 0.25 to check 4 times per epoch, or an int for number of iterations
accelerator: auto
strategy:
_target_: pytorch_lightning.strategies.DDPStrategy
_target_: lightning.pytorch.strategies.DDPStrategy
gradient_as_bucket_view: true
accumulate_grad_batches: 1
gradient_clip_val: 1.0
Expand Down
Loading

0 comments on commit 05dac7e

Please sign in to comment.