Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Add tests for ft launcher and straggler detection
Signed-off-by: Shriya Palsamudram <[email protected]> Fix FaultTolerencePlugin Signed-off-by: Shriya Palsamudram <[email protected]> Apply isort and black reformatting Signed-off-by: ShriyaPalsamudram <[email protected]> Add StragglerDetection callback to all NeMo2.0 recipes Signed-off-by: Shriya Palsamudram <[email protected]> Apply isort and black reformatting Signed-off-by: ShriyaPalsamudram <[email protected]> Add missing and remove unsued imports Signed-off-by: Shriya Palsamudram <[email protected]> Apply isort and black reformatting Signed-off-by: ShriyaPalsamudram <[email protected]> Add ft launcher test Signed-off-by: Shriya Palsamudram <[email protected]> fix typo Signed-off-by: Shriya Palsamudram <[email protected]> Apply isort and black reformatting Signed-off-by: ShriyaPalsamudram <[email protected]> fix more typos Signed-off-by: Shriya Palsamudram <[email protected]> Apply isort and black reformatting Signed-off-by: ShriyaPalsamudram <[email protected]> add ft launcher using nemo-run for llama3 test Signed-off-by: Shriya Palsamudram <[email protected]> Apply isort and black reformatting Signed-off-by: ShriyaPalsamudram <[email protected]> fix serialization errors Signed-off-by: Shriya Palsamudram <[email protected]> Apply isort and black reformatting Signed-off-by: ShriyaPalsamudram <[email protected]> create seperate ft test Signed-off-by: Shriya Palsamudram <[email protected]> Apply isort and black reformatting Signed-off-by: ShriyaPalsamudram <[email protected]> change github actions test Signed-off-by: Shriya Palsamudram <[email protected]> draft crash simulation Signed-off-by: Shriya Balaji Palsamudram <[email protected]> Apply isort and black reformatting Signed-off-by: ShriyaPalsamudram <[email protected]> Simulate a crash using step, disable checkpointing Signed-off-by: Shriya Palsamudram <[email protected]> Apply isort and black reformatting Signed-off-by: ShriyaPalsamudram <[email protected]> Add a straggler detection test as well Signed-off-by: Shriya Palsamudram <[email protected]> Apply isort and black reformatting Signed-off-by: ShriyaPalsamudram <[email protected]> Apply isort and black reformatting Signed-off-by: ShriyaPalsamudram <[email protected]> Revert enabling straggler_detection by default in all recipes Signed-off-by: Shriya Palsamudram <[email protected]> Remove unused imports Signed-off-by: Shriya Palsamudram <[email protected]> Remove extra check in ConfigValidationPlugin Signed-off-by: Shriya Palsamudram <[email protected]> Address pylinter issues Signed-off-by: Shriya Palsamudram <[email protected]> Improve straggler detection testing and add doc string Signed-off-by: Shriya Palsamudram <[email protected]> Apply isort and black reformatting Signed-off-by: ShriyaPalsamudram <[email protected]> fix paths Signed-off-by: Shriya Palsamudram <[email protected]> Add assert for crash Signed-off-by: Shriya Palsamudram <[email protected]> Apply isort and black reformatting Signed-off-by: ShriyaPalsamudram <[email protected]> Append run logs to a file after a crash Signed-off-by: Shriya Palsamudram <[email protected]> Set FAULT_TOL_FINISHED_FLAG_FILE and FAULT_TOL_CFG_PATH Signed-off-by: Shriya Palsamudram <[email protected]> Add openai-gelu in gated activation (#11293) Fixes per comments (#11280) * Fixes per comments Signed-off-by: Gomathy Venkata Krishnan <[email protected]> * Update README Signed-off-by: Gomathy Venkata Krishnan <[email protected]> --------- Signed-off-by: Gomathy Venkata Krishnan <[email protected]> Add T5TTS (#11193) * added training and inference recipes for T5-TTS. * fix some attention errors * add copyright headers. * added TODO and detail error log info. * fixed missing a corner case. * added classes to __all__ * fixed to return either self-attention scores or cross-attention scores in ParallelTransformerLayer_ class. Signed-off-by: XuesongYang <[email protected]> --------- Signed-off-by: Jason <[email protected]> Signed-off-by: blisc <[email protected]> Signed-off-by: Xuesong Yang <[email protected]> Signed-off-by: Xuesong Yang <[email protected]> Signed-off-by: XuesongYang <[email protected]> Co-authored-by: blisc <[email protected]> Co-authored-by: Xuesong Yang <[email protected]> Co-authored-by: Xuesong Yang <[email protected]> Co-authored-by: XuesongYang <[email protected]> ci: Exclude CPU machines from scan (#11300) Signed-off-by: Oliver Koenig <[email protected]> Revert "fix(export): GPT models w/ bias=False convert properly (#11255)" (#11301) This reverts commit 2d4f4953881b9e2d118d3ffeba7e64625d827d11. remove redundant docs (#11302) Create phi3mini.py (#11281) * Create phi3mini.py Signed-off-by: mayani-nv <[email protected]> Apply isort and black reformatting Signed-off-by: mayani-nv <[email protected]> Update __init__.py Signed-off-by: mayani-nv <[email protected]> Update __init__.py Signed-off-by: mayani-nv <[email protected]> Apply isort and black reformatting Signed-off-by: mayani-nv <[email protected]> * Create phi3_mini_4k_instruct.py for adding to recipe Signed-off-by: mayani-nv <[email protected]> Apply isort and black reformatting Signed-off-by: mayani-nv <[email protected]> Update phi3_mini_4k_instruct.py and removed Performant recipe Signed-off-by: mayani-nv <[email protected]> Update phi3_mini_4k_instruct.py and removing performant condition Signed-off-by: mayani-nv <[email protected]> Update phi3_mini_4k_instruct.py with docstring changes Signed-off-by: mayani-nv <[email protected]> * Update __init__.py Signed-off-by: mayani-nv <[email protected]> * fixing pylint warnings * Apply isort and black reformatting Signed-off-by: mayani-nv <[email protected]> * correcting typos and adding working recipe files --------- Signed-off-by: mayani-nv <[email protected]> Signed-off-by: mayani-nv <[email protected]> Co-authored-by: Chen Cui <[email protected]> Co-authored-by: mayani-nv <[email protected]> Integrate lm-eval-harness for evaluations in NeMo (#10621) * Add evaluate method and other minor fixes Signed-off-by: Abhishree <[email protected]> * Add inference params to evaluate method Signed-off-by: Abhishree <[email protected]> * Add wait_for_rest_service fn to evaluate method Signed-off-by: Abhishree <[email protected]> * Apply isort and black reformatting Signed-off-by: athitten <[email protected]> * Add logprobs to be returned by Pytriton for trtllm models Signed-off-by: Abhishree <[email protected]> * Increase max_retries in wait_for_rest_service method Signed-off-by: Abhishree <[email protected]> * Apply isort and black reformatting Signed-off-by: athitten <[email protected]> * Add unset slurm vars and use env vars for Triton args Signed-off-by: Abhishree <[email protected]> * Add logic to get logProbs from logits Signed-off-by: Abhishree <[email protected]> * Refactor, clean and organize the code 1) Refactors the code and creates an evaluation folder where all util methods live 2) Add doctsrings, comments 3) Expose gather_context_logits, gather_generation_logits in trtllm and add output_generation_logits flag to return generation logits and remove output_logporbs as its not getting used anymore Signed-off-by: Abhishree <[email protected]> * Add copyright and initialize special_tokens_kwargs in eval_utils.py Signed-off-by: Abhishree <[email protected]> * Add the following chanes 1) Move get_trtllm_deployable and unset_environment_variables to deploy base.py 2) Rename eval_utils.py to base.py 3) REstore scripts/export/convert_nemo2_for_export.py Signed-off-by: Abhishree <[email protected]> * Fix a minor typo Signed-off-by: Abhishree <[email protected]> * Revert output_log_probs and all_probs arg in tensorrt_llm_run.py Signed-off-by: Abhishree <[email protected]> * Fix docstrings formatting Signed-off-by: Abhishree <[email protected]> * Pylint and other minor fixes Signed-off-by: Abhishree <[email protected]> * Fix pylint and typos Signed-off-by: Abhishree <[email protected]> * Apply isort and black reformatting Signed-off-by: athitten <[email protected]> * Avoid multiple calls for tokenizer_type Co-authored-by: Ananth Subramaniam <[email protected]> Signed-off-by: Abhishree Thittenamane <[email protected]> * Replace print statements with logging statements Signed-off-by: Abhishree <[email protected]> * Apply isort and black reformatting Signed-off-by: athitten <[email protected]> --------- Signed-off-by: Abhishree <[email protected]> Signed-off-by: athitten <[email protected]> Signed-off-by: Abhishree Thittenamane <[email protected]> Co-authored-by: athitten <[email protected]> Co-authored-by: Ananth Subramaniam <[email protected]> ci: Fix release workflow (#11286) * ci: Fix release workflow Signed-off-by: Oliver Koenig <[email protected]> * fix Signed-off-by: Oliver Koenig <[email protected]> * fix Signed-off-by: Oliver Koenig <[email protected]> * fix Signed-off-by: Oliver Koenig <[email protected]> * fix Signed-off-by: Oliver Koenig <[email protected]> * Update .github/workflows/release.yml Signed-off-by: oliver könig <[email protected]> --------- Signed-off-by: Oliver Koenig <[email protected]> Signed-off-by: oliver könig <[email protected]> Update import 'pytorch_lightning' -> 'lightning.pytorch' (#11252) * update import in collections/llm Signed-off-by: Maanu Grover <[email protected]> * update import in lightning Signed-off-by: Maanu Grover <[email protected]> * update fabric import in lightning Signed-off-by: Maanu Grover <[email protected]> * Apply isort and black reformatting Signed-off-by: maanug-nv <[email protected]> * update import in collections/asr Signed-off-by: Maanu Grover <[email protected]> * update import in collections/tts Signed-off-by: Maanu Grover <[email protected]> * Apply isort and black reformatting Signed-off-by: maanug-nv <[email protected]> * update requirements Signed-off-by: Maanu Grover <[email protected]> * unused imports Signed-off-by: Maanu Grover <[email protected]> * update import in tests Signed-off-by: Maanu Grover <[email protected]> * Apply isort and black reformatting Signed-off-by: maanug-nv <[email protected]> * update import in collections/common Signed-off-by: Maanu Grover <[email protected]> * update import in core Signed-off-by: Maanu Grover <[email protected]> * Apply isort and black reformatting Signed-off-by: maanug-nv <[email protected]> * update import in utils Signed-off-by: Maanu Grover <[email protected]> * Apply isort and black reformatting Signed-off-by: maanug-nv <[email protected]> * update import in collections/nlp Signed-off-by: Maanu Grover <[email protected]> * update fabric import in collections/nlp Signed-off-by: Maanu Grover <[email protected]> * Apply isort and black reformatting Signed-off-by: maanug-nv <[email protected]> * update fabric import in utils Signed-off-by: Maanu Grover <[email protected]> * Apply isort and black reformatting Signed-off-by: maanug-nv <[email protected]> * update import in nlp examples Signed-off-by: Maanu Grover <[email protected]> * update import in asr examples Signed-off-by: Maanu Grover <[email protected]> * update import in llm examples Signed-off-by: Maanu Grover <[email protected]> * update import in tts examples Signed-off-by: Maanu Grover <[email protected]> * update fabric import in nlp examples Signed-off-by: Maanu Grover <[email protected]> * update import in deploy Signed-off-by: Maanu Grover <[email protected]> * Apply isort and black reformatting Signed-off-by: maanug-nv <[email protected]> * update import in slu examples Signed-off-by: Maanu Grover <[email protected]> * update import in speaker_tasks examples Signed-off-by: Maanu Grover <[email protected]> * update import in collections/audio Signed-off-by: Maanu Grover <[email protected]> * update import in audio examples Signed-off-by: Maanu Grover <[email protected]> * update import in collections/llm Signed-off-by: Maanu Grover <[email protected]> * Apply isort and black reformatting Signed-off-by: maanug-nv <[email protected]> * update import in collections/vlm Signed-off-by: Maanu Grover <[email protected]> * update import in collections/diffusion Signed-off-by: Maanu Grover <[email protected]> * update import in collections/vision Signed-off-by: Maanu Grover <[email protected]> * update import in collections/multimodal Signed-off-by: Maanu Grover <[email protected]> * update import in multimodal examples Signed-off-by: Maanu Grover <[email protected]> * update import in vision examples Signed-off-by: Maanu Grover <[email protected]> * Apply isort and black reformatting Signed-off-by: maanug-nv <[email protected]> * update import in scripts Signed-off-by: Maanu Grover <[email protected]> * Update baseline Signed-off-by: maanug-nv <[email protected]> * Apply isort and black reformatting Signed-off-by: artbataev <[email protected]> * revert bad change Signed-off-by: Maanu Grover <[email protected]> --------- Signed-off-by: Maanu Grover <[email protected]> Signed-off-by: maanug-nv <[email protected]> Signed-off-by: artbataev <[email protected]> Co-authored-by: maanug-nv <[email protected]> Co-authored-by: artbataev <[email protected]> fix perf plugin CUDA_DEVICE_MAX_CONNECTIONS setting (#11299) * fix Signed-off-by: Jimmy Zhang <[email protected]> * Docstrings Signed-off-by: Jimmy Zhang <[email protected]> * Apply isort and black reformatting Signed-off-by: JimmyZhang12 <[email protected]> --------- Signed-off-by: Jimmy Zhang <[email protected]> Signed-off-by: JimmyZhang12 <[email protected]> Co-authored-by: Jimmy Zhang <[email protected]> Co-authored-by: JimmyZhang12 <[email protected]> PTQ via NeMo-Run CLI (#10984) * PTQ support in nemo CLI Signed-off-by: Jan Lasek <[email protected]> * Naming engine vs checkpoint Signed-off-by: Jan Lasek <[email protected]> --------- Signed-off-by: Jan Lasek <[email protected]> PTQ memory optimization (#11257) * Initial commit Signed-off-by: Piotr Kaminski <[email protected]> * Apply isort and black reformatting Signed-off-by: Laplasjan107 <[email protected]> * Add sample generate Signed-off-by: Piotr Kaminski <[email protected]> * Apply isort and black reformatting Signed-off-by: Laplasjan107 <[email protected]> * Nemotron quantization, reduce diff Signed-off-by: Piotr Kaminski <[email protected]> * Apply isort and black reformatting Signed-off-by: Laplasjan107 <[email protected]> * Apply isort and black reformatting Signed-off-by: Laplasjan107 <[email protected]> * Reduce diff Signed-off-by: Piotr Kaminski <[email protected]> * code review suggestions Signed-off-by: Piotr Kaminski <[email protected]> * Bug fixes Signed-off-by: Piotr Kaminski <[email protected]> * remove not needed import Signed-off-by: Piotr Kaminski <[email protected]> * fix model type and allow ddp/optim setup Signed-off-by: Piotr Kaminski <[email protected]> * Apply isort and black reformatting Signed-off-by: Laplasjan107 <[email protected]> --------- Signed-off-by: Piotr Kaminski <[email protected]> Signed-off-by: Laplasjan107 <[email protected]> Signed-off-by: Piotr Kamiński <[email protected]> Co-authored-by: Piotr Kaminski <[email protected]> Co-authored-by: Laplasjan107 <[email protected]> Co-authored-by: Jan Lasek <[email protected]> update README.md (#11223) Signed-off-by: yaoyu-33 <[email protected]> Add `attention_bias` argument in transformer block and transformer layer modules, addressing change in MCore (#11289) * fix api Signed-off-by: yaoyu-33 <[email protected]> * fix ci Signed-off-by: yaoyu-33 <[email protected]> * add docstring Signed-off-by: yaoyu-33 <[email protected]> * Apply isort and black reformatting Signed-off-by: yaoyu-33 <[email protected]> * fix docstring2 Signed-off-by: yaoyu-33 <[email protected]> * Apply isort and black reformatting Signed-off-by: yaoyu-33 <[email protected]> * fix line too long Signed-off-by: yaoyu-33 <[email protected]> --------- Signed-off-by: yaoyu-33 <[email protected]> Signed-off-by: yaoyu-33 <[email protected]> Co-authored-by: yaoyu-33 <[email protected]> Remove pytorch-lightning (#11306) * update import in docs Signed-off-by: Maanu Grover <[email protected]> * update import in tutorials Signed-off-by: Maanu Grover <[email protected]> * remove pl requirement Signed-off-by: Maanu Grover <[email protected]> * missed import updates Signed-off-by: Maanu Grover <[email protected]> --------- Signed-off-by: Maanu Grover <[email protected]> Adding multimodal examples (#11279) * Adding multimodal examples * Apply isort and black reformatting Signed-off-by: shanmugamr1992 <[email protected]> --------- Signed-off-by: shanmugamr1992 <[email protected]> Co-authored-by: Shanmugam Ramasamy <[email protected]> Co-authored-by: shanmugamr1992 <[email protected]> Update T5 attention-mask shapes to be compatible with all attention-backend in new TE versions (#11059) * initial commits * updating cicd test * commit for FlashFused T5 from Mcore * testing CICD * update code for data/mock, update mcore commit for dockerfile * fix error * fix error * fix error in nemo/collections/llm/inference/base.py * update t5/data/mock.py * fix cicd erorr * remove unused libs * address Yu Yao's comments * Apply isort and black reformatting Signed-off-by: huvunvidia <[email protected]> --------- Signed-off-by: huvunvidia <[email protected]> Co-authored-by: Huy Vu2 <[email protected]> Co-authored-by: huvunvidia <[email protected]> Add HF untrusted code toggle (#11313) * add trust_remote_code toggle Signed-off-by: Alexandros Koumparoulis <[email protected]> * Apply isort and black reformatting Signed-off-by: akoumpa <[email protected]> --------- Signed-off-by: Alexandros Koumparoulis <[email protected]> Signed-off-by: akoumpa <[email protected]> Co-authored-by: akoumpa <[email protected]> P2p chunk size setting in nemo 2.0 (#11312) * NCCL P2P communication chunk size Signed-off-by: Sangkug Lym <[email protected]> * NCCL P2P communication chunk size Signed-off-by: Sangkug Lym <[email protected]> --------- Signed-off-by: Sangkug Lym <[email protected]> Nemo2 batcheval (#11158) * initial draft for eval api Signed-off-by: HuiyingLi <[email protected]> * add dp to generate Signed-off-by: HuiyingLi <[email protected]> * Apply isort and black reformatting Signed-off-by: HuiyingLi <[email protected]> * add top_k=1 to defaul inf param to get deterministic output Signed-off-by: HuiyingLi <[email protected]> * change name Signed-off-by: HuiyingLi <[email protected]> * add eval ds and write to file to llm.generate Signed-off-by: HuiyingLi <[email protected]> * support standalone input jsonl Signed-off-by: HuiyingLi <[email protected]> --------- Signed-off-by: HuiyingLi <[email protected]> Signed-off-by: HuiyingLi <[email protected]> Co-authored-by: HuiyingLi <[email protected]> DoRA (#11104) * initial commit for DoRA Signed-off-by: Chen Cui <[email protected]> * clean up code Signed-off-by: Chen Cui <[email protected]> * Apply isort and black reformatting Signed-off-by: cuichenx <[email protected]> * clean up Signed-off-by: Chen Cui <[email protected]> * fix TP Signed-off-by: Chen Cui <[email protected]> * Apply isort and black reformatting Signed-off-by: cuichenx <[email protected]> * add dropout correction term Signed-off-by: Chen Cui <[email protected]> * Apply isort and black reformatting Signed-off-by: cuichenx <[email protected]> * add copyright and doc strings Signed-off-by: Chen Cui <[email protected]> * Apply isort and black reformatting Signed-off-by: cuichenx <[email protected]> * fix Signed-off-by: Chen Cui <[email protected]> * Apply isort and black reformatting Signed-off-by: cuichenx <[email protected]> * docstrings Signed-off-by: Chen Cui <[email protected]> * Apply isort and black reformatting Signed-off-by: cuichenx <[email protected]> * docstrings Signed-off-by: Chen Cui <[email protected]> * add ci test Signed-off-by: Chen Cui <[email protected]> * add ci test Signed-off-by: Chen Cui <[email protected]> * typo Signed-off-by: Chen Cui <[email protected]> * remove unused code Signed-off-by: Chen Cui <[email protected]> * remove commented out code Signed-off-by: Chen Cui <[email protected]> * fix Signed-off-by: Chen Cui <[email protected]> * bug Signed-off-by: Chen Cui <[email protected]> --------- Signed-off-by: Chen Cui <[email protected]> Signed-off-by: cuichenx <[email protected]> Co-authored-by: cuichenx <[email protected]> Profiling - support Chakra & Kineto trace dumping (#11115) * Support chakra trace dumping by cfg Signed-off-by: Lily Wang <[email protected]> remove the manual recording of process::init Signed-off-by: Lily Wang <[email protected]> 1. Remove unnecessary kineto config 2. Fix typo Signed-off-by: Lily Wang <[email protected]> Change warning to exception when nsys is enabled with chakra profiling Signed-off-by: Lily Wang <[email protected]> * Apply isort and black reformatting Signed-off-by: pablo-garay <[email protected]> * fix bug in identifying profiling start step Signed-off-by: Lily Wang <[email protected]> * Update baseline Signed-off-by: lilyw97 <[email protected]> * [1]remove unused import [2]switch to use isinstance instead of type() [3]move torch.profiling to function Signed-off-by: Lily Wang <[email protected]> * Apply isort and black reformatting Signed-off-by: lilyw97 <[email protected]> --------- Signed-off-by: Lily Wang <[email protected]> Signed-off-by: pablo-garay <[email protected]> Signed-off-by: lilyw97 <[email protected]> Signed-off-by: Maanu Grover <[email protected]> Co-authored-by: Lily Wang <[email protected]> Co-authored-by: Pablo Garay <[email protected]> Co-authored-by: pablo-garay <[email protected]> Co-authored-by: lilyw97 <[email protected]> Co-authored-by: Maanu Grover <[email protected]> NeMo 2.0 SFT PEFT notebooks (#10874) * nemo2-sft notebook initial draft Signed-off-by: HuiyingLi <[email protected]> * remove mixtral info Signed-off-by: HuiyingLi <[email protected]> * minor fixes Signed-off-by: HuiyingLi <[email protected]> * minor fixes Signed-off-by: HuiyingLi <[email protected]> * minor fixes Signed-off-by: HuiyingLi <[email protected]> * add import_ckpt script and minor changes Signed-off-by: HuiyingLi <[email protected]> * Random read for tarr files in lhotse dataloaders (#10536) * Random read for tarr files in lhotse dataloaders Signed-off-by: Nune <[email protected]> * Apply isort and black reformatting Signed-off-by: nune-tadevosyan <[email protected]> * Solve failled tests Signed-off-by: Nune <[email protected]> * Apply isort and black reformatting Signed-off-by: nune-tadevosyan <[email protected]> * Adding a testcase Signed-off-by: Nune <[email protected]> * Apply isort and black reformatting Signed-off-by: nune-tadevosyan <[email protected]> * Some changs in tests Signed-off-by: Nune <[email protected]> * Apply isort and black reformatting Signed-off-by: nune-tadevosyan <[email protected]> * removing import Signed-off-by: Nune <[email protected]> --------- Signed-off-by: Nune <[email protected]> Signed-off-by: nune-tadevosyan <[email protected]> Co-authored-by: nune-tadevosyan <[email protected]> * training code for hybrid-autoregressive inference model (#10841) * training code for hybrid-autoregressive inference model Signed-off-by: Hainan Xu <[email protected]> * Apply isort and black reformatting Signed-off-by: hainan-xv <[email protected]> --------- Signed-off-by: Hainan Xu <[email protected]> Signed-off-by: hainan-xv <[email protected]> Co-authored-by: Hainan Xu <[email protected]> Co-authored-by: hainan-xv <[email protected]> * [🤠]: Howdy folks, let's bump `Dockerfile.ci` to 772faca ! (#10871) Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: pablo-garay <[email protected]> * Use trainer.local_rank/global_rank (#10860) * fix global_rank calculation Signed-off-by: Alexandros Koumparoulis <[email protected]> * use trainer's global/local rank Signed-off-by: Alexandros Koumparoulis <[email protected]> --------- Signed-off-by: Alexandros Koumparoulis <[email protected]> * remove stacking operation from batched functions (#10524) * remove stacking operations Signed-off-by: lilithgrigoryan <[email protected]> * fixes im base class Signed-off-by: lilithgrigoryan <[email protected]> * clean up Signed-off-by: lilithgrigoryan <[email protected]> * Apply isort and black reformatting Signed-off-by: lilithgrigoryan <[email protected]> * remove potentially uninitialized local variable Signed-off-by: lilithgrigoryan <[email protected]> * restore batch_intilize states funcname Signed-off-by: lilithgrigoryan <[email protected]> * fix typo Signed-off-by: lilithgrigoryan <[email protected]> * fix potentially uninitialized local variable Signed-off-by: lilithgrigoryan <[email protected]> * fix potentially uninitialized local variable in stateless transduser Signed-off-by: lilithgrigoryan <[email protected]> * fix test Signed-off-by: lilithgrigoryan <[email protected]> * Apply isort and black reformatting Signed-off-by: lilithgrigoryan <[email protected]> * fix docstring, rm comment Signed-off-by: lilithgrigoryan <[email protected]> * fix dosctrings Signed-off-by: lilithgrigoryan <[email protected]> --------- Signed-off-by: lilithgrigoryan <[email protected]> Signed-off-by: lilithgrigoryan <[email protected]> Co-authored-by: lilithgrigoryan <[email protected]> Co-authored-by: lilithgrigoryan <[email protected]> * [NeMo-UX] Add llm.generate to nemo.collections.llm (#10471) * Add llm.generate Signed-off-by: Hemil Desai <[email protected]> * Remove comment Signed-off-by: Hemil Desai <[email protected]> * Apply isort and black reformatting Signed-off-by: hemildesai <[email protected]> * Fix launching with python Signed-off-by: Hemil Desai <[email protected]> * PR feedback Signed-off-by: Hemil Desai <[email protected]> * PR feedback Signed-off-by: Hemil Desai <[email protected]> * Apply isort and black reformatting Signed-off-by: hemildesai <[email protected]> * Add assert cp Signed-off-by: Hemil Desai <[email protected]> * Add example script Signed-off-by: Hemil Desai <[email protected]> * Fix Signed-off-by: Hemil Desai <[email protected]> --------- Signed-off-by: Hemil Desai <[email protected]> Signed-off-by: hemildesai <[email protected]> Co-authored-by: hemildesai <[email protected]> * Adding support for LightningDataModule inside Fabric-API (#10879) * Make FabricMegatronMixedPrecision match MegatronMixedPrecision Signed-off-by: Marc Romeijn <[email protected]> * Apply isort and black reformatting Signed-off-by: marcromeyn <[email protected]> * Supporting DataModule in fabric-API Signed-off-by: Marc Romeijn <[email protected]> * Adding support for LightningDataModule inside Fabric-API Signed-off-by: Marc Romeijn <[email protected]> * Apply isort and black reformatting Signed-off-by: marcromeyn <[email protected]> * Remove import in mock.py Signed-off-by: Marc Romeijn <[email protected]> --------- Signed-off-by: Marc Romeijn <[email protected]> Signed-off-by: marcromeyn <[email protected]> Co-authored-by: marcromeyn <[email protected]> * initial draft Signed-off-by: smajumdar <[email protected]> * Initial local run Signed-off-by: smajumdar <[email protected]> * Initial local run Signed-off-by: smajumdar <[email protected]> * Initial local run Signed-off-by: smajumdar <[email protected]> * Initial local run Signed-off-by: smajumdar <[email protected]> * Save yaml config for model in nemo.lightning.io (#10765) * Save yaml config for model in nemo.lightning.io Signed-off-by: Hemil Desai <[email protected]> * Apply isort and black reformatting Signed-off-by: hemildesai <[email protected]> * Fix bug Signed-off-by: Hemil Desai <[email protected]> * Fix bug Signed-off-by: Hemil Desai <[email protected]> * fix bug Signed-off-by: Hemil Desai <[email protected]> * Add explicit yaml comparison Signed-off-by: Hemil Desai <[email protected]> * Apply isort and black reformatting Signed-off-by: hemildesai <[email protected]> * relax test Signed-off-by: Hemil Desai <[email protected]> --------- Signed-off-by: Hemil Desai <[email protected]> Signed-off-by: hemildesai <[email protected]> Co-authored-by: hemildesai <[email protected]> * Move collectiob.nlp imports inline for t5 (#10877) * Move collectiob.nlp imports inline for t5 Signed-off-by: Marc Romeyn <[email protected]> * Apply isort and black reformatting Signed-off-by: marcromeyn <[email protected]> --------- Signed-off-by: Marc Romeyn <[email protected]> Signed-off-by: marcromeyn <[email protected]> Co-authored-by: marcromeyn <[email protected]> * add world_size/pp_size runtime check (#10842) * add world_size/pp_size runtime check Signed-off-by: Alexandros Koumparoulis <[email protected]> * fix msg precision Signed-off-by: Alexandros Koumparoulis <[email protected]> * fix test_init_parallel_ranks ws=3 pp=3 Signed-off-by: Alexandros Koumparoulis <[email protected]> --------- Signed-off-by: Alexandros Koumparoulis <[email protected]> * fix peft resume (#10887) Signed-off-by: Chen Cui <[email protected]> * Update engine build step for TRT-LLM 0.13.0 (#10880) * Setting use_fused_mlp for TRT-LLM >= 0.13.0 Signed-off-by: Jan Lasek <[email protected]> * Unused import removal Signed-off-by: Jan Lasek <[email protected]> --------- Signed-off-by: Jan Lasek <[email protected]> * Akoumparouli/nemo ux moe loss logging (#10128) * Move across pipeline loss reduction to a separate function Signed-off-by: Alexandros Koumparoulis <[email protected]> * Add support for MoE loss logging Signed-off-by: Alexandros Koumparoulis <[email protected]> * fix Signed-off-by: Alexandros Koumparoulis <[email protected]> * fix Signed-off-by: Alexandros Koumparoulis <[email protected]> * remove unused function Signed-off-by: Alexandros Koumparoulis <[email protected]> * Apply isort and black reformatting Signed-off-by: akoumpa <[email protected]> --------- Signed-off-by: Alexandros Koumparoulis <[email protected]> Signed-off-by: akoumpa <[email protected]> Co-authored-by: akoumpa <[email protected]> * enable vboost and set LM SM margin (#10853) * enable vboost Signed-off-by: Malay Nagda <[email protected]> * Apply isort and black reformatting Signed-off-by: malay-nagda <[email protected]> * env vars Signed-off-by: Malay Nagda <[email protected]> * Apply isort and black reformatting Signed-off-by: malay-nagda <[email protected]> * add perf plugin Signed-off-by: Jimmy Zhang <[email protected]> * Apply isort and black reformatting Signed-off-by: JimmyZhang12 <[email protected]> * revert default executor Signed-off-by: Jimmy Zhang <[email protected]> * Apply isort and black reformatting Signed-off-by: JimmyZhang12 <[email protected]> * fix typo Signed-off-by: Jimmy Zhang <[email protected]> * fix more typo Signed-off-by: Jimmy Zhang <[email protected]> * ln margin knob Signed-off-by: Jimmy Zhang <[email protected]> * Apply isort and black reformatting Signed-off-by: JimmyZhang12 <[email protected]> * specify lm margin Signed-off-by: Jimmy Zhang <[email protected]> * Apply isort and black reformatting Signed-off-by: JimmyZhang12 <[email protected]> --------- Signed-off-by: Malay Nagda <[email protected]> Signed-off-by: malay-nagda <[email protected]> Signed-off-by: malay-nagda <[email protected]> Signed-off-by: Jimmy Zhang <[email protected]> Signed-off-by: JimmyZhang12 <[email protected]> Co-authored-by: malay-nagda <[email protected]> Co-authored-by: Jimmy Zhang <[email protected]> Co-authored-by: JimmyZhang12 <[email protected]> * use _get_extra_te_kwargs_meta in fabric (call mcore's _get_extra_te_k… (#10608) * use _get_extra_te_kwargs_meta in fabric (call mcore's _get_extra_te_kwargs & overwrite device) Signed-off-by: Alexandros Koumparoulis <[email protected]> * Apply isort and black reformatting Signed-off-by: akoumpa <[email protected]> --------- Signed-off-by: Alexandros Koumparoulis <[email protected]> Signed-off-by: akoumpa <[email protected]> Co-authored-by: akoumpa <[email protected]> * Use torch sdpa implementation in ASR mha (#9590) * use pytorch sdpa Signed-off-by: WoodieDudy <[email protected]> * sdpa work Signed-off-by: WoodieDudy <[email protected]> * Apply isort and black reformatting Signed-off-by: titu1994 <[email protected]> * sdpa flag to false & sdpa_backend arg Signed-off-by: WoodieDudy <[email protected]> * Apply isort and black reformatting Signed-off-by: WoodieDudy <[email protected]> * change arg name Signed-off-by: WoodieDudy <[email protected]> * Apply isort and black reformatting Signed-off-by: WoodieDudy <[email protected]> * fix config args Signed-off-by: WoodieDudy <[email protected]> * Apply isort and black reformatting Signed-off-by: WoodieDudy <[email protected]> * add condition on version Signed-off-by: WoodieDudy <[email protected]> * Apply isort and black reformatting Signed-off-by: WoodieDudy <[email protected]> * update condition on version Signed-off-by: WoodieDudy <[email protected]> * remove condition on torch version Signed-off-by: WoodieDudy <[email protected]> * Apply isort and black reformatting Signed-off-by: WoodieDudy <[email protected]> * move code to init Signed-off-by: WoodieDudy <[email protected]> * Apply isort and black reformatting Signed-off-by: WoodieDudy <[email protected]> * refactor Signed-off-by: WoodieDudy <[email protected]> * Apply isort and black reformatting Signed-off-by: WoodieDudy <[email protected]> * refactor Signed-off-by: WoodieDudy <[email protected]> --------- Signed-off-by: WoodieDudy <[email protected]> Signed-off-by: titu1994 <[email protected]> Signed-off-by: WoodieDudy <[email protected]> Co-authored-by: Somshubra Majumdar <[email protected]> Co-authored-by: titu1994 <[email protected]> Co-authored-by: WoodieDudy <[email protected]> Co-authored-by: Nithin Rao <[email protected]> * Add registry to register all needed classes with artifacts in nemo.lightning.io (#10861) * Add registry to register all needed classes with artifacts in nemo.lightning.io Signed-off-by: Hemil Desai <[email protected]> * Apply isort and black reformatting Signed-off-by: hemildesai <[email protected]> * Fixes Signed-off-by: Hemil Desai <[email protected]> * Apply isort and black reformatting Signed-off-by: hemildesai <[email protected]> * Fix Signed-off-by: Hemil Desai <[email protected]> * comments Signed-off-by: Hemil Desai <[email protected]> * Apply isort and black reformatting Signed-off-by: hemildesai <[email protected]> * Apply isort and black reformatting Signed-off-by: artbataev <[email protected]> * Remove cyclic import Signed-off-by: Hemil Desai <[email protected]> --------- Signed-off-by: Hemil Desai <[email protected]> Signed-off-by: hemildesai <[email protected]> Signed-off-by: artbataev <[email protected]> Co-authored-by: hemildesai <[email protected]> Co-authored-by: artbataev <[email protected]> * call __post_init__ after altering config values (#10885) * call __post_init__ after altering config values Signed-off-by: Alexandros Koumparoulis <[email protected]> * test fix Signed-off-by: Alexandros Koumparoulis <[email protected]> * turn off SP Signed-off-by: Alexandros Koumparoulis <[email protected]> --------- Signed-off-by: Alexandros Koumparoulis <[email protected]> * Nemo 2.0 ckpt support in TRT-LLM export (#10891) * fix minor import bug Signed-off-by: Onur Yilmaz <[email protected]> * Add registry to register all needed classes with artifacts in nemo.lightning.io Signed-off-by: Hemil Desai <[email protected]> * Apply isort and black reformatting Signed-off-by: hemildesai <[email protected]> * Fixes Signed-off-by: Hemil Desai <[email protected]> * Apply isort and black reformatting Signed-off-by: hemildesai <[email protected]> * Fix Signed-off-by: Hemil Desai <[email protected]> * nemo 2.0 support in export to trt-llm Signed-off-by: Onur Yilmaz <[email protected]> * get mixing from main Signed-off-by: Onur Yilmaz <[email protected]> * Apply isort and black reformatting Signed-off-by: oyilmaz-nvidia <[email protected]> * fix style Signed-off-by: Onur Yilmaz <[email protected]> --------- Signed-off-by: Onur Yilmaz <[email protected]> Signed-off-by: Hemil Desai <[email protected]> Signed-off-by: hemildesai <[email protected]> Signed-off-by: oyilmaz-nvidia <[email protected]> Co-authored-by: Hemil Desai <[email protected]> Co-authored-by: hemildesai <[email protected]> Co-authored-by: oyilmaz-nvidia <[email protected]> * [Docs] Fix doc warnings, focus on feature and multimodal sections (#10171) * various simple docs source fixes Signed-off-by: Elena Rastorgueva <[email protected]> * fix docstrings and typing with forward reference Signed-off-by: Elena Rastorgueva <[email protected]> * Apply isort and black reformatting Signed-off-by: erastorgueva-nv <[email protected]> * fix typing forward reference for PromptedAudioToTextLhotseDataset Signed-off-by: Elena Rastorgueva <[email protected]> * fix feature warnings Signed-off-by: yaoyu-33 <[email protected]> * Try fix some model part errors Signed-off-by: yaoyu-33 <[email protected]> * try add requirements Signed-off-by: yaoyu-33 <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * try add requirements Signed-off-by: yaoyu-33 <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fix indent in docstring Signed-off-by: yaoyu-33 <[email protected]> * Apply isort and black reformatting Signed-off-by: yaoyu-33 <[email protected]> * update Signed-off-by: yaoyu-33 <[email protected]> * handle duplicate issue Signed-off-by: yaoyu-33 <[email protected]> * handle duplicate issue Signed-off-by: yaoyu-33 <[email protected]> * fix imagen cite * fix ratio issues Signed-off-by: yaoyu-33 <[email protected]> * fix Dreambooth Signed-off-by: yaoyu-33 <[email protected]> * Fix activation recomputation Signed-off-by: yaoyu-33 <[email protected]> * fix sequence packing Signed-off-by: yaoyu-33 <[email protected]> * fix asr_language_modeling_and_customization Signed-off-by: yaoyu-33 <[email protected]> * fixes wip Signed-off-by: Huiying Li <[email protected]> --------- Signed-off-by: Elena Rastorgueva <[email protected]> Signed-off-by: erastorgueva-nv <[email protected]> Signed-off-by: yaoyu-33 <[email protected]> Signed-off-by: yaoyu-33 <[email protected]> Signed-off-by: Huiying Li <[email protected]> Signed-off-by: Yu Yao <[email protected]> Co-authored-by: Elena Rastorgueva <[email protected]> Co-authored-by: Elena Rastorgueva <[email protected]> Co-authored-by: erastorgueva-nv <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: yaoyu-33 <[email protected]> Co-authored-by: Ao Tang <[email protected]> Co-authored-by: Huiying Li <[email protected]> * calculate step time batch end-batch end (#10202) * log step time at end Signed-off-by: Malay Nagda <[email protected]> * Apply isort and black reformatting Signed-off-by: malay-nagda <[email protected]> * use nemo logging Signed-off-by: Malay Nagda <[email protected]> * Apply isort and black reformatting Signed-off-by: malay-nagda <[email protected]> * cleanup Signed-off-by: Malay Nagda <[email protected]> * check remove Signed-off-by: Malay Nagda <[email protected]> * delta timing callback Signed-off-by: Malay Nagda <[email protected]> * comment and name change Signed-off-by: Malay Nagda <[email protected]> --------- Signed-off-by: Malay Nagda <[email protected]> Signed-off-by: malay-nagda <[email protected]> Co-authored-by: malay-nagda <[email protected]> * late import prettytable (#10912) Signed-off-by: Maanu Grover <[email protected]> * [🤠]: Howdy folks, let's bump `Dockerfile.ci` to 0d89fc4 ! (#10919) Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> * Warning for missing FP8 checkpoint support for vLLM deployment (#10906) Signed-off-by: Jan Lasek <[email protected]> * Add lhotse fixes for rnnt model training and WER hanging issue with f… (#10821) * Add lhotse fixes for rnnt model training and WER hanging issue with f… (#10787) * Add lhotse fixes for rnnt model training and WER hanging issue with fuse batching Signed-off-by: Nithin Rao Koluguri <nithinraok> * Apply isort and black reformatting Signed-off-by: nithinraok <[email protected]> --------- Signed-off-by: Nithin Rao Koluguri <nithinraok> Signed-off-by: nithinraok <[email protected]> Co-authored-by: Nithin Rao Koluguri <nithinraok> Co-authored-by: nithinraok <[email protected]> * Apply isort and black reformatting Signed-off-by: nithinraok <[email protected]> * Apply isort and black reformatting Signed-off-by: artbataev <[email protected]> --------- Signed-off-by: Nithin Rao Koluguri <nithinraok> Signed-off-by: nithinraok <[email protected]> Signed-off-by: artbataev <[email protected]> Co-authored-by: nithinraok <[email protected]> Co-authored-by: artbataev <[email protected]> * Fix ASR tests (#10794) * Make tests required Signed-off-by: Vladimir Bataev <[email protected]> * Debug torch.load issue Signed-off-by: Vladimir Bataev <[email protected]> * Apply isort and black reformatting Signed-off-by: artbataev <[email protected]> * Run only necessary tests Signed-off-by: Vladimir Bataev <[email protected]> * Try fix loading Signed-off-by: Vladimir Bataev <[email protected]> * Apply isort and black reformatting Signed-off-by: artbataev <[email protected]> * Avoid caching fixture Signed-off-by: Vladimir Bataev <[email protected]> * Try restore model several times Signed-off-by: Vladimir Bataev <[email protected]> * Try customize temporary directory Signed-off-by: Vladimir Bataev <[email protected]> * Apply isort and black reformatting Signed-off-by: artbataev <[email protected]> * Reorder tests Signed-off-by: Vladimir Bataev <[email protected]> * Disable one test Signed-off-by: Vladimir Bataev <[email protected]> * Apply isort and black reformatting Signed-off-by: artbataev <[email protected]> * Avoid xxlarge model Signed-off-by: Vladimir Bataev <[email protected]> * Disable test Signed-off-by: Vladimir Bataev <[email protected]> * Revert changes Signed-off-by: Vladimir Bataev <[email protected]> * Apply isort and black reformatting Signed-off-by: artbataev <[email protected]> * Magic fix Signed-off-by: Vladimir Bataev <[email protected]> * Revert unnecessary changes Signed-off-by: Vladimir Bataev <[email protected]> * Clean up Signed-off-by: Vladimir Bataev <[email protected]> * Disable all jobs except L0 Signed-off-by: Vladimir Bataev <[email protected]> * RNNT alignments - merge with unit tests Signed-off-by: Vladimir Bataev <[email protected]> * Fix CUDA graph frame-looping decoder to handle non-CUDA inputs Signed-off-by: Vladimir Bataev <[email protected]> * Fix config Signed-off-by: Vladimir Bataev <[email protected]> * Log test results Signed-off-by: Vladimir Bataev <[email protected]> * Apply isort and black reformatting Signed-off-by: artbataev <[email protected]> * Use less audio files for tests Signed-off-by: Vladimir Bataev <[email protected]> --------- Signed-off-by: Vladimir Bataev <[email protected]> Signed-off-by: artbataev <[email protected]> Co-authored-by: artbataev <[email protected]> * Integrating mcore export (#10238) * Integrating mcore export * Integrating mcore export * Apply isort and black reformatting Signed-off-by: shanmugamr1992 <[email protected]> * Move trt imports in nemo.collections.llm inside respective functions (#10234) Signed-off-by: Hemil Desai <[email protected]> * Add tests for LazyNeMoIterator and fix case with metadata_only=True and offsets in manifest (#10198) * Add tests for LazyNeMoIterator and fix case with manifest_only=True and offsets in manifest Signed-off-by: Piotr Żelasko <[email protected]> * Address code review Signed-off-by: Piotr Żelasko <[email protected]> * fix tests Signed-off-by: Piotr Żelasko <[email protected]> * fix tests Signed-off-by: Piotr Żelasko <[email protected]> --------- Signed-off-by: Piotr Żelasko <[email protected]> * [NeMo-UX] Fix a serialization bug that prevents users from moving checkpoints (#9939) * perfor serialization using relative paths to allow users to move checkpoints after they're saved Signed-off-by: ashors1 <[email protected]> * Apply isort and black reformatting Signed-off-by: ashors1 <[email protected]> * remove unused import Signed-off-by: ashors1 <[email protected]> * fix artifact load Signed-off-by: ashors1 <[email protected]> * fix path artifact Signed-off-by: ashors1 <[email protected]> * remove unused import Signed-off-by: ashors1 <[email protected]> --------- Signed-off-by: ashors1 <[email protected]> Signed-off-by: ashors1 <[email protected]> Co-authored-by: ashors1 <[email protected]> * Add MemoryProfileCallback (#10166) * Add MemoryProfileCallback Signed-off-by: Shriya Palsamudram <[email protected]> * Apply isort and black reformatting Signed-off-by: ShriyaPalsamudram <[email protected]> * Remove reference cycles, save snapshot on specific ranks Signed-off-by: Shriya Palsamudram <[email protected]> * Remove unnecessary imports Signed-off-by: Shriya Palsamudram <[email protected]> * Apply isort and black reformatting Signed-off-by: ShriyaPalsamudram <[email protected]> * Update docstring Signed-off-by: Shriya Palsamudram <[email protected]> --------- Signed-off-by: Shriya Palsamudram <[email protected]> Signed-off-by: ShriyaPalsamudram <[email protected]> Signed-off-by: Shriya Rishab <[email protected]> Co-authored-by: ShriyaPalsamudram <[email protected]> * Lower bound transformers to support nemotron (#10240) Signed-off-by: Dong Hyuk Chang <[email protected]> Co-authored-by: Dong Hyuk Chang <[email protected]> * [Audio] SSL Pretraining framework for flow-matching model for audio processing (#10052) Flow matching generative model with SSL pretraining framework Signed-off-by: Pin-Jui Ku <[email protected]> Co-authored-by: Kuray107 <[email protected]> * Revert torchrun fix for model import (#10251) Signed-off-by: Alexandros Koumparoulis <[email protected]> * [NeMo-UX[ Move nemotron imports inline (#10255) * Move nemotron transformers + tokenizer imports inline to reduce number of required deps Signed-off-by: Marc Romeyn <[email protected]> * Apply isort and black reformatting Signed-off-by: marcromeyn <[email protected]> --------- Signed-off-by: Marc Romeyn <[email protected]> Signed-off-by: marcromeyn <[email protected]> Co-authored-by: marcromeyn <[email protected]> * Wrap CPU model init with megatron_lazy_init_context (#10219) * Wrap CPU model init with megatron_lazy_init_context Signed-off-by: Alexandros Koumparoulis <[email protected]> * Cleanup checkpoint-dir if saving fails Signed-off-by: Alexandros Koumparoulis <[email protected]> * Apply isort and black reformatting Signed-off-by: akoumpa <[email protected]> --------- Signed-off-by: Alexandros Koumparoulis <[email protected]> Signed-off-by: akoumpa <[email protected]> Co-authored-by: akoumpa <[email protected]> * Bump `Dockerfile.ci` (2024-08-22) (#10227) * [🤠]: Howdy folks, let's bump `Dockerfile.ci` to 124bcff ! Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> * fix bert flags Signed-off-by: Oliver Koenig <[email protected]> --------- Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Signed-off-by: Oliver Koenig <[email protected]> Co-authored-by: pablo-garay <[email protected]> * salm export trtllm (#10245) Signed-off-by: slyne deng <[email protected]> Co-authored-by: slyne deng <[email protected]> * [🤠]: Howdy folks, let's bump `Dockerfile.ci` to ef85bc9 ! (#10250) Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: pablo-garay <[email protected]> * [🤠]: Howdy folks, let's bump `Dockerfile.ci` to 01ca03f ! (#10266) Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Signed-off-by: oliver könig <[email protected]> Co-authored-by: pablo-garay <[email protected]> * Load model in the target export precision by default in PTQ (#10267) * Load model in the target export precision by default Signed-off-by: Jan Lasek <[email protected]> * Enable megatron_amp_O2=true to actually use half-precision Signed-off-by: Jan Lasek <[email protected]> --------- Signed-off-by: Jan Lasek <[email protected]> Signed-off-by: Jan Lasek <[email protected]> * Add WandbPlugin, NsysPlugin and PreemptionPlugin to nemo.lightning.run.plugins (#10223) * Add WandbPlugin, NsysPlugin and PreemptionPlugin to nemo.lightning.run.plugins Signed-off-by: Hemil Desai <[email protected]> * Apply isort and black reformatting Signed-off-by: hemildesai <[email protected]> * Remove duplicate Signed-off-by: Hemil Desai <[email protected]> * Add entity to wandb logger Signed-off-by: Hemil Desai <[email protected]> * Add documentation Signed-off-by: Hemil Desai <[email protected]> * Apply isort and black reformatting Signed-off-by: hemildesai <[email protected]> * Add warning Signed-off-by: Hemil Desai <[email protected]> * Apply isort and black reformatting Signed-off-by: hemildesai <[email protected]> * PR feedback Signed-off-by: Hemil Desai <[email protected]> * Apply isort and black reformatting Signed-off-by: hemildesai <[email protected]> * Add comments Signed-off-by: Hemil Desai <[email protected]> * Apply isort and black reformatting Signed-off-by: hemildesai <[email protected]> --------- Signed-off-by: Hemil Desai <[email protected]> Signed-off-by: hemildesai <[email protected]> Co-authored-by: hemildesai <[email protected]> * [NeMo-UX] Handle absolute logger directories in nemo_logger (#10259) * handle absolute and relative logger directories Signed-off-by: Anna Shors <[email protected]> * merge lines Signed-off-by: ashors1 <[email protected]> --------- Signed-off-by: Anna Shors <[email protected]> Signed-off-by: ashors1 <[email protected]> * Add sdxl notebook (#10139) * Add sdxl notebook Signed-off-by: mingyuanm <[email protected]> * Rename Signed-off-by: mingyuanm <[email protected]> * final Update SDXL notebook Signed-off-by: mingyuanm <[email protected]> --------- Signed-off-by: mingyuanm <[email protected]> * Updating some coments * Apply isort and black reformatting Signed-off-by: shanmugamr1992 <[email protected]> * Updating some coments * Apply isort and black reformatting Signed-off-by: shanmugamr1992 <[email protected]> * Updating some coments * Small change * Apply isort and black reformatting Signed-off-by: shanmugamr1992 <[email protected]> * Apply isort and black reformatting Signed-off-by: shanmugamr1992 <[email protected]> * ADD support for layernorm1p * Apply isort and black reformatting Signed-off-by: shanmugamr1992 <[email protected]> * Update Dockerfile.ci Signed-off-by: Shanmugam Ramasamy <[email protected]> * Update Dockerfile.ci Signed-off-by: Shanmugam Ramasamy <[email protected]> * Update Dockerfile.ci Signed-off-by: Shanmugam Ramasamy <[email protected]> --------- Signed-off-by: shanmugamr1992 <[email protected]> Signed-off-by: Hemil Desai <[email protected]> Signed-off-by: Piotr Żelasko <[email protected]> Signed-off-by: ashors1 <[email protected]> Signed-off-by: ashors1 <[email protected]> Signed-off-by: Shriya Palsamudram <[email protected]> Signed-off-by: ShriyaPalsamudram <[email protected]> Signed-off-by: Shriya Rishab <[email protected]> Signed-off-by: Dong Hyuk Chang <[email protected]> Signed-off-by: Pin-Jui Ku <[email protected]> Signed-off-by: Alexandros Koumparoulis <[email protected]> Signed-off-by: Marc Romeyn <[email protected]> Signed-off-by: marcromeyn <[email protected]> Signed-off-by: akoumpa <[email protected]> Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Signed-off-by: Oliver Koenig <[email protected]> Signed-off-by: slyne deng <[email protected]> Signed-off-by: oliver könig <[email protected]> Signed-off-by: Jan Lasek <[email protected]> Signed-off-by: Jan Lasek <[email protected]> Signed-off-by: hemildesai <[email protected]> Signed-off-by: Anna Shors <[email protected]> Signed-off-by: mingyuanm <[email protected]> Signed-off-by: Shanmugam Ramasamy <[email protected]> Co-authored-by: Shanmugam Ramasamy <[email protected]> Co-authored-by: shanmugamr1992 <[email protected]> Co-authored-by: Hemil Desai <[email protected]> Co-authored-by: Piotr Żelasko <[email protected]> Co-authored-by: Anna Shors <[email protected]> Co-authored-by: ashors1 <[email protected]> Co-authored-by: Shriya Rishab <[email protected]> Co-authored-by: ShriyaPalsamudram <[email protected]> Co-authored-by: Dong Hyuk Chang <[email protected]> Co-authored-by: Dong Hyuk Chang <[email protected]> Co-authored-by: Kuray107 <[email protected]> Co-authored-by: Kuray107 <[email protected]> Co-authored-by: Alexandros Koumparoulis <[email protected]> Co-authored-by: Marc Romeyn <[email protected]> Co-authored-by: marcromeyn <[email protected]> Co-authored-by: akoumpa <[email protected]> Co-authored-by: oliver könig <[email protected]> Co-authored-by: pablo-garay <[email protected]> Co-authored-by: Slyne Deng <[email protected]> Co-authored-by: slyne deng <[email protected]> Co-authored-by: Jan Lasek <[email protected]> Co-authored-by: hemildesai <[email protected]> Co-authored-by: Ming <[email protected]> Co-authored-by: Shanmugam Ramasamy <[email protected]> * Fix artifact saving (#10914) Signed-off-by: Hemil Desai <[email protected]> * Lora improvement (#10918) * pull out freeze model Signed-off-by: Chen Cui <[email protected]> * add wildcard match to lora target modules Signed-off-by: Chen Cui <[email protected]> --------- Signed-off-by: Chen Cui <[email protected]> * Huvu/t5 nemo2.0 peft (#10916) * adding peft test and cicd * add setting mcore model to train in peft.py * adding test for T5 lora * fix follow Chen's fix * restore cicd-main.yml --------- Co-authored-by: Huy Vu2 <[email protected]> * Add tie_word_embeddings=True (#10710) Signed-off-by: Yoshi Suhara <[email protected]> * Use a context-manager when opening files (#10895) * Use a context-manager when opening files Signed-off-by: Alexandros Koumparoulis <[email protected]> * Apply isort and black reformatting Signed-off-by: akoumpa <[email protected]> * Apply isort and black reformatting Signed-off-by: artbataev <[email protected]> --------- Signed-off-by: Alexandros Koumparoulis <[email protected]> Signed-off-by: akoumpa <[email protected]> Signed-off-by: artbataev <[email protected]> Co-authored-by: akoumpa <[email protected]> Co-authored-by: artbataev <[email protected]> * long context performance numbers in doc (#10784) * long context perf Signed-off-by: Youngeun Kwon <[email protected]> * update the long context perf Signed-off-by: Youngeun Kwon <[email protected]> * Akoumparouli/mcore microbatch calculator fix (#10780) * move tests/lightning/{,_}io Signed-off-by: Alexandros Koumparoulis <[email protected]> * add microbatch calculator context manager Signed-off-by: Alexandros Koumparoulis <[email protected]> * use microbatch calculator context manager Signed-off-by: Alexandros Koumparoulis <[email protected]> * add on_load_checkpoint test to ValidateModelRestoration; use ctx manager to reconfigure microbatch calculator; update save/restore path; add cleanup step at the end Signed-off-by: Alexandros Koumparoulis <[email protected]> * remove unused var Signed-off-by: Alexandros Koumparoulis <[email protected]> * fix Signed-off-by: Alexandros Koumparoulis <[email protected]> * Apply isort and black reformatting Signed-off-by: akoumpa <[email protected]> --------- Signed-off-by: Alexandros Koumparoulis <[email protected]> Signed-off-by: akoumpa <[email protected]> Co-authored-by: akoumpa <[email protected]> Signed-off-by: Youngeun Kwon <[email protected]> * remove 8x3b recipes (#10764) * remove 8x3b recipes Signed-off-by: Alexandros Koumparoulis <[email protected]> * remove 8x3b from test_nemo_run Signed-off-by: Alexandros Koumparoulis <[email protected]> * rm from __init__ Signed-off-by: Alexandros Koumparoulis <[email protected]> --------- Signed-off-by: Alexandros Koumparoulis <[email protected]> Signed-off-by: Youngeun Kwon <[email protected]> * change the figure file name Signed-off-by: Youngeun Kwon <[email protected]> * Accommodating the reviewer's comment Signed-off-by: Youngeun Kwon <[email protected]> * update the y-axis title Signed-off-by: Youngeun Kwon <[email protected]> * [🤠]: Howdy folks, let's bump `Dockerfile.ci` to 3f90b98 ! (#10789) Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: pablo-garay <[email protected]> Signed-off-by: Youngeun Kwon <[email protected]> * Add ModelOpt transformer model pruning example for Llama models, default to llama3.1-8b-base (#10294) * Add ModelOpt transformer model pruning example for Llama3 model Signed-off-by: Shengliang Xu <[email protected]> * Apply isort and black reformatting Signed-off-by: shengliangxu <[email protected]> Signed-off-by: Shengliang Xu <[email protected]> * examples code is at wrong dir, move them Signed-off-by: Shengliang Xu <[email protected]> * changes as suggested in comment remove some logging and unused config code, update example model to llama3.1 Signed-off-by: Shengliang Xu <[email protected]> * Add pruning of hidden_size into example Signed-off-by: Shengliang Xu <[email protected]> * Apply isort and black reformatting Signed-off-by: shengliangxu <[email protected]> Signed-off-by: Shengliang Xu <[email protected]> * Update examples/nlp/language_modeling/conf/megatron_gpt_prune.yaml Signed-off-by: Keval Morabia <[email protected]> * Add pruning test to cicd-main.yml Signed-off-by: Keval Morabia <[email protected]> * Update cicd-main.yml Signed-off-by: Keval Morabia <[email protected]> * Update cicd-main.yml Signed-off-by: Keval Morabia <[email protected]> * Update cicd-main.yml Signed-off-by: Keval Morabia <[email protected]> * Update cicd-main.yml Signed-off-by: Keval Morabia <2891698…
- Loading branch information