Export & deploy updates (part I) #10941

janekl · 2024-10-18T10:00:43Z

What does this PR do ?

Extra changes just for cleanup purposes related to #10904, isolated here to facilitate review for main functionalities.

Collection: NLP

Changelog

Add specific line by line info of high level changes in this PR.

Usage

You can potentially add a usage example below

# Add a code snippet demonstrating how to use this

GitHub Actions CI

The Jenkins CI system has been replaced by GitHub Actions self-hosted runners.

The GitHub Actions CI will run automatically when the "Run CICD" label is added to the PR.
To re-run CI remove and add the label again.
To run CI on an untrusted fork, a NeMo user with write access must first click "Approve and run".

Before your PR is "Ready for review"

Pre checks:

Make sure you read and followed Contributor guidelines
Did you write any new necessary tests?
Did you add or update any necessary documentation?
Does the PR affect components that are optional to install? (Ex: Numba, Pynini, Apex etc)
- Reviewer: Does the PR have correct import guards for all optional libraries?

PR Type:

New Feature
Bugfix
Documentation

If you haven't finished some of the above items you can still open "Draft" PR.

Who can review?

Anyone in the community is free to review the PR once the checks have passed.
Contributor guidelines contains specific people who can review PRs to various areas.

Additional Information

Related to # (issue)

Signed-off-by: Jan Lasek <[email protected]>

Signed-off-by: janekl <[email protected]>

tests/export/nemo_export.py

janekl · 2024-10-29T09:20:16Z

tests/export/nemo_export.py

+    output_deployed = output_deployed["sentences"]
+    # MegatronLLMDeployable will return the prompt + generated output, so cut off the prompt
+    for i, output in enumerate(output_deployed):
+        output = output[len(prompts[i]) :]


This is indeed an issue: cutting off is not really done here. Fixed in 093aa94

This reverts commit 8499d50. Signed-off-by: Jan Lasek <[email protected]>

Signed-off-by: Jan Lasek <[email protected]>

janekl · 2024-10-28T16:29:44Z

scripts/deploy/nlp/deploy_vllm_triton.py

-        )
-        return exporter
-    except Exception as error:
-        raise RuntimeError("An error has occurred during the model export. Error message: " + str(error))


Suggestion: just propagate the original exception (hence removing try / catch block)

janekl · 2024-10-28T16:30:20Z

tests/export/nemo_export.py

@@ -811,7 +808,7 @@ def str_to_bool(name: str, s: str, optional: bool = False) -> Optional[bool]:
    args.test_cpp_runtime = str_to_bool("test_cpp_runtime", args.test_cpp_runtime)
    args.test_deployment = str_to_bool("test_deployment", args.test_deployment)
    args.functional_test = str_to_bool("functional_test", args.functional_test)
-    args.save_trt_engine = str_to_bool("save_trt_engin", args.save_trt_engine)
+    args.save_engine = str_to_bool("save_engine", args.save_engine)


Renamed as this flag is for both TRT-LLM and vLLM engines.

janekl · 2024-10-28T16:30:43Z

tests/export/nemo_export.py

@@ -497,9 +497,6 @@ def run_existing_checkpoints(
    else:
        use_embedding_sharing = False

-    if trt_llm_export_kwargs is None:
-        trt_llm_export_kwargs = {}


Not needed here

Signed-off-by: Jan Lasek <[email protected]>

Signed-off-by: janekl <[email protected]>

janekl · 2024-10-29T11:28:41Z

tests/export/nemo_export.py

@@ -591,7 +588,7 @@ def run_in_framework_inference(
        output_deployed = output_deployed["sentences"]
        # MegatronLLMDeployable will return the prompt + generated output, so cut off the prompt
        for i, output in enumerate(output_deployed):
-            output = output[len(prompts[i]) :]
+            output_deployed[i, :] = output[0][len(prompts[i]) :]


Explained here #10941 (comment)

github-actions · 2024-10-29T16:37:07Z

[🤖]: Hi @janekl 👋,

We wanted to let you know that a CICD pipeline for this PR just finished successfully

So it might be time to merge this PR or get some approvals

I'm just a bot so I'll leave it you what to do next.

//cc @pablo-garay @ko3n1g

* Update vLLMExporter docstring Signed-off-by: Jan Lasek <[email protected]> * No need to create empty kwargs here Signed-off-by: Jan Lasek <[email protected]> * Use debug from command line Signed-off-by: Jan Lasek <[email protected]> * Param save_engine for both both vLLM and TRT-LLM Signed-off-by: Jan Lasek <[email protected]> * Unused backend param in run_trt_llm_inference Signed-off-by: Jan Lasek <[email protected]> * Reindent files for non-existent checkpoint check Signed-off-by: Jan Lasek <[email protected]> * Docs for lora_checkpoints Signed-off-by: Jan Lasek <[email protected]> * Improve config readability Signed-off-by: Jan Lasek <[email protected]> * Raise error directly in get_vllm_deployable Signed-off-by: Jan Lasek <[email protected]> * Apply isort and black reformatting Signed-off-by: janekl <[email protected]> * Revert "Reindent files for non-existent checkpoint check" This reverts commit 8499d50. Signed-off-by: Jan Lasek <[email protected]> * Cut off prompt for real Signed-off-by: Jan Lasek <[email protected]> * Apply isort and black reformatting Signed-off-by: janekl <[email protected]> --------- Signed-off-by: Jan Lasek <[email protected]> Signed-off-by: janekl <[email protected]> Co-authored-by: janekl <[email protected]> Signed-off-by: Hainan Xu <[email protected]>

* Update vLLMExporter docstring Signed-off-by: Jan Lasek <[email protected]> * No need to create empty kwargs here Signed-off-by: Jan Lasek <[email protected]> * Use debug from command line Signed-off-by: Jan Lasek <[email protected]> * Param save_engine for both both vLLM and TRT-LLM Signed-off-by: Jan Lasek <[email protected]> * Unused backend param in run_trt_llm_inference Signed-off-by: Jan Lasek <[email protected]> * Reindent files for non-existent checkpoint check Signed-off-by: Jan Lasek <[email protected]> * Docs for lora_checkpoints Signed-off-by: Jan Lasek <[email protected]> * Improve config readability Signed-off-by: Jan Lasek <[email protected]> * Raise error directly in get_vllm_deployable Signed-off-by: Jan Lasek <[email protected]> * Apply isort and black reformatting Signed-off-by: janekl <[email protected]> * Revert "Reindent files for non-existent checkpoint check" This reverts commit 8499d50. Signed-off-by: Jan Lasek <[email protected]> * Cut off prompt for real Signed-off-by: Jan Lasek <[email protected]> * Apply isort and black reformatting Signed-off-by: janekl <[email protected]> --------- Signed-off-by: Jan Lasek <[email protected]> Signed-off-by: janekl <[email protected]> Co-authored-by: janekl <[email protected]>

janekl added 9 commits October 18, 2024 11:52

Update vLLMExporter docstring

e1edab1

Signed-off-by: Jan Lasek <[email protected]>

No need to create empty kwargs here

f581024

Signed-off-by: Jan Lasek <[email protected]>

Use debug from command line

a4fc46e

Signed-off-by: Jan Lasek <[email protected]>

Param save_engine for both both vLLM and TRT-LLM

3ef9c3a

Signed-off-by: Jan Lasek <[email protected]>

Unused backend param in run_trt_llm_inference

82e4f60

Signed-off-by: Jan Lasek <[email protected]>

Reindent files for non-existent checkpoint check

8499d50

Signed-off-by: Jan Lasek <[email protected]>

Docs for lora_checkpoints

dcfe806

Signed-off-by: Jan Lasek <[email protected]>

Improve config readability

7552f65

Signed-off-by: Jan Lasek <[email protected]>

Raise error directly in get_vllm_deployable

f56fadf

Signed-off-by: Jan Lasek <[email protected]>

janekl assigned oyilmaz-nvidia Oct 18, 2024

Apply isort and black reformatting

a6bedab

Signed-off-by: janekl <[email protected]>

github-advanced-security bot found potential problems Oct 18, 2024

View reviewed changes

janekl mentioned this pull request Oct 21, 2024

Basic online dynamic FP8 quantization with vLLM #10904

Merged

8 tasks

janekl requested a review from oyilmaz-nvidia October 21, 2024 13:13

janekl unassigned oyilmaz-nvidia Oct 21, 2024

janekl added 2 commits October 28, 2024 17:09

Revert "Reindent files for non-existent checkpoint check"

0170673

This reverts commit 8499d50. Signed-off-by: Jan Lasek <[email protected]>

Merge branch 'main' into jlasek/vllm_extra

bc522cb

Signed-off-by: Jan Lasek <[email protected]>

janekl commented Oct 28, 2024

View reviewed changes

janekl and others added 2 commits October 29, 2024 10:15

Cut off prompt for real

093aa94

Signed-off-by: Jan Lasek <[email protected]>

Apply isort and black reformatting

2aa441d

Signed-off-by: janekl <[email protected]>

janekl added the Run CICD label Oct 29, 2024

janekl commented Oct 29, 2024

View reviewed changes

oyilmaz-nvidia approved these changes Oct 29, 2024

View reviewed changes

oyilmaz-nvidia merged commit a8832b8 into main Oct 29, 2024
156 of 158 checks passed

oyilmaz-nvidia deleted the jlasek/vllm_extra branch October 29, 2024 17:01

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Export & deploy updates (part I) #10941

Export & deploy updates (part I) #10941

janekl commented Oct 18, 2024

janekl Oct 29, 2024

janekl Oct 28, 2024

janekl Oct 28, 2024

janekl Oct 28, 2024

janekl Oct 29, 2024

github-actions bot commented Oct 29, 2024

Export & deploy updates (part I) #10941

Export & deploy updates (part I) #10941

Conversation

janekl commented Oct 18, 2024

What does this PR do ?

Changelog

Usage

GitHub Actions CI

Before your PR is "Ready for review"

Who can review?

Additional Information

janekl Oct 29, 2024

Choose a reason for hiding this comment

janekl Oct 28, 2024

Choose a reason for hiding this comment

janekl Oct 28, 2024

Choose a reason for hiding this comment

janekl Oct 28, 2024

Choose a reason for hiding this comment

janekl Oct 29, 2024

Choose a reason for hiding this comment

github-actions bot commented Oct 29, 2024