forked from pytorch/FBGEMM
-
Notifications
You must be signed in to change notification settings - Fork 0
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Refactor stacked version of FP8 Grouped Gemm for reduced overhead (py…
…torch#3699) Summary: Pull Request resolved: pytorch#3699 X-link: facebookresearch/FBGEMM#780 Currently, the stacked version of FP8 grouped gemm accepts lists of tensor inputs and produces a single tensor output. This reduces quite a bit of overhead when cuda graphs are used, but still requires splitting input tensors in prefill which can be costly. This diff updates the input types of stacked grouped gemm to support single tensors. Notably, since M varies across group and we do no padding, this change requires that we provide a new input tensor called `M_offsets` that indicates the row that each group begins at within in the first input. We create M_offsets by taking the cumulative sum of M for each group, which we may be able to further optimize. This diff also includes a long overdue refactor of grouped gemm setup for nvidia such that we only launch a single kernel rather than one per group. This should reduce overhead by quite a bit in some cases. Differential Revision: D69544396
- Loading branch information
1 parent
610ea2e
commit f5c437a
Showing
73 changed files
with
459 additions
and
348 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.