Skip to content

Fix ReorderShardedAxis and MakeReshardingContiguous for DID loop split. #3900

@wujingyue

Description

They currently assume DID is only in logical and will break when we switch to loop split.

The solution I have in mind and am open to discussion is:

  1. Change ReorderShardedAxis to move DID to the front in loop. We may even do this universally because many schedulers bear that assumption, but it can come separately.
  2. Change MakeReshardingContiguous to set allocation around each resharding expression (which has been decomposed and therefore can be lowered to communication) to be the same as loop and be contiguous.
  3. Change existing Communication IRs to respect the allocation order. I don't know exactly what needs to be fixed, but I think
    output_tensors[0].push_back(output_tensor.slice(0, j, j + 1));
    should be fixed to not assume the scattered axis is always 0.

That said, even without DID loop split, I think this is a good cleanup for ReorderShardedAxis. For example, it wouldn't need to insert Set.Permute, making fusion IR easier to follow.

For #2563

Activity

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Metadata

Assignees

No one assigned

    Labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions