Skip to content

使用解决了多卡gradient accumulation严重BUG的最新transformer库(以及对应的trl库),DPO训练的时候LOSS变为之前的好几倍 #5747

Closed
@JianbangZ

Description

Reminder

  • I have read the README and searched the existing issues.

System Info

8XH100

Reproduction

更新到master分支的最新的transformer & trl库,DPO训练LOSS从之前的1.0->0.3 变为9->3
详情见huggingface/transformers#34191

Expected behavior

No response

Others

No response

Activity

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Metadata

Assignees

No one assigned

    Labels

    solvedThis problem has been already solved

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions