Skip to content

KTO训练报错 #3847

Closed
Closed
@wxp1127

Description

Reminder

  • I have read the README and searched the existing issues.

Reproduction

训练脚本:
deepspeed --include localhost:0,1,2,3,4,5,6,7 src/train.py
--model_name_or_path $model_path
--stage kto
--do_train
--preprocessing_num_workers 16
--dataset $dataset
--finetuning_type lora
--output_dir $output_dir
--per_device_train_batch_size $batch_num
--per_device_eval_batch_size $batch_num
--gradient_accumulation_steps 1
--learning_rate $learning_rate
--weight_decay 0.1
--adam_beta2 0.95
--warmup_ratio 0.01
--lr_scheduler_type "cosine"
--logging_steps 10
--save_steps 2000
--num_train_epochs 15
--plot_loss
--fp16
--cutoff_len 1024
--lora_target $lora_target
--template $template
--overwrite_output_dir
--deepspeed s3_config.json
--save_only_model
--use_dora
--eval_steps 2000
--evaluation_strategy steps
--val_size 0.02
--lora_r 64
--lora_alpha 16
--lora_dropout 0.05 \

然后报错信息是:
File "/raid_0609/wxp/tmp/LLaMA-Factory-main/src/train.py", line 14, in
main()
File "/raid_0609/wxp/tmp/LLaMA-Factory-main/src/train.py", line 5, in main
run_exp()
File "/raid_0609/wxp/tmp/LLaMA-Factory-main/src/llamafactory/train/tuner.py", line 42, in run_exp
run_kto(model_args, data_args, training_args, finetuning_args, callbacks)
File "/raid_0609/wxp/tmp/LLaMA-Factory-main/src/llamafactory/train/kto/workflow.py", line 59, in run_kto
train_result = trainer.train(resume_from_checkpoint=training_args.resume_from_checkpoint)
File "/raid/miniforge3/envs/llama-factory/lib/python3.10/site-packages/transformers/trainer.py", line 1837, in train
return inner_training_loop(
File "/raid/miniforge3/envs/llama-factory/lib/python3.10/site-packages/transformers/trainer.py", line 2181, in _inner_training_loop
tr_loss_step = self.training_step(model, inputs)
File "/raid/miniforge3/envs/llama-factory/lib/python3.10/site-packages/transformers/trainer.py", line 3116, in training_step
loss = self.compute_loss(model, inputs)
File "/raid/miniforge3/envs/llama-factory/lib/python3.10/site-packages/trl/trainer/kto_trainer.py", line 1028, in compute_loss
loss, metrics = self.get_batch_loss_metrics(model, inputs, train_eval="train")
TypeError: CustomKTOTrainer.get_batch_loss_metrics() got an unexpected keyword argument 'train_eval'

训练报错了,这是什么原因呢

Expected behavior

No response

System Info

No response

Others

No response

Activity

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Metadata

Assignees

No one assigned

    Labels

    solvedThis problem has been already solved

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions