Description
Reminder
- I have read the README and searched the existing issues.
Reproduction
训练脚本:
deepspeed --include localhost:0,1,2,3,4,5,6,7 src/train.py
--model_name_or_path $model_path
--stage kto
--do_train
--preprocessing_num_workers 16
--dataset $dataset
--finetuning_type lora
--output_dir $output_dir
--per_device_train_batch_size $batch_num
--per_device_eval_batch_size $batch_num
--gradient_accumulation_steps 1
--learning_rate $learning_rate
--weight_decay 0.1
--adam_beta2 0.95
--warmup_ratio 0.01
--lr_scheduler_type "cosine"
--logging_steps 10
--save_steps 2000
--num_train_epochs 15
--plot_loss
--fp16
--cutoff_len 1024
--lora_target $lora_target
--template $template
--overwrite_output_dir
--deepspeed s3_config.json
--save_only_model
--use_dora
--eval_steps 2000
--evaluation_strategy steps
--val_size 0.02
--lora_r 64
--lora_alpha 16
--lora_dropout 0.05 \
然后报错信息是:
File "/raid_0609/wxp/tmp/LLaMA-Factory-main/src/train.py", line 14, in
main()
File "/raid_0609/wxp/tmp/LLaMA-Factory-main/src/train.py", line 5, in main
run_exp()
File "/raid_0609/wxp/tmp/LLaMA-Factory-main/src/llamafactory/train/tuner.py", line 42, in run_exp
run_kto(model_args, data_args, training_args, finetuning_args, callbacks)
File "/raid_0609/wxp/tmp/LLaMA-Factory-main/src/llamafactory/train/kto/workflow.py", line 59, in run_kto
train_result = trainer.train(resume_from_checkpoint=training_args.resume_from_checkpoint)
File "/raid/miniforge3/envs/llama-factory/lib/python3.10/site-packages/transformers/trainer.py", line 1837, in train
return inner_training_loop(
File "/raid/miniforge3/envs/llama-factory/lib/python3.10/site-packages/transformers/trainer.py", line 2181, in _inner_training_loop
tr_loss_step = self.training_step(model, inputs)
File "/raid/miniforge3/envs/llama-factory/lib/python3.10/site-packages/transformers/trainer.py", line 3116, in training_step
loss = self.compute_loss(model, inputs)
File "/raid/miniforge3/envs/llama-factory/lib/python3.10/site-packages/trl/trainer/kto_trainer.py", line 1028, in compute_loss
loss, metrics = self.get_batch_loss_metrics(model, inputs, train_eval="train")
TypeError: CustomKTOTrainer.get_batch_loss_metrics() got an unexpected keyword argument 'train_eval'
训练报错了,这是什么原因呢
Expected behavior
No response
System Info
No response
Others
No response
Activity