Description
Reminder
- I have read the README and searched the existing issues.
System Info
full+reward模式,Qwen1.5-0.5B-Chat微调训练时,如果不添加--save_safetensors会报错:
RuntimeError: Some tensors share memory, this will lead to duplicate memory on disk and potential differences when loading them again: [{'model.embed_tokens.weight', 'lm_head.weight'}].
添加--save_safetensors后虽然不再报错,但保存模型时src/llamafactory/train/callbacks.py的函数fix_valuehead_checkpoint内os.remove(path_to_checkpoint)会删除pytorch_model.bin,导致保存的模型无法使用,请问该如何解决?谢谢.
Reproduction
llamafactory-cli train --stage rm --do_train True --model_name_or_path models/Qwen1.5-0.5B-Chat --preprocessing_num_workers 16 --finetuning_type full --quantization_method bitsandbytes --template qwen --flash_attn auto --dataset_dir data --dataset dpo_en_demo --cutoff_len 256 --learning_rate 0.0002 --num_train_epochs 3.0 --max_samples 500 --per_device_train_batch_size 2 --gradient_accumulation_steps 4 --lr_scheduler_type cosine --max_grad_norm 1.0 --logging_steps 5 --save_steps 100 --warmup_steps 0 --optim adamw_torch --packing False --report_to none --output_dir saves/Qwen1.5-0.5B-Chat/full_rm --bf16 True --plot_loss True --ddp_timeout 180000000 --include_num_input_tokens_seen True --save_safetensors False
Expected behavior
No response
Others
No response
Activity