Skip to content

singlenode with 2gpus deepspeed zeor2/3 can't log step #3559

Closed
@xxll88

Description

Reminder

  • I have read the README and searched the existing issues.

Reproduction

'--logging_steps 20 ' 时:
File "/home/ubuntu/LLaMA-Factory/src/llmtuner/extras/callbacks.py", line 137, in on_log
current_steps=self.cur_steps,
AttributeError: 'LogCallback' object has no attribute 'cur_steps'

DS_SKIP_CUDA_CHECK=1 deepspeed --num_gpus 2 src/train.py
--deepspeed ds_config.json
--stage sft
--do_train
--model_name_or_path ../Meta-Llama-3-8B-Instruct
--dataset Law-Pair,Law-Triplet
--dataset_dir data
--overwrite_cache False
--template llama3
--finetuning_type full
--output_dir /home/ubuntu/sft-law/llama_factory_law_llama3_8B_full
--overwrite_output_dir
--cutoff_len 1024
--preprocessing_num_workers 16
--per_device_train_batch_size 8
--per_device_eval_batch_size 1
--gradient_accumulation_steps 1
--lr_scheduler_type cosine
--logging_steps 10
--warmup_steps 20
--save_steps 200
--save_total_limit 5
--eval_steps 200
--evaluation_strategy steps
--load_best_model_at_end
--learning_rate 5e-5
--num_train_epochs 3.0
--val_size 0.0001
--plot_loss
--bf16

Expected behavior

No response

System Info

No response

Others

No response

Activity

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Metadata

Assignees

No one assigned

    Labels

    solvedThis problem has been already solved

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions