Skip to content

Is flash_attn mandatory for training models like InternLM2? #4398

Closed
@gaoyang07

Description

Reminder

  • I have read the README and searched the existing issues.

System Info

Question from InternLM/InternLM#747

Reproduction

model

model_name_or_path: internlm/internlm2-chat-7b

method

stage: sft
do_train: true
finetuning_type: lora
lora_target: all

dataset

dataset: data
template: intern2
cutoff_len: 1024
max_samples: 1000
overwrite_cache: true
preprocessing_num_workers: 16

output

output_dir: saves/internlm2-chat-7b/lora/sft
logging_steps: 10
save_steps: 500
plot_loss: true
overwrite_output_dir: true

train

per_device_train_batch_size: 1
gradient_accumulation_steps: 8
learning_rate: 1.0e-4
num_train_epochs: 3.0
lr_scheduler_type: cosine
warmup_ratio: 0.1
fp16: true

eval
val_size: 0.1
per_device_eval_batch_size: 1
evaluation_strategy: steps
eval_steps: 500

Expected behavior

No response

Others

No response

Activity

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Metadata

Assignees

No one assigned

    Labels

    solvedThis problem has been already solved

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions