Description
Reminder
- I have read the README and searched the existing issues.
System Info
OS:wsl2
cuda-12.3
最新llamafactory,docker compose
Reproduction
命令行:
llamafactory-cli train
--stage sft
--do_train True
--model_name_or_path /home/xx/.cache/modelscope/hub/ZhipuAI/glm-4-9b-chat/
--preprocessing_num_workers 16
--finetuning_type lora
--template glm4
--dataset_dir data
--dataset test
--cutoff_len 1024
--learning_rate 5e-05
--num_train_epochs 3.0
--max_samples 100000
--per_device_train_batch_size 1
--gradient_accumulation_steps 8
--lr_scheduler_type cosine
--max_grad_norm 1.0
--logging_steps 5
--save_steps 50
--warmup_steps 0
--optim adamw_torch
--packing False
--report_to none
--output_dir saves/GLM-4-9B-Chat/lora/train_2024-06-27-13-02-26
--fp16 True
--plot_loss True
--ddp_timeout 180000000
--include_num_input_tokens_seen True
--lora_rank 8
--lora_alpha 16
--lora_dropout 0
--lora_target all
命令行加或不加 --flash_attn auto
以及使用 llamafactory-cli train examples/train_lora/llama3_lora_sft.yaml
.yaml:
model
model_name_or_path: modles/ZhipuAI/glm-4-9b-chat/
method
stage: sft
do_train: true
finetuning_type: lora
lora_target: all
dataset
dataset: test
template: glm4
cutoff_len: 1024
max_samples: 1000
overwrite_cache: true
preprocessing_num_workers: 16
output
output_dir: saves/GLM-4-9B-Chat/lora/train_2024-06-27-13-02-26
logging_steps: 10
save_steps: 500
plot_loss: true
overwrite_output_dir: true
train
per_device_train_batch_size: 1
gradient_accumulation_steps: 8
learning_rate: 5.0e-5
num_train_epochs: 3.0
lr_scheduler_type: cosine
warmup_ratio: 0.1
fp16: true
ddp_timeout: 180000000
eval
val_size: 0.1
per_device_eval_batch_size: 1
eval_strategy: steps
eval_steps: 500
都报错如下
##################################
Traceback (most recent call last):
File "/usr/local/bin/llamafactory-cli", line 8, in
sys.exit(main())
File "/app/src/llamafactory/cli.py", line 111, in main
run_exp()
File "/app/src/llamafactory/train/tuner.py", line 50, in run_exp
run_sft(model_args, data_args, training_args, finetuning_args, generating_args, callbacks)
File "/app/src/llamafactory/train/sft/workflow.py", line 49, in run_sft
model = load_model(tokenizer, model_args, finetuning_args, training_args.do_train)
File "/app/src/llamafactory/model/loader.py", line 152, in load_model
model = AutoModelForCausalLM.from_pretrained(**init_kwargs)
File "/usr/local/lib/python3.10/dist-packages/transformers/models/auto/auto_factory.py", line 550, in from_pretrained
model_class = get_class_from_dynamic_module(
File "/usr/local/lib/python3.10/dist-packages/transformers/dynamic_module_utils.py", line 501, in get_class_from_dynamic_module
final_module = get_cached_module_file(
File "/usr/local/lib/python3.10/dist-packages/transformers/dynamic_module_utils.py", line 326, in get_cached_module_file
modules_needed = check_imports(resolved_module_file)
File "/usr/local/lib/python3.10/dist-packages/transformers/dynamic_module_utils.py", line 181, in check_imports
raise ImportError(
ImportError: This modeling file requires the following packages that were not found in your environment: flash_attn. Run pip install flash_attn
#####################################################################
安装flash_attn后报错
Traceback (most recent call last):
File "/usr/local/bin/llamafactory-cli", line 5, in
from llamafactory.cli import main
File "/app/src/llamafactory/init.py", line 17, in
from .cli import VERSION
File "/app/src/llamafactory/cli.py", line 21, in
from . import launcher
File "/app/src/llamafactory/launcher.py", line 15, in
from llamafactory.train.tuner import run_exp
File "/app/src/llamafactory/train/tuner.py", line 27, in
from ..model import load_model, load_tokenizer
File "/app/src/llamafactory/model/init.py", line 15, in
from .loader import load_config, load_model, load_tokenizer
File "/app/src/llamafactory/model/loader.py", line 28, in
from .patcher import patch_config, patch_model, patch_tokenizer, patch_valuehead_model
File "/app/src/llamafactory/model/patcher.py", line 30, in
from .model_utils.longlora import configure_longlora
File "/app/src/llamafactory/model/model_utils/longlora.py", line 25, in
from transformers.models.llama.modeling_llama import (
File "/usr/local/lib/python3.10/dist-packages/transformers/models/llama/modeling_llama.py", line 54, in
from flash_attn import flash_attn_func, flash_attn_varlen_func
File "/usr/local/lib/python3.10/dist-packages/flash_attn/init.py", line 3, in
from flash_attn.flash_attn_interface import (
File "/usr/local/lib/python3.10/dist-packages/flash_attn/flash_attn_interface.py", line 10, in
import flash_attn_2_cuda as flash_attn_cuda
ImportError: /usr/local/lib/python3.10/dist-packages/flash_attn_2_cuda.cpython-310-x86_64-linux-gnu.so: undefined symbol: _ZN3c104cuda14ExchangeDeviceEa。
Expected behavior
No response
Others
No response
Activity