Skip to content

Merging a lora into a model results in an error thrown by safetensors package #3238

Closed
@jim-plus

Description

Reminder

  • I have read the README and searched the existing issues.

Reproduction

python src/export_model.py --model_name_or_path "basemodel1" --adapter_name_or_path "checkpoint1" --template default --export_dir "export1" --export_size 2

[INFO|modeling_utils.py:1417] 2024-04-11 22:12:34,190 >> Instantiating MistralForCausalLM model under default dtype torch.bfloat16.
[INFO|configuration_utils.py:928] 2024-04-11 22:12:34,190 >> Generate config GenerationConfig {
  "bos_token_id": 1,
  "eos_token_id": 2
}

Loading checkpoint shards: 100%|█████████████████| 3/3 [00:18<00:00,  6.04s/it]
[INFO|modeling_utils.py:4024] 2024-04-11 22:12:53,408 >> All model checkpoint weights were used when initializing MistralForCausalLM.
...
[INFO|modeling_utils.py:3573] 2024-04-11 22:12:53,476 >> Generation config file not found, using a generation config created from the model config.
WARNING:root:Some parameters are on the meta device device because they were offloaded to the cpu.
04/11/2024 22:12:54 - INFO - llmtuner.model.adapter - Fine-tuning method: LoRA
INFO:llmtuner.model.adapter:Fine-tuning method: LoRA
WARNING:root:Some parameters are on the meta device device because they were offloaded to the cpu.
04/11/2024 22:13:03 - INFO - llmtuner.model.adapter - Merged 1 adapter(s).
INFO:llmtuner.model.adapter:Merged 1 adapter(s).
04/11/2024 22:13:03 - INFO - llmtuner.model.adapter - Loaded adapter(s): checkpoint1
INFO:llmtuner.model.adapter:Loaded adapter(s): checkpoint1
04/11/2024 22:13:03 - INFO - llmtuner.model.loader - all params: 7241732096
INFO:llmtuner.model.loader:all params: 7241732096
[INFO|configuration_utils.py:697] 2024-04-11 22:13:03,367 >> Configuration saved in export1\generation_config.json
[WARNING|logging.py:329] 2024-04-11 22:13:03,372 >> Removed shared tensor {'model.layers.31.self_attn.v_proj.weight', 'model.layers.26.input_layernorm.weight', 'model.layers.29.self_attn.v_proj.weight', 'model.layers.26.self_attn.q_proj.weight', 'model.layers.31.mlp.down_proj.weight', 'model.layers.26.post_attention_layernorm.weight', 'model.layers.28.input_layernorm.weight', 'model.layers.2.self_attn.o_proj.weight', 'model.layers.6.self_attn.o_proj.weight', 'model.layers.30.self_attn.q_proj.weight', 'model.layers.1.post_attention_layernorm.weight', 'model.layers.17.mlp.down_proj.weight', 'model.layers.7.post_attention_layernorm.weight', 'model.layers.29.mlp.down_proj.weight', 'model.layers.26.mlp.gate_proj.weight', 'model.norm.weight', 'model.layers.28.mlp.down_proj.weight', 'model.layers.4.self_attn.k_proj.weight', 'model.layers.16.self_attn.o_proj.weight', 'model.layers.11.mlp.down_proj.weight', 'model.layers.13.mlp.up_proj.weight', 'model.layers.4.mlp.up_proj.weight', 'model.layers.28.self_attn.k_proj.weight', 'model.layers.2.input_layernorm.weight', 'model.layers.19.mlp.down_proj.weight', 'model.layers.6.post_attention_layernorm.weight', 'model.layers.20.self_attn.k_proj.weight', 'model.layers.31.post_attention_layernorm.weight', 'model.layers.7.self_attn.q_proj.weight', 'model.layers.3.self_attn.k_proj.weight', 'model.layers.30.self_attn.v_proj.weight', 'model.layers.29.self_attn.k_proj.weight', 'model.layers.19.self_attn.k_proj.weight', 'model.layers.10.post_attention_layernorm.weight', 'model.layers.27.self_attn.o_proj.weight', 'model.layers.6.mlp.gate_proj.weight', 'model.layers.6.self_attn.q_proj.weight', 'model.layers.15.mlp.down_proj.weight', 'model.layers.19.mlp.up_proj.weight', 'model.layers.2.self_attn.v_proj.weight', 'model.layers.26.mlp.down_proj.weight', 'model.layers.25.post_attention_layernorm.weight', 'model.layers.23.self_attn.o_proj.weight', 'model.layers.12.self_attn.o_proj.weight', 'model.layers.28.self_attn.o_proj.weight', 'model.layers.25.mlp.gate_proj.weight', 'model.layers.3.mlp.down_proj.weight', 'model.layers.4.mlp.gate_proj.weight', 'model.layers.30.mlp.gate_proj.weight', 'model.layers.21.self_attn.v_proj.weight', 'model.layers.9.self_attn.q_proj.weight', 'model.layers.16.self_attn.k_proj.weight', 'model.layers.22.mlp.gate_proj.weight', 'model.layers.17.self_attn.o_proj.weight', 'model.layers.2.self_attn.k_proj.weight', 'model.layers.15.input_layernorm.weight', 'model.layers.22.self_attn.q_proj.weight', 'model.layers.12.self_attn.k_proj.weight', 'model.layers.25.self_attn.v_proj.weight', 'model.layers.8.input_layernorm.weight', 'model.layers.10.mlp.gate_proj.weight', 'model.layers.25.self_attn.q_proj.weight', 'model.layers.3.self_attn.q_proj.weight', 'model.layers.28.mlp.up_proj.weight', 'model.layers.23.self_attn.k_proj.weight', 'model.layers.5.self_attn.k_proj.weight', 'model.layers.8.self_attn.v_proj.weight', 'model.layers.23.post_attention_layernorm.weight', 'model.layers.17.self_attn.v_proj.weight', 'model.layers.9.self_attn.k_proj.weight', 'model.layers.21.mlp.up_proj.weight', 'model.layers.27.self_attn.v_proj.weight', 'model.layers.24.mlp.down_proj.weight', 'model.layers.8.mlp.up_proj.weight', 'model.layers.15.mlp.gate_proj.weight', 'model.layers.18.self_attn.v_proj.weight', 'model.layers.14.mlp.up_proj.weight', 'model.layers.23.mlp.up_proj.weight', 'model.layers.18.self_attn.k_proj.weight', 'model.layers.5.mlp.up_proj.weight', 'model.layers.14.self_attn.o_proj.weight', 'model.layers.10.self_attn.o_proj.weight', 'model.layers.21.self_attn.q_proj.weight', 'model.layers.23.mlp.gate_proj.weight', 'model.layers.10.input_layernorm.weight', 'model.layers.20.self_attn.v_proj.weight', 'model.layers.13.mlp.gate_proj.weight', 'model.layers.10.self_attn.k_proj.weight', 'model.layers.12.input_layernorm.weight', 'model.layers.25.input_layernorm.weight', 'model.layers.16.input_layernorm.weight', 'model.layers.3.input_layernorm.weight', 'model.layers.26.mlp.up_proj.weight', 'model.layers.27.post_attention_layernorm.weight', 'model.layers.21.post_attention_layernorm.weight', 'model.layers.12.mlp.down_proj.weight', 'model.layers.25.mlp.up_proj.weight', 'model.layers.9.mlp.down_proj.weight', 'model.layers.4.post_attention_layernorm.weight', 'model.layers.15.self_attn.o_proj.weight', 'model.layers.1.self_attn.v_proj.weight', 'model.layers.21.self_attn.o_proj.weight', 'model.layers.12.mlp.gate_proj.weight', 'model.layers.17.mlp.up_proj.weight', 'model.layers.19.mlp.gate_proj.weight', 'model.layers.14.post_attention_layernorm.weight', 'model.layers.7.input_layernorm.weight', 'model.layers.25.self_attn.o_proj.weight', 'model.layers.17.self_attn.q_proj.weight', 'model.layers.18.post_attention_layernorm.weight', 'model.layers.9.self_attn.v_proj.weight', 'model.layers.2.self_attn.q_proj.weight', 'model.layers.23.self_attn.q_proj.weight', 'model.layers.10.self_attn.v_proj.weight', 'model.layers.14.self_attn.v_proj.weight', 'model.layers.24.mlp.up_proj.weight', 'model.layers.5.post_attention_layernorm.weight', 'model.layers.2.mlp.gate_proj.weight', 'model.layers.18.self_attn.q_proj.weight', 'model.layers.10.self_attn.q_proj.weight', 'model.layers.20.self_attn.o_proj.weight', 'model.layers.25.mlp.down_proj.weight', 'model.layers.21.input_layernorm.weight', 'model.layers.13.mlp.down_proj.weight', 'model.layers.15.post_attention_layernorm.weight', 'model.layers.22.self_attn.o_proj.weight', 'model.layers.26.self_attn.k_proj.weight', 'model.layers.12.self_attn.v_proj.weight', 'model.layers.1.self_attn.o_proj.weight', 'model.layers.15.self_attn.q_proj.weight', 'model.layers.11.mlp.up_proj.weight', 'model.layers.30.mlp.up_proj.weight', 'model.layers.26.self_attn.o_proj.weight', 'model.layers.28.self_attn.q_proj.weight', 'model.layers.12.self_attn.q_proj.weight', 'model.layers.11.input_layernorm.weight', 'model.layers.8.self_attn.q_proj.weight', 'model.layers.18.input_layernorm.weight', 'model.layers.31.mlp.up_proj.weight', 'model.layers.16.self_attn.v_proj.weight', 'model.layers.9.mlp.up_proj.weight', 'model.layers.9.input_layernorm.weight', 'model.layers.31.mlp.gate_proj.weight', 'model.layers.12.post_attention_layernorm.weight', 'model.layers.17.mlp.gate_proj.weight', 'model.layers.23.mlp.down_proj.weight', 'model.layers.8.mlp.down_proj.weight', 'model.layers.30.self_attn.o_proj.weight', 'model.layers.18.mlp.up_proj.weight', 'model.layers.28.self_attn.v_proj.weight', 'model.layers.22.mlp.down_proj.weight', 'model.layers.17.self_attn.k_proj.weight', 'model.layers.27.mlp.down_proj.weight', 'model.layers.19.self_attn.o_proj.weight', 'model.layers.14.mlp.gate_proj.weight', 'model.layers.20.mlp.up_proj.weight', 'model.layers.4.self_attn.v_proj.weight', 'model.layers.20.input_layernorm.weight', 'model.layers.15.mlp.up_proj.weight', 'model.layers.1.mlp.up_proj.weight', 'model.layers.24.post_attention_layernorm.weight', 'model.layers.10.mlp.up_proj.weight', 'model.layers.13.post_attention_layernorm.weight', 'model.layers.19.post_attention_layernorm.weight', 'model.layers.18.mlp.gate_proj.weight', 'model.layers.15.self_attn.v_proj.weight', 'model.layers.11.self_attn.o_proj.weight', 'model.layers.14.self_attn.q_proj.weight', 'model.layers.22.self_attn.v_proj.weight', 'model.layers.28.mlp.gate_proj.weight', 'model.layers.8.mlp.gate_proj.weight', 'model.layers.21.mlp.down_proj.weight', 'model.layers.12.mlp.up_proj.weight', 'model.layers.27.mlp.up_proj.weight', 'model.layers.7.self_attn.v_proj.weight', 'model.layers.18.self_attn.o_proj.weight', 'model.layers.19.self_attn.q_proj.weight', 'model.layers.20.self_attn.q_proj.weight', 'model.layers.29.mlp.gate_proj.weight', 'model.layers.24.self_attn.o_proj.weight', 'model.layers.3.mlp.gate_proj.weight', 'model.layers.7.self_attn.o_proj.weight', 'model.layers.27.self_attn.q_proj.weight', 'model.layers.2.mlp.up_proj.weight', 'model.layers.7.mlp.gate_proj.weight', 'model.layers.6.mlp.up_proj.weight', 'model.layers.4.self_attn.o_proj.weight', 'model.layers.11.mlp.gate_proj.weight', 'model.layers.16.mlp.down_proj.weight', 'model.layers.20.mlp.gate_proj.weight', 'model.layers.20.mlp.down_proj.weight', 'model.layers.13.self_attn.q_proj.weight', 'model.layers.28.post_attention_layernorm.weight', 'model.layers.22.input_layernorm.weight', 'model.layers.29.input_layernorm.weight', 'model.layers.13.self_attn.v_proj.weight', 'model.layers.7.mlp.down_proj.weight', 'model.layers.29.mlp.up_proj.weight', 'model.layers.14.input_layernorm.weight', 'model.layers.29.self_attn.o_proj.weight', 'model.layers.30.self_attn.k_proj.weight', 'model.layers.3.mlp.up_proj.weight', 'model.layers.3.self_attn.v_proj.weight', 'model.layers.27.self_attn.k_proj.weight', 'model.layers.6.input_layernorm.weight', 'model.layers.4.self_attn.q_proj.weight', 'model.layers.8.self_attn.k_proj.weight', 'model.layers.24.self_attn.q_proj.weight', 'model.layers.19.input_layernorm.weight', 'model.layers.14.self_attn.k_proj.weight', 'model.layers.1.mlp.down_proj.weight', 'model.layers.2.post_attention_layernorm.weight', 'model.layers.18.mlp.down_proj.weight', 'model.layers.31.self_attn.k_proj.weight', 'model.layers.6.self_attn.v_proj.weight', 'model.layers.11.self_attn.k_proj.weight', 'model.layers.22.mlp.up_proj.weight', 'model.layers.7.self_attn.k_proj.weight', 'model.layers.23.input_layernorm.weight', 'model.layers.24.self_attn.k_proj.weight', 'model.layers.5.mlp.gate_proj.weight', 'model.layers.21.self_attn.k_proj.weight', 'model.layers.8.self_attn.o_proj.weight', 'model.layers.11.self_attn.q_proj.weight', 'model.layers.15.self_attn.k_proj.weight', 'model.layers.16.mlp.up_proj.weight', 'model.layers.24.self_attn.v_proj.weight', 'model.layers.30.mlp.down_proj.weight', 'model.layers.3.post_attention_layernorm.weight', 'model.layers.24.input_layernorm.weight', 'model.layers.31.self_attn.o_proj.weight', 'model.layers.5.input_layernorm.weight', 'model.layers.29.post_attention_layernorm.weight', 'model.layers.11.self_attn.v_proj.weight', 'model.layers.4.mlp.down_proj.weight', 'model.layers.30.input_layernorm.weight', 'model.layers.27.mlp.gate_proj.weight', 'model.layers.14.mlp.down_proj.weight', 'model.layers.20.post_attention_layernorm.weight', 'model.layers.29.self_attn.q_proj.weight', 'model.layers.23.self_attn.v_proj.weight', 'model.layers.22.post_attention_layernorm.weight', 'model.layers.9.post_attention_layernorm.weight', 'model.layers.16.post_attention_layernorm.weight', 'model.layers.5.self_attn.v_proj.weight', 'model.layers.2.mlp.down_proj.weight', 'model.layers.5.self_attn.q_proj.weight', 'model.layers.6.mlp.down_proj.weight', 'model.layers.8.post_attention_layernorm.weight', 'model.layers.16.self_attn.q_proj.weight', 'model.layers.19.self_attn.v_proj.weight', 'model.layers.21.mlp.gate_proj.weight', 'model.layers.4.input_layernorm.weight', 'model.layers.17.post_attention_layernorm.weight', 'model.layers.9.self_attn.o_proj.weight', 'model.layers.13.self_attn.o_proj.weight', 'model.layers.26.self_attn.v_proj.weight', 'model.layers.10.mlp.down_proj.weight', 'model.layers.24.mlp.gate_proj.weight', 'model.layers.6.self_attn.k_proj.weight', 'model.layers.25.self_attn.k_proj.weight', 'model.layers.31.input_layernorm.weight', 'model.layers.31.self_attn.q_proj.weight', 'model.layers.11.post_attention_layernorm.weight', 'model.layers.17.input_layernorm.weight', 'model.layers.5.mlp.down_proj.weight', 'model.layers.27.input_layernorm.weight', 'model.layers.30.post_attention_layernorm.weight', 'model.layers.3.self_attn.o_proj.weight', 'model.layers.13.self_attn.k_proj.weight', 'model.layers.16.mlp.gate_proj.weight', 'model.layers.5.self_attn.o_proj.weight', 'model.layers.7.mlp.up_proj.weight', 'model.layers.13.input_layernorm.weight', 'model.layers.22.self_attn.k_proj.weight', 'model.layers.9.mlp.gate_proj.weight'} while saving. This should be OK, but check by verifying that you don't receive any warning while reloading
Traceback (most recent call last):
  File "C:\cygwin64\home\Jim\chat\LLaMA-Factory\src\export_model.py", line 9, in <module>
    main()
  File "C:\cygwin64\home\Jim\chat\LLaMA-Factory\src\export_model.py", line 5, in main
    export_model()
  File "C:\cygwin64\home\Jim\chat\LLaMA-Factory\src\llmtuner\train\tuner.py", line 71, in export_model
    model.save_pretrained(
  File "C:\users\jim\appdata\local\programs\python\python311\Lib\site-packages\transformers\modeling_utils.py", line 2468, in save_pretrained
    safe_save_file(shard, os.path.join(save_directory, shard_file), metadata={"format": "pt"})
  File "C:\users\jim\appdata\local\programs\python\python311\Lib\site-packages\safetensors\torch.py", line 281, in save_file
    serialize_file(_flatten(tensors), filename, metadata=metadata)
                   ^^^^^^^^^^^^^^^^^
  File "C:\users\jim\appdata\local\programs\python\python311\Lib\site-packages\safetensors\torch.py", line 485, in _flatten
    return {
           ^
  File "C:\users\jim\appdata\local\programs\python\python311\Lib\site-packages\safetensors\torch.py", line 489, in <dictcomp>
    "data": _tobytes(v, k),
            ^^^^^^^^^^^^^^
  File "C:\users\jim\appdata\local\programs\python\python311\Lib\site-packages\safetensors\torch.py", line 411, in _tobytes
    tensor = tensor.to("cpu")
             ^^^^^^^^^^^^^^^^
NotImplementedError: Cannot copy out of meta tensor; no data!

Expected behavior

I was hoping the 7B model lora would merge with the base model, as I was able to train a small lora within 16GB VRAM.

The warnings indicate that some tasks were offloaded to the cpu, but safetensors didn't implement it? Not sure how to work around this.

Although the filesystem was under cygwin, I ran the script from Windows command line. I cloned the repo from yesterday or so. I and using current gaming Nvidia drivers, with the option to swap into conventional memory enabled (to enable slow swapping instead of crashing).

System Info

  • transformers version: 4.39.3
  • Platform: Windows-10-10.0.22631-SP0 (I've upgraded to Windows 11, but this was the version during original Python install)
  • Python version: 3.11.5
  • Huggingface_hub version: 0.22.1
  • Safetensors version: 0.4.2
  • Accelerate version: 0.27.2
  • Accelerate config: not found
  • PyTorch version (GPU?): 2.2.1+cu121 (True)
  • Tensorflow version (GPU?): not installed (NA)
  • Flax version (CPU?/GPU?/TPU?): not installed (NA)
  • Jax version: not installed
  • JaxLib version: not installed

Others

No response

Activity

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Metadata

Assignees

No one assigned

    Labels

    solvedThis problem has been already solved

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions