Closed
Description
Reminder
- I have read the README and searched the existing issues.
Reproduction
python src/export_model.py --model_name_or_path "basemodel1" --adapter_name_or_path "checkpoint1" --template default --export_dir "export1" --export_size 2
[INFO|modeling_utils.py:1417] 2024-04-11 22:12:34,190 >> Instantiating MistralForCausalLM model under default dtype torch.bfloat16.
[INFO|configuration_utils.py:928] 2024-04-11 22:12:34,190 >> Generate config GenerationConfig {
"bos_token_id": 1,
"eos_token_id": 2
}
Loading checkpoint shards: 100%|█████████████████| 3/3 [00:18<00:00, 6.04s/it]
[INFO|modeling_utils.py:4024] 2024-04-11 22:12:53,408 >> All model checkpoint weights were used when initializing MistralForCausalLM.
...
[INFO|modeling_utils.py:3573] 2024-04-11 22:12:53,476 >> Generation config file not found, using a generation config created from the model config.
WARNING:root:Some parameters are on the meta device device because they were offloaded to the cpu.
04/11/2024 22:12:54 - INFO - llmtuner.model.adapter - Fine-tuning method: LoRA
INFO:llmtuner.model.adapter:Fine-tuning method: LoRA
WARNING:root:Some parameters are on the meta device device because they were offloaded to the cpu.
04/11/2024 22:13:03 - INFO - llmtuner.model.adapter - Merged 1 adapter(s).
INFO:llmtuner.model.adapter:Merged 1 adapter(s).
04/11/2024 22:13:03 - INFO - llmtuner.model.adapter - Loaded adapter(s): checkpoint1
INFO:llmtuner.model.adapter:Loaded adapter(s): checkpoint1
04/11/2024 22:13:03 - INFO - llmtuner.model.loader - all params: 7241732096
INFO:llmtuner.model.loader:all params: 7241732096
[INFO|configuration_utils.py:697] 2024-04-11 22:13:03,367 >> Configuration saved in export1\generation_config.json
[WARNING|logging.py:329] 2024-04-11 22:13:03,372 >> Removed shared tensor {'model.layers.31.self_attn.v_proj.weight', 'model.layers.26.input_layernorm.weight', 'model.layers.29.self_attn.v_proj.weight', 'model.layers.26.self_attn.q_proj.weight', 'model.layers.31.mlp.down_proj.weight', 'model.layers.26.post_attention_layernorm.weight', 'model.layers.28.input_layernorm.weight', 'model.layers.2.self_attn.o_proj.weight', 'model.layers.6.self_attn.o_proj.weight', 'model.layers.30.self_attn.q_proj.weight', 'model.layers.1.post_attention_layernorm.weight', 'model.layers.17.mlp.down_proj.weight', 'model.layers.7.post_attention_layernorm.weight', 'model.layers.29.mlp.down_proj.weight', 'model.layers.26.mlp.gate_proj.weight', 'model.norm.weight', 'model.layers.28.mlp.down_proj.weight', 'model.layers.4.self_attn.k_proj.weight', 'model.layers.16.self_attn.o_proj.weight', 'model.layers.11.mlp.down_proj.weight', 'model.layers.13.mlp.up_proj.weight', 'model.layers.4.mlp.up_proj.weight', 'model.layers.28.self_attn.k_proj.weight', 'model.layers.2.input_layernorm.weight', 'model.layers.19.mlp.down_proj.weight', 'model.layers.6.post_attention_layernorm.weight', 'model.layers.20.self_attn.k_proj.weight', 'model.layers.31.post_attention_layernorm.weight', 'model.layers.7.self_attn.q_proj.weight', 'model.layers.3.self_attn.k_proj.weight', 'model.layers.30.self_attn.v_proj.weight', 'model.layers.29.self_attn.k_proj.weight', 'model.layers.19.self_attn.k_proj.weight', 'model.layers.10.post_attention_layernorm.weight', 'model.layers.27.self_attn.o_proj.weight', 'model.layers.6.mlp.gate_proj.weight', 'model.layers.6.self_attn.q_proj.weight', 'model.layers.15.mlp.down_proj.weight', 'model.layers.19.mlp.up_proj.weight', 'model.layers.2.self_attn.v_proj.weight', 'model.layers.26.mlp.down_proj.weight', 'model.layers.25.post_attention_layernorm.weight', 'model.layers.23.self_attn.o_proj.weight', 'model.layers.12.self_attn.o_proj.weight', 'model.layers.28.self_attn.o_proj.weight', 'model.layers.25.mlp.gate_proj.weight', 'model.layers.3.mlp.down_proj.weight', 'model.layers.4.mlp.gate_proj.weight', 'model.layers.30.mlp.gate_proj.weight', 'model.layers.21.self_attn.v_proj.weight', 'model.layers.9.self_attn.q_proj.weight', 'model.layers.16.self_attn.k_proj.weight', 'model.layers.22.mlp.gate_proj.weight', 'model.layers.17.self_attn.o_proj.weight', 'model.layers.2.self_attn.k_proj.weight', 'model.layers.15.input_layernorm.weight', 'model.layers.22.self_attn.q_proj.weight', 'model.layers.12.self_attn.k_proj.weight', 'model.layers.25.self_attn.v_proj.weight', 'model.layers.8.input_layernorm.weight', 'model.layers.10.mlp.gate_proj.weight', 'model.layers.25.self_attn.q_proj.weight', 'model.layers.3.self_attn.q_proj.weight', 'model.layers.28.mlp.up_proj.weight', 'model.layers.23.self_attn.k_proj.weight', 'model.layers.5.self_attn.k_proj.weight', 'model.layers.8.self_attn.v_proj.weight', 'model.layers.23.post_attention_layernorm.weight', 'model.layers.17.self_attn.v_proj.weight', 'model.layers.9.self_attn.k_proj.weight', 'model.layers.21.mlp.up_proj.weight', 'model.layers.27.self_attn.v_proj.weight', 'model.layers.24.mlp.down_proj.weight', 'model.layers.8.mlp.up_proj.weight', 'model.layers.15.mlp.gate_proj.weight', 'model.layers.18.self_attn.v_proj.weight', 'model.layers.14.mlp.up_proj.weight', 'model.layers.23.mlp.up_proj.weight', 'model.layers.18.self_attn.k_proj.weight', 'model.layers.5.mlp.up_proj.weight', 'model.layers.14.self_attn.o_proj.weight', 'model.layers.10.self_attn.o_proj.weight', 'model.layers.21.self_attn.q_proj.weight', 'model.layers.23.mlp.gate_proj.weight', 'model.layers.10.input_layernorm.weight', 'model.layers.20.self_attn.v_proj.weight', 'model.layers.13.mlp.gate_proj.weight', 'model.layers.10.self_attn.k_proj.weight', 'model.layers.12.input_layernorm.weight', 'model.layers.25.input_layernorm.weight', 'model.layers.16.input_layernorm.weight', 'model.layers.3.input_layernorm.weight', 'model.layers.26.mlp.up_proj.weight', 'model.layers.27.post_attention_layernorm.weight', 'model.layers.21.post_attention_layernorm.weight', 'model.layers.12.mlp.down_proj.weight', 'model.layers.25.mlp.up_proj.weight', 'model.layers.9.mlp.down_proj.weight', 'model.layers.4.post_attention_layernorm.weight', 'model.layers.15.self_attn.o_proj.weight', 'model.layers.1.self_attn.v_proj.weight', 'model.layers.21.self_attn.o_proj.weight', 'model.layers.12.mlp.gate_proj.weight', 'model.layers.17.mlp.up_proj.weight', 'model.layers.19.mlp.gate_proj.weight', 'model.layers.14.post_attention_layernorm.weight', 'model.layers.7.input_layernorm.weight', 'model.layers.25.self_attn.o_proj.weight', 'model.layers.17.self_attn.q_proj.weight', 'model.layers.18.post_attention_layernorm.weight', 'model.layers.9.self_attn.v_proj.weight', 'model.layers.2.self_attn.q_proj.weight', 'model.layers.23.self_attn.q_proj.weight', 'model.layers.10.self_attn.v_proj.weight', 'model.layers.14.self_attn.v_proj.weight', 'model.layers.24.mlp.up_proj.weight', 'model.layers.5.post_attention_layernorm.weight', 'model.layers.2.mlp.gate_proj.weight', 'model.layers.18.self_attn.q_proj.weight', 'model.layers.10.self_attn.q_proj.weight', 'model.layers.20.self_attn.o_proj.weight', 'model.layers.25.mlp.down_proj.weight', 'model.layers.21.input_layernorm.weight', 'model.layers.13.mlp.down_proj.weight', 'model.layers.15.post_attention_layernorm.weight', 'model.layers.22.self_attn.o_proj.weight', 'model.layers.26.self_attn.k_proj.weight', 'model.layers.12.self_attn.v_proj.weight', 'model.layers.1.self_attn.o_proj.weight', 'model.layers.15.self_attn.q_proj.weight', 'model.layers.11.mlp.up_proj.weight', 'model.layers.30.mlp.up_proj.weight', 'model.layers.26.self_attn.o_proj.weight', 'model.layers.28.self_attn.q_proj.weight', 'model.layers.12.self_attn.q_proj.weight', 'model.layers.11.input_layernorm.weight', 'model.layers.8.self_attn.q_proj.weight', 'model.layers.18.input_layernorm.weight', 'model.layers.31.mlp.up_proj.weight', 'model.layers.16.self_attn.v_proj.weight', 'model.layers.9.mlp.up_proj.weight', 'model.layers.9.input_layernorm.weight', 'model.layers.31.mlp.gate_proj.weight', 'model.layers.12.post_attention_layernorm.weight', 'model.layers.17.mlp.gate_proj.weight', 'model.layers.23.mlp.down_proj.weight', 'model.layers.8.mlp.down_proj.weight', 'model.layers.30.self_attn.o_proj.weight', 'model.layers.18.mlp.up_proj.weight', 'model.layers.28.self_attn.v_proj.weight', 'model.layers.22.mlp.down_proj.weight', 'model.layers.17.self_attn.k_proj.weight', 'model.layers.27.mlp.down_proj.weight', 'model.layers.19.self_attn.o_proj.weight', 'model.layers.14.mlp.gate_proj.weight', 'model.layers.20.mlp.up_proj.weight', 'model.layers.4.self_attn.v_proj.weight', 'model.layers.20.input_layernorm.weight', 'model.layers.15.mlp.up_proj.weight', 'model.layers.1.mlp.up_proj.weight', 'model.layers.24.post_attention_layernorm.weight', 'model.layers.10.mlp.up_proj.weight', 'model.layers.13.post_attention_layernorm.weight', 'model.layers.19.post_attention_layernorm.weight', 'model.layers.18.mlp.gate_proj.weight', 'model.layers.15.self_attn.v_proj.weight', 'model.layers.11.self_attn.o_proj.weight', 'model.layers.14.self_attn.q_proj.weight', 'model.layers.22.self_attn.v_proj.weight', 'model.layers.28.mlp.gate_proj.weight', 'model.layers.8.mlp.gate_proj.weight', 'model.layers.21.mlp.down_proj.weight', 'model.layers.12.mlp.up_proj.weight', 'model.layers.27.mlp.up_proj.weight', 'model.layers.7.self_attn.v_proj.weight', 'model.layers.18.self_attn.o_proj.weight', 'model.layers.19.self_attn.q_proj.weight', 'model.layers.20.self_attn.q_proj.weight', 'model.layers.29.mlp.gate_proj.weight', 'model.layers.24.self_attn.o_proj.weight', 'model.layers.3.mlp.gate_proj.weight', 'model.layers.7.self_attn.o_proj.weight', 'model.layers.27.self_attn.q_proj.weight', 'model.layers.2.mlp.up_proj.weight', 'model.layers.7.mlp.gate_proj.weight', 'model.layers.6.mlp.up_proj.weight', 'model.layers.4.self_attn.o_proj.weight', 'model.layers.11.mlp.gate_proj.weight', 'model.layers.16.mlp.down_proj.weight', 'model.layers.20.mlp.gate_proj.weight', 'model.layers.20.mlp.down_proj.weight', 'model.layers.13.self_attn.q_proj.weight', 'model.layers.28.post_attention_layernorm.weight', 'model.layers.22.input_layernorm.weight', 'model.layers.29.input_layernorm.weight', 'model.layers.13.self_attn.v_proj.weight', 'model.layers.7.mlp.down_proj.weight', 'model.layers.29.mlp.up_proj.weight', 'model.layers.14.input_layernorm.weight', 'model.layers.29.self_attn.o_proj.weight', 'model.layers.30.self_attn.k_proj.weight', 'model.layers.3.mlp.up_proj.weight', 'model.layers.3.self_attn.v_proj.weight', 'model.layers.27.self_attn.k_proj.weight', 'model.layers.6.input_layernorm.weight', 'model.layers.4.self_attn.q_proj.weight', 'model.layers.8.self_attn.k_proj.weight', 'model.layers.24.self_attn.q_proj.weight', 'model.layers.19.input_layernorm.weight', 'model.layers.14.self_attn.k_proj.weight', 'model.layers.1.mlp.down_proj.weight', 'model.layers.2.post_attention_layernorm.weight', 'model.layers.18.mlp.down_proj.weight', 'model.layers.31.self_attn.k_proj.weight', 'model.layers.6.self_attn.v_proj.weight', 'model.layers.11.self_attn.k_proj.weight', 'model.layers.22.mlp.up_proj.weight', 'model.layers.7.self_attn.k_proj.weight', 'model.layers.23.input_layernorm.weight', 'model.layers.24.self_attn.k_proj.weight', 'model.layers.5.mlp.gate_proj.weight', 'model.layers.21.self_attn.k_proj.weight', 'model.layers.8.self_attn.o_proj.weight', 'model.layers.11.self_attn.q_proj.weight', 'model.layers.15.self_attn.k_proj.weight', 'model.layers.16.mlp.up_proj.weight', 'model.layers.24.self_attn.v_proj.weight', 'model.layers.30.mlp.down_proj.weight', 'model.layers.3.post_attention_layernorm.weight', 'model.layers.24.input_layernorm.weight', 'model.layers.31.self_attn.o_proj.weight', 'model.layers.5.input_layernorm.weight', 'model.layers.29.post_attention_layernorm.weight', 'model.layers.11.self_attn.v_proj.weight', 'model.layers.4.mlp.down_proj.weight', 'model.layers.30.input_layernorm.weight', 'model.layers.27.mlp.gate_proj.weight', 'model.layers.14.mlp.down_proj.weight', 'model.layers.20.post_attention_layernorm.weight', 'model.layers.29.self_attn.q_proj.weight', 'model.layers.23.self_attn.v_proj.weight', 'model.layers.22.post_attention_layernorm.weight', 'model.layers.9.post_attention_layernorm.weight', 'model.layers.16.post_attention_layernorm.weight', 'model.layers.5.self_attn.v_proj.weight', 'model.layers.2.mlp.down_proj.weight', 'model.layers.5.self_attn.q_proj.weight', 'model.layers.6.mlp.down_proj.weight', 'model.layers.8.post_attention_layernorm.weight', 'model.layers.16.self_attn.q_proj.weight', 'model.layers.19.self_attn.v_proj.weight', 'model.layers.21.mlp.gate_proj.weight', 'model.layers.4.input_layernorm.weight', 'model.layers.17.post_attention_layernorm.weight', 'model.layers.9.self_attn.o_proj.weight', 'model.layers.13.self_attn.o_proj.weight', 'model.layers.26.self_attn.v_proj.weight', 'model.layers.10.mlp.down_proj.weight', 'model.layers.24.mlp.gate_proj.weight', 'model.layers.6.self_attn.k_proj.weight', 'model.layers.25.self_attn.k_proj.weight', 'model.layers.31.input_layernorm.weight', 'model.layers.31.self_attn.q_proj.weight', 'model.layers.11.post_attention_layernorm.weight', 'model.layers.17.input_layernorm.weight', 'model.layers.5.mlp.down_proj.weight', 'model.layers.27.input_layernorm.weight', 'model.layers.30.post_attention_layernorm.weight', 'model.layers.3.self_attn.o_proj.weight', 'model.layers.13.self_attn.k_proj.weight', 'model.layers.16.mlp.gate_proj.weight', 'model.layers.5.self_attn.o_proj.weight', 'model.layers.7.mlp.up_proj.weight', 'model.layers.13.input_layernorm.weight', 'model.layers.22.self_attn.k_proj.weight', 'model.layers.9.mlp.gate_proj.weight'} while saving. This should be OK, but check by verifying that you don't receive any warning while reloading
Traceback (most recent call last):
File "C:\cygwin64\home\Jim\chat\LLaMA-Factory\src\export_model.py", line 9, in <module>
main()
File "C:\cygwin64\home\Jim\chat\LLaMA-Factory\src\export_model.py", line 5, in main
export_model()
File "C:\cygwin64\home\Jim\chat\LLaMA-Factory\src\llmtuner\train\tuner.py", line 71, in export_model
model.save_pretrained(
File "C:\users\jim\appdata\local\programs\python\python311\Lib\site-packages\transformers\modeling_utils.py", line 2468, in save_pretrained
safe_save_file(shard, os.path.join(save_directory, shard_file), metadata={"format": "pt"})
File "C:\users\jim\appdata\local\programs\python\python311\Lib\site-packages\safetensors\torch.py", line 281, in save_file
serialize_file(_flatten(tensors), filename, metadata=metadata)
^^^^^^^^^^^^^^^^^
File "C:\users\jim\appdata\local\programs\python\python311\Lib\site-packages\safetensors\torch.py", line 485, in _flatten
return {
^
File "C:\users\jim\appdata\local\programs\python\python311\Lib\site-packages\safetensors\torch.py", line 489, in <dictcomp>
"data": _tobytes(v, k),
^^^^^^^^^^^^^^
File "C:\users\jim\appdata\local\programs\python\python311\Lib\site-packages\safetensors\torch.py", line 411, in _tobytes
tensor = tensor.to("cpu")
^^^^^^^^^^^^^^^^
NotImplementedError: Cannot copy out of meta tensor; no data!
Expected behavior
I was hoping the 7B model lora would merge with the base model, as I was able to train a small lora within 16GB VRAM.
The warnings indicate that some tasks were offloaded to the cpu, but safetensors didn't implement it? Not sure how to work around this.
Although the filesystem was under cygwin, I ran the script from Windows command line. I cloned the repo from yesterday or so. I and using current gaming Nvidia drivers, with the option to swap into conventional memory enabled (to enable slow swapping instead of crashing).
System Info
transformers
version: 4.39.3- Platform: Windows-10-10.0.22631-SP0 (I've upgraded to Windows 11, but this was the version during original Python install)
- Python version: 3.11.5
- Huggingface_hub version: 0.22.1
- Safetensors version: 0.4.2
- Accelerate version: 0.27.2
- Accelerate config: not found
- PyTorch version (GPU?): 2.2.1+cu121 (True)
- Tensorflow version (GPU?): not installed (NA)
- Flax version (CPU?/GPU?/TPU?): not installed (NA)
- Jax version: not installed
- JaxLib version: not installed
Others
No response
Activity