ValueError: Trying to set a tensor of shape torch.Size([128256, 3072]) in "weight" (which has shape torch.Size([128003, 3072])), this looks incorrect

### System Info

win11
Python3.12.6

### Who can help?

_No response_

### Information

- [ ] The official example scripts
- [ ] My own modified scripts

### Tasks

- [ ] An officially supported task in the `examples` folder (such as GLUE/SQuAD, ...)
- [ ] My own task or dataset (give details below)

### Reproduction

I'm from https://www.llama.com/ I downloaded the meta lama/Llama-3.2-3B-Instruction model and then https://github.com/huggingface/transformers After downloading convert_1lama_ weights_to-hf-py and executing python convert_1lama_ weights_to-hf-py -- input-dir llama-v3p2-3b base -- model_2 3B -- export-dir llama-v3p2-3b base huggingface, an error occurred:
python convert_llama_weights_to_hf.py --input_dir llama-v3p2-3b-base --model_size 3B --output_dir llama-v3p2-3b-base-huggingface


### Expected behavior

Converting the tokenizer.
You are using the default legacy behaviour of the <class 'transformers.models.llama.tokenization_llama_fast.LlamaTokenizerFast'>. This is expected, and simply means that the `legacy` (previous) behavior will be used so nothing changes for you. If you want to use the new behaviour, set `legacy=False`. This should only be set if you understand what it means, and thoroughly read the reason why this was added as explained in https://github.com/huggingface/transformers/pull/24565 - if you loaded a llama tokenizer from a GGUF file you can ignore this message.
Saving a LlamaTokenizerFast to llama-v3p2-3b-base-huggingface.
Converting the model.
Fetching all parameters from the checkpoint at llama-v3p2-3b-base.
Loading the checkpoint in a Llama model.
Loading checkpoint shards:  72%|███████████████████████████████████████▊               | 21/29 [00:02<00:00,  9.50it/s]
Traceback (most recent call last):
  File "C:\WORK\AITools\convert_llama_weights_to_hf.py", line 601, in <module>
    main()
  File "C:\WORK\AITools\convert_llama_weights_to_hf.py", line 587, in main
    write_model(
  File "C:\WORK\AITools\convert_llama_weights_to_hf.py", line 417, in write_model
    model = LlamaForCausalLM.from_pretrained(tmp_model_path, torch_dtype=torch.bfloat16, low_cpu_mem_usage=True)
            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\ProgramData\anaconda3\Lib\site-packages\transformers\modeling_utils.py", line 262, in _wrapper
    return func(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^
  File "C:\ProgramData\anaconda3\Lib\site-packages\transformers\modeling_utils.py", line 4319, in from_pretrained
    ) = cls._load_pretrained_model(
        ^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\ProgramData\anaconda3\Lib\site-packages\transformers\modeling_utils.py", line 4897, in _load_pretrained_model
    new_error_msgs, offload_index, state_dict_index = _load_state_dict_into_meta_model(
                                                      ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\ProgramData\anaconda3\Lib\site-packages\transformers\modeling_utils.py", line 896, in _load_state_dict_into_meta_model
    set_module_tensor_to_device(model, param_name, param_device, **set_module_kwargs)
  File "C:\ProgramData\anaconda3\Lib\site-packages\accelerate\utils\modeling.py", line 287, in set_module_tensor_to_device
    raise ValueError(
ValueError: Trying to set a tensor of shape torch.Size([128256, 3072]) in "weight" (which has shape torch.Size([128003, 3072])), this looks incorrect.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ValueError: Trying to set a tensor of shape torch.Size([128256, 3072]) in "weight" (which has shape torch.Size([128003, 3072])), this looks incorrect #36350

System Info

Who can help?

Information

Tasks

Reproduction

Expected behavior

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development