llava - RuntimeError: Index put requires the source and destination dtypes match, got Half for the destination and Float for the source.

### Reminder

- [X] I have read the README and searched the existing issues.

### Reproduction

Broke after update to 2bec28e

```
set CUDA_VISIBLE_DEVICES=0 && llamafactory-cli train --stage sft --do_train True --model_name_or_path llava-hf/llava-1.5-13b-hf --preprocessing_num_workers 16 --finetuning_type lora --quantization_bit 8 --template vicuna --rope_scaling linear --flash_attn fa2 --visual_inputs True --dataset_dir data --dataset pokemon_1k --cutoff_len 4096 --learning_rate 2e-05 --num_train_epochs 3.0 --max_samples 100000 --per_device_train_batch_size 1 --gradient_accumulation_steps 1 --lr_scheduler_type cosine --max_grad_norm 1.0 --logging_steps 5 --save_steps 1000 --warmup_steps 0 --optim adamw_torch --packing False --upcast_layernorm True --report_to none --output_dir saves\LLaVA1.5-13B-Chat\lora\LLaVA1.5-13B-Chat_pokemon --fp16 True --plot_loss True --lora_rank 256 --lora_alpha 512 --lora_dropout 0 --create_new_adapter True --lora_target all

:LLaMA-Factory\venv\lib\site-packages\bitsandbytes\autograd\_functions.py:322: UserWarning: MatMul8bitLt: inputs will be cast from torch.float32 to float16 during quantization
  warnings.warn(f"MatMul8bitLt: inputs will be cast from {A.dtype} to float16 during quantization")
Traceback (most recent call last):
  File "C:\Users\username\AppData\Local\Programs\Python\Python310\lib\runpy.py", line 196, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "C:\Users\username\AppData\Local\Programs\Python\Python310\lib\runpy.py", line 86, in _run_code
    exec(code, run_globals)
  File "C:\LLaMA-Factory\venv\Scripts\llamafactory-cli.exe\__main__.py", line 7, in <module>
    sys.exit(main())
  File "C:\LLaMA-Factory\src\llamafactory\cli.py", line 65, in main
    run_exp()
  File "C:\LLaMA-Factory\src\llamafactory\train\tuner.py", line 34, in run_exp
    run_sft(model_args, data_args, training_args, finetuning_args, generating_args, callbacks)
  File "C:\LLaMA-Factory\src\llamafactory\train\sft\workflow.py", line 73, in run_sft
    train_result = trainer.train(resume_from_checkpoint=training_args.resume_from_checkpoint)
  File "C:\LLaMA-Factory\venv\lib\site-packages\transformers\trainer.py", line 1859, in train
    return inner_training_loop(
  File "C:\LLaMA-Factory\venv\lib\site-packages\transformers\trainer.py", line 2203, in _inner_training_loop
    tr_loss_step = self.training_step(model, inputs)
  File "C:\LLaMA-Factory\venv\lib\site-packages\transformers\trainer.py", line 3138, in training_step
    loss = self.compute_loss(model, inputs)
  File "C:\LLaMA-Factory\venv\lib\site-packages\transformers\trainer.py", line 3161, in compute_loss
    outputs = model(**inputs)
  File "C:\LLaMA-Factory\venv\lib\site-packages\torch\nn\modules\module.py", line 1532, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "C:\LLaMA-Factory\venv\lib\site-packages\torch\nn\modules\module.py", line 1541, in _call_impl
    return forward_call(*args, **kwargs)
  File "C:\LLaMA-Factory\venv\lib\site-packages\accelerate\utils\operations.py", line 822, in forward
    return model_forward(*args, **kwargs)
  File "C:\LLaMA-Factory\venv\lib\site-packages\accelerate\utils\operations.py", line 810, in __call__
    return convert_to_fp32(self.model_forward(*args, **kwargs))
  File "C:\LLaMA-Factory\venv\lib\site-packages\torch\amp\autocast_mode.py", line 16, in decorate_autocast
    return func(*args, **kwargs)
  File "C:\LLaMA-Factory\venv\lib\site-packages\peft\peft_model.py", line 1129, in forward
    return self.base_model(
  File "C:\LLaMA-Factory\venv\lib\site-packages\torch\nn\modules\module.py", line 1532, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "C:\LLaMA-Factory\venv\lib\site-packages\torch\nn\modules\module.py", line 1541, in _call_impl
    return forward_call(*args, **kwargs)
  File "C:\LLaMA-Factory\venv\lib\site-packages\peft\tuners\tuners_utils.py", line 161, in forward
    return self.model.forward(*args, **kwargs)
  File "C:\LLaMA-Factory\venv\lib\site-packages\accelerate\hooks.py", line 166, in new_forward
    output = module._old_forward(*args, **kwargs)
  File "C:\LLaMA-Factory\venv\lib\site-packages\transformers\models\llava\modeling_llava.py", line 438, in forward
    inputs_embeds, attention_mask, labels, position_ids = self._merge_input_ids_with_image_features(
  File "C:\LLaMA-Factory\venv\lib\site-packages\transformers\models\llava\modeling_llava.py", line 340, in _merge_input_ids_with_image_features
    final_embedding[image_to_overwrite] = image_features.contiguous().reshape(-1, embed_dim).to(target_device)
RuntimeError: Index put requires the source and destination dtypes match, got Half for the destination and Float for the source.
```

### Expected behavior

_No response_

### System Info

_No response_

### Others

_No response_

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

llava - RuntimeError: Index put requires the source and destination dtypes match, got Half for the destination and Float for the source. #3807

Reminder

Reproduction

Expected behavior

System Info

Others

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development