Skip to content

微调llava1.5时报错图片数量不匹配,猜测llava_plugin中img_token的处理存在问题 #5344

Closed
@wwwbq

Description

我理解的LlavaPlugin中处理img_token的逻辑为(1)找到content中所有即对应的img token并且替换为{{image}}(2)把每个{{image}}替换为image_seqlen个img token。在llava中img token的数量为576,即image_seqlen为576。如果content中只有1个img token,那么经过LlavaPlugin的处理后,content会带有576个img token。但是查看LlavaForConditionalGeneration的源码发现,其实模型假定的是只会输入一个img token,找到img token位置后再一次性把图片的所有576个token插入进来,这样的话就和llama-factory源码的逻辑不太符合?

实际上我遇到的报错为:
File "/root/miniconda3/lib/python3.10/site-packages/transformers/models/llava/modeling_llava.py", line 339, in _merge_input_ids_with_image_features raise ValueError( ValueError: The input provided to the model are wrong. The number of image tokens is 576 while the number of image given to the model is 1. This prevents correct indexing and breaks batch generation.
或许可以理解为图片输入了576个img token,理应对应576张图片,但是数据集只输入了一张图,所以报错。在更新qwen-vl以前似乎img token的处理和现在的llama factory版本不太一样,是为了兼容qwen-vl吗

Activity

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Metadata

Assignees

No one assigned

    Labels

    solvedThis problem has been already solved

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions