最新版LLaMA Factory，使用vllm推理报错

### Reminder

- [X] I have read the README and searched the existing issues.

### System Info

```
ERROR:    Exception in ASGI application
Traceback (most recent call last):
  File "/sie/anaconda3/envs/yecp_main/lib/python3.10/site-packages/uvicorn/protocols/http/httptools_impl.py", line 401, in run_asgi
    result = await app(  # type: ignore[func-returns-value]
  File "/sie/anaconda3/envs/yecp_main/lib/python3.10/site-packages/uvicorn/middleware/proxy_headers.py", line 70, in __call__
    return await self.app(scope, receive, send)
  File "/sie/anaconda3/envs/yecp_main/lib/python3.10/site-packages/fastapi/applications.py", line 1054, in __call__
    await super().__call__(scope, receive, send)
  File "/sie/anaconda3/envs/yecp_main/lib/python3.10/site-packages/starlette/applications.py", line 113, in __call__
    await self.middleware_stack(scope, receive, send)
  File "/sie/anaconda3/envs/yecp_main/lib/python3.10/site-packages/starlette/middleware/errors.py", line 187, in __call__
    raise exc
  File "/sie/anaconda3/envs/yecp_main/lib/python3.10/site-packages/starlette/middleware/errors.py", line 165, in __call__
    await self.app(scope, receive, _send)
  File "/sie/anaconda3/envs/yecp_main/lib/python3.10/site-packages/starlette/middleware/cors.py", line 93, in __call__
    await self.simple_response(scope, receive, send, request_headers=headers)
  File "/sie/anaconda3/envs/yecp_main/lib/python3.10/site-packages/starlette/middleware/cors.py", line 144, in simple_response
    await self.app(scope, receive, send)
  File "/sie/anaconda3/envs/yecp_main/lib/python3.10/site-packages/starlette/middleware/exceptions.py", line 62, in __call__
    await wrap_app_handling_exceptions(self.app, conn)(scope, receive, send)
  File "/sie/anaconda3/envs/yecp_main/lib/python3.10/site-packages/starlette/_exception_handler.py", line 62, in wrapped_app
    raise exc
  File "/sie/anaconda3/envs/yecp_main/lib/python3.10/site-packages/starlette/_exception_handler.py", line 51, in wrapped_app
    await app(scope, receive, sender)
  File "/sie/anaconda3/envs/yecp_main/lib/python3.10/site-packages/starlette/routing.py", line 715, in __call__
    await self.middleware_stack(scope, receive, send)
  File "/sie/anaconda3/envs/yecp_main/lib/python3.10/site-packages/starlette/routing.py", line 735, in app
    await route.handle(scope, receive, send)
  File "/sie/anaconda3/envs/yecp_main/lib/python3.10/site-packages/starlette/routing.py", line 288, in handle
    await self.app(scope, receive, send)
  File "/sie/anaconda3/envs/yecp_main/lib/python3.10/site-packages/starlette/routing.py", line 76, in app
    await wrap_app_handling_exceptions(app, request)(scope, receive, send)
  File "/sie/anaconda3/envs/yecp_main/lib/python3.10/site-packages/starlette/_exception_handler.py", line 62, in wrapped_app
    raise exc
  File "/sie/anaconda3/envs/yecp_main/lib/python3.10/site-packages/starlette/_exception_handler.py", line 51, in wrapped_app
    await app(scope, receive, sender)
  File "/sie/anaconda3/envs/yecp_main/lib/python3.10/site-packages/starlette/routing.py", line 73, in app
    response = await f(request)
  File "/sie/anaconda3/envs/yecp_main/lib/python3.10/site-packages/fastapi/routing.py", line 301, in app
    raw_response = await run_endpoint_function(
  File "/sie/anaconda3/envs/yecp_main/lib/python3.10/site-packages/fastapi/routing.py", line 212, in run_endpoint_function
    return await dependant.call(**values)
  File "/sie/yecp/code/llama_factory_main/src/llamafactory/api/app.py", line 100, in create_chat_completion
    return await create_chat_completion_response(request, chat_model)
  File "/sie/yecp/code/llama_factory_main/src/llamafactory/api/chat.py", line 149, in create_chat_completion_response
    responses = await chat_model.achat(
  File "/sie/yecp/code/llama_factory_main/src/llamafactory/chat/chat_model.py", line 76, in achat
    return await self.engine.chat(messages, system, tools, image, video, **input_kwargs)
TypeError: VllmEngine.chat() takes from 2 to 5 positional arguments but 6 were given
```

### Reproduction

```
CUDA_VISIBLE_DEVICES=0 API_PORT=8082 llamafactory-cli api yaml_data/pcb/qwen2-7b_lora_sft_vllm.yaml

qwen2-7b_lora_sft_vllm.yaml 脚本如下：
model_name_or_path: /qwen2_7b_pcb_100_sft_all_1016_merge
template: qwen
infer_backend: vllm
max_new_tokens: 32768
vllm_maxlen: 32768
vllm_enforce_eager: true
vllm_gpu_util: 0.27

请求代码如下：
# 目标URL
url = "http://192.168.174.38:8082/v1/chat/completions"

# 请求头
headers = {
    "accept": "application/json",
    "Content-Type": "application/json",
}

# 请求体中的数据
data = {
    "model": "string",
    "messages": [
        {
            "role": "user",
            "content": """测试数据"""
        }
    ],
    "tools": [],
    "do_sample": True,
    "temperature": 0,
    "top_p": 0,
    "n": 1,
    "max_tokens": 0,
    "stream": False
}

# 发送POST请求
s_time = time.time()
response = requests.post(url, headers=headers, data=json.dumps(data))
print((time.time()-s_time)*1000)


主要依赖版本如下：
torch==2.4.0
torchaudio==2.4.0
torchvision==0.19.0
transformers==4.45.0.dev0
transformers-stream-generator==0.0.4
triton==3.0.0
trl==0.9.6
vllm==0.5.4
vllm-flash-attn==2.6.1
xformers==0.0.27.post2
```

### Expected behavior

期望可以正常推理

### Others

_No response_

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

最新版LLaMA Factory，使用vllm推理报错 #5384

Reminder

System Info

Reproduction

Expected behavior

Others

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development