Skip to content

最新版LLaMA Factory,使用vllm推理报错 #5384

Closed
@yecphaha

Description

Reminder

  • I have read the README and searched the existing issues.

System Info

ERROR:    Exception in ASGI application
Traceback (most recent call last):
  File "/sie/anaconda3/envs/yecp_main/lib/python3.10/site-packages/uvicorn/protocols/http/httptools_impl.py", line 401, in run_asgi
    result = await app(  # type: ignore[func-returns-value]
  File "/sie/anaconda3/envs/yecp_main/lib/python3.10/site-packages/uvicorn/middleware/proxy_headers.py", line 70, in __call__
    return await self.app(scope, receive, send)
  File "/sie/anaconda3/envs/yecp_main/lib/python3.10/site-packages/fastapi/applications.py", line 1054, in __call__
    await super().__call__(scope, receive, send)
  File "/sie/anaconda3/envs/yecp_main/lib/python3.10/site-packages/starlette/applications.py", line 113, in __call__
    await self.middleware_stack(scope, receive, send)
  File "/sie/anaconda3/envs/yecp_main/lib/python3.10/site-packages/starlette/middleware/errors.py", line 187, in __call__
    raise exc
  File "/sie/anaconda3/envs/yecp_main/lib/python3.10/site-packages/starlette/middleware/errors.py", line 165, in __call__
    await self.app(scope, receive, _send)
  File "/sie/anaconda3/envs/yecp_main/lib/python3.10/site-packages/starlette/middleware/cors.py", line 93, in __call__
    await self.simple_response(scope, receive, send, request_headers=headers)
  File "/sie/anaconda3/envs/yecp_main/lib/python3.10/site-packages/starlette/middleware/cors.py", line 144, in simple_response
    await self.app(scope, receive, send)
  File "/sie/anaconda3/envs/yecp_main/lib/python3.10/site-packages/starlette/middleware/exceptions.py", line 62, in __call__
    await wrap_app_handling_exceptions(self.app, conn)(scope, receive, send)
  File "/sie/anaconda3/envs/yecp_main/lib/python3.10/site-packages/starlette/_exception_handler.py", line 62, in wrapped_app
    raise exc
  File "/sie/anaconda3/envs/yecp_main/lib/python3.10/site-packages/starlette/_exception_handler.py", line 51, in wrapped_app
    await app(scope, receive, sender)
  File "/sie/anaconda3/envs/yecp_main/lib/python3.10/site-packages/starlette/routing.py", line 715, in __call__
    await self.middleware_stack(scope, receive, send)
  File "/sie/anaconda3/envs/yecp_main/lib/python3.10/site-packages/starlette/routing.py", line 735, in app
    await route.handle(scope, receive, send)
  File "/sie/anaconda3/envs/yecp_main/lib/python3.10/site-packages/starlette/routing.py", line 288, in handle
    await self.app(scope, receive, send)
  File "/sie/anaconda3/envs/yecp_main/lib/python3.10/site-packages/starlette/routing.py", line 76, in app
    await wrap_app_handling_exceptions(app, request)(scope, receive, send)
  File "/sie/anaconda3/envs/yecp_main/lib/python3.10/site-packages/starlette/_exception_handler.py", line 62, in wrapped_app
    raise exc
  File "/sie/anaconda3/envs/yecp_main/lib/python3.10/site-packages/starlette/_exception_handler.py", line 51, in wrapped_app
    await app(scope, receive, sender)
  File "/sie/anaconda3/envs/yecp_main/lib/python3.10/site-packages/starlette/routing.py", line 73, in app
    response = await f(request)
  File "/sie/anaconda3/envs/yecp_main/lib/python3.10/site-packages/fastapi/routing.py", line 301, in app
    raw_response = await run_endpoint_function(
  File "/sie/anaconda3/envs/yecp_main/lib/python3.10/site-packages/fastapi/routing.py", line 212, in run_endpoint_function
    return await dependant.call(**values)
  File "/sie/yecp/code/llama_factory_main/src/llamafactory/api/app.py", line 100, in create_chat_completion
    return await create_chat_completion_response(request, chat_model)
  File "/sie/yecp/code/llama_factory_main/src/llamafactory/api/chat.py", line 149, in create_chat_completion_response
    responses = await chat_model.achat(
  File "/sie/yecp/code/llama_factory_main/src/llamafactory/chat/chat_model.py", line 76, in achat
    return await self.engine.chat(messages, system, tools, image, video, **input_kwargs)
TypeError: VllmEngine.chat() takes from 2 to 5 positional arguments but 6 were given

Reproduction

CUDA_VISIBLE_DEVICES=0 API_PORT=8082 llamafactory-cli api yaml_data/pcb/qwen2-7b_lora_sft_vllm.yaml

qwen2-7b_lora_sft_vllm.yaml 脚本如下:
model_name_or_path: /qwen2_7b_pcb_100_sft_all_1016_merge
template: qwen
infer_backend: vllm
max_new_tokens: 32768
vllm_maxlen: 32768
vllm_enforce_eager: true
vllm_gpu_util: 0.27

请求代码如下:
# 目标URL
url = "http://192.168.174.38:8082/v1/chat/completions"

# 请求头
headers = {
    "accept": "application/json",
    "Content-Type": "application/json",
}

# 请求体中的数据
data = {
    "model": "string",
    "messages": [
        {
            "role": "user",
            "content": """测试数据"""
        }
    ],
    "tools": [],
    "do_sample": True,
    "temperature": 0,
    "top_p": 0,
    "n": 1,
    "max_tokens": 0,
    "stream": False
}

# 发送POST请求
s_time = time.time()
response = requests.post(url, headers=headers, data=json.dumps(data))
print((time.time()-s_time)*1000)


主要依赖版本如下:
torch==2.4.0
torchaudio==2.4.0
torchvision==0.19.0
transformers==4.45.0.dev0
transformers-stream-generator==0.0.4
triton==3.0.0
trl==0.9.6
vllm==0.5.4
vllm-flash-attn==2.6.1
xformers==0.0.27.post2

Expected behavior

期望可以正常推理

Others

No response

Activity

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Metadata

Assignees

No one assigned

    Labels

    solvedThis problem has been already solved

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions