Closed
Description
Reminder
- I have read the README and searched the existing issues.
System Info
ERROR: Exception in ASGI application
Traceback (most recent call last):
File "/sie/anaconda3/envs/yecp_main/lib/python3.10/site-packages/uvicorn/protocols/http/httptools_impl.py", line 401, in run_asgi
result = await app( # type: ignore[func-returns-value]
File "/sie/anaconda3/envs/yecp_main/lib/python3.10/site-packages/uvicorn/middleware/proxy_headers.py", line 70, in __call__
return await self.app(scope, receive, send)
File "/sie/anaconda3/envs/yecp_main/lib/python3.10/site-packages/fastapi/applications.py", line 1054, in __call__
await super().__call__(scope, receive, send)
File "/sie/anaconda3/envs/yecp_main/lib/python3.10/site-packages/starlette/applications.py", line 113, in __call__
await self.middleware_stack(scope, receive, send)
File "/sie/anaconda3/envs/yecp_main/lib/python3.10/site-packages/starlette/middleware/errors.py", line 187, in __call__
raise exc
File "/sie/anaconda3/envs/yecp_main/lib/python3.10/site-packages/starlette/middleware/errors.py", line 165, in __call__
await self.app(scope, receive, _send)
File "/sie/anaconda3/envs/yecp_main/lib/python3.10/site-packages/starlette/middleware/cors.py", line 93, in __call__
await self.simple_response(scope, receive, send, request_headers=headers)
File "/sie/anaconda3/envs/yecp_main/lib/python3.10/site-packages/starlette/middleware/cors.py", line 144, in simple_response
await self.app(scope, receive, send)
File "/sie/anaconda3/envs/yecp_main/lib/python3.10/site-packages/starlette/middleware/exceptions.py", line 62, in __call__
await wrap_app_handling_exceptions(self.app, conn)(scope, receive, send)
File "/sie/anaconda3/envs/yecp_main/lib/python3.10/site-packages/starlette/_exception_handler.py", line 62, in wrapped_app
raise exc
File "/sie/anaconda3/envs/yecp_main/lib/python3.10/site-packages/starlette/_exception_handler.py", line 51, in wrapped_app
await app(scope, receive, sender)
File "/sie/anaconda3/envs/yecp_main/lib/python3.10/site-packages/starlette/routing.py", line 715, in __call__
await self.middleware_stack(scope, receive, send)
File "/sie/anaconda3/envs/yecp_main/lib/python3.10/site-packages/starlette/routing.py", line 735, in app
await route.handle(scope, receive, send)
File "/sie/anaconda3/envs/yecp_main/lib/python3.10/site-packages/starlette/routing.py", line 288, in handle
await self.app(scope, receive, send)
File "/sie/anaconda3/envs/yecp_main/lib/python3.10/site-packages/starlette/routing.py", line 76, in app
await wrap_app_handling_exceptions(app, request)(scope, receive, send)
File "/sie/anaconda3/envs/yecp_main/lib/python3.10/site-packages/starlette/_exception_handler.py", line 62, in wrapped_app
raise exc
File "/sie/anaconda3/envs/yecp_main/lib/python3.10/site-packages/starlette/_exception_handler.py", line 51, in wrapped_app
await app(scope, receive, sender)
File "/sie/anaconda3/envs/yecp_main/lib/python3.10/site-packages/starlette/routing.py", line 73, in app
response = await f(request)
File "/sie/anaconda3/envs/yecp_main/lib/python3.10/site-packages/fastapi/routing.py", line 301, in app
raw_response = await run_endpoint_function(
File "/sie/anaconda3/envs/yecp_main/lib/python3.10/site-packages/fastapi/routing.py", line 212, in run_endpoint_function
return await dependant.call(**values)
File "/sie/yecp/code/llama_factory_main/src/llamafactory/api/app.py", line 100, in create_chat_completion
return await create_chat_completion_response(request, chat_model)
File "/sie/yecp/code/llama_factory_main/src/llamafactory/api/chat.py", line 149, in create_chat_completion_response
responses = await chat_model.achat(
File "/sie/yecp/code/llama_factory_main/src/llamafactory/chat/chat_model.py", line 76, in achat
return await self.engine.chat(messages, system, tools, image, video, **input_kwargs)
TypeError: VllmEngine.chat() takes from 2 to 5 positional arguments but 6 were given
Reproduction
CUDA_VISIBLE_DEVICES=0 API_PORT=8082 llamafactory-cli api yaml_data/pcb/qwen2-7b_lora_sft_vllm.yaml
qwen2-7b_lora_sft_vllm.yaml 脚本如下:
model_name_or_path: /qwen2_7b_pcb_100_sft_all_1016_merge
template: qwen
infer_backend: vllm
max_new_tokens: 32768
vllm_maxlen: 32768
vllm_enforce_eager: true
vllm_gpu_util: 0.27
请求代码如下:
# 目标URL
url = "http://192.168.174.38:8082/v1/chat/completions"
# 请求头
headers = {
"accept": "application/json",
"Content-Type": "application/json",
}
# 请求体中的数据
data = {
"model": "string",
"messages": [
{
"role": "user",
"content": """测试数据"""
}
],
"tools": [],
"do_sample": True,
"temperature": 0,
"top_p": 0,
"n": 1,
"max_tokens": 0,
"stream": False
}
# 发送POST请求
s_time = time.time()
response = requests.post(url, headers=headers, data=json.dumps(data))
print((time.time()-s_time)*1000)
主要依赖版本如下:
torch==2.4.0
torchaudio==2.4.0
torchvision==0.19.0
transformers==4.45.0.dev0
transformers-stream-generator==0.0.4
triton==3.0.0
trl==0.9.6
vllm==0.5.4
vllm-flash-attn==2.6.1
xformers==0.0.27.post2
Expected behavior
期望可以正常推理
Others
No response
Activity