Closed
Description
Reminder
- I have read the README and searched the existing issues.
System Info
System: Ubuntu 20.04.2 LTS
GPU: NVIDIA A100-SXM4-80GB
Docker: 24.0.0
Docker Compose: v2.17.3
llamafactory: 0.8.2.dev0
vllm: 0.5.1
Reproduction
Build Command:
docker build -f ./Dockerfile \
--build-arg INSTALL_BNB=true \
--build-arg INSTALL_VLLM=true \
--build-arg INSTALL_DEEPSPEED=true \
--build-arg INSTALL_FLASHATTN=true \
--build-arg PIP_INDEX=https://pypi.tuna.tsinghua.edu.cn/simple \
-t llamafactory:latest .
Launch Command:
docker run -dit --gpus=all \
-v ./hf_cache:/root/.cache/huggingface \
-v ./ms_cache:/root/.cache/modelscope \
-v ./data:/app/data \
-v ./output:/app/output \
-p 7860:7860 \
-p 8000:8000 \
--shm-size 16G \
--name llamafactory \
llamafactory:latest
docker exec -it llamafactory bash
llamafactory-cli webui
The error below occurs when loading Qwen2-7B-Instruct in the chat tab of webui using vllm with multi-gpu:
(VllmWorkerProcess pid=263) Process VllmWorkerProcess:
(VllmWorkerProcess pid=263) Traceback (most recent call last):
(VllmWorkerProcess pid=263) File "/usr/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap
(VllmWorkerProcess pid=263) self.run()
(VllmWorkerProcess pid=263) File "/usr/lib/python3.10/multiprocessing/process.py", line 108, in run
(VllmWorkerProcess pid=263) self._target(*self._args, **self._kwargs)
(VllmWorkerProcess pid=263) File "/usr/local/lib/python3.10/dist-packages/vllm/executor/multiproc_worker_utils.py", line 210, in _run_worker_process
(VllmWorkerProcess pid=263) worker = worker_factory()
(VllmWorkerProcess pid=263) File "/usr/local/lib/python3.10/dist-packages/vllm/executor/gpu_executor.py", line 68, in _create_worker
(VllmWorkerProcess pid=263) wrapper.init_worker(**self._get_worker_kwargs(local_rank, rank,
(VllmWorkerProcess pid=263) File "/usr/local/lib/python3.10/dist-packages/vllm/worker/worker_base.py", line 334, in init_worker
(VllmWorkerProcess pid=263) self.worker = worker_class(*args, **kwargs)
(VllmWorkerProcess pid=263) File "/usr/local/lib/python3.10/dist-packages/vllm/worker/worker.py", line 85, in __init__
(VllmWorkerProcess pid=263) self.model_runner: GPUModelRunnerBase = ModelRunnerClass(
(VllmWorkerProcess pid=263) File "/usr/local/lib/python3.10/dist-packages/vllm/worker/model_runner.py", line 217, in __init__
(VllmWorkerProcess pid=263) self.attn_backend = get_attn_backend(
(VllmWorkerProcess pid=263) File "/usr/local/lib/python3.10/dist-packages/vllm/attention/selector.py", line 45, in get_attn_backend
(VllmWorkerProcess pid=263) backend = which_attn_to_use(num_heads, head_size, num_kv_heads,
(VllmWorkerProcess pid=263) File "/usr/local/lib/python3.10/dist-packages/vllm/attention/selector.py", line 151, in which_attn_to_use
(VllmWorkerProcess pid=263) if torch.cuda.get_device_capability()[0] < 8:
(VllmWorkerProcess pid=263) File "/usr/local/lib/python3.10/dist-packages/torch/cuda/__init__.py", line 430, in get_device_capability
(VllmWorkerProcess pid=263) prop = get_device_properties(device)
(VllmWorkerProcess pid=263) File "/usr/local/lib/python3.10/dist-packages/torch/cuda/__init__.py", line 444, in get_device_properties
(VllmWorkerProcess pid=263) _lazy_init() # will define _get_device_properties
(VllmWorkerProcess pid=263) File "/usr/local/lib/python3.10/dist-packages/torch/cuda/__init__.py", line 279, in _lazy_init
(VllmWorkerProcess pid=263) raise RuntimeError(
(VllmWorkerProcess pid=263) RuntimeError: Cannot re-initialize CUDA in forked subprocess. To use CUDA with multiprocessing, you must use the 'spawn' start method
ERROR 07-11 13:53:53 multiproc_worker_utils.py:120] Worker VllmWorkerProcess pid 263 died, exit code: 1
INFO 07-11 13:53:53 multiproc_worker_utils.py:123] Killing local vLLM worker processes
Expected behavior
Successfully loading model using vllm with multi-gpu.
Others
No response
Activity