Open
Description
I am trying to use cuStreamWriteValue32
which is part of the cuda driver API (context: #3894). Even though I can build, I am getting a runtime error CUDA_ERROR_NOT_SUPPORTED
. This should be supported as I am using a DGX H100 node with cuda 12.8, inside the pjnl latest docker.
Repro:
- error: https://gitlab-master.nvidia.com/dl/pytorch/fuser-gh-mirror/-/jobs/125237761
- PR: [WIP] test with cuStreamWriteValue32 #3496
- docker image:
gitlab-master.nvidia.com:5005/dl/pytorch/update-scripts:pjnl-latest
Driver Version: 550.90.07 CUDA Version: 12.8
(I also tried with more recent drivers)
The source of problem can be narrowed down to lazy loading /usr/local/cuda/compat/lib.real/libcuda.so.1
in the pjnl container -- the bug comes either from lazy loading or from the library itself.
To prove this, note that the following patch (which explicitly links to cuda, non-lazily) solves the bug:
diff --git a/CMakeLists.txt b/CMakeLists.txt
index 9d7d7b32..3e51bce8 100644
--- a/CMakeLists.txt
+++ b/CMakeLists.txt
@@ -334,6 +334,7 @@ target_link_libraries(codegen_internal PUBLIC
${LIBCUPTI}
${TORCH_LIBRARIES}
dl
+ cuda
)
add_library(nvfuser_codegen SHARED $<TARGET_OBJECTS:codegen_internal>)
diff --git a/csrc/driver_api.h b/csrc/driver_api.h
index 41072a22..b8c413a4 100644
--- a/csrc/driver_api.h
+++ b/csrc/driver_api.h
@@ -37,7 +37,6 @@ namespace nvfuser {
#if (CUDA_VERSION >= 12000)
#define ALL_DRIVER_API_WRAPPER(fn) \
ALL_DRIVER_API_WRAPPER_CUDA11(fn); \
- fn(cuStreamWriteValue32); \
fn(cuTensorMapEncodeTiled)
#else
#define ALL_DRIVER_API_WRAPPER ALL_DRIVER_API_WRAPPER_CUDA11
diff --git a/tests/cpp/test_gpu3.cpp b/tests/cpp/test_gpu3.cpp
index 9570bb9b..a7236c6a 100644
--- a/tests/cpp/test_gpu3.cpp
+++ b/tests/cpp/test_gpu3.cpp
@@ -56,7 +56,8 @@
#include <sstream>
#include "parallel_dimension_map.h"
-#include <driver_api.h>
+// #include <driver_api.h>
+#include <cuda.h>
namespace nvfuser {
using namespace at::indexing;
and note also that cuda-gdb gives the following backtrace of the error:
#0 0x00007fff37f740f0 in cudbgReportDriverApiError () from /usr/local/cuda/compat/lib.real/libcuda.so.1
#1 0x00007fff381e312b in ?? () from /usr/local/cuda/compat/lib.real/libcuda.so.1
#2 0x00007fff2f4c0d47 in ?? () from /usr/local/cuda/compat/lib.real/libcudadebugger.so.1
#3 0x00007fff2f49c29e in ?? () from /usr/local/cuda/compat/lib.real/libcudadebugger.so.1
#4 0x00007fff2f4af56d in ?? () from /usr/local/cuda/compat/lib.real/libcudadebugger.so.1
#5 0x00007fff2f5aebd6 in ?? () from /usr/local/cuda/compat/lib.real/libcudadebugger.so.1
#6 0x00007fff380c05d0 in ?? () from /usr/local/cuda/compat/lib.real/libcuda.so.1
#7 0x0000555555a67b3e in lazilyLoadAndInvoke (args#0=0x7fff2ad0d618, args#1=140724802682880, args#2=3, args#3=0) at /opt/pytorch/Fuser2/csrc/driver_api.cpp:95
Metadata
Assignees
Labels
No labels
Activity