Fix CUDA version detection in CUB #170

Artem-B · 2019-09-23T18:21:27Z

This fixes the problem with CUB using deprecated shfl/vote instructions when CUB
is compiled with clang (e.g. some TensorFlow builds).

Artem-B · 2019-09-23T18:25:30Z

@dumerrill This is a follow-up on https://github.com/NVlabs/cub/pull/164#issuecomment-495732330
@chsigg : This should fix TF build issues w/ clang.

RaulPPelaez · 2019-09-24T08:11:50Z

Might be related #137

Artem-B · 2019-09-24T16:20:50Z

Might be related #137

I don't think so. #137 is about "shfl" instructions used by CUB not being available on Fermi GPUs. CUB uses different mechanism to select kernels that use shared memory instead of shfl. AFAICT it's independent of selection of shfl vs shfl.sync here.

Artem-B · 2019-10-14T02:30:09Z

@dumerrill : ping

andrewcorrigan · 2020-10-23T17:34:27Z

I'm trying this out with the conflict resolved as:

#if ((__CUDACC_VER_MAJOR__ >= 9) || defined(__NVCOMPILER_CUDA__) || CUDA_VERSION >= 9000) && \
        !defined(CUB_USE_COOPERATIVE_GROUPS)

andrewcorrigan · 2020-10-23T18:25:08Z

Can this please be merged?

alliepiper · 2020-10-23T20:29:45Z

I can bump this up to the 1.11.0 release. Our internal integration and testing workflows are currently blocked, but this should be able to go in quickly once things start moving again.

This fixes the problem with CUB using deprecated shfl/vote instructions when CUB is compiled with clang (e.g. some TensorFlow builds).

alliepiper · 2020-10-23T20:40:30Z

Rebased and resolved conflict as described.

alliepiper · 2020-10-30T16:46:26Z

DVS CL: 29265335

Signed-off-by: daquexian <[email protected]>

Signed-off-by: daquexian <[email protected]> Co-authored-by: oneflow-ci-bot <[email protected]>

* Rename class OneflowVM to VirtualMachine (#6753) * Rename class OneflowVM to VirtualMachine * refine * refine * refine Co-authored-by: oneflow-ci-bot <[email protected]> * upgrade cub to 1.11.0 for NVIDIA/cub#170 (#6795) Signed-off-by: daquexian <[email protected]> Co-authored-by: oneflow-ci-bot <[email protected]> * lazy create cuda_stream (#6806) * lazy create cuda_stream in CudaCopyD2HDeviceCtx CudaStreamHandleDeviceCtx * refine * Remove KernelContext::stream_ctx() (#6805) * Remove KernelContext::stream_ctx() * fix GetCudaAlignedSize * refine * Remove StreamContextAdapter * refine include Co-authored-by: oneflow-ci-bot <[email protected]> * add tensor method docstr (#6800) * add tensor method docstr * add tensor method docstr * add tensor method docstr * add tensor method docstr * add tensor method docstr * add tensor method docstr * add tensor method docstr * fix ci related bug * set common compiler flags in oneflow_add_library(...), enable it for CUDA (#6813) * apply treating warnings as errors in oneflow_add_library(...), enable it to CUDA Signed-off-by: daquexian <[email protected]> * support target_try_compile_options on clang cuda Signed-off-by: daquexian <[email protected]> * reorder oneflow_add_library Signed-off-by: daquexian <[email protected]> * add cuda-61-clang.cmake and cuda-75-clang.cmake Signed-off-by: daquexian <[email protected]> * move oneflow_add_xxx after set_compile_options_to_oneflow_target Signed-off-by: daquexian <[email protected]> * reformat Signed-off-by: daquexian <[email protected]> Co-authored-by: oneflow-ci-bot <[email protected]> * Use ep::Stream instead of DeviceCtx (#6825) * remove redundant code (#6807) * Prevent CI failure when cublas alloc fail (#6826) * Dev nms (#6817) * fix typo * dev nms * fix * fix * fix format * skip distribute test Co-authored-by: oneflow-ci-bot <[email protected]> * Refactor vm consuming (#6748) * refactor PhyInstrOperand::ForEachXXXMirroredObject * remove ForEachXXXMirroredObject4XXXPhyInstrOperand * reduce for-loops for InstructionList * reduce for-loops on InstructionMsgList * refactor MakeInstructions * refactor PhyInstrOperand::ForEachXXXMirroredObject * 1) refactor ConnectInstruction to TryConnectInstruction; 2) refactor BackInserter to SetInserter * create RwMutextObjectAccess/InstructionEdge from intrusive::ObjectPool * refactor profiler range name * fix barrier instruction comment typos * fix compiler complaints * Update oneflow/core/intrusive/object_pool.h Co-authored-by: daquexian <[email protected]> * fix static analysis complaints Co-authored-by: daquexian <[email protected]> Co-authored-by: oneflow-ci-bot <[email protected]> * Add init method docstr modify int to int32 (#6828) * Add nn.init method docstr, and modify np.int * Add nn.init method docstr, and modify np.int * Check whether the expand_shape parameter is legal (#6812) * check parameters * simplify logic * fix ci error Co-authored-by: oneflow-ci-bot <[email protected]> * refactor local call opkernel instruction (#6733) * remove CheckOutputBlobObjectsMemCase * move calling of ChooseOpKernel from scheduler thread to main thread. * address pr comments Co-authored-by: oneflow-ci-bot <[email protected]> * one_hot primitive interface (#6796) * one_hot primitive interface * refine * refine Co-authored-by: oneflow-ci-bot <[email protected]> * revert DependenceVector to std::vector (#6835) * fix indexed slice for adam max_x (#6824) Co-authored-by: ZZK <[email protected]> Co-authored-by: oneflow-ci-bot <[email protected]> * set str option (#6832) * set str option * refine * refine * fix * refine * fix * fix * refine * refine * refine * fix Co-authored-by: oneflow-ci-bot <[email protected]> * Empty op support float16 (#6847) * support fp16 * add float16 test case * add graph cudnn conv alg config (#6799) Co-authored-by: oneflow-ci-bot <[email protected]> * Dev vm view instruction (#6815) * shallow copy * try reset blob data * refine * debug * raw implementation * refine * refine * to_contiguous op * reine * refine * refine * set_last_used_device * refine * raw implementation * debug * replace TryResetBlobData with SyncAccessBlobByCallback * tensor_view_instruction * refine * tensor_view_operand * remove tensor_view_phy_instr_operand * refine * refine * refine * restruct * refine * refine * refine * refine * Remove deafult l2 and use bias add in lazy mode (#6844) * remove_deafult_l2_and_use_bias_add_in_lazy_mode * minor fix * minor fix * undo bais add Co-authored-by: oneflow-ci-bot <[email protected]> * Add arccos op and docstr (#6841) * Add arccos op and docstr * fix docstr format Co-authored-by: oneflow-ci-bot <[email protected]> * add some fused kernels (#6635) * fix errors, op with dropout successes, but op without dropout has error * fix errors, success * fix typo error * test dropout * add comments * fix typos * change format * reformat file * fix error * change format * remove useless head file * fix errors * fix errors * reformat * fix errors * reformat * fix errors Co-authored-by: oneflow-ci-bot <[email protected]> Co-authored-by: ZZK <[email protected]> * Add CUDA arch 52 back and compile it in CI (#6802) * Add CUDA arch 53 back and compile it in CI * fix cuda * fix * don't build 52 by default * rm comment Co-authored-by: oneflow-ci-bot <[email protected]> * [EP] Add ep::Device/ep::Event (#6851) * [EP] Add ep::Device/ep::Event * Refine ActiveDeviceContext * fix * refine include * fix tidy error * fix cudaEventRecord * fix test * refine * Fix FuseBN eval error (#6836) * fix arange bug * fix fuse bn * Remove redundant saved_tensor * fix bug * add more test case * add more random test case * add fuse functor when track_stats=false * fix backward errror when track_stats=false Co-authored-by: oneflow-ci-bot <[email protected]> * Remove KernelXxxContext::device_ctx()/device_ctx() (#6862) * pool code refine (#6853) * pool code refine * refine * format * fix static analysis error * fix max_pool_2d_grad name * prefix tf is used to pool functor name * fix Co-authored-by: oneflow-ci-bot <[email protected]> * Add cpu group conv impl (#6823) * add cpu group conv kernel, test success * add group conv cpu backward kernel * rename * update test case * fix comment * fix comments * fix comment * optimize again and fix ci eroor * fix error * fix ci error * fix ci_tidy error * fix ci error * revert code * fix bug * delete useless file * delete useless file * fix ci error Co-authored-by: oneflow-ci-bot <[email protected]> * Add nsys profile host thread name (#6865) Co-authored-by: oneflow-ci-bot <[email protected]> * Rename DeviceType::kGPU to DeviceType::kCUDA (#6863) * Rename DeviceType::kGPU to DeviceType::kCUDA * fix * fix typo Co-authored-by: oneflow-ci-bot <[email protected]> * Check modify op module (#6860) * Add arccos op and docstr * Check and modify Op module * delete register_tensor_op * Fix random ops (#6868) Co-authored-by: Bowen Chen <[email protected]> Co-authored-by: oneflow-ci-bot <[email protected]> * fix docstr problem (#6554) * fix docstr problem * fix * Update random.py Co-authored-by: Yao Chi <[email protected]> * fix retinanet (#6870) Co-authored-by: oneflow-ci-bot <[email protected]> * Optimize LayerNorm Forward (#6842) * layer_norm forward * test case * rm useless * int count to T count * fix * fix T mask to int mask, refine code * refine * refine * test case * format * fix Co-authored-by: oneflow-ci-bot <[email protected]> * Refactor last used device (#6852) * move last_used_device * refine * refine * fix pipeline delay ctrl edge between src subset tick and output (#6881) Co-authored-by: oneflow-ci-bot <[email protected]> * Support OpenVINO in xrt (#6709) * Support openvino in xrt * OpenVINO: add graph input and weight in op * OpenVINO: support more op * update follow review * update follow review * update follow review * Add doc for graph_config.py * update follow review * update follow review * modify after review * format * add xrt in check_src.py Co-authored-by: oneflow-ci-bot <[email protected]> * Dev optimize std vector (#6630) * use reserve * use emplace_back * refine * remove useless codes (#6859) * remove useless codes * fix index_select * fix expand error * fix expand error Co-authored-by: oneflow-ci-bot <[email protected]> * add alpha parameter in add_op (#6867) * add alpha parameter in add_op * format * refine * refine * refine * fix bug about dtype caused by alpha Co-authored-by: oneflow-ci-bot <[email protected]> * add fallback to cpu builder (#6582) * add fallback to cpu slice boxing * fix * fix * merge master * format * fix * modify graph.py (#6884) Co-authored-by: oneflow-ci-bot <[email protected]> * fix eye op attr name error (#6873) * fix eye op attr name error * refine * refine * fix * delete useless attr Co-authored-by: oneflow-ci-bot <[email protected]> * add inplace mul (#6861) * init commit for inplace mul * fix issue, format code * add tests and fix issues * format code * delete redundant code * Update oneflow/core/functional/impl/binary_functor.cpp Co-authored-by: Yinggang Wang <[email protected]> * refine * fix unit test * fix bug * refine * fix unittests * add boardcast test * refine * refine * fix ci issue Co-authored-by: Yinggang Wang <[email protected]> * Dev roialign (#6879) * dev roialign * testcase * fux * fix Co-authored-by: oneflow-ci-bot <[email protected]> * Pick Variant from Standalone Maybe (#6856) * refactor maybe: add variant * maybe: add optional and tests * maybe: add hash for optional & variant; support NullOpt for both optional & variant * maybe: more notes * maybe: binary search impl for Variant::Visit * maybe: add more relational operator to optional & variant * maybe: add nonstd::string_view * maybe: fix construct of optional & variant * maybe: support comparision for optional & variant * maybe: add monadic operations for optional * maybe: add error traits * maybe: add JUST and Maybe * maybe: remove useless comment * maybe: add more test * maybe: customizable JUST * maybe: add Map and Bind (AndThen) to Maybe * maybe: re-design JustConfig * maybe: rename xxxT to xxxS * maybe: fix method names * maybe: add maybe to cmake * maybe: fix error traits * maybe: rename fields & add aggregate type checking * maybe: move string_view to new file * maybe: rename fields for optional and error * maybe: new Value (no index checking, protected method) and Get (has check, public method) * maybe: remove DefaultArgument * Pick Variant from Standalone Maybe Co-authored-by: oneflow-ci-bot <[email protected]> * Limit CI run speed test on one machine (#6891) * Run speed test on one machine * fix * Add oneDNN (#6767) * add onednn cmake * add onednn stream engine * Successfully implement addn * add int64 double * optimization voctor * fix * fix merge master error * fix merge master * fix merge error * Add BUILD_ONEDNN cmake flags * fix format * fix onednn datatype * optmizer onedn type * modified for(n) => for(i) * modified ci * modified oneDNN.cmake * fix clang 10 error * rename BUILD_ONEDNN * Delete oneDNN installation path by mistake * fix ci error, c++: error: third_party_install/onednn/lib/libdnnl.a: No such file or directory * include(GNUInstallDirs) * print ci error * reformat * Only the first parameter can be operated inplace * format * fix inlcude onednn, add clang 11 support refernce Co-authored-by: oneflow-ci-bot <[email protected]> * Rework op import with new ods (#6883) * add naive impl * refine * refine * refine * add naive gen td * refine * refine * fix * refine * refine * refine * sort alphabetically * support optional * support Variadic * refine * refine * add conv * add input output order * add todo * add todo * refine * refine * refine * refine * naive bn order interface * fix includes * refine * refine * refine * refine * group ops * refine order * add math * refine * refine * refine * refine * refine * refine * add quantization ops * refine * refine * add detection * fix * refien * add new .td generated * refine * refine * refine * refine * refine * refine * refine * refine * Use generated ods in mlir (#6857) * refine * check in changes * refine * move pattern to another file * compile grouped op * refine * add todo * fix * add GetUserOpDef in wrapper * check in files * refine * refine * fix * refine * refine * refine * refine * refine * refine * refine tablegen * refine * fix * refine * refine * refine * refine * refine * refine * refine * refien * refine * refine * refine * refine * fix * refine * rm log * refine * refine * make ctrl edge type safe * refine * refine * refine * rm legacy code * refine * refien * refine * dirty trick addn2 without variadic deduction * fix jit op * refine * extract GetOutputLbn * refine * fix for single seg * refine * rm todo * update .mlir file * refine * add todo * refine * refine * refine * refine * refine * add log * refine * refine * make op_type_name type safe * refine * refine * refine * delete trainable * add IsOpConfCompatible * add IsImportCompatible * refine * refien * mv ir_pass.cpp out of core * refine * refine * refine * refine * refine * refine * gen new ods from master * refine * refine * update for tf pool ops * refine * refine * refine * refine APIs * refine order * rm * rm output_lbn_segment_keys * output_lbn_segment_sizes * rm output_lbns * refine * refine * refine * refine * fmt * use less cores to prevent OOM in CI * refine * refine Co-authored-by: BBuf <[email protected]> Co-authored-by: oneflow-ci-bot <[email protected]> * add cudnn.h (#6886) Co-authored-by: oneflow-ci-bot <[email protected]> Co-authored-by: Shenghang Tsai <[email protected]> * refine * refactor jit interpreter with updated ODS * refine Co-authored-by: Yu OuYang <[email protected]> Co-authored-by: oneflow-ci-bot <[email protected]> Co-authored-by: daquexian <[email protected]> Co-authored-by: guo ran <[email protected]> Co-authored-by: Juncheng <[email protected]> Co-authored-by: Li Xiang <[email protected]> Co-authored-by: dssgsra <[email protected]> Co-authored-by: Shijie <[email protected]> Co-authored-by: Li Xinqi <[email protected]> Co-authored-by: Liang Depeng <[email protected]> Co-authored-by: ZZK <[email protected]> Co-authored-by: liufengwei0103 <[email protected]> Co-authored-by: Luyang <[email protected]> Co-authored-by: Xiaoyu Xu <[email protected]> Co-authored-by: binbinHan <[email protected]> Co-authored-by: DangKai <[email protected]> Co-authored-by: Xiaoyu Zhang <[email protected]> Co-authored-by: cheng cheng <[email protected]> Co-authored-by: Houjiang Chen <[email protected]> Co-authored-by: Bowen Chen <[email protected]> Co-authored-by: Derek Zhang <[email protected]> Co-authored-by: Yao Chi <[email protected]> Co-authored-by: tingkuanpei <[email protected]> Co-authored-by: grybd <[email protected]> Co-authored-by: oneflow-ci-bot <[email protected]> Co-authored-by: Zhanghuihong Guan <[email protected]> Co-authored-by: Yinggang Wang <[email protected]> Co-authored-by: Twice <[email protected]> Co-authored-by: luqiang guo <[email protected]> Co-authored-by: BBuf <[email protected]>

Artem-B force-pushed the cuda-version-check branch from fd6e7a6 to e93d821 Compare October 25, 2019 17:21

alliepiper added this to the 1.11.1 milestone Oct 20, 2020

alliepiper changed the base branch from master to main October 21, 2020 13:30

alliepiper mentioned this pull request Oct 23, 2020

Instruction 'shfl' without '.sync' is deprecated NVIDIA/thrust#1327

Closed

alliepiper modified the milestones: 1.11.1, 1.11.0 Oct 23, 2020

Fix CUDA version detection in CUB

286570e

This fixes the problem with CUB using deprecated shfl/vote instructions when CUB is compiled with clang (e.g. some TensorFlow builds).

alliepiper force-pushed the cuda-version-check branch from e93d821 to 286570e Compare October 23, 2020 20:40

alliepiper added testing: gpuCI in progress Started gpuCI testing. testing: gpuCI passed Passed gpuCI testing. and removed testing: gpuCI in progress Started gpuCI testing. labels Oct 23, 2020

alliepiper added the testing: internal ci in progress Currently testing on internal NVIDIA CI (DVS). label Oct 30, 2020

alliepiper added testing: internal ci passed Passed internal NVIDIA CI (DVS). and removed testing: internal ci in progress Currently testing on internal NVIDIA CI (DVS). labels Nov 3, 2020

alliepiper approved these changes Nov 5, 2020

View reviewed changes

alliepiper merged commit daaa127 into NVIDIA:main Nov 5, 2020

daquexian added a commit to Oneflow-Inc/oneflow that referenced this pull request Nov 17, 2021

upgrade cub to 1.11.0 for NVIDIA/cub#170

fceaa1e

Signed-off-by: daquexian <[email protected]>

daquexian mentioned this pull request Nov 17, 2021

upgrade cub to 1.11.0 for https://github.com/NVIDIA/cub/pull/170 Oneflow-Inc/oneflow#6795

Merged

oneflow-ci-bot added a commit to Oneflow-Inc/oneflow that referenced this pull request Nov 18, 2021

upgrade cub to 1.11.0 for NVIDIA/cub#170 (#6795)

47e85b0

Signed-off-by: daquexian <[email protected]> Co-authored-by: oneflow-ci-bot <[email protected]>

masterleinad mentioned this pull request May 18, 2022

Update CUDA base images in CI arborx/ArborX#676

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix CUDA version detection in CUB #170

Fix CUDA version detection in CUB #170

Artem-B commented Sep 23, 2019

Artem-B commented Sep 23, 2019

RaulPPelaez commented Sep 24, 2019

Artem-B commented Sep 24, 2019 •

edited

Loading

Artem-B commented Oct 14, 2019

andrewcorrigan commented Oct 23, 2020

andrewcorrigan commented Oct 23, 2020

alliepiper commented Oct 23, 2020

alliepiper commented Oct 23, 2020

alliepiper commented Oct 30, 2020

Fix CUDA version detection in CUB #170

Fix CUDA version detection in CUB #170

Conversation

Artem-B commented Sep 23, 2019

Artem-B commented Sep 23, 2019

RaulPPelaez commented Sep 24, 2019

Artem-B commented Sep 24, 2019 • edited Loading

Artem-B commented Oct 14, 2019

andrewcorrigan commented Oct 23, 2020

andrewcorrigan commented Oct 23, 2020

alliepiper commented Oct 23, 2020

alliepiper commented Oct 23, 2020

alliepiper commented Oct 30, 2020

Artem-B commented Sep 24, 2019 •

edited

Loading