Skip to content
This repository has been archived by the owner on Mar 21, 2024. It is now read-only.

Add BLOCK_LOAD_STRIPED and BLOCK_STORE_STRIPED #274

Merged
merged 8 commits into from
Jun 11, 2021
Merged

Conversation

mnicely
Copy link
Collaborator

@mnicely mnicely commented Mar 25, 2021

This PR adds the following to BlockLoadAlgorithm

  1. BLOCK_LOAD_STRIPED
    It's basically BLOCK_LOAD_TRANSPOSE without the BlockExchange

This PR adds the following to BlockStoreAlgorithm

  1. BLOCK_STORE_STRIPED
    It's basically BLOCK_STORE_TRANSPOSE without the BlockExchange

@mnicely mnicely requested a review from alliepiper March 25, 2021 13:23
@mnicely mnicely self-assigned this Mar 25, 2021
@mnicely
Copy link
Collaborator Author

mnicely commented Mar 25, 2021

@llukas & @jszuppe for vis

@mnicely mnicely changed the title Add BLOCK_LOAD_STRIPED and BLOCK_STORE_STRIPED [WIP] Add BLOCK_LOAD_STRIPED and BLOCK_STORE_STRIPED Mar 25, 2021
@alliepiper
Copy link
Collaborator

This looks good, I'll plan to take a closer look sometime next week.

Can you update the test_block_load_store.cuh test? It should just take an extra line in this block that exercises the new BlockAlgorithm.

@alliepiper alliepiper added the type: enhancement New feature or request. label Mar 26, 2021
@alliepiper alliepiper added this to the 1.13.0 milestone Mar 26, 2021
@mnicely
Copy link
Collaborator Author

mnicely commented Mar 26, 2021

Thanks @allisonvacanti. Looks like everything passes.

...
      Start  3: cub.cpp14.test.block_load_store.thorough
 3/52 Test  #3: cub.cpp14.test.block_load_store.thorough ..............   Passed    0.14 sec
      Start  4: cub.cpp14.test.block_load_store.benchmark
 4/52 Test  #4: cub.cpp14.test.block_load_store.benchmark .............   Passed    0.10 sec
...

This PR covers our C2C scenarios. I need to think more about how R2C and C2R might work

@mnicely mnicely changed the title [WIP] Add BLOCK_LOAD_STRIPED and BLOCK_STORE_STRIPED Add BLOCK_LOAD_STRIPED and BLOCK_STORE_STRIPED Mar 26, 2021
@brycelelbach brycelelbach added the P1: should have Necessary, but not critical. label Mar 29, 2021
@alliepiper alliepiper assigned alliepiper and unassigned mnicely May 10, 2021
@alliepiper
Copy link
Collaborator

LGTM -- Starting tests.

alliepiper added a commit to alliepiper/thrust that referenced this pull request May 10, 2021
@alliepiper
Copy link
Collaborator

DVS CL: 29946764
gpuCI: NVIDIA/thrust#1427

@alliepiper alliepiper added testing: internal ci in progress Currently testing on internal NVIDIA CI (DVS). testing: gpuCI in progress Started gpuCI testing. labels May 10, 2021
@alliepiper
Copy link
Collaborator

gpuCI identified some issues building with nvcc+clang. @mnicely can you take a look? Looks like a missing member variable.

../dependencies/cub/cub/block/block_store.cuh(628): error #245: a nonstatic member reference must be relative to a specific object
          detected during:
            instantiation of "void cub::BlockStore<T, BLOCK_DIM_X, ITEMS_PER_THREAD, ALGORITHM, BLOCK_DIM_Y, BLOCK_DIM_Z, PTX_ARCH>::StoreInternal<cub::BLOCK_STORE_STRIPED, DUMMY>::Store(OutputIteratorT, T (&)[ITEMS_PER_THREAD], int) [with T=char, BLOCK_DIM_X=15, ITEMS_PER_THREAD=1, ALGORITHM=cub::BLOCK_STORE_STRIPED, BLOCK_DIM_Y=1, BLOCK_DIM_Z=1, PTX_ARCH=800, DUMMY=0, OutputIteratorT=cub::DiscardOutputIterator<std::ptrdiff_t>]" 
(1048): here
            instantiation of "void cub::BlockStore<T, BLOCK_DIM_X, ITEMS_PER_THREAD, ALGORITHM, BLOCK_DIM_Y, BLOCK_DIM_Z, PTX_ARCH>::Store(OutputIteratorT, T (&)[ITEMS_PER_THREAD], int) [with T=char, BLOCK_DIM_X=15, ITEMS_PER_THREAD=1, ALGORITHM=cub::BLOCK_STORE_STRIPED, BLOCK_DIM_Y=1, BLOCK_DIM_Z=1, PTX_ARCH=800, OutputIteratorT=cub::DiscardOutputIterator<std::ptrdiff_t>]" 
../dependencies/cub/test/test_block_load_store.cu(137): here
            instantiation of "void Kernel<BLOCK_THREADS,ITEMS_PER_THREAD,LOAD_ALGORITHM,STORE_ALGORITHM,InputIteratorT,OutputIteratorT>(InputIteratorT, OutputIteratorT, OutputIteratorT, int) [with BLOCK_THREADS=15, ITEMS_PER_THREAD=1, LOAD_ALGORITHM=cub::BLOCK_LOAD_STRIPED, STORE_ALGORITHM=cub::BLOCK_STORE_STRIPED, InputIteratorT=const char *, OutputIteratorT=cub::DiscardOutputIterator<std::ptrdiff_t>]" 
../dependencies/cub/test/test_block_load_store.cu(180): here
            instantiation of "void TestKernel<T,BLOCK_THREADS,ITEMS_PER_THREAD,LOAD_ALGORITHM,STORE_ALGORITHM,InputIteratorT,OutputIteratorT>(T *, InputIteratorT, OutputIteratorT, OutputIteratorT, T *, T *, int, int) [with T=char, BLOCK_THREADS=15, ITEMS_PER_THREAD=1, LOAD_ALGORITHM=cub::BLOCK_LOAD_STRIPED, STORE_ALGORITHM=cub::BLOCK_STORE_STRIPED, InputIteratorT=const char *, OutputIteratorT=char *]" 
../dependencies/cub/test/test_block_load_store.cu(261): here
            instantiation of "void TestNative<T,BLOCK_THREADS,ITEMS_PER_THREAD,LOAD_ALGORITHM,STORE_ALGORITHM>(int, float, cub::Int2Type<1>) [with T=char, BLOCK_THREADS=15, ITEMS_PER_THREAD=1, LOAD_ALGORITHM=cub::BLOCK_LOAD_STRIPED, STORE_ALGORITHM=cub::BLOCK_STORE_STRIPED]" 
../dependencies/cub/test/test_block_load_store.cu(403): here
            instantiation of "void TestPointerType<T,BLOCK_THREADS,ITEMS_PER_THREAD,LOAD_ALGORITHM,STORE_ALGORITHM>(int, float) [with T=char, BLOCK_THREADS=15, ITEMS_PER_THREAD=1, LOAD_ALGORITHM=cub::BLOCK_LOAD_STRIPED, STORE_ALGORITHM=cub::BLOCK_STORE_STRIPED]" 
../dependencies/cub/test/test_block_load_store.cu(442): here
            instantiation of "void TestStrategy<T,BLOCK_THREADS,ITEMS_PER_THREAD>(int, float, cub::Int2Type<0>) [with T=char, BLOCK_THREADS=15, ITEMS_PER_THREAD=1]" 
../dependencies/cub/test/test_block_load_store.cu(476): here
            instantiation of "void TestItemsPerThread<T,BLOCK_THREADS>(int, float) [with T=char, BLOCK_THREADS=15]" 
../dependencies/cub/test/test_block_load_store.cu(491): here
            instantiation of "void TestThreads<T>(int, float) [with T=char]" 
../dependencies/cub/test/test_block_load_store.cu(534): here

@alliepiper alliepiper assigned mnicely and unassigned alliepiper May 12, 2021
@alliepiper alliepiper removed testing: internal ci in progress Currently testing on internal NVIDIA CI (DVS). testing: gpuCI in progress Started gpuCI testing. labels May 12, 2021
@mnicely
Copy link
Collaborator Author

mnicely commented Jun 8, 2021

@canonizer I didn't need those for my use case, but I might not be hard to add...
@allisonvacanti If you have not objections, I'll see if I can get everything in today.

@mnicely
Copy link
Collaborator Author

mnicely commented Jun 8, 2021

I'm getting a bunch of errors trying to build CUB and test this go-around. Maybe it's best we wait till after 1.13 has been released and revisit.

...
/home/belt/workStuff/git_examples/cub/thrust/dependencies/cub/cub/iterator/tex_ref_input_iterator.cuh(133): error #1215-D: function "tex1Dfetch(texture<T, 1, cudaReadModeElementType>, int) [with T=uint2]"
/usr/local/cuda/bin/../targets/x86_64-linux/include/texture_fetch_functions.h(132): here was declared deprecated
          detected during:
            instantiation of "T cub::<unnamed>::IteratorTexRef<T>::TexId<UNIQUE_ID>::Fetch(Distance) [with T=TestBar, UNIQUE_ID=554, Distance=ptrdiff_t]" 
(299): here
            instantiation of "cub::TexRefInputIterator<T, UNIQUE_ID, OffsetT>::reference cub::TexRefInputIterator<T, UNIQUE_ID, OffsetT>::operator*() const [with T=TestBar, UNIQUE_ID=554, OffsetT=ptrdiff_t]" 
/home/belt/workStuff/git_examples/cub/thrust/dependencies/cub/test/test_iterator.cu(113): here
            instantiation of "void Kernel(InputIteratorT, T *, InputIteratorT *) [with InputIteratorT=cub::TexRefInputIterator<TestBar, 554, ptrdiff_t>, T=TestBar]" 
/home/belt/workStuff/git_examples/cub/thrust/dependencies/cub/test/test_iterator.cu(160): here
            instantiation of "void Test(InputIteratorT, T (&)[TEST_VALUES]) [with InputIteratorT=cub::TexRefInputIterator<TestBar, 554, ptrdiff_t>, T=TestBar, TEST_VALUES=8]" 
/home/belt/workStuff/git_examples/cub/thrust/dependencies/cub/test/test_iterator.cu(561): here
            instantiation of "void TestTexRef<T,CastT>() [with T=TestBar, CastT=TestBar]" 
/home/belt/workStuff/git_examples/cub/thrust/dependencies/cub/test/test_iterator.cu(694): here
            instantiation of "void Test<T,CastT>(cub::Int2Type<0>) [with T=TestBar, CastT=TestBar]" 
/home/belt/workStuff/git_examples/cub/thrust/dependencies/cub/test/test_iterator.cu(726): here
            instantiation of "void Test<T>() [with T=TestBar]" 
/home/belt/workStuff/git_examples/cub/thrust/dependencies/cub/test/test_iterator.cu(797): here
...
60 errors detected in the compilation of "/home/belt/workStuff/git_examples/cub/thrust/dependencies/cub/test/test_iterator.cu".
make[2]: *** [dependencies/cub/test/CMakeFiles/cub.cpp14.test.iterator.dir/build.make:76: dependencies/cub/test/CMakeFiles/cub.cpp14.test.iterator.dir/test_iterator.cu.o] Error 1
make[1]: *** [CMakeFiles/Makefile2:18203: dependencies/cub/test/CMakeFiles/cub.cpp14.test.iterator.dir/all] Error 2

@alliepiper
Copy link
Collaborator

Hmm, strange -- that error should be suppressed by our build system. It's just a deprecation warning and can be ignored.

What platform are you seeing that on?

@mnicely
Copy link
Collaborator Author

mnicely commented Jun 8, 2021

I'm using CTK 11.3U1 and Ubuntu 20.04.

I've merged in the latest from main into my branch and I'm using the updated build instructions.

@mnicely
Copy link
Collaborator Author

mnicely commented Jun 8, 2021

Should one of these be ON?

THRUST_IGNORE_CUB_VERSION_CHECK:BOOL=OFF
THRUST_IGNORE_DEPRECATED_COMPILER:BOOL=OFF
THRUST_IGNORE_DEPRECATED_CPP_11:BOOL=OFF
THRUST_IGNORE_DEPRECATED_CPP_DIALECT:BOOL=OFF

@alliepiper
Copy link
Collaborator

No, those affect other things. I'll look into it, but that's a known issue that shouldn't hold things up, and 1.13 probably won't fix it.

Let me know if you are planning to add the other load algorithms Andy suggested, or if you want to leave that as future work. If you're finished with this patch for now, I'll restart the tests -- our CI seems to suppress the warning as intended, so we can just validate there.

@mnicely
Copy link
Collaborator Author

mnicely commented Jun 8, 2021

I think it'll be easier to hold off on BLOCK_*_WARP_STRIDED.

We can revisit it later. I'll create a new PR and branch off 1.13 when it's released.

@alliepiper
Copy link
Collaborator

Sounds good 👍🏻 I'll go ahead and start the tests.

Copy link
Collaborator

@alliepiper alliepiper left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Needs a bit more work -- @mnicely can you add the missing member variable?

TempStorage temp_storage;

needs to be added as a member to the new StoreInternal specialization.

int valid_items) ///< [in] Number of valid items to write
{
if (linear_tid == 0)
temp_storage.valid_items = valid_items; // Move through volatile smem as a workaround to prevent RF spilling on subsequent loads
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That last commit only added an initialization to the constructor, there still needs to be a temp_storage member variable defined.

@alliepiper
Copy link
Collaborator

Thanks! This looks good to me, I should be able to land it in time for 1.13 👍

@alliepiper alliepiper assigned alliepiper and unassigned mnicely Jun 9, 2021
alliepiper added a commit to alliepiper/thrust that referenced this pull request Jun 9, 2021
@alliepiper
Copy link
Collaborator

alliepiper commented Jun 9, 2021

DVS CL: 30058071
gpuCI: NVIDIA/thrust#1427

@alliepiper alliepiper added testing: internal ci in progress Currently testing on internal NVIDIA CI (DVS). testing: gpuCI in progress Started gpuCI testing. labels Jun 9, 2021
@alliepiper alliepiper added testing: internal ci passed Passed internal NVIDIA CI (DVS). testing: gpuCI passed Passed gpuCI testing. and removed testing: internal ci in progress Currently testing on internal NVIDIA CI (DVS). testing: gpuCI in progress Started gpuCI testing. labels Jun 11, 2021
@alliepiper
Copy link
Collaborator

Tests look good, merging.

@alliepiper alliepiper merged commit d056a9a into NVIDIA:main Jun 11, 2021
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
P1: should have Necessary, but not critical. testing: gpuCI passed Passed gpuCI testing. testing: internal ci passed Passed internal NVIDIA CI (DVS). type: enhancement New feature or request.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants