Support more types in decoupled lookback fastpath by bernhardmgruber · Pull Request #7575 · NVIDIA/cccl

bernhardmgruber · 2026-02-09T14:04:46Z

The goal is to reduce the use of is_primitive<T> in the long term.

cub/cub/agent/single_pass_scan_operators.cuh

fbusato · 2026-02-11T17:05:21Z

cub/cub/util_arch.cuh

 static constexpr ::cuda::std::size_t max_smem_per_block = 48 * 1024;

+// The size in bytes of the largest machine word that can be atomically read/written in a single instruction.
+inline static constexpr int largest_atomic_word_size = 16;


I don't think this is true for all architectures

I don't know TBH. We hardcoded 16 in the past and I know the build-in atomics like __nv_atomic_load support up to 16 bytes. Also, cub::detail::store_release etc. are also only implemented for up to 16 bytes. I think we are fine wrt. this PR, but we may revisit this in the future.

I believe this requires SM90 otherwise its only 8

I think the name of the variable may not be accurate then. We have ld.acquire.gpu.v2.u64 on SM70, and ld.cg.v2.u64+__threadfence() before that. (I am looking at the implementation of cub::detail::load_acquire). What this variable means is what is the largest message size in bytes that we can pass from one thread to another using a store release and a load acquire.

fbusato · 2026-02-11T17:07:27Z

cub/cub/agent/single_pass_scan_operators.cuh

-  ::cuda::std::_If<is_primitive<ValueT>::value && (sizeof(ValueT) + sizeof(KeyT) < 16),
+  // TODO(bgruber): remove the check for is_primitive<ValueT> in CCCL 4.0
+  ::cuda::std::_If<(is_primitive<ValueT>::value || ::cuda::std::is_trivially_copyable_v<ValueT>)
+                     && (sizeof(ValueT) + sizeof(KeyT) < largest_atomic_word_size),


(sizeof(ValueT) + sizeof(KeyT) < largest_atomic_word_size) or
(sizeof(ValueT) + sizeof(KeyT) <= largest_atomic_word_size)?

I think it must be less than, since we also need to fit the status word into the 16 bytes.

cub/cub/agent/single_pass_scan_operators.cuh

The goal is to reduce the use of is_primitive<T> in the long term.

bernhardmgruber · 2026-02-11T20:29:29Z

cub/test/catch2_test_device_scan_invalid.cu

+static_assert(!cub::detail::is_primitive_v<segment>);
+// static_assert(!cuda::std::is_trivially_copyable_v<segment>); // TODO(bgruber): why is this important?


@pauleonix I need your input here please. Why does segment need to be a non-primitive type and what should this mean here? Because it is trivially copyable.

We wanted to explicitly test the non-primitive path.

The changes in this PR make segment take the primitive path now, because it's trivially copyable. So I guess segment must be made non-trivially copyable by e.g. adding a non-trivial destructor or a copy ctor? Does that make sense?

Confirmed offline.

github-actions · 2026-02-12T02:34:55Z

😬 CI Workflow Results

🟥 Finished in 6h 04m: Pass: 92%/99 | Total: 7d 05h | Max: 6h 04m | Hits: 27%/102486

See results here.

pauleonix · 2026-02-14T00:25:15Z

cub/cub/agent/single_pass_scan_operators.cuh

-template <typename T, bool SINGLE_WORD = detail::is_primitive<T>::value>
+template <typename T,
+          // TODO(bgruber): remove the check for is_primitive<T> in CCCL 4.0
+          bool single_word = detail::is_primitive<T>::value


Suggested change

bool single_word = detail::is_primitive<T>::value

bool SingleWord = detail::is_primitive<T>::value

pauleonix · 2026-02-14T00:29:10Z

cub/cub/agent/single_pass_scan_operators.cuh

          typename KeyT,
-          bool SINGLE_WORD = detail::is_primitive<ValueT>::value && (sizeof(ValueT) + sizeof(KeyT) < 16)>
+          // TODO(bgruber): remove the check for is_primitive<ValueT> in CCCL 4.0
+          bool single_word = (detail::is_primitive<ValueT>::value || ::cuda::std::is_trivially_copyable_v<ValueT>)


Suggested change

bool single_word = (detail::is_primitive<ValueT>::value || ::cuda::std::is_trivially_copyable_v<ValueT>)

bool SingleWord = (detail::is_primitive<ValueT>::value || ::cuda::std::is_trivially_copyable_v<ValueT>)

bernhardmgruber requested a review from a team as a code owner February 9, 2026 14:04

bernhardmgruber requested a review from pauleonix February 9, 2026 14:04

github-project-automation bot added this to CCCL Feb 9, 2026

github-project-automation bot moved this to Todo in CCCL Feb 9, 2026

cccl-authenticator-app bot moved this from Todo to In Review in CCCL Feb 9, 2026

This comment has been minimized.

Sign in to view

fbusato approved these changes Feb 11, 2026

View reviewed changes

bernhardmgruber added 2 commits February 11, 2026 21:28

Support more types in decoupled lookback fastpath

8f63dd3

The goal is to reduce the use of is_primitive<T> in the long term.

Reviewer feedback

cb10516

bernhardmgruber force-pushed the lookback_types branch from 0c8dc6d to cb10516 Compare February 11, 2026 20:28

bernhardmgruber commented Feb 11, 2026

View reviewed changes

pauleonix approved these changes Feb 14, 2026

View reviewed changes

		static_assert(!cub::detail::is_primitive_v<segment>);
		// static_assert(!cuda::std::is_trivially_copyable_v<segment>); // TODO(bgruber): why is this important?

	bool single_word = detail::is_primitive<T>::value
	bool SingleWord = detail::is_primitive<T>::value

	bool single_word = (detail::is_primitive<ValueT>::value \|\| ::cuda::std::is_trivially_copyable_v<ValueT>)
	bool SingleWord = (detail::is_primitive<ValueT>::value \|\| ::cuda::std::is_trivially_copyable_v<ValueT>)

Conversation

bernhardmgruber commented Feb 9, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

This comment has been minimized.

This comment has been minimized.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

github-actions bot commented Feb 12, 2026

😬 CI Workflow Results

🟥 Finished in 6h 04m: Pass: 92%/99 | Total: 7d 05h | Max: 6h 04m | Hits: 27%/102486

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

bernhardmgruber commented Feb 9, 2026 •

edited

Loading