full CUDA Support for Blackwell by loscrossos · Pull Request #60 · Dao-AILab/causal-conv1d

loscrossos · 2025-05-28T15:37:28Z

this PR enabled full cuda support with the correct compute levels to leverage all kernels provided by the toolkit, CUDA 12.8 adds: 100, 101 and 120 (see https://docs.nvidia.com/cuda/archive/12.8.1/cuda-toolkit-release-notes/index.html ) and for CUDA 12.9 adds 103, 121 see https://docs.nvidia.com/cuda/cuda-toolkit-release-notes/index.html

Signed-off-by: LosCrossos <165311345+loscrossos@users.noreply.github.com>

trvachov · 2025-06-26T19:24:01Z

@johnnynunez does this PR partially obviate #45 ?

johnnynunez · 2025-06-26T20:12:26Z

@johnnynunez does this PR partially obviate #45 ?
flag cuda >= 13 add 11.0 codegen

11.0 is thor
12.1 is spark
10.3 i don’t know which device is

loscrossos · 2025-06-27T08:03:32Z

i think this PR and #45 do not collide that much. #45 does a lot of thinks and heavy refactoring. This PR is only about adding CUDA capabilities according to latest documentation from CTK:

for 12.8:

https://docs.nvidia.com/cuda/archive/12.8.0/cuda-features-archive/index.html

This release adds compiler support for the following Nvidia Blackwell GPU architectures: SM_100 SM_101 SM_120

and 12.9:
https://docs.nvidia.com/cuda/cuda-features-archive/index.html

CUDA Toolkit 12.9 adds compiler target support for SM architecture 10.3 (sm_103, sm_103f, sm_103a) and 12.1 (sm_121),

@johnnynunez thor is afaik: 101 and not "110" :) 103 is GB200
see:
https://en.wikipedia.org/wiki/CUDA

10.1 Jetson AGX Thor, DRIVE AGX Thor

https://docs.nvidia.com/cuda/pdf/CUDA_Toolkit_Release_Notes.pdf

Added hardware-accelerated JPEG encoding support for NVIDIA Jetson Thor hardware (Blackwell SM 10.1 architecture)

Also upon inspection on #45:

it seems you are adding support for Cuda Toolkit 13. but the latest official release is afaik 12.9.1

Also i think sm_110 does not exist in the official CUDATK documentation
https://docs.nvidia.com/cuda/pdf/CUDA_Toolkit_Release_Notes.pdf

Also it seems #45 does not include support for Cuda 12.9?

johnnynunez · 2025-06-27T08:33:45Z

i think this PR and #45 do not collide that much. #45 does a lot of thinks and heavy refactoring. This PR is only about adding CUDA capabilities according to latest documentation from CTK:

for 12.8:

https://docs.nvidia.com/cuda/archive/12.8.0/cuda-features-archive/index.html

This release adds compiler support for the following Nvidia Blackwell GPU architectures: SM_100 SM_101 SM_120

and 12.9: https://docs.nvidia.com/cuda/cuda-features-archive/index.html

CUDA Toolkit 12.9 adds compiler target support for SM architecture 10.3 (sm_103, sm_103f, sm_103a) and 12.1 (sm_121),

@johnnynunez thor is afaik: 101 and not "110" :) 103 is GB200 see: https://en.wikipedia.org/wiki/CUDA

10.1 Jetson AGX Thor, DRIVE AGX Thor

https://docs.nvidia.com/cuda/pdf/CUDA_Toolkit_Release_Notes.pdf

Added hardware-accelerated JPEG encoding support for NVIDIA Jetson Thor hardware (Blackwell SM 10.1 architecture)

Also upon inspection on #45:

it seems you are adding support for Cuda Toolkit 13. but the latest official release is afaik 12.9.1

Also i think sm_110 does not exist in the official CUDATK documentation https://docs.nvidia.com/cuda/pdf/CUDA_Toolkit_Release_Notes.pdf

Also it seems #45 does not include support for Cuda 12.9?

Hello, there is a confusion here.
Thor was 10.1 based on cuda arm tegra, but two weeks ago thor was updated to 11.0, i mention here:
pytorch/pytorch#156176

Why?
With the legacy driver (nvgpu) used for CUDA 12.9, Thor was operating with SM 10.1.
This changes to SM 11.0 when the newer driver model (OpenRM), which is intended for CUDA 13.0, is introduced.
Thor 10.1 --> 11.0
Spark 12.1

CUDA 13 comes mid of July

Jetson Orin will receive CUDA SBSA support Q1 2026

loscrossos · 2025-06-27T09:04:52Z

I dont think there is much of confusion:

my PR is based only on official documentation for CUDAtk. Therefore i can not comment on unreleased or undocumented features. Do you have sources for that? it seems you guys have more insights into nvidia development. I would be very interested in new developments :)

But the actual point is:

our PRs "collide" by like 3 lines (where we both actually fully agree): 12.8 supports 100, 101, 120.

Then my PR is about 12.9 which is not touched by #45

on the other side #45 is about CUDAtk 13, which is not touched by me and also that PR does a lot of refactoring and improvements which does not affect my PR at all.

johnnynunez · 2025-06-27T10:01:25Z

I dont think there is much of confusion:

my PR is based only on official documentation for CUDAtk. Therefore i can not comment on unreleased or undocumented features. Do you have sources for that? it seems you guys have more insights into nvidia development. I would be very interested in new developments :)

But the actual point is:

our PRs "collide" by like 3 lines (where we both actually fully agree): 12.8 supports 100, 101, 120.

Then my PR is about 12.9 which is not touched by #45

on the other side #45 is about CUDAtk 13, which is not touched by me and also that PR does a lot of refactoring and improvements which does not affect my PR at all.

it is internal information, sorry, 10.1 will disappear

adding changes for full CUDA Support for Blackwell.

c0659d6

Signed-off-by: LosCrossos <165311345+loscrossos@users.noreply.github.com>

This was referenced May 28, 2025

Get your windows wheels for all RTX cards including 50 series here! #59

Open

Added support for RTX 50 Series #56

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

full CUDA Support for Blackwell#60

full CUDA Support for Blackwell#60
loscrossos wants to merge 1 commit intoDao-AILab:mainfrom
loscrossos:causal_full_cuda_support

loscrossos commented May 28, 2025

Uh oh!

trvachov commented Jun 26, 2025

Uh oh!

johnnynunez commented Jun 26, 2025 •

edited

Loading

Uh oh!

loscrossos commented Jun 27, 2025 •

edited

Loading

Uh oh!

johnnynunez commented Jun 27, 2025

Uh oh!

loscrossos commented Jun 27, 2025 •

edited

Loading

Uh oh!

johnnynunez commented Jun 27, 2025 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

loscrossos commented May 28, 2025

Uh oh!

trvachov commented Jun 26, 2025

Uh oh!

johnnynunez commented Jun 26, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

loscrossos commented Jun 27, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

johnnynunez commented Jun 27, 2025

Uh oh!

loscrossos commented Jun 27, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

johnnynunez commented Jun 27, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

johnnynunez commented Jun 26, 2025 •

edited

Loading

loscrossos commented Jun 27, 2025 •

edited

Loading

loscrossos commented Jun 27, 2025 •

edited

Loading

johnnynunez commented Jun 27, 2025 •

edited

Loading