Skip to content

full CUDA Support for Blackwell#60

Open
loscrossos wants to merge 1 commit intoDao-AILab:mainfrom
loscrossos:causal_full_cuda_support
Open

full CUDA Support for Blackwell#60
loscrossos wants to merge 1 commit intoDao-AILab:mainfrom
loscrossos:causal_full_cuda_support

Conversation

@loscrossos
Copy link
Copy Markdown

this PR enabled full cuda support with the correct compute levels to leverage all kernels provided by the toolkit, CUDA 12.8 adds: 100, 101 and 120 (see https://docs.nvidia.com/cuda/archive/12.8.1/cuda-toolkit-release-notes/index.html ) and for CUDA 12.9 adds 103, 121 see https://docs.nvidia.com/cuda/cuda-toolkit-release-notes/index.html

Signed-off-by: LosCrossos <165311345+loscrossos@users.noreply.github.com>
@trvachov
Copy link
Copy Markdown

@johnnynunez does this PR partially obviate #45 ?

@johnnynunez
Copy link
Copy Markdown
Contributor

johnnynunez commented Jun 26, 2025

@johnnynunez does this PR partially obviate #45 ?
flag cuda >= 13 add 11.0 codegen

11.0 is thor
12.1 is spark
10.3 i don’t know which device is

@loscrossos
Copy link
Copy Markdown
Author

loscrossos commented Jun 27, 2025

i think this PR and #45 do not collide that much. #45 does a lot of thinks and heavy refactoring. This PR is only about adding CUDA capabilities according to latest documentation from CTK:

for 12.8:

https://docs.nvidia.com/cuda/archive/12.8.0/cuda-features-archive/index.html

This release adds compiler support for the following Nvidia Blackwell GPU architectures: SM_100 SM_101 SM_120

and 12.9:
https://docs.nvidia.com/cuda/cuda-features-archive/index.html

CUDA Toolkit 12.9 adds compiler target support for SM architecture 10.3 (sm_103, sm_103f, sm_103a) and 12.1 (sm_121),

@johnnynunez thor is afaik: 101 and not "110" :) 103 is GB200
see:
https://en.wikipedia.org/wiki/CUDA

10.1 Jetson AGX Thor, DRIVE AGX Thor

https://docs.nvidia.com/cuda/pdf/CUDA_Toolkit_Release_Notes.pdf

Added hardware-accelerated JPEG encoding support for NVIDIA Jetson Thor hardware (Blackwell SM 10.1 architecture)

Also upon inspection on #45:

image

it seems you are adding support for Cuda Toolkit 13. but the latest official release is afaik 12.9.1

Also i think sm_110 does not exist in the official CUDATK documentation
https://docs.nvidia.com/cuda/pdf/CUDA_Toolkit_Release_Notes.pdf

Also it seems #45 does not include support for Cuda 12.9?

@johnnynunez
Copy link
Copy Markdown
Contributor

i think this PR and #45 do not collide that much. #45 does a lot of thinks and heavy refactoring. This PR is only about adding CUDA capabilities according to latest documentation from CTK:

for 12.8:

https://docs.nvidia.com/cuda/archive/12.8.0/cuda-features-archive/index.html

This release adds compiler support for the following Nvidia Blackwell GPU architectures: SM_100 SM_101 SM_120

and 12.9: https://docs.nvidia.com/cuda/cuda-features-archive/index.html

CUDA Toolkit 12.9 adds compiler target support for SM architecture 10.3 (sm_103, sm_103f, sm_103a) and 12.1 (sm_121),

@johnnynunez thor is afaik: 101 and not "110" :) 103 is GB200 see: https://en.wikipedia.org/wiki/CUDA

10.1 Jetson AGX Thor, DRIVE AGX Thor

https://docs.nvidia.com/cuda/pdf/CUDA_Toolkit_Release_Notes.pdf

Added hardware-accelerated JPEG encoding support for NVIDIA Jetson Thor hardware (Blackwell SM 10.1 architecture)

Also upon inspection on #45:

image

it seems you are adding support for Cuda Toolkit 13. but the latest official release is afaik 12.9.1

Also i think sm_110 does not exist in the official CUDATK documentation https://docs.nvidia.com/cuda/pdf/CUDA_Toolkit_Release_Notes.pdf

Also it seems #45 does not include support for Cuda 12.9?

Hello, there is a confusion here.
Thor was 10.1 based on cuda arm tegra, but two weeks ago thor was updated to 11.0, i mention here:
pytorch/pytorch#156176

Why?
With the legacy driver (nvgpu) used for CUDA 12.9, Thor was operating with SM 10.1.
This changes to SM 11.0 when the newer driver model (OpenRM), which is intended for CUDA 13.0, is introduced.
Thor 10.1 --> 11.0
Spark 12.1

CUDA 13 comes mid of July

Jetson Orin will receive CUDA SBSA support Q1 2026

@loscrossos
Copy link
Copy Markdown
Author

loscrossos commented Jun 27, 2025

I dont think there is much of confusion:

my PR is based only on official documentation for CUDAtk. Therefore i can not comment on unreleased or undocumented features. Do you have sources for that? it seems you guys have more insights into nvidia development. I would be very interested in new developments :)

But the actual point is:

our PRs "collide" by like 3 lines (where we both actually fully agree): 12.8 supports 100, 101, 120.

Then my PR is about 12.9 which is not touched by #45

on the other side #45 is about CUDAtk 13, which is not touched by me and also that PR does a lot of refactoring and improvements which does not affect my PR at all.

@johnnynunez
Copy link
Copy Markdown
Contributor

johnnynunez commented Jun 27, 2025

I dont think there is much of confusion:

my PR is based only on official documentation for CUDAtk. Therefore i can not comment on unreleased or undocumented features. Do you have sources for that? it seems you guys have more insights into nvidia development. I would be very interested in new developments :)

But the actual point is:

our PRs "collide" by like 3 lines (where we both actually fully agree): 12.8 supports 100, 101, 120.

Then my PR is about 12.9 which is not touched by #45

on the other side #45 is about CUDAtk 13, which is not touched by me and also that PR does a lot of refactoring and improvements which does not affect my PR at all.

it is internal information, sorry, 10.1 will disappear

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants