-
NVIDIA
- Shanghai, China
-
23:40
(UTC +08:00)
Pinned Loading
-
TensorRT-LLM
TensorRT-LLM PublicForked from NVIDIA/TensorRT-LLM
TensorRT LLM provides users with an easy-to-use Python API to define Large Language Models (LLMs) and supports state-of-the-art optimizations to perform inference efficiently on NVIDIA GPUs. Tensor…
Python
-
cutlass
cutlass PublicForked from NVIDIA/cutlass
CUDA Templates and Python DSLs for High-Performance Linear Algebra
C++
-
cuda-kernels
cuda-kernels PublicA collection of high-performance CUDA kernels and experiments for learning and optimizing GPU compute primitives.
Cuda 1
-
matmul-cpu
matmul-cpu PublicHigh-performance CPU GEMM kernels (C = A·Bᵀ + C) optimized for LLM inference, featuring AVX2/AVX-512 SIMD and multi-threading. Benchmarked against OpenBLAS.
C++ 1
-
mini-sglang
mini-sglang PublicForked from sgl-project/mini-sglang
A compact implementation of SGLang, designed to demystify the complexities of modern LLM serving systems.
Python
-
If the problem persists, check the GitHub status page or contact support.

