Skip to content
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
71 commits
Select commit Hold shift + click to select a range
3e4e91d
Solution and kernel writer rebased
wen-des Jan 26, 2026
148d196
Contraction and DataType rebase
javier-amd Jan 26, 2026
3b41f20
client modification
javier-amd Jan 27, 2026
1bc4230
Parameters and LocalRead rebased
wen-des Jan 27, 2026
32d700f
rocisa supportand other changes
javier-amd Jan 27, 2026
81c40d2
Components rebased
wen-des Jan 27, 2026
28f2602
writer related change and yaml
javier-amd Jan 27, 2026
d89ad91
Conversion rebased
wen-des Jan 27, 2026
2822693
Fixed errors in compiling
wen-des Jan 27, 2026
0df3258
Fixed python space issues
wen-des Jan 28, 2026
089cd5a
Bugfixed in python files and generated kernel sucessfully
wen-des Jan 28, 2026
74d6373
Disable swap address for mxsa/mxsb
wen-des Jan 28, 2026
e7cb8f6
Committed some missing fixes
wen-des Jan 28, 2026
f56cb78
Fixed mxsa/mxsb address offset
wen-des Jan 29, 2026
0649b1e
Added TODO memo for later consideration
wen-des Jan 29, 2026
b81161d
bpe function fix
javier-amd Jan 29, 2026
ee1714e
Bugfixed for the wrong address offset calculation
wen-des Jan 29, 2026
c89efb4
MX F8 functional testes passed in tensilelite
wen-des Jan 30, 2026
cf1251a
Updated f8 yaml file
wen-des Jan 30, 2026
8478d05
Removed the mx f6 yaml files for mx f6 is not ready by now
wen-des Feb 4, 2026
67b50bd
Updated f4 yaml file for test coverage
wen-des Feb 4, 2026
045f9ec
Standardize kernel names with MX types (#4363)
AlexBrownAMD Feb 6, 2026
9cb5440
Fix some errors breaking non-mx tests on mx branch (#4616)
AlexBrownAMD Feb 18, 2026
de7dee5
Fix for gfx950 mxfp4 DirectToLds (#4644)
nakajee Feb 18, 2026
e0a7991
[hipBLASLt] Enable MX data generation for Tensile host and support ca…
amd-chunxlin Feb 20, 2026
9e0422c
[hipBLASLt] Add block size into predicate for correct solution select…
amd-chunxlin Feb 20, 2026
7afd6fb
[Tensilelite] Add MXFP4 data generator for Tensile (#4597)
archana-ramalingam Feb 21, 2026
e91ecf3
Enable DirectToLds for MXSA/B and re-enable LdsPad for MXFP4 + Direc…
nakajee Feb 21, 2026
e0e6ecc
Fix data initialization (#4827)
bnemanich Feb 23, 2026
dab0b9c
Fix a verification fail with MXFP4 + non DTL (#4715)
nakajee Feb 24, 2026
a3654aa
[hipblaslt] Fixing build issues for gfx_950_mx_rebase (#4465)
NineKa Feb 25, 2026
fd621eb
[TensileLite] Fix MX FP4 scale data overwrite in initializeCPUInputs …
archana-ramalingam Feb 26, 2026
a2ce1ab
Fix stream-k with mx scaling (#4388)
AlexBrownAMD Feb 26, 2026
1c2fe0e
[hipblaslt] Fix fails with dtl.yaml and xfp32.yaml on gfx950_mx_rebas…
nakajee Feb 26, 2026
3b3c84b
Merge commit '4ffdf58b7d36b29ad86806c642e8d7aa930deeaf' into users/ho…
NineKa Feb 27, 2026
613ccdb
add kernel["ProblemType"]["Sparse"] to condition
NineKa Feb 27, 2026
a4a6368
Merge commit '0db944b2e05878e30d441fb1b32421096107ddf5' into users/ho…
nakajee Feb 27, 2026
337dbbe
fix dependency issues for tensilelite clients
NineKa Feb 27, 2026
20e4cb1
Merge commit '70b16b75e53a69200142bf27fa6f90771a0ba0c9' into users/ho…
NineKa Feb 27, 2026
85d98aa
fix computeInputType in tensilelite
NineKa Feb 27, 2026
e1a5bfb
Merge commit '7c3a3e5c044b8abbf77aaf97c2b93f303e763fff' into users/ho…
nakajee Feb 27, 2026
d5b8ff8
Merge branch 'develop' into users/hongjche/gfx950_mx_rebase_sync
NineKa Feb 28, 2026
9c46a42
fix computeInputType issue in ReferenceValidator.cpp
NineKa Feb 28, 2026
84d18f9
[hipblaslt] fix unit tests for gfx950_mx_rebase (#4912)
NineKa Mar 2, 2026
0a645bd
Merge branch 'gfx950_mx_rebase' into users/hongjche/gfx950_mx_rebase_…
NineKa Mar 2, 2026
9a3591d
[hipblaslt] Fix a verification fail with spmm_i8hs.yaml (#5034)
nakajee Mar 3, 2026
0f5c904
initial set of testcase for MXFP4 (#4739)
pdhirajkumarprasad Mar 3, 2026
8724f2f
Merge branch 'develop' into users/hongjche/gfx950_mx_rebase_sync
NineKa Mar 4, 2026
01f29ad
[Tensilelite] Add regression test for MX FP4 scale buffer determinism…
archana-ramalingam Mar 4, 2026
f83ef8e
Merge branch 'gfx950_mx_rebase' into users/hongjche/gfx950_mx_rebase_…
NineKa Mar 5, 2026
24c36e1
Merge branch 'develop' into users/hongjche/gfx950_mx_rebase_sync
NineKa Mar 5, 2026
024ba23
UseF32XEmulation in forceLrvwTile1 for B tensor (#5143)
talumbau Mar 5, 2026
8de6b1a
Merge branch 'users/hongjche/gfx950_mx_rebase_sync' into gfx950_mx_re…
NineKa Mar 5, 2026
ede3a2a
[hipblaslt] Enable StoreSwapAddr for MXFP4, plus add GRVWMXSA/B adust…
nakajee Mar 6, 2026
34bca88
[Tensilelite] Fix UserArgs struct stride mismatch in grouped GEMM (#…
archana-ramalingam Mar 6, 2026
946988a
[hipBLASLt] Disable failed mx f8 problem sizes (#5105)
amd-chunxlin Mar 6, 2026
dcc90b5
[hipblaslt] Scheduling related fixes for MXFP4 (#5169)
nakajee Mar 6, 2026
20b1923
Merge branch 'develop' into users/hongjche/gfx950_mx_rebase_sync
NineKa Mar 6, 2026
e09725d
remove explicit constructor from variable_value class
NineKa Mar 6, 2026
faa7dc7
fix return statement in hipDataType_to_tensile_type and add type chec…
NineKa Mar 6, 2026
ae18131
Merge branch 'users/hongjche/gfx950_mx_rebase_sync' into gfx950_mx_re…
NineKa Mar 6, 2026
2a4b814
[Tensilelite] Shuffle mx scaling data in Tensile (#4864)
archana-ramalingam Mar 9, 2026
b690c88
[hipblaslt] Fix fail with kringshift.yaml (#5228)
nakajee Mar 9, 2026
b967536
Merge branch 'develop' into users/hongjche/gfx950_mx_rebase_sync
NineKa Mar 9, 2026
59349d6
[hipblaslt] Optimize StoreSwapAddr (#5217)
nakajee Mar 9, 2026
945fd17
[hipblaslt] Enable MXFP4 + DtlPlusLdsBuf (#5251)
nakajee Mar 10, 2026
5babee6
Fix gfx12 build error with integer cast
Mar 10, 2026
73b68cf
[hipblaslt] Fix SIA3 issues with MXFP4 (#5245)
nakajee Mar 10, 2026
87181ea
[hipBLASLt] Fix CI failures for gfx942 (#5216)
amd-chunxlin Mar 11, 2026
b6c3d45
Make the usage side’s logic consistent with allocation side (tPackM) …
tomchengchitang Mar 11, 2026
ed39e63
Fix test_SolutionStructsUtilities
talumbau Mar 11, 2026
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
20 changes: 11 additions & 9 deletions projects/hipblaslt/clients/CMakeLists.txt
Original file line number Diff line number Diff line change
Expand Up @@ -28,19 +28,21 @@ target_link_libraries(hipblaslt-clients-common
hip::device
)

if(HIPBLASLT_ENABLE_ROCROLLER)
if(NOT ROCM_LIBS_SUPERBUILD)
if(HIPBLASLT_ENABLE_THEROCK)
find_package(mxDataGenerator REQUIRED)
else()
add_subdirectory("${CMAKE_CURRENT_SOURCE_DIR}/../../../shared/mxdatagenerator" "${CMAKE_CURRENT_BINARY_DIR}/mxdatagenerator")
endif()
if(NOT ROCM_LIBS_SUPERBUILD)
if(HIPBLASLT_ENABLE_THEROCK)
find_package(mxDataGenerator REQUIRED)
else()
add_subdirectory("${CMAKE_CURRENT_SOURCE_DIR}/../../../shared/mxdatagenerator" "${CMAKE_CURRENT_BINARY_DIR}/mxdatagenerator")
endif()
endif()
target_compile_features(hipblaslt-clients-common PRIVATE cxx_std_20)
target_link_libraries(hipblaslt-clients-common PRIVATE roc::mxDataGenerator)

if(HIPBLASLT_ENABLE_ROCROLLER)
target_compile_definitions(hipblaslt-clients-common PRIVATE HIPBLASLT_USE_ROCROLLER)
target_link_libraries(hipblaslt-clients-common PRIVATE roc::mxDataGenerator)
target_compile_features(hipblaslt-clients-common PRIVATE cxx_std_20)
endif()


if(HIPBLASLT_ENABLE_ASAN)
hipblaslt_target_configure_sanitizers(hipblaslt-clients-common PUBLIC)
endif()
Expand Down
7 changes: 3 additions & 4 deletions projects/hipblaslt/clients/common/include/mxDataGen.hpp
Original file line number Diff line number Diff line change
Expand Up @@ -26,10 +26,10 @@

#pragma once

#include <hipblaslt/hipblaslt.h>
#include <stdint.h>
#include <hip/hip_runtime.h>

#include <vector>

#ifdef HIPBLASLT_USE_ROCROLLER
std::vector<float> generateMXInput(hipDataType dataType,
void* data,
void* scale,
Expand All @@ -45,4 +45,3 @@ std::vector<float> generateMXInput(hipDataType dataType,
std::string_view const initMethod = "Bounded",
float min_val = -1.0f,
float max_val = 1.0f);
#endif
Original file line number Diff line number Diff line change
Expand Up @@ -1918,7 +1918,6 @@ void testing_matmul_with_bias(const Arguments& arg,

hipblaslt_seedrand();

#ifdef HIPBLASLT_USE_ROCROLLER
if(isBlockScaling(arg.scaleA))
{
if(arg.initialization != hipblaslt_initialization::hpl
Expand Down Expand Up @@ -1963,7 +1962,6 @@ void testing_matmul_with_bias(const Arguments& arg,
}
else
{
#endif
hipblaslt_init_device(ABC_dims::A,
arg.initialization,
alpha_isnan_type(arg, Talpha),
Expand All @@ -1975,7 +1973,6 @@ void testing_matmul_with_bias(const Arguments& arg,
(do_swizzle_a && stride_a[i] != 0) ? A_row[i] * A_col[i]
: stride_a[i],
num_batches[i]);
#ifdef HIPBLASLT_USE_ROCROLLER
}
if(isBlockScaling(arg.scaleB))
{
Expand Down Expand Up @@ -2019,7 +2016,6 @@ void testing_matmul_with_bias(const Arguments& arg,
}
else
{
#endif
hipblaslt_init_device(ABC_dims::B,
arg.initialization,
alpha_isnan_type(arg, Talpha),
Expand All @@ -2031,9 +2027,7 @@ void testing_matmul_with_bias(const Arguments& arg,
(do_swizzle_b && stride_b[i] != 0) ? B_row[i] * B_col[i]
: stride_b[i],
num_batches[i]);
#ifdef HIPBLASLT_USE_ROCROLLER
}
#endif
hipblaslt_init_device(ABC_dims::C,
arg.initialization,
beta_isnan_type(arg, Talpha),
Expand Down
4 changes: 1 addition & 3 deletions projects/hipblaslt/clients/common/src/CMakeLists.txt
Original file line number Diff line number Diff line change
Expand Up @@ -14,9 +14,7 @@ target_sources(hipblaslt-clients-common
"${CMAKE_CURRENT_SOURCE_DIR}/hipblaslt_init_device.cpp"
)

if(HIPBLASLT_ENABLE_ROCROLLER)
target_sources(hipblaslt-clients-common PRIVATE "${CMAKE_CURRENT_SOURCE_DIR}/mxDataGen.cpp")
endif()
target_sources(hipblaslt-clients-common PRIVATE "${CMAKE_CURRENT_SOURCE_DIR}/mxDataGen.cpp")

if(HIPBLASLT_ENABLE_BLIS)
target_sources(hipblaslt-clients-common
Expand Down
10 changes: 3 additions & 7 deletions projects/hipblaslt/clients/common/src/mxDataGen.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -234,7 +234,6 @@ std::vector<float> generateData(T dgen,

std::vector<uint8_t> scaleBytes = dgen.getScaleBytes();

#ifdef HIPBLASLT_USE_ROCROLLER
// Apply pre-swizzle to scale data
size_t scaleRows = sizes[0] / elementsPerMXBlock;
size_t scaleCols = sizes[1];
Expand All @@ -244,7 +243,6 @@ std::vector<float> generateData(T dgen,
scaleBytes = DGen::preSwizzleScalesGFX950(scaleBytes, {scaleCols, scaleRows});

}
#endif

std::memcpy(scale, scaleBytes.data(), scaleBytes.size() * sizeof(uint8_t));

Expand Down Expand Up @@ -290,7 +288,6 @@ std::vector<float> generateData(T dgen,
}
}

#ifdef HIPBLASLT_USE_ROCROLLER
/**
* @brief Generate random data for OCP (MX) F8/F6/F4 types
*
Expand Down Expand Up @@ -388,7 +385,7 @@ std::vector<float> generateMXInput(hipDataType dataType,
preSwizzleTile,
preTile);
}
else if(static_cast<hipDataType>(dataType) == HIP_R_6F_E2M3_EXT)
else if(static_cast<hipDataType>(dataType) == HIP_R_6F_E2M3)
{
DGen::DataGenerator<DGen::ocp_e2m3_mxfp6> dgen;
return generateData<decltype(dgen), DGen::ocp_e2m3_mxfp6>(dgen,
Expand All @@ -404,7 +401,7 @@ std::vector<float> generateMXInput(hipDataType dataType,
preSwizzleTile,
preTile);
}
else if(static_cast<hipDataType>(dataType) == HIP_R_6F_E3M2_EXT)
else if(static_cast<hipDataType>(dataType) == HIP_R_6F_E3M2)
{
DGen::DataGenerator<DGen::ocp_e3m2_mxfp6> dgen;
return generateData<decltype(dgen), DGen::ocp_e3m2_mxfp6>(dgen,
Expand All @@ -420,7 +417,7 @@ std::vector<float> generateMXInput(hipDataType dataType,
preSwizzleTile,
preTile);
}
else if(static_cast<hipDataType>(dataType) == HIP_R_4F_E2M1_EXT)
else if(static_cast<hipDataType>(dataType) == HIP_R_4F_E2M1)
{
DGen::DataGenerator<DGen::ocp_e2m1_mxfp4> dgen;
return generateData<decltype(dgen), DGen::ocp_e2m1_mxfp4>(dgen,
Expand All @@ -441,4 +438,3 @@ std::vector<float> generateMXInput(hipDataType dataType,
throw std::runtime_error("Unsupported data types in MX data generation!");
}
}
#endif
19 changes: 19 additions & 0 deletions projects/hipblaslt/clients/tests/data/matmul_gtest.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -2560,4 +2560,23 @@ Tests:
beta: [ 0.0, 1.0 ]
gpu_arch: '90a'

# This is for testing MX FP4 kernel using Tensile
- name: matmul_tensile_fp4
category: quick
function:
matmul:
- { a_type: f4_r, b_type: f4_r, c_type: f32_r, d_type: f32_r, compute_type: c_f32_r, scaleA: 3, scaleB: 3, scale_type: f32_r}
M: [2048]
N: [2048]
K: [4096]
transA: T
transB: N
alpha: 1.0
beta: 0.0
initialization: hpl
unit_check: 0
norm_check: 1
requested_solution_num: 1
gpu_arch: '950'

...
Loading
Loading