[Hipblastl][tensilelite] Speed up AlmostEqual by Alex-Vasile · Pull Request #4909 · ROCm/rocm-libraries

Alex-Vasile · 2026-02-26T02:45:21Z

Motivation

Current Implementation of AlmostEquals stores intermediary results back in input datatypes (e.g. Half or Float8) which are not natively supported by cpus.

This results in a lot of conversions back and forth within the the hot loop of ReferenceValidator.cpp. So much so, that for several Tensile yaml files, this comparison function ends up taking more time than the reference gemm calculation.

This part 1 of 2. Follow up PR will focus on restructuring to be able to vectorize and unroll the hot loop.

Also, not all tests were correctly handling infinity and nan checks.

Technical Details

The current implementations keep intermediary results in input format (e.g. Half) which is not natively supported by CPUs. This introduces a lot of extra conversions for each calculation.

For all non-native dtypes, cast them to float (which is big enough for their precision and range) and perform all operations with floats.

Now all types which represent inf and nan will handle these cases correctly using the first part of the check (a == b || <abs_diff_check>).

Test Plan

Run current tests.
Add extensive testing of current implementation.

Test Result

Passing tests

Submission Checklist

Look over the contributing guidelines at https://github.com/ROCm/ROCm/blob/develop/CONTRIBUTING.md#pull-requests.

codecov-commenter · 2026-02-26T04:04:59Z

Codecov Report

✅ All modified and coverable lines are covered by tests.

❌ Your project status has failed because the head coverage (76.88%) is below the target coverage (80.00%). You can increase the head coverage or adjust the target coverage.

Additional details and impacted files

@@           Coverage Diff            @@
##           develop    #4909   +/-   ##
========================================
  Coverage    66.18%   66.18%           
========================================
  Files         1742     1742           
  Lines       269819   269819           
  Branches     37507    37507           
========================================
  Hits        178577   178577           
  Misses       75570    75570           
  Partials     15672    15672

Flag	Coverage Δ	*Carryforward flag
hipBLAS	`90.67% <ø> (ø)`	Carriedforward from 8e3a4bb
hipBLASLt	`43.55% <ø> (ø)`
hipCUB	`82.38% <ø> (ø)`	Carriedforward from 8e3a4bb
hipDNN	`81.95% <ø> (ø)`	Carriedforward from 8e3a4bb
hipFFT	`55.93% <ø> (ø)`	Carriedforward from 8e3a4bb
hipRAND	`76.12% <ø> (ø)`	Carriedforward from 8e3a4bb
hipSOLVER	`68.81% <ø> (ø)`	Carriedforward from 8e3a4bb
hipSPARSE	`84.70% <ø> (ø)`	Carriedforward from 8e3a4bb
rocBLAS	`47.97% <ø> (ø)`	Carriedforward from 8e3a4bb
rocFFT	`52.93% <ø> (ø)`	Carriedforward from 8e3a4bb
rocRAND	`57.06% <ø> (ø)`	Carriedforward from 8e3a4bb
rocSOLVER	`76.88% <ø> (ø)`	Carriedforward from 8e3a4bb
rocSPARSE	`71.53% <ø> (ø)`	Carriedforward from 8e3a4bb

*This pull request uses carry forward flags. Click here to find out more.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

talumbau

LGTM 👍

math-ci-webhook · 2026-02-28T00:03:48Z

perfci run on commit `cb6bbf6`

math-ci run

Signed-off-by: Alex Vasile <48962821+Alex-Vasile@users.noreply.github.com>

math-ci-webhook · 2026-03-02T15:36:51Z

perfci run on commit `6601702`

math-ci run

[Hipblastl][tensilelite] Speed up AlmostEqual ## Motivation Current Implementation of AlmostEquals stores intermediary results back in input datatypes (e.g. Half or Float8) which are not natively supported by cpus. This results in a lot of conversions back and forth within the the hot loop of ReferenceValidator.cpp. So much so, that for several Tensile yaml files, this comparison function **ends up taking more time than the reference gemm calculation**. This part 1 of 2. Follow up PR will focus on restructuring to be able to vectorize and unroll the hot loop. Also, not all tests were correctly handling infinity and nan checks. ## Technical Details The current implementations keep intermediary results in input format (e.g. Half) which is not natively supported by CPUs. This introduces a lot of extra conversions for each calculation. For all non-native dtypes, cast them to float (which is big enough for their precision and range) and perform all operations with floats. Now all types which represent inf and nan will handle these cases correctly using the first part of the check (`a == b || <abs_diff_check>`). ## Test Plan 1. Run current tests. 2. Add extensive testing of current implementation. ## Test Result Passing tests ## Submission Checklist - [x] Look over the contributing guidelines at https://github.com/ROCm/ROCm/blob/develop/CONTRIBUTING.md#pull-requests. Signed-off-by: Alex Vasile <48962821+Alex-Vasile@users.noreply.github.com>

## Motivation Current Implementation of AlmostEquals stores intermediary results back in input datatypes (e.g. Half or Float8) which are not natively supported by cpus. This results in a lot of conversions back and forth within the the hot loop of ReferenceValidator.cpp. So much so, that for several Tensile yaml files, this comparison function **ends up taking more time than the reference gemm calculation**. This part 1 of 2. Follow up PR will focus on restructuring to be able to vectorize and unroll the hot loop. Also, not all tests were correctly handling infinity and nan checks. ## Technical Details The current implementations keep intermediary results in input format (e.g. Half) which is not natively supported by CPUs. This introduces a lot of extra conversions for each calculation. For all non-native dtypes, cast them to float (which is big enough for their precision and range) and perform all operations with floats. Now all types which represent inf and nan will handle these cases correctly using the first part of the check (`a == b || <abs_diff_check>`). ## Test Plan 1. Run current tests. 2. Add extensive testing of current implementation. ## Test Result Passing tests ## Submission Checklist - [x] Look over the contributing guidelines at https://github.com/ROCm/ROCm/blob/develop/CONTRIBUTING.md#pull-requests. Signed-off-by: Alex Vasile <48962821+Alex-Vasile@users.noreply.github.com>

Alex-Vasile requested review from newling and talumbau February 26, 2026 02:45

Alex-Vasile marked this pull request as ready for review February 26, 2026 02:45

Alex-Vasile requested review from a team as code owners February 26, 2026 02:45

github-actions bot added project: hipblaslt project: hipsparselt ci:hipsparselt-fast labels Feb 26, 2026

assistant-librarian bot added the organization: ROCm label Feb 26, 2026

Alex-Vasile requested a review from carsonbrownlee February 27, 2026 18:46

Alex-Vasile force-pushed the users/alvasile/faster_comparison branch from 74b1745 to 8cd9c60 Compare February 27, 2026 19:05

talumbau approved these changes Feb 27, 2026

View reviewed changes

carsonbrownlee approved these changes Feb 27, 2026

View reviewed changes

Alex-Vasile enabled auto-merge (squash) February 27, 2026 22:22

Alex-Vasile force-pushed the users/alvasile/faster_comparison branch from 8cd9c60 to 7abee7c Compare February 27, 2026 22:23

Speed up comparison by converting to fp32

26f5b04

Signed-off-by: Alex Vasile <48962821+Alex-Vasile@users.noreply.github.com>

Alex-Vasile force-pushed the users/alvasile/faster_comparison branch from 7abee7c to 26f5b04 Compare March 2, 2026 14:06

bstefanuk approved these changes Mar 2, 2026

View reviewed changes

Alex-Vasile merged commit 4ca3b9f into develop Mar 2, 2026
34 checks passed

Alex-Vasile deleted the users/alvasile/faster_comparison branch March 2, 2026 20:05

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Hipblastl][tensilelite] Speed up AlmostEqual#4909

[Hipblastl][tensilelite] Speed up AlmostEqual#4909
Alex-Vasile merged 1 commit intodevelopfrom
users/alvasile/faster_comparison

Alex-Vasile commented Feb 26, 2026 •

edited

Loading

Uh oh!

codecov-commenter commented Feb 26, 2026 •

edited

Loading

Uh oh!

talumbau left a comment

Uh oh!

math-ci-webhook bot commented Feb 28, 2026

Uh oh!

math-ci-webhook bot commented Mar 2, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

Conversation

Alex-Vasile commented Feb 26, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Motivation

Technical Details

Test Plan

Test Result

Submission Checklist

Uh oh!

codecov-commenter commented Feb 26, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

talumbau left a comment

Choose a reason for hiding this comment

Uh oh!

math-ci-webhook bot commented Feb 28, 2026

perfci run on commit cb6bbf6

Uh oh!

math-ci-webhook bot commented Mar 2, 2026

perfci run on commit 6601702

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

Alex-Vasile commented Feb 26, 2026 •

edited

Loading

codecov-commenter commented Feb 26, 2026 •

edited

Loading

perfci run on commit `cb6bbf6`

perfci run on commit `6601702`