Skip to content

[formocast][origami] Refine accuracy of math clocks#4907

Open
hcman2 wants to merge 4 commits intoROCm:developfrom
hcman2:users/hcman2/formocast_grstall
Open

[formocast][origami] Refine accuracy of math clocks#4907
hcman2 wants to merge 4 commits intoROCm:developfrom
hcman2:users/hcman2/formocast_grstall

Conversation

@hcman2
Copy link
Contributor

@hcman2 hcman2 commented Feb 26, 2026

Motivation

  1. Refine Local write instruction accuracy.
  2. Support Global read instruction FIFO stall behavior. Note that the stall is not related with the memory latency. It is the HW behavior to queue the instruction in TCP_TCC stage.

Technical Details

Simulate the FIFO behavior.

Test Plan

Run BBSTN 257 kernels with different MTs. Remove the global incremental code to make L2 always hit (so that very less memory latency included.) Compare the buffer_load issue cycles between formocast and rocprofv3.

Test Result

before patch
average difference per instruction: +12 cylces
after patch
average difference per instruction: +3 cylces

Refine origami test for local write and global read to approach the real cases.

Submission Checklist

@codecov-commenter
Copy link

Codecov Report

✅ All modified and coverable lines are covered by tests.

❌ Your project status has failed because the head coverage (76.83%) is below the target coverage (80.00%). You can increase the head coverage or adjust the target coverage.

Additional details and impacted files
@@           Coverage Diff            @@
##           develop    #4907   +/-   ##
========================================
  Coverage    65.40%   65.40%           
========================================
  Files         1714     1714           
  Lines       266774   266774           
  Branches     37020    37020           
========================================
  Hits        174468   174468           
  Misses       76794    76794           
  Partials     15512    15512           
Flag Coverage Δ *Carryforward flag
hipBLAS 90.67% <ø> (ø) Carriedforward from fceeaa5
hipBLASLt 43.55% <ø> (ø)
hipCUB 81.98% <ø> (ø) Carriedforward from fceeaa5
hipDNN 81.54% <ø> (ø) Carriedforward from fceeaa5
hipFFT 55.39% <ø> (ø) Carriedforward from fceeaa5
hipRAND 76.12% <ø> (ø) Carriedforward from fceeaa5
hipSOLVER 68.81% <ø> (ø) Carriedforward from fceeaa5
hipSPARSE 84.70% <ø> (ø) Carriedforward from fceeaa5
rocBLAS 47.97% <ø> (ø) Carriedforward from fceeaa5
rocFFT 47.94% <ø> (ø) Carriedforward from fceeaa5
rocRAND 57.08% <ø> (ø) Carriedforward from fceeaa5
rocSOLVER 76.83% <ø> (ø) Carriedforward from fceeaa5
rocSPARSE 71.53% <ø> (ø) Carriedforward from fceeaa5

*This pull request uses carry forward flags. Click here to find out more.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants