Skip to content

Layernorm optimization#4888

Draft
brentmaas wants to merge 18 commits intodevelopfrom
users/brentmaas/layernorm-optimisation
Draft

Layernorm optimization#4888
brentmaas wants to merge 18 commits intodevelopfrom
users/brentmaas/layernorm-optimisation

Conversation

@brentmaas
Copy link
Contributor

Motivation

This PR provides performance improvements to the layernorm kernels via optimizations and additional tuning options, including an option for vectorized loading and storing of data.

Technical Details

  • Add tuning options for vectorized loading and storing of data, for scheduling strides in a different dimension, and for incorporating stride into the local size.
  • Update heuristics using new tuning options.
  • Miscellaneous optimizations.
  • Deduplicate cache and tuning keys.
  • Print data throughput in layernorm MIOpenDriver.
  • Report multiple failing buffers in layernorm MIOpenDriver.

Test Plan

The changes can be verified by building and running the test target test_layernorm.

Test Result

All tests pass.

Submission Checklist

@brentmaas brentmaas added project: miopen organization: streamhpc contributors from streamhpc labels Feb 25, 2026
@brentmaas brentmaas force-pushed the users/brentmaas/layernorm-optimisation branch from 5eb7373 to 82d698c Compare February 26, 2026 09:16
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant