Skip to content

[Example] Use relaxed mbarrier wait for CLC shared-memory sync#108

Merged
yaoyaoding merged 1 commit intomainfrom
b200-gemm-opt-relaxed-mbarrier
Apr 3, 2026
Merged

[Example] Use relaxed mbarrier wait for CLC shared-memory sync#108
yaoyaoding merged 1 commit intomainfrom
b200-gemm-opt-relaxed-mbarrier

Conversation

@yaoyaoding
Copy link
Copy Markdown
Member

The CLC response synchronization only involves shared memory. Since shared memory is not cached in L1, the acquire semantics are unnecessary. Switch to relaxed.cluster to match nvjet's pattern and eliminate ~2.1M CCTL instructions from the hot loop.

@copy-pr-bot
Copy link
Copy Markdown

copy-pr-bot bot commented Apr 2, 2026

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

@yaoyaoding yaoyaoding force-pushed the b200-gemm-opt-relaxed-mbarrier branch 2 times, most recently from 5ff4b62 to 1a8ba8f Compare April 2, 2026 21:30
The CLC response synchronization only involves shared memory (written
by UGETNEXTWORKID.BROADCAST, read via LDS). Since shared memory is not
cached in L1, the acquire semantics (which insert CCTL.IVALL for L1
invalidation) are unnecessary. Switch to relaxed.cluster to match
nvjet's pattern and eliminate ~2.1M CCTL instructions from the hot loop.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Signed-off-by: Yaoyao Ding <dingyaoyao.cs@gmail.com>
@yaoyaoding yaoyaoding force-pushed the b200-gemm-opt-relaxed-mbarrier branch from 1a8ba8f to f56e1e4 Compare April 2, 2026 21:31
@yaoyaoding yaoyaoding merged commit 456cbc8 into main Apr 3, 2026
8 checks passed
@yaoyaoding yaoyaoding deleted the b200-gemm-opt-relaxed-mbarrier branch April 3, 2026 01:46
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant