Skip to content

Pull requests: pytorch/helion

Author
Filter by author
Loading
Label
Filter by label
Loading
Use alt + click/return to exclude labels
or + click/return for logical OR
Projects
Filter by project
Loading
Milestones
Filter by milestone
Loading
Reviews
Assignee
Filter by who’s assigned
Assigned to nobody Loading
Sort

Pull requests list

[autotuner] Preserve arg layout in _clone_args CLA Signed This label is managed by the Meta Open Source bot.
#2867 opened Jun 26, 2026 by tarinduj Contributor Loading…
[autotuner] pointwise seed heuristic for pure elementwise kernels CLA Signed This label is managed by the Meta Open Source bot.
#2866 opened Jun 26, 2026 by calebmkim Contributor Loading…
Pretuned kernels nightly dashboard CLA Signed This label is managed by the Meta Open Source bot.
#2865 opened Jun 26, 2026 by yushangdi Contributor Draft
[pretuned_kernels] benchmark scaled_mm under CUDA graphs CLA Signed This label is managed by the Meta Open Source bot.
#2863 opened Jun 25, 2026 by yushangdi Contributor Loading…
autotuner: record structured hardware_info in the autotune dataset CLA Signed This label is managed by the Meta Open Source bot.
#2862 opened Jun 24, 2026 by IshanAryendu Contributor Loading…
autotuner: record per-config perf_stats in the autotune dataset CLA Signed This label is managed by the Meta Open Source bot.
#2861 opened Jun 24, 2026 by IshanAryendu Contributor Loading…
[autotuner] allow block sizes to overshoot small tiled dimensions CLA Signed This label is managed by the Meta Open Source bot.
#2856 opened Jun 24, 2026 by yushangdi Contributor Loading…
[cute] Drop dead scalar-fallback K-loop branch for flat single padded-M tile CLA Signed This label is managed by the Meta Open Source bot.
#2854 opened Jun 23, 2026 by yushangdi Contributor Draft
[cute] Coalesced TMA store for padded-M tcgen05 tiles with m_size>=32 CLA Signed This label is managed by the Meta Open Source bot.
#2853 opened Jun 23, 2026 by yushangdi Contributor Draft
[cute] Vectorize padded-M aux (scale) load in tcgen05 store epilogue CLA Signed This label is managed by the Meta Open Source bot.
#2852 opened Jun 23, 2026 by yushangdi Contributor Draft
[autotuner] composed-fact seed for fused matmul + reduction-epilogue CLA Signed This label is managed by the Meta Open Source bot.
#2846 opened Jun 23, 2026 by calebmkim Contributor Loading…
[cute] Enable N-axis cluster (cluster_n=2, cluster_m=1) A-multicast for tiny-M fp8 CLA Signed This label is managed by the Meta Open Source bot.
#2845 opened Jun 23, 2026 by yushangdi Contributor Draft
[cute] Leaner masked store for single padded-M tcgen05 tile CLA Signed This label is managed by the Meta Open Source bot.
#2844 opened Jun 23, 2026 by yushangdi Contributor Draft
[cute] Allow block_m > static_m with padded-M tcgen05 for tiny-M fp8/bf16 GEMM CLA Signed This label is managed by the Meta Open Source bot.
#2843 opened Jun 23, 2026 by yushangdi Contributor Draft
[cute] Fix codegen crash for fp8 matmul when static M < 64 CLA Signed This label is managed by the Meta Open Source bot.
#2840 opened Jun 22, 2026 by yushangdi Contributor Draft
[autotuner] specialize the M-reduction seeds via per_feature_accumulator (occupancy + byte-cap) CLA Signed This label is managed by the Meta Open Source bot.
#2830 opened Jun 20, 2026 by calebmkim Contributor Loading…
[autotuner] route the 6 backward M-reduction kernels onto the standard/user-tiled tracks CLA Signed This label is managed by the Meta Open Source bot.
#2829 opened Jun 20, 2026 by calebmkim Contributor Loading…
[autotuner] reduction seed: budgeted r_block + liveness-aware persistent/looped decision CLA Signed This label is managed by the Meta Open Source bot.
#2828 opened Jun 20, 2026 by calebmkim Contributor Loading…
[bench] Add --fp8-gemm-native-shapes to benchmarks/run.py CLA Signed This label is managed by the Meta Open Source bot.
#2814 opened Jun 18, 2026 by yushangdi Contributor Draft
autotuner: record per-config generated source (Triton/Metal/Cute/Pallas) in the autotune dataset CLA Signed This label is managed by the Meta Open Source bot.
#2809 opened Jun 17, 2026 by IshanAryendu Contributor Loading…
[cute] Enable tcgen05 tensor cores for small-M fp8 GEMM (M<64) CLA Signed This label is managed by the Meta Open Source bot.
#2807 opened Jun 17, 2026 by yushangdi Contributor Draft
autotuner: capture per-run device IR in the autotune dataset CLA Signed This label is managed by the Meta Open Source bot.
#2796 opened Jun 16, 2026 by IshanAryendu Contributor Loading…
autotuner: add device-IR extractor for cost-model dataset CLA Signed This label is managed by the Meta Open Source bot.
#2794 opened Jun 16, 2026 by IshanAryendu Contributor Loading…
[autotuner] mem_op_id: cache-robustness (structural fingerprint + AOT length guard) CLA Signed This label is managed by the Meta Open Source bot.
#2790 opened Jun 15, 2026 by calebmkim Contributor Draft
[autotuner] mem_op_id: id-keyed per-memory-op tunable slots CLA Signed This label is managed by the Meta Open Source bot.
#2789 opened Jun 15, 2026 by calebmkim Contributor Draft
ProTip! Find all pull requests that aren't related to any open issues with -linked:issue.