Add FuseConcatPass to eliminate redundant concat ops#18827
Add FuseConcatPass to eliminate redundant concat ops#18827ryan-monroe wants to merge 1 commit intopytorch:mainfrom
Conversation
Summary: Concat (torch.cat) in the Gen2 Executorch ARM/Ethos-U stack is lowered to TOSA CONCAT, which Vela then converts to N x MemoryCopy operations — real DMA data movement on the NPU. This pass eliminates concat operations that can be proven unnecessary at the FX graph level, preventing Vela from generating MemoryCopy ops entirely. Inspired by Espresso's concat elimination techniques (bolt/nn/espresso/transforms/remove_nops.py), three patterns are handled: 1. Single-input concat: cat([x]) is a no-op, replaced with x. 2. Concat-then-slice: if every consumer of cat([a, b, ...]) is a slice_copy that extracts exactly one original input, bypass both. 3. Slice-then-concat: if contiguous slices of the same tensor are concatenated back, the result is the original tensor. Differential Revision: D97667069
🔗 Helpful Links🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/18827
Note: Links to docs will display an error until the docs builds have been completed. ❗ 1 Active SEVsThere are 1 currently active SEVs. If your PR is affected, please view them below: ❌ 1 Awaiting Approval, 2 New Failures, 1 Cancelled Job, 2 Unrelated FailuresAs of commit f91806e with merge base 411ede2 ( NEW FAILURES - The following jobs have failed:
FLAKY - The following job failed but was likely due to flakiness present on trunk:
BROKEN TRUNK - The following job failed but was present on the merge base:👉 Rebase onto the `viable/strict` branch to avoid these failures
This comment was automatically generated by Dr. CI and updates every 15 minutes. |
|
@ryan-monroe has exported this pull request. If you are a Meta employee, you can view the originating Diff in D97667069. |
This PR needs a
|
Summary:
Concat (torch.cat) in the Gen2 Executorch ARM/Ethos-U stack is lowered to
TOSA CONCAT, which Vela then converts to N x MemoryCopy operations — real
DMA data movement on the NPU. This pass eliminates concat operations that
can be proven unnecessary at the FX graph level, preventing Vela from
generating MemoryCopy ops entirely.
Inspired by Espresso's concat elimination techniques
(bolt/nn/espresso/transforms/remove_nops.py), three patterns are handled:
slice_copy that extracts exactly one original input, bypass both.
concatenated back, the result is the original tensor.
Differential Revision: D97667069