merge main into amd-staging by ronlieb · Pull Request #1597 · ROCm/llvm-project

ronlieb · 2026-02-27T23:07:17Z

No description provided.

…lvm#181252) Replace manual region dissolution code in simplifyBranchConditionForVFAndUF with using general removeBranchOnConst. simplifyBranchConditionForVFAndUF now just creates a (BranchOnCond true) or updates BranchOnTwoConds. The loop then gets automatically removed by running removeBranchOnConst. This removes a bunch of special logic to handle header phi replacements and CFG updates. With the new code, there's no restriction on what kind of header phi recipes the loop contains. Note that VPEVLBasedIVRecipe needs to be marked as readnone. This is technically unrelated, but I could not find an independent test that would be impacted. The code to deal with epilogue resume values now needs updating, because we may simplify a reduction directly to the start value. PR: llvm#181252

Currently for thin-lto, the imported static global values (functions, variables, etc) will be promoted/renamed from e.g., foo() to foo.llvm.<hash>(). Such a renaming caused difficulties in live patching since function name is changed ([1]). It is possible that some global value names have to be promoted to avoid name collision and linker failure. But in practice, majority of name promotions can be avoided. In [2], the suggestion is that thin-lto pre-link decides whether a particular global value needs name promotion or not. If yes, later on in thinBackend() the name will be promoted. I compiled a particular linux kernel version (latest bpf-next tree) and found 1216 global values with suffix .llvm.<hash>. With this patch, the number of promoted functions is 2, 98% reduction from the original kernel build. If some native objects are not participating with LTO, name promotions have to be done to avoid potential linker issues. So the current implementation cannot be on by default. But in certain cases, e.g., linux kernel build, people can enable lld flag --lto-whole-program-visibility to reduce the number of functions like foo.llvm.<hash>(). For ThinLTOCodeGenerator.cpp which is used by llvm-lto tool and a few other rare cases, reducing the number of renaming due to promotion, is not implemented as lld flag '-lto-whole-program-visibility' is not supported in ThinLTOCodeGenerator.cpp for now. In summary, this pull request only supports llvm-lto2 style workflow. [1] https://lpc.events/event/19/contributions/2212 [2] https://discourse.llvm.org/t/rfc-avoid-functions-like-foo-llvm-for-kernel-live-patch/89400

…ions" (llvm#183774) Reverts llvm#183567 UBSan failure.

Update the test to more cleanly handle making a 'blocking' call using a custom command instead of python `time.sleep`, which we cannot easily interrupt. This should improve the overall performance of the tests, locally they took around 30s and now finish in around 6s.

…cross incremental scans (llvm#183328) Add a test that verifies symlink aliases to a module map directory produce the same PCM across incremental scans.

…llvm#182773) Fixes: llvm#123791

When there's a dependency cycle between modules, the dependency scanner may encounter a deadlock. This was caused by not respecting the lock timeout. But even with the timeout implemented, leaving `unsafeMaybeUnlock()` unimplemented means trying to take a lock after a timeout would still fail and prevent making progress. This PR implements this API in a way to avoid UB on `std::mutex` (when it's unlocked by someone else than the owner). Lastly, this PR makes sure that `unsafeUnlock()` ends the wait of existing threads, so that they don't need to hit the full timeout amount. This PR also implements `-fimplicit-modules-lock-timeout=<seconds>` that allows tweaking the default 90-second lock timeout, and adds `#pragma clang __debug sleep` that makes it easier to achieve desired execution ordering. rdar://170738600

…part 22) (llvm#183681) Tests converted from test/Lower: intentout-deallocate.f90 Tests converted from test/Lower/Intrinsics: abs.f90, achar.f90, acospi.f90, adjustl.f90

Part 1 of changes needed for USM alloc/dealloc impl. This is part of the SYCL support upstreaming effort. The relevant RFCs can be found here: https://discourse.llvm.org/t/rfc-add-full-support-for-the-sycl-programming-model/74080 https://discourse.llvm.org/t/rfc-sycl-runtime-upstreaming/74479 --------- Signed-off-by: Tikhomirova, Kseniya <kseniya.tikhomirova@intel.com>

Enable Flang to match Clang behavior for command-line recording in DWARF producer strings when using -grecord-command-line. Signed-off-by: Yangyu Chen <cyy@cyyself.name>

Just like other bitcode libs such as ockl.bc ocml.bc, link asanrtl.bc with '-mlink-builtin-bitcode' in the driver when GPU ASan is enabled.

…llvm#183781) Two related crashes were fixed in vector.mask handling: 1. MaskOp::fold() crashes with a null pointer dereference when the mask is all-true and the mask body has no maskable operation (only a vector.yield). getMaskableOp() returns nullptr in this case, and the fold was calling nullptr->dropAllUses(). Fixed by returning failure() when there is no maskable op, deferring to the canonicalizer. 2. CanonializeEmptyMaskOp creates an invalid arith.select when the mask type is a vector (e.g., vector<1xi1>) but the result type is a scalar (e.g., i32). arith.select with a vector condition requires the value types to be vectors of the same shape. Fixed by bailing out when any result type doesn't match the mask shape. Regression tests are added for both cases. Fixes llvm#177833

…83199) Using physical register 0, aka NoRegister, also just looked suspicious.

…vm#178587)" (llvm#183782) There is a conflict with existing code. See llvm#178587 Revert and resolve the conflict and then will submit later.

This allows us to support more lifetimes, and also gets rid of the quadratic call to isPotentiallyReachable. Reviewers: pcc, usama54321 Reviewed By: pcc Pull Request: llvm#182425

Instead of excluding the whole package, push any existing parse_headers failures to individual targets. In some cases we can avoid suppressing a target by adding a few missing deps.

… support (llvm#183442) This is the second of three patches aimed to support indirect symbol handling for the SystemZ backend. An external name is added for both MC sections and symbols and makes the relevant printers and writers utilize the external name when present. Furthermore, the ALIAS HLASM instruction is emitted after every XATTR instruction. Depends on llvm#183441.

…4171) When hoisting loop invariant instructions, we can preserve profile metadata because it depends solely on the condition (which is loop invariant) rather than where we are in the control flow graph.

…x) (llvm#183363) add a pre-commit test case for Inefficient asm of std::bit_floor(x) for powerpc.

) Summary: This enables primarily `stop.cpp` and `descriptor.cpp`. Requires a little bit of wrangling to get it to compile. Unlike the CUDA build, this build uses an in-tree libc++ configured for the GPU. This is configured without thread support, environment, or filesystem, and it is not POSIX at all. So, no mutexes, pthreads, or get/setenv. I tested stop, but i don't know if it's actually legal to exit from OpenMP offloading.

…m#182512) LLVM converts sqrt libcall to intrinsic call if the argument is within the range(greater than or equal to 0.0). In this case the compiler is not able to deduce the non-negativity on its own. Extended ValueTracking to understand such loops. Have created new ABI's for matching Intrinsics with three operands (those existed only for 2 operands) `matchSimpleTernaryIntrinsicRecurrence` and `matchThreeInputRecurrence`. Fixes llvm#174813

…lvm#181030) This implements the TOKENIZE intrinsic per the Fortran 2023 Standard. TOKENIZE is a more complicated addition to the flang intrinsics, as it is the first subroutine that has multiple unique footprints. Intrinsic functions have already addressed this challenge, however subroutines and functions are processed slightly differently and the function code was not a good 1:1 solution for the subroutines. To solve this the function code was used as an example to create error buffering within the intrinsics Process and select the most appropriate error message for a given subroutine footprint. A simple FIR compile test was added to show the proper compilation of each case. A thorough negative path test has also been added, ensuring that all possible errors are reported as expected. Testing prior to commit: = check-flang ========================================== ``` Testing Time: 139.51s Total Discovered Tests: 4153 Unsupported : 77 (1.85%) Passed : 4065 (97.88%) Expectedly Failed: 11 (0.26%) FLANG Container Test completed 2 minutes (160 s). Total Time: 2 minutes (160 s) Completed : Wed Feb 11 04:05:50 PM CST 2026 ``` = check-flang-rt ========================================== ``` Testing Time: 1.55s Total Discovered Tests: 258 Passed: 258 (100.00%) FLANG Container Test completed 0 minutes (55 s). Total Time: 0 minutes (56 s) Completed : Wed Feb 11 04:08:32 PM CST 2026 ``` = llvm-test-suite ========================================== ``` Testing Time: 1886.64s Total Discovered Tests: 6926 Passed: 6926 (100.00%) CCE SLES Container debug compile completed 31 minutes (1895 s). CCE SLES Container debug install completed in 0 minutes (0 s). Total Time: 31 minutes (1895 s) Completed : Wed Feb 11 05:46:52 PM CST 2026 ``` Additionally, (FYI) an executable test has been written and will be added to the llvm-test-suite under a separate PR. --------- Co-authored-by: Kevin Wyatt <kwyatt@hpe.com>

…#183176) Adjusting `VariableReferenceStorage` to only need to track permanent vs temporary storage by making `VariableStore` the common base class. Moved the subclasses of `VariableStore` into the Variables.cpp file, since they're no long referenced externally. Expanding on the tests by adding an updated core dump with variables in the argument scope we can use to validate variable storage.

…183405) This commit updates the LLVM::decomposeValue and LLVM::composeValue methods to handle aggregate types - LLVM arrays and structs, and to have different behaviors on dealing with types like pointers that can't be bitcast to fixed-size integers. This allows the "any type" on gpu.subgroup_broadcast to be more comprehensive - you can broadcast a memref to a subgroup by decomposing it, for example. (This branched off of getting an LLM to implement ValueuboundsOpInterface on subgroup_broadcast, having it add handling for the dimensions of shaped types, and realizing that there's no fundamental reason you can't broadcast a memref or the like) --------- Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>

@jkbz64

…low (llvm#181755) Rather than mapping out full "reachability" between blocks in a region to find loops and using `LoopBlocks` to find the bodies of said loops, use SCCs (strongly-connected components) to provide this information. This brings in LLVM's generic `SCCIterator` (which uses Tarjan's algorithm) as the implementation for sorting the basic blocks of the CFG into their SCCs. This PR greatly reduces the compile-time footprint of the pass, making memory use and time taken negliable where it might have previously caused stalls and OOM before (e.g. llvm#47793, usagi-coffee/tree-sitter-abl#114) ------ Supersedes llvm#179722 Fixes llvm#47793 Fixes llvm#165041 (probably) Thanks to @jkbz64 for the initial investigations (w/ AI; see llvm#179722) into why this pass was slow and memory consuming and showing that SCCs were the key. Also thanks to the Cheerp compiler project for bringing `SCCIterator` to light in this context ([blog post](https://cheerp.io/blog/control-flow#fix-the-irreducible-control-flow), [implementation](https://github.com/leaningtech/cheerp-compiler/blob/master/llvm/lib/CheerpUtils/FixIrreducibleControlFlow.cpp)).

Fix linking of 'ockl.bc' for OpenMP by switching from `-mlink-bitcode-file` to `-mlink-builtin-bitcode`

…lvm#182640) This patch makes it so that renumbering indices when inserting instructions into the SlotIndexes analysis renumbers the entire list if the list is otherwise densely packed. This fixes a case we saw on AArch64 with a lot of spills where every single spill instruction insertion required a renumbering of most of the instructions in a large function, making the operation approximately quadratic. This is not NFC as heuristics depend on the SlotIndex numbers, although this should mostly be a wash as LRs should be extended ~equally.

This PR adds `JSONFormat` support for reading and writing `TUSummaryEncoding`. The implementation exploits similarities in the structures of `TUSummary` and `TUSummaryEncoding` by reusing existing `JSONFormat` support for `TUSummary`. Duplication of tests has been avoided by parameterizing the test fixture that runs all relevant read/write tests against `TUSummary`, for `TUSummaryEncoding`. This ensures that the two serialization paths remain in lockstep.

After header search has found a header it looks for module maps that cover that header. This patch uses the parsed representation of module maps to do this search instead of relying on FileEntryRef lookups after stating headers in module maps. This behavior is currently gated behind the `-fmodules-lazy-load-module-maps` `-cc1` flag.

…perand is a block argument of its successor (llvm#183797) When `simplifyBrToBlockWithSinglePred` merges a block into its sole predecessor, it calls `inlineBlockBefore` which replaces each block argument with the corresponding value passed by the branch. If one of those values is itself a block argument of the successor block, the call `replaceAllUsesWith(arg, arg)` is a no-op. Any uses of that argument outside the block (e.g. in a downstream block) are therefore not replaced, and when the successor block is erased the argument is destroyed while those uses are still live, triggering the assertion `use_empty() && "Cannot destroy a value that still has uses\!"` in `IRObjectWithUseList::~IRObjectWithUseList`. Guard against this by returning early when any branch operand is a block argument owned by the destination block. Fixes llvm#126213

…m#181177) Checks that isReversibleBranch() returns false - when the immediate value is 63 and needs +1 adjustment - when the immediate value is 0 and needs -1 adjustment Checks that reverseBranchCondition() adjusts - the opcode - the immediate operand if necessary (+/-1) - the register operands if necessary (swap)

This variable ends up being unused in builds without assertions. Mark it [[maybe_unused]] per the coding standards.

Reviewers: Pull Request: llvm#183807

Reviewers: Pull Request: llvm#183808

…and UF. (llvm#181252)" This reverts commit 9c53215. Appears to cause crashes with ordered reductions, revert while I investigate

…m#183825) Currently, as pointed out in the reviews for llvm#183405, decomposeValues and composeValues should be able to emit zexts and truncations for cases like i48 and vector<3xi16> becoming i32s but currently that's an assert. This commit fixes that limitation. Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>

Account for masked VPInstruction when verifying the operands in the constructor. Fixes a crash when trying to unroll VPlans for predicated early exits.

The `exact` flag with the following semantics > If the `exact` attribute is present, it is assumed that the index type width > is such that the conversion does not lose information. When this assumption > is violated, the result is poison. can be added to index_cast and index_castui operations. This unlocks the following lowerings: * index_cast (signed) exact -> trunc nsw * index_castui (unsigned) exact -> trunc nuw * index_castui nneg exact -> trunc nuw nsw Changes: * Adds ArithExactFlagInterface. * Updates Arith_IntBinaryOpWithExactFlag to use ArithExactFlagInterface * Update IndexCastOp and IndexCastUIOp to declare `ArithExactFlagInterface` * Update canonicalization patterns * Update roundtrip, lowering, and canonicalization tests.

Updates formatter_bytecode.py to support compilation and disassembly for synthetic formatters, in other words support for multiple functions (signatures). This includes a number of other changes: * String parsing and encoding have bugs fixed * CLI args are updated, primarily to support an output file * Added uleb encoding/decoding support This work is a prelude the ongoing work of a Python to formatter bytecode compiler. The python compiler to emit assembly, and this module (formatter_bytecode) will compile it into binary bytecode.

Fixing test failures on my local desktop with incremental building.

z1-cciauto · 2026-02-27T23:08:07Z

PSDB Link: https://compiler-ci.amd.com/job/compiler-psdb-amd-staging/4393

jpienaar and others added 30 commits February 27, 2026 18:44

[mlir] Enable specifying bytecode producer in mlir-opt. (llvm#182846)

4b10a4c

Revert "[SPIRV][NFCI] Use unordered data structures for SPIR-V extens…

729602e

…ions" (llvm#183774) Reverts llvm#183567 UBSan failure.

[clang-scan-deps] Add test for symlink-aliased module map PCM reuse a…

d1da7f6

…cross incremental scans (llvm#183328) Add a test that verifies symlink aliases to a module map directory produce the same PCM across incremental scans.

[SPIR-V] Fix non-deterministic compiler output for debug type pointer (…

0617623

…llvm#182773) Fixes: llvm#123791

[flang][NFC] Converted five tests from old lowering to new lowering (…

35f8ca8

…part 22) (llvm#183681) Tests converted from test/Lower: intentout-deallocate.f90 Tests converted from test/Lower/Intrinsics: abs.f90, achar.f90, acospi.f90, adjustl.f90

[flang] Implement -grecord-command-line for Flang (llvm#181686)

7f0a343

Enable Flang to match Clang behavior for command-line recording in DWARF producer strings when using -grecord-command-line. Signed-off-by: Yangyu Chen <cyy@cyyself.name>

[ASan] Enable Internalization for 'asanrtl.bc' in Driver (llvm#182825)

dc26edd

Just like other bitcode libs such as ockl.bc ocml.bc, link asanrtl.bc with '-mlink-builtin-bitcode' in the driver when GPU ASan is enabled.

[lldb] Add arithmetic binary addition to DIL (llvm#177208)

55d62ab

[RISCV] Use getCopyFromReg in unit test to match comment. NFC (llvm#1…

d2c5452

…83199) Using physical register 0, aka NoRegister, also just looked suspicious.

Revert "[ThinLTO] Reduce the number of renaming due to promotions (ll…

cd50a30

…vm#178587)" (llvm#183782) There is a conflict with existing code. See llvm#178587 Revert and resolve the conflict and then will submit later.

[MTE] [HWASan] support more complicated lifetimes

179c25e

This allows us to support more lifetimes, and also gets rid of the quadratic call to isPotentiallyReachable. Reviewers: pcc, usama54321 Reviewed By: pcc Pull Request: llvm#182425

[bazel] Enable parse_headers for llvm/BUILD.bazel (llvm#183680)

1269a74

Instead of excluding the whole package, push any existing parse_headers failures to individual targets. In some cases we can avoid suppressing a target by adding a few missing deps.

[LoopInfo] Preserve profile information in makeLoopInvariant (llvm#17…

0d95dda

…4171) When hoisting loop invariant instructions, we can preserve profile metadata because it depends solely on the condition (which is loop invariant) rather than where we are in the control flow graph.

[NFC][POWER] add Pre-Commit test case for Inefficient std::bit_floor(…

67a51ea

…x) (llvm#183363) add a pre-commit test case for Inefficient asm of std::bit_floor(x) for powerpc.

[OpenMP] Enable internalization of 'ockl.bc' for OpenMP (llvm#183685)

dce48f2

Fix linking of 'ockl.bc' for OpenMP by switching from `-mlink-bitcode-file` to `-mlink-builtin-bitcode`

Bigcheese and others added 14 commits February 27, 2026 12:29

[WebAseembly] Fix -Wunused-variable in llvm#181755

c2f66f2

This variable ends up being unused in builds without assertions. Mark it [[maybe_unused]] per the coding standards.

[NFC] [HWASan] add test for duplicated lifetime end

63ab568

Reviewers: Pull Request: llvm#183807

[NFC] [MTE] add test for duplicated lifetime end

94bd8b9

Reviewers: Pull Request: llvm#183808

Revert "[VPlan] Remove manual region removal when simplifying for VF …

d7e037c

…and UF. (llvm#181252)" This reverts commit 9c53215. Appears to cause crashes with ordered reductions, revert while I investigate

[Offload] Remove unused data type (llvm#183840)

8f268e6

[VPlan] Support unrolling/cloning masked VPInstructions.

73d655a

Account for masked VPInstruction when verifying the operands in the constructor. Fixes a crash when trying to unroll VPlans for predicated early exits.

[lldb] Add skip shared build to more API tests

c5588be

Fixing test failures on my local desktop with incremental building.

merge main into amd-staging

89d42b3

ronlieb requested review from a team and dpalermo February 27, 2026 23:07

ronlieb requested review from Groverkss, fabianmcg, krzysz00, kuhar and nicolasvasilache as code owners February 27, 2026 23:07

ronlieb removed request for Groverkss, fabianmcg, krzysz00, kuhar and nicolasvasilache February 27, 2026 23:07

dpalermo approved these changes Feb 27, 2026

View reviewed changes

z1-cciauto merged commit e884a8c into amd-staging Feb 28, 2026
37 checks passed

z1-cciauto deleted the amd/merge/upstream_merge_20260227164042 branch February 28, 2026 02:10

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

merge main into amd-staging#1597

merge main into amd-staging#1597
z1-cciauto merged 59 commits intoamd-stagingfrom
amd/merge/upstream_merge_20260227164042

ronlieb commented Feb 27, 2026

Uh oh!

z1-cciauto commented Feb 27, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

20 participants

Conversation

ronlieb commented Feb 27, 2026

Uh oh!

z1-cciauto commented Feb 27, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

20 participants