8381650: Vector rotate operations on AArch64 with NEON by raneashay · Pull Request #30574 · openjdk/jdk

raneashay · 2026-04-04T00:34:45Z

Before this patch, on AArch64 processors with NEON, RotateLeftV nodes
were decomposed to expressions of the form (x << n) | (x >> (N-n)),
where n is the number of bits to rotate by and N is the size of the
type. RotateRightV nodes were similar, with n substitued by N-n.
This decomposition happens at the level of Ideal graph rewrites, and
these expressions translate to three instructions on NEON: SHL for
x << n, USHR for x >> (N-n), and ORR for combining the two
values.

However, NEON supports the SLI instruction, which shifts left while
also preserving the destination register's low bits that a pure
left-shift operation would have overwritten with zeroes. This allows us
to lower a rotate operations into USHR + SLI instructions, thus
emitting one fewer instruction than before. Of course, this only works
when the bits to rotate by is a known constant, so this patch does not
modify the lowering of variable-count rotates, letting them decompose
into LeftShift + RightShift + Or nodes as before.

Perhaps of note, this patch enables the optimized lowering for not just
32- and 64-bit integers, but also for subword types (specifically,
byte and short types). I've included a good deal of tests for
coverage, but I am unsure whether there is anything in the rest of C2's
compilation that might break from allowing subword types to be lowered
the same way as int and long types.

Progress

Change must be properly reviewed (1 review required, with at least 1 Reviewer)
Change must not contain extraneous whitespace
Commit message must refer to an issue

Issue

JDK-8381650: Vector rotate operations on AArch64 with NEON (Enhancement - P4)

Reviewing

Using git

Checkout this PR locally:
$ git fetch https://git.openjdk.org/jdk.git pull/30574/head:pull/30574
$ git checkout pull/30574

Update a local copy of the PR:
$ git checkout pull/30574
$ git pull https://git.openjdk.org/jdk.git pull/30574/head

Using Skara CLI tools

Checkout this PR locally:
$ git pr checkout 30574

View PR using the GUI difftool:
$ git pr show -t 30574

Using diff file

Download this PR as a diff file:
https://git.openjdk.org/jdk/pull/30574.diff

Using Webrev

Link to Webrev Comment

Before this patch, on AArch64 processors with NEON, `RotateLeftV` nodes were decomposed to expressions of the form `(x << n) | (x >> (N-n))`, where `n` is the number of bits to rotate by and `N` is the size of the type. `RotateRightV` nodes were similar, with `n` substitued by `N-n`. This decomposition happens at the level of Ideal graph rewrites, and these expressions translate to three instructions on NEON: `SHL` for `x << n`, `USHR` for `x >> (N-n)`, and `ORR` for combining the two values. However, NEON supports the `SLI` instruction, which shifts left while also preserving the destination register's low bits that a pure left-shift operation would have overwritten with zeroes. This allows us to lower a rotate operations into `USHR` + `SLI` instructions, thus emitting one fewer instruction than before. Of course, this only works when the bits to rotate by is a known constant, so this patch does not modify the lowering of variable-count rotates, letting them decompose into LeftShift + RightShift + Or nodes as before. Perhaps of note, this patch enables the optimized lowering for not just 32- and 64-bit integers, but also for subword types (specifically, `byte` and `short` types). I've included a good deal of tests for coverage, but I am unsure whether there is anything in the rest of C2's compilation that might break from allowing subword types to be lowered the same way as `int` and `long` types.

bridgekeeper · 2026-04-04T00:36:32Z

👋 Welcome back raneashay! A progress list of the required criteria for merging this PR into master will be added to the body of your pull request. There are additional pull request commands available for use with this pull request.

openjdk · 2026-04-04T00:37:11Z

❗ This change is not yet ready to be integrated.
See the Progress checklist in the description for automated requirements.

openjdk · 2026-04-04T00:37:44Z

@raneashay The following label will be automatically applied to this pull request:

hotspot-compiler

When this pull request is ready to be reviewed, an "RFR" email will be sent to the corresponding mailing list. If you would like to change these labels, use the /label pull request command.

mlbridge · 2026-04-04T00:41:26Z

Webrevs

00: Full (dec0354b)

theRealAph · 2026-04-05T08:54:53Z

src/hotspot/cpu/aarch64/aarch64_vector.ad

+    int raw_shift = (int)$shift$$constant;
+
+    // Compute left and right shift amounts.
+    int lshift, rshift;
+    if (opc == Op_RotateLeftV) {
+      lshift = raw_shift & (esize - 1);
+      rshift = esize - lshift;
+    } else {
+      assert(opc == Op_RotateRightV, "unexpected opcode");
+      rshift = raw_shift & (esize - 1);
+      lshift = esize - rshift;
+    }


This looks somewhat rococo.

Suggested change

int raw_shift = (int)$shift$$constant;

// Compute left and right shift amounts.

int lshift, rshift;

if (opc == Op_RotateLeftV) {

lshift = raw_shift & (esize - 1);

rshift = esize - lshift;

} else {

assert(opc == Op_RotateRightV, "unexpected opcode");

rshift = raw_shift & (esize - 1);

lshift = esize - rshift;

}

int raw_shift = checked_cast<int>(opc == Op_RotateLeftV ?

$shift$$constant : -$shift$$constant);

int lshift = raw_shift & (esize - 1);

int rshift = -lshift & (esize - 1);

theRealAph · 2026-04-05T08:55:40Z

src/hotspot/cpu/aarch64/aarch64_vector.ad

+              $src$$FloatRegister, rshift);
+      __ sli($dst$$FloatRegister, get_arrangement(this),
+             $src$$FloatRegister, lshift);
+    }


Please move all of this logic to class MacroAssembler.

openjdk bot added the hotspot-compiler hotspot-compiler-dev@openjdk.org label Apr 4, 2026

openjdk bot added the rfr Pull request is ready for review label Apr 4, 2026

theRealAph reviewed Apr 5, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

8381650: Vector rotate operations on AArch64 with NEON#30574

8381650: Vector rotate operations on AArch64 with NEON#30574
raneashay wants to merge 1 commit intoopenjdk:masterfrom
raneashay:JDK-8381650-neon-vector-rotate-ops

raneashay commented Apr 4, 2026 •

edited by openjdk bot

Loading

Uh oh!

bridgekeeper bot commented Apr 4, 2026

Uh oh!

openjdk bot commented Apr 4, 2026

Uh oh!

openjdk bot commented Apr 4, 2026

Uh oh!

mlbridge bot commented Apr 4, 2026

Uh oh!

theRealAph Apr 5, 2026

Uh oh!

theRealAph Apr 5, 2026

Uh oh!

Reviewers

Assignees

Labels

Milestone

Development

Uh oh!

2 participants

Conversation

raneashay commented Apr 4, 2026 • edited by openjdk bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Progress

Issue

Reviewing

Uh oh!

bridgekeeper bot commented Apr 4, 2026

Uh oh!

openjdk bot commented Apr 4, 2026

Uh oh!

openjdk bot commented Apr 4, 2026

Uh oh!

mlbridge bot commented Apr 4, 2026

Webrevs

Uh oh!

theRealAph Apr 5, 2026

Choose a reason for hiding this comment

Uh oh!

theRealAph Apr 5, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Milestone

Development

Uh oh!

2 participants

raneashay commented Apr 4, 2026 •

edited by openjdk bot

Loading