Skip to content

8381650: Vector rotate operations on AArch64 with NEON#30574

Open
raneashay wants to merge 1 commit intoopenjdk:masterfrom
raneashay:JDK-8381650-neon-vector-rotate-ops
Open

8381650: Vector rotate operations on AArch64 with NEON#30574
raneashay wants to merge 1 commit intoopenjdk:masterfrom
raneashay:JDK-8381650-neon-vector-rotate-ops

Conversation

@raneashay
Copy link
Copy Markdown
Contributor

@raneashay raneashay commented Apr 4, 2026

Before this patch, on AArch64 processors with NEON, RotateLeftV nodes
were decomposed to expressions of the form (x << n) | (x >> (N-n)),
where n is the number of bits to rotate by and N is the size of the
type. RotateRightV nodes were similar, with n substitued by N-n.
This decomposition happens at the level of Ideal graph rewrites, and
these expressions translate to three instructions on NEON: SHL for
x << n, USHR for x >> (N-n), and ORR for combining the two
values.

However, NEON supports the SLI instruction, which shifts left while
also preserving the destination register's low bits that a pure
left-shift operation would have overwritten with zeroes. This allows us
to lower a rotate operations into USHR + SLI instructions, thus
emitting one fewer instruction than before. Of course, this only works
when the bits to rotate by is a known constant, so this patch does not
modify the lowering of variable-count rotates, letting them decompose
into LeftShift + RightShift + Or nodes as before.

Perhaps of note, this patch enables the optimized lowering for not just
32- and 64-bit integers, but also for subword types (specifically,
byte and short types). I've included a good deal of tests for
coverage, but I am unsure whether there is anything in the rest of C2's
compilation that might break from allowing subword types to be lowered
the same way as int and long types.


Progress

  • Change must be properly reviewed (1 review required, with at least 1 Reviewer)
  • Change must not contain extraneous whitespace
  • Commit message must refer to an issue

Issue

  • JDK-8381650: Vector rotate operations on AArch64 with NEON (Enhancement - P4)

Reviewing

Using git

Checkout this PR locally:
$ git fetch https://git.openjdk.org/jdk.git pull/30574/head:pull/30574
$ git checkout pull/30574

Update a local copy of the PR:
$ git checkout pull/30574
$ git pull https://git.openjdk.org/jdk.git pull/30574/head

Using Skara CLI tools

Checkout this PR locally:
$ git pr checkout 30574

View PR using the GUI difftool:
$ git pr show -t 30574

Using diff file

Download this PR as a diff file:
https://git.openjdk.org/jdk/pull/30574.diff

Using Webrev

Link to Webrev Comment

Before this patch, on AArch64 processors with NEON, `RotateLeftV` nodes
were decomposed to expressions of the form `(x << n) | (x >> (N-n))`,
where `n` is the number of bits to rotate by and `N` is the size of the
type.  `RotateRightV` nodes were similar, with `n` substitued by `N-n`.
This decomposition happens at the level of Ideal graph rewrites, and
these expressions translate to three instructions on NEON: `SHL` for
`x << n`, `USHR` for `x >> (N-n)`, and `ORR` for combining the two
values.

However, NEON supports the `SLI` instruction, which shifts left while
also preserving the destination register's low bits that a pure
left-shift operation would have overwritten with zeroes.  This allows us
to lower a rotate operations into `USHR` + `SLI` instructions, thus
emitting one fewer instruction than before.  Of course, this only works
when the bits to rotate by is a known constant, so this patch does not
modify the lowering of variable-count rotates, letting them decompose
into LeftShift + RightShift + Or nodes as before.

Perhaps of note, this patch enables the optimized lowering for not just
32- and 64-bit integers, but also for subword types (specifically,
`byte` and `short` types).  I've included a good deal of tests for
coverage, but I am unsure whether there is anything in the rest of C2's
compilation that might break from allowing subword types to be lowered
the same way as `int` and `long` types.
@bridgekeeper
Copy link
Copy Markdown

bridgekeeper bot commented Apr 4, 2026

👋 Welcome back raneashay! A progress list of the required criteria for merging this PR into master will be added to the body of your pull request. There are additional pull request commands available for use with this pull request.

@openjdk
Copy link
Copy Markdown

openjdk bot commented Apr 4, 2026

❗ This change is not yet ready to be integrated.
See the Progress checklist in the description for automated requirements.

@openjdk openjdk bot added the hotspot-compiler hotspot-compiler-dev@openjdk.org label Apr 4, 2026
@openjdk
Copy link
Copy Markdown

openjdk bot commented Apr 4, 2026

@raneashay The following label will be automatically applied to this pull request:

  • hotspot-compiler

When this pull request is ready to be reviewed, an "RFR" email will be sent to the corresponding mailing list. If you would like to change these labels, use the /label pull request command.

@openjdk openjdk bot added the rfr Pull request is ready for review label Apr 4, 2026
@mlbridge
Copy link
Copy Markdown

mlbridge bot commented Apr 4, 2026

Webrevs

Comment on lines +3224 to +3235
int raw_shift = (int)$shift$$constant;

// Compute left and right shift amounts.
int lshift, rshift;
if (opc == Op_RotateLeftV) {
lshift = raw_shift & (esize - 1);
rshift = esize - lshift;
} else {
assert(opc == Op_RotateRightV, "unexpected opcode");
rshift = raw_shift & (esize - 1);
lshift = esize - rshift;
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks somewhat rococo.

Suggested change
int raw_shift = (int)$shift$$constant;
// Compute left and right shift amounts.
int lshift, rshift;
if (opc == Op_RotateLeftV) {
lshift = raw_shift & (esize - 1);
rshift = esize - lshift;
} else {
assert(opc == Op_RotateRightV, "unexpected opcode");
rshift = raw_shift & (esize - 1);
lshift = esize - rshift;
}
int raw_shift = checked_cast<int>(opc == Op_RotateLeftV ?
$shift$$constant : -$shift$$constant);
int lshift = raw_shift & (esize - 1);
int rshift = -lshift & (esize - 1);

$src$$FloatRegister, rshift);
__ sli($dst$$FloatRegister, get_arrangement(this),
$src$$FloatRegister, lshift);
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please move all of this logic to class MacroAssembler.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

hotspot-compiler hotspot-compiler-dev@openjdk.org rfr Pull request is ready for review

Development

Successfully merging this pull request may close these issues.

2 participants