Skip to content

8386687: Regression >6% on Crypto-MLDSA.sign on x64#31686

Draft
dwhite-intel wants to merge 1 commit into
openjdk:masterfrom
dwhite-intel:dwhite-simpler-apx-regression
Draft

8386687: Regression >6% on Crypto-MLDSA.sign on x64#31686
dwhite-intel wants to merge 1 commit into
openjdk:masterfrom
dwhite-intel:dwhite-simpler-apx-regression

Conversation

@dwhite-intel

@dwhite-intel dwhite-intel commented Jun 26, 2026

Copy link
Copy Markdown

The simpler APX code in PR Tune APX support in C2 backend disabled the CISC_Spilling optimization that is buried in ALDC. It replaces instructions that match on registers with instructions that operate on the stack (I think) if it knows that the register has been spilled to the stack.

The matching code in formsself.cpp insisted that both the original and replacement "instructs" have identical predicates.

This PR loosens this requirement to also allow replacing an instruct with a predicate by an instruct without a predicate - the replacement instruct is logically more general than the original instruct that only applies if the predicate is true.



Progress

  • Change must not contain extraneous whitespace
  • Commit message must refer to an issue
  • Change must be properly reviewed (2 reviews required, with at least 1 Reviewer, 1 Author)

Issue

  • JDK-8386687: Regression >6% on Crypto-MLDSA.sign on x64 (Bug - P3)

Reviewing

Using git

Checkout this PR locally:
$ git fetch https://git.openjdk.org/jdk.git pull/31686/head:pull/31686
$ git checkout pull/31686

Update a local copy of the PR:
$ git checkout pull/31686
$ git pull https://git.openjdk.org/jdk.git pull/31686/head

Using Skara CLI tools

Checkout this PR locally:
$ git pr checkout 31686

View PR using the GUI difftool:
$ git pr show -t 31686

Using diff file

Download this PR as a diff file:
https://git.openjdk.org/jdk/pull/31686.diff

@bridgekeeper

bridgekeeper Bot commented Jun 26, 2026

Copy link
Copy Markdown

👋 Welcome back drwhite! A progress list of the required criteria for merging this PR into master will be added to the body of your pull request. There are additional pull request commands available for use with this pull request.

@dwhite-intel

Copy link
Copy Markdown
Author

/label hotspot-compiler

@openjdk

openjdk Bot commented Jun 26, 2026

Copy link
Copy Markdown

❗ This change is not yet ready to be integrated.
See the Progress checklist in the description for automated requirements.

@openjdk openjdk Bot added the hotspot-compiler hotspot-compiler-dev@openjdk.org label Jun 26, 2026
@openjdk

openjdk Bot commented Jun 26, 2026

Copy link
Copy Markdown

@dwhite-intel
The hotspot-compiler label was successfully added.

@openjdk

openjdk Bot commented Jun 26, 2026

Copy link
Copy Markdown

The total number of required reviews for this PR has been set to 2 based on the presence of this label: hotspot-compiler. This can be overridden with the /reviewers command.

@openjdk

openjdk Bot commented Jun 26, 2026

Copy link
Copy Markdown

@dwhite-intel To determine the appropriate audience for reviewing this pull request, one or more labels corresponding to different subsystems will normally be applied automatically. However, no automatic labelling rule matches the changes in this pull request. In order to have an "RFR" email sent to the correct mailing list, you will need to add one or more applicable labels manually using the /label pull request command.

Applicable Labels
  • build
  • client
  • compiler
  • core-libs
  • hotspot
  • hotspot-compiler
  • hotspot-gc
  • hotspot-jfr
  • hotspot-runtime
  • i18n
  • ide-support
  • javadoc
  • jdk
  • net
  • nio
  • security
  • serviceability
  • shenandoah

@dwhite-intel

dwhite-intel commented Jun 26, 2026

Copy link
Copy Markdown
Author

Here's more detail on the regression

PICKY PROBLEMS WITH PREDICATES

I eventually narrowed down the regression to the removal of a predicate(!UseAPX) when running on x86 without APX support (which is approximately 100% of x86 CPUs at the moment :-).

On x86 and S390, there is a "CISC SPILLING" optimization, controlled by UseCISCSpill.

The optimization is that if a register has been spilled to the stack, but you need to do an operation on it, instead of restoring it back to a register, the back-end will replace the canonical op_rReg_rReg instruct to a "matching" op_rReg_mem instruct.

   mov %r9,OFFSET(%rsp)
   add %r8,%r9            ===> add %r8,OFFSET(%rsp)

In this case the instruct addL_rReg(rRegL dst, rRegL src, rFlagsReg cr) was changed to instruct addL_rReg_mem(rRegL dst, memory src, rFlagsReg cr)

This all happens "magically" in generated code like build/.../hotspot/variant-server/gensrc/adfiles/ad_x86.cpp (look for "The following instructions can cisc-spill"). The .hpp file shows Node classes with extra cisc_spill methods, if the Node is eligible for the optimization.

Great, so far so good.

The regression from my simpler AXP PR occurred because of how the matching for the replacement instruction works.

I think it first looks for instruction pairs like these:
(match(Set dst (<OPERATION> dst srcReg))) => (match(Set dst (<OPERATION> dst (LoadL srcMem)))

But ALSO, each instruction must have equal predicates. It actually requires textually equal predicates, but that's not the problem here. Keep in mind that all of this matching happens at build time as it parses the x86.ad file and generates code that implements the C2 back end.

REGRESSION - What did I do?

My simpler-apx PR removed instructions for NDD with memory operands.

-------------- BEFORE PR ----------------

instruct addL_rReg(rRegL dst, rRegL src, rFlagsReg cr)
%{
  predicate(!UseAPX);
  ...
%}

instruct addL_rReg_ndd(rRegL dst, rRegL src1, rRegL src2, rFlagsReg cr)
%{
  predicate(UseAPX);
  ...
%}

instruct addL_rReg_mem(rRegL dst, memory src, rFlagsReg cr)
%{
  predicate(!UseAPX);
  ...
%}

instruct addL_rReg_rReg_mem_ndd(rRegL dst, rRegL src1, memory src2, rFlagsReg cr)
%{
  predicate(UseAPX);
  ...
%}

-------------- AFTER PR ----------------

instruct addL_rReg(rRegL dst, rRegL src, rFlagsReg cr)
%{
  predicate(!UseAPX);
  ...
%}

instruct addL_rReg_ndd(rRegL dst, rRegL src1, rRegL src2, rFlagsReg cr)
%{
  predicate(UseAPX);
  ...
%}

instruct addL_rReg_mem(rRegL dst, memory src, rFlagsReg cr)
%{
  **// predicate(!UseAPX);**
  ...
%}

// Removed instruct addL_rReg_rReg_mem_ndd and the predicate in addL_rReg_mem , so the legacy addL_rReg_mem instruct will get used with or without UseAPX.


So that's fine as far as normal JIT code gen is concerned, but the CISC_SPILL optimization will fail because addL_rReg and addL_rReg_mem no longer have identical predicates. This generates code that uses XMM registers for spilling, and contains a lot of XMM<->gp-register moves which ends up being slower than operating on the stack memory.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

hotspot-compiler hotspot-compiler-dev@openjdk.org

Development

Successfully merging this pull request may close these issues.

1 participant