Skip to content

perf: optimize scalar multiplications and multi-scalar multiplications circuits via lattice reductions#1697

Draft
yelhousni wants to merge 56 commits intomasterfrom
perf/ec-mul
Draft

perf: optimize scalar multiplications and multi-scalar multiplications circuits via lattice reductions#1697
yelhousni wants to merge 56 commits intomasterfrom
perf/ec-mul

Conversation

@yelhousni
Copy link
Copy Markdown
Contributor

@yelhousni yelhousni commented Jan 31, 2026

Description

This PR migrates gnark's scalar decomposition hints from the eisenstein package to the new lattice package in Consensys/gnark-crypto#799, following the lattice-based rational reconstruction approach from "Fast elliptic curve scalar multiplications in SN(T)ARK circuits" by Eagen-ElHousni-Masson-Piellard (https://eprint.iacr.org/2025/933.pdf).

The new approach provides proven bounds from LLL lattice reduction theory, replacing heuristic bounds. This allows tighter bit-width bounds for the decomposed scalars, reducing circuit constraints.

The PR also revisits the complete arithmetic path to make it more constraint-optimized.

Changes

Hint Renames

  • halfGCDrationalReconstruct (2-part decomposition using lattice.RationalReconstruct)
  • halfGCDEisensteinrationalReconstructExt (4-part decomposition using lattice.RationalReconstructExt)

Tighter Bounds

The number of bits for decomposed scalars has been reduced:

  • Old: r.BitLen()/4 + 9 (heuristic with large safety margin)
  • New: (r.BitLen()+3)/4 + 2 (proven bound from LLL: outputs < 1.25·r^(1/4))

This saves ~7 iterations in the scalar multiplication loop.

Affected Packages

  • std/algebra/emulated/sw_emulated - Emulated short Weierstrass curves G1
  • std/algebra/emulated/sw_bls12381, std/algebra/emulated/sw_bn254 and std/algebra/emulated/sw_bw6761 - emulated G2
  • std/algebra/native/sw_bls12377 - Native BLS12-377 G1 and G2
  • std/algebra/native/twistededwards - Native twisted Edwards curves

Type of change

  • New feature (non-breaking change which adds functionality)
  • Optimization

How has this been tested?

All existing tests pass:

go test -short ./std/algebra/emulated/sw_emulated/...
go test -short ./std/algebra/native/sw_bls12377/...
go test -short ./std/algebra/native/twistededwards/...

How has this been benchmarked?

Constraint Counts (Plonk/SCS)

G1 scalar multiplication:

Curve/G1 Old (eisenstein) New (lattice) Δ Improvement
emulated
secp256k1 394,959 364,495 -30,464 7.7%
BN254 390,317 364,205 -26,112 6.7%
BLS12-381 550,299 512,235 -38,064 6.9%
BW6-761 1,376,789 1,317,113 -59,676 4.3%

G2 scalar multiplication:

Curve/G2 Old (2D GLV) New (4D-lattice GLV+FakeGLV) Δ Improvement
emulated
BN254 599,779 411,854 -187,925 31.3%
BLS12-381 913,513 584,794 -328,719 36.0%
BW6-761 1,090,960 728,286 -362,674 33.2%

G1 MSM of size 2 :

Curve GLV Method Old New Δ Improvement
emulated (short Weierstrass)
P256 No 2 scalar muls + add 523,062 277,544 -245,518 46.9%
native (twisted Edwards)
BabyJubjub (BN254) No lattice 3-MSM with LogUp 9,956 6,785 -3,171 32%
Jubjub (BLS12-381) No lattice 3-MSM with LogUp 9,930 6,793 -3,137 32%
Bandersnatch (BLS12-381) Yes lattice 6-MSM with LogUp 10,185 6,820 -3,365 33%

Applications:

Precompile Old New Δ Improvement
P256Verify 666,146 533,201 -132,945 20%
BLSG2MSM (10 pairs) 9,304,517 7,135,929 -2,168,588 23.3%
ECMul (BN254) 210,369 195,663 -14,706 7.0%
BLSG1MSM (10 pairs) 4,397,157 4,243,497 -153,660 3.5%
KZGPointEval 2,928,188 2,897,456 -30,732 1.0%
PLONK recursion Old New Δ Improvement
Emulated (BW6-761 in BN254) 15,042,004 14,713,771 -328,233 2.2%
EdDSA GLV Old New Δ Improvement
Jubjub (BLS12-381) No 13,570 10,680 -2,890 21%
Bandersnatch (BLS12-381) Yes 13,835 10,706 -3,129 23%

Discussion

1. Hint Computation Time

2-part decomposition

Method Time Notes
Old (HalfGCD) 9.5 μs xgcd (PrecomputeLattice)
New (uncached) 5.0 μs xgcd
New (cached) 3.8 μs xgcd (Cached reconstructor)

4-part decomposition (RationalReconstructExt) - GLV curves

Method Time Notes
Old (HalfGCDEisenstein) 43 μs Eisenstein HalfGCD
New (uncached) 43 ms LLL from scratch
New (cached) 43 ms LLL (Caching doesn't help here)

The new approach is slower for hint computation (4D) because it runs LLL reduction from scratch rather than using 2-step Eisenstein half-GCD. However, hint computation happens outside the prover and is negligible compared to proof generation time. The constraint reduction provides a net benefit.

3-part decomposition (MultiRationalReconstruct) - 2 scalars

Method Time Notes
New (uncached) 1.7 ms LLL from scratch
New (cached) 0.6 μs LLL (Huge speedup with caching)

6-part decomposition (MultiRationalReconstructExt) - 2 scalars

Method Time Notes
New (uncached) 563 ms LLL from scratch
New (cached) 500 ms Minimal improvement

2. logup vs Mux

For G1 we can do a 4-MSM. For G2 we can leverage the Frobenius as a second endomorphism, we can apply it to all and get a 8-MSM or to half and get a 6-MSM. But with big tables Mux becomes the bottlneck, so we can try with logup.

  • G1 BLS12-381:

    Method Mux Logup Savings
    G1 4D GLV+FakeGLV 272,520 340,273 +24.9% worse
  • G2 BLS12-381:

    Method Mux Logup Savings
    G2 4D GLV+FakeGLV 584,794 741,975 +26.9% worse
    G2 6D GLV+GLS+FakeGLV 1,645,337 1,512,349 -8.1% better
    G2 8D GLV+GLS+FakeGLV 6,239,417 5,786,450 -7.3% better

The 4D GLV+FakeGLV method with Mux remains optimal for single scalar multiplication on both G1 and G2. Higher-dimensional methods (6D, 8D) using the ψ endomorphism don't reduce constraints because the Mux/logup overhead outweighs the benefits of fewer loop iterations, even with logup optimization.

3. MSM

According to [EEMP25], we can turn a MSM(2,n) verification (i.e. a MSM of size 2 with scalars of n bits) into a MSM(3,2n/3) or MSM(6,n/3) verification. We implemented this for the native (SW and tEd) and emulated (SW) cases with Mux and logup (for native). For all the scenario existing algorithms were better except for:

  • native non-GLV tEd MSM(3,2n/3) with LogUp
  • emulated non-GLV SW MSM(3,2n/3) with Mux
  • native GLV tEd MSM(6,n/3) with Mux(Bandersnatch).

Checklist:

  • I have performed a self-review of my code
  • I have commented my code, particularly in hard-to-understand areas
  • I have added tests that prove my fix is effective or that my feature works
  • golangci-lint does not output errors locally
  • New and existing unit tests pass locally with my changes
  • Any dependent changes have been merged and published in downstream modules (gnark-crypto lattice package)

Note

High Risk
High risk because it rewrites core elliptic-curve scalar multiplication verification logic and hint interfaces (including complete-arithmetic edge cases) across multiple curves; any mistake can silently break proof soundness or curve arithmetic correctness.

Overview
Optimizes elliptic-curve scalar multiplication verification in circuits by replacing Eisenstein/half-GCD based scalar decompositions with LLL-backed lattice rational reconstruction (rationalReconstruct, rationalReconstructExt) and tightening sub-scalar bit bounds.

Updates emulated SW code to use the new hints (including new G2 hints) and introduces a new GLV+fakeGLV-based G2.ScalarMul path for bls12-381, bn254, and bw6-761, with precomputed generator constants and expanded edge-case handling under algopts.WithCompleteArithmetic().

Refactors native SW bls12-377 to use lattice reconstruction for GLV+fakeGLV checks, adds a hint-backed complete G1 joint-scalar-mul verifier, and adjusts arithmetic to avoid overflow via emulated scalar-field checks.

Improves MSM/joint-scalar-mul behavior in sw_emulated (prefer ScalarMul+Add for non-GLV, bias-point strategy, stricter edge-case/soundness handling) and updates twisted Edwards DoubleBaseScalarMul to choose between GLV and non-GLV optimized paths; adds extensive new tests/benchmarks and updates internal/stats/latest_stats.csv accordingly.

Written by Cursor Bugbot for commit 986d261. This will update automatically on new commits. Configure here.

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR migrates gnark's scalar decomposition hints from the eisenstein package to the new lattice package in gnark-crypto, implementing lattice-based rational reconstruction following the approach from "Fast elliptic curve scalar multiplications in SN(T)ARK circuits" (EEMPE 2025). The new approach provides proven bounds from LLL lattice reduction theory instead of heuristic bounds, enabling tighter bit-width bounds for decomposed scalars.

Changes:

  • Renamed hint functions: halfGCDrationalReconstruct and halfGCDEisensteinrationalReconstructExt
  • Reduced bit bounds from r.BitLen()/4 + 9 to (r.BitLen()+3)/4 + 2, saving ~7 iterations in scalar multiplication loops
  • Updated imports to use github.com/consensys/gnark-crypto/algebra/lattice instead of the eisenstein package

Reviewed changes

Copilot reviewed 9 out of 9 changed files in this pull request and generated no comments.

Show a summary per file
File Description
std/algebra/native/twistededwards/hints.go Reimplemented rationalReconstruct hint using lattice.RationalReconstruct with proper sign handling and overflow computation
std/algebra/native/twistededwards/point.go Updated hint call from halfGCD to rationalReconstruct
std/algebra/native/twistededwards/curve_test.go Added benchmark for constraint counting
std/algebra/native/sw_bls12377/hints.go Reimplemented rationalReconstructExt using lattice.RationalReconstructExt for 4-part decomposition
std/algebra/native/sw_bls12377/g1.go Updated hint call, bounds calculation, and comments to reflect new LLL-proven bounds
std/algebra/native/sw_bls12377/g1_test.go Added benchmark for constraint counting
std/algebra/emulated/sw_emulated/hints.go Reimplemented both rationalReconstruct and rationalReconstructExt for emulated field arithmetic
std/algebra/emulated/sw_emulated/point.go Updated hint calls, bounds calculation, and comments
std/algebra/emulated/sw_emulated/point_test.go Added benchmarks for multiple curve configurations

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@yelhousni yelhousni self-assigned this Feb 2, 2026
@yelhousni yelhousni added dep: linea Issues affecting Linea downstream type: perf feat: ECC labels Feb 2, 2026
@yelhousni yelhousni added this to the v0.14.N milestone Feb 2, 2026
@yelhousni yelhousni changed the title perf: use lattice reduction instead of eisenstein gcd for tighter bounds perf: optimize scalar multiplications and multi-scalar multiplications Feb 5, 2026
@yelhousni yelhousni changed the title perf: optimize scalar multiplications and multi-scalar multiplications perf: optimize scalar multiplications and multi-scalar multiplications circuits Feb 5, 2026
@yelhousni yelhousni changed the title perf: optimize scalar multiplications and multi-scalar multiplications circuits perf: optimize scalar multiplications and multi-scalar multiplications circuits via lattice reductions Feb 8, 2026
@ivokub
Copy link
Copy Markdown
Collaborator

ivokub commented Mar 16, 2026

Hmm, something strange is happening right now -- testing with -short passes on twistededwards package, but not with -tags prover_checks (using actual solver).

And I think the issue is that we're using DivUnchecked(0,0) in

func (p *Point) phi(api frontend.API, p1 *Point, curve *CurveParams, endo *EndoParams) *Point {

	xy := api.Mul(p1.X, p1.Y)
	yy := api.Mul(p1.Y, p1.Y)
	f := api.Sub(1, yy)
	f = api.Mul(f, endo.Endo[1])
	g := api.Add(yy, endo.Endo[0])
	g = api.Mul(g, endo.Endo[0])
	h := api.Sub(yy, endo.Endo[0])

	p.X = api.DivUnchecked(f, xy) // <---- here
	p.Y = api.DivUnchecked(g, h)

	return p
}

which is unconstrained by the frontend.API:

	// DivUnchecked returns i1 / i2
	// If i1 == i2 == 0, the return value (0) is unconstrained.
	DivUnchecked(i1, i2 Variable) Variable

Here test engine silently returns 0 and R1CS solver as well (it could return anything though), but PLONK solver explicitly fails here.

So I think there is still the issue that the twistededwards path doesn't handle edge cases GLV in twistededwards.

Additionally, imo in another PR we should make test engine more strict to panic explicitly in case we have DivUnchecked(0,0) to avoid having unconstrained circuits during development time.

@ivokub
Copy link
Copy Markdown
Collaborator

ivokub commented Mar 16, 2026

Hmm, something strange is happening right now -- testing with -short passes on twistededwards package, but not with -tags prover_checks (using actual solver).

And I think the issue is that we're using DivUnchecked(0,0) in

func (p *Point) phi(api frontend.API, p1 *Point, curve *CurveParams, endo *EndoParams) *Point {

	xy := api.Mul(p1.X, p1.Y)
	yy := api.Mul(p1.Y, p1.Y)
	f := api.Sub(1, yy)
	f = api.Mul(f, endo.Endo[1])
	g := api.Add(yy, endo.Endo[0])
	g = api.Mul(g, endo.Endo[0])
	h := api.Sub(yy, endo.Endo[0])

	p.X = api.DivUnchecked(f, xy) // <---- here
	p.Y = api.DivUnchecked(g, h)

	return p
}

which is unconstrained by the frontend.API:

	// DivUnchecked returns i1 / i2
	// If i1 == i2 == 0, the return value (0) is unconstrained.
	DivUnchecked(i1, i2 Variable) Variable

Here test engine silently returns 0 and R1CS solver as well (it could return anything though), but PLONK solver explicitly fails here.

So I think there is still the issue that the twistededwards path doesn't handle edge cases GLV in twistededwards.

Additionally, imo in another PR we should make test engine more strict to panic explicitly in case we have DivUnchecked(0,0) to avoid having unconstrained circuits during development time.

Made test engine stricter in #1734. It is merged now and could merge master into this branch for helping to debug.

@yelhousni
Copy link
Copy Markdown
Contributor Author

yelhousni commented Mar 16, 2026

Thanks for the stricter test engine in #1734 — merged master and it immediately surfaced the issue.

The problem was in phi: when the input is the identity (0,1), both f and xy are 0, leading to DivUnchecked(0,0). Fixed by selecting xy=1 when p1.X=0 (since f=0 too, 0/1=0 gives the correct X coordinate). Tests pass now. Mathematically speaking phi is defined over the prime subgroup, which the identity (0,1) belongs to -- but we need to explicitly handle in-circuit (0,1)-->(0,1) under phi.

Copy link
Copy Markdown
Collaborator

@ivokub ivokub left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have reviewed the emulated cases, still reviewing 2-chains. I'm posting my comments for now. I'm not confident that the changes are correct, particularly we seem to trust the hinted scalar mul result before we constrain it. And later we also may switch back to hinted result without constraining for particular edge cases (scalar=0 for example).

@ivokub
Copy link
Copy Markdown
Collaborator

ivokub commented Mar 17, 2026

Also for sw_emulated I was able to create the POC hitting completeness bug:

func TestScalarMulFakeGLVUnsafeS1Fails(t *testing.T) {
	assert := test.NewAssert(t)
	p256 := elliptic.P256()

	s := big.NewInt(1)
	px, py := p256.Params().Gx, p256.Params().Gy

	unsafeCircuit := ScalarMulFakeGLVTest[emulated.P256Fp, emulated.P256Fr]{}
	completeCircuit := ScalarMulFakeGLVEdgeCasesTest[emulated.P256Fp, emulated.P256Fr]{}

	witness := ScalarMulFakeGLVTest[emulated.P256Fp, emulated.P256Fr]{
		S: emulated.ValueOf[emulated.P256Fr](s),
		Q: AffinePoint[emulated.P256Fp]{
			X: emulated.ValueOf[emulated.P256Fp](px),
			Y: emulated.ValueOf[emulated.P256Fp](py),
		},
		R: AffinePoint[emulated.P256Fp]{
			X: emulated.ValueOf[emulated.P256Fp](px),
			Y: emulated.ValueOf[emulated.P256Fp](py),
		},
	}

	err := test.IsSolved(&unsafeCircuit, &witness, testCurve.ScalarField())
	assert.Error(err)

	completeWitness := ScalarMulFakeGLVEdgeCasesTest[emulated.P256Fp, emulated.P256Fr]{
		S: witness.S,
		P: witness.Q,
		R: witness.R,
	}
	err = test.IsSolved(&completeCircuit, &completeWitness, testCurve.ScalarField())
	assert.NoError(err)
}

So this fails to solve for s=1 and Q=G without WithCompleteArithmetic case, but the computation should be complete unless s=0 or Q=(0,0). I think in general we cannot use the dummy-point approach for avoiding hitting incomplete cases in general and could always find a counterexample.

@yelhousni yelhousni requested a review from ivokub March 18, 2026 17:45
Copy link
Copy Markdown
Collaborator

@ivokub ivokub left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is still the completeness issue for sw_emulated.scalarMulFakeGLV

@@ -1229,18 +1247,20 @@ func (c *Curve[B, S]) scalarMulFakeGLV(Q *AffinePoint[B], s *emulated.Element[S]
panic(err)
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This doesn't still handle all edge cases in case we have WithCompleteArithmetic set. See the test:

func TestScalarMulFakeGLVUnsafeS1Fails(t *testing.T) {
	assert := test.NewAssert(t)
	p256 := elliptic.P256()

	s := big.NewInt(1)
	px, py := p256.Params().Gx, p256.Params().Gy

	unsafeCircuit := ScalarMulFakeGLVTest[emulated.P256Fp, emulated.P256Fr]{}
	completeCircuit := ScalarMulFakeGLVEdgeCasesTest[emulated.P256Fp, emulated.P256Fr]{}

	witness := ScalarMulFakeGLVTest[emulated.P256Fp, emulated.P256Fr]{
		S: emulated.ValueOf[emulated.P256Fr](s),
		Q: AffinePoint[emulated.P256Fp]{
			X: emulated.ValueOf[emulated.P256Fp](px),
			Y: emulated.ValueOf[emulated.P256Fp](py),
		},
		R: AffinePoint[emulated.P256Fp]{
			X: emulated.ValueOf[emulated.P256Fp](px),
			Y: emulated.ValueOf[emulated.P256Fp](py),
		},
	}

	err := test.IsSolved(&unsafeCircuit, &witness, testCurve.ScalarField())
	assert.Error(err)

	completeWitness := ScalarMulFakeGLVEdgeCasesTest[emulated.P256Fp, emulated.P256Fr]{
		S: witness.S,
		P: witness.Q,
		R: witness.R,
	}
	err = test.IsSolved(&completeCircuit, &completeWitness, testCurve.ScalarField())
	assert.NoError(err)
}

I think we should revert to using complete arithmetic in case the option is set instead of using dummy point to shift. As the dummy point is known beforehand, then it would otherwise always be possible to create an edge case imo which leads to incomplete circuit.

Copy link
Copy Markdown

@cursor cursor bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 1 potential issue.

Fix All in Cursor

Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, have a team admin enable autofix in the Cursor dashboard.

@yelhousni yelhousni requested a review from ivokub March 25, 2026 17:58
@ivokub ivokub marked this pull request as draft March 26, 2026 08:35
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

dep: linea Issues affecting Linea downstream feat: ECC type: perf

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants