Add NonLazyBind to __ykrt_control_point declaration. by Pavel-Durov · Pull Request #300 · ykjit/ykllvm

Pavel-Durov · 2026-01-23T16:30:14Z

This PR optimises calls to __ykrt_control_point by bypassing PLT (Procedure Linkage Table) indirection, reducing call overhead in the hot path of the interpreter loop.

Changes

1. ControlPoint.cpp — Add NonLazyBind Attribute

NF = Function::Create(FType, GlobalVariable::ExternalLinkage,
                      YK_NEW_CONTROL_POINT, M);
// Use NonLazyBind to avoid PLT indirection, reducing call overhead.
NF->addFnAttr(Attribute::NonLazyBind);

Marks the __ykrt_control_point function declaration with NonLazyBind, signalling to the backend that PLT should be avoided.

2. X86MCInstLower.cpp — Handle NonLazyBind in Patchpoint Lowering

When lowering patchpoints with a GlobalAddress target, check for the NonLazyBind attribute:

case MachineOperand::MO_GlobalAddress: {
  const GlobalValue *GV = CalleeMO.getGlobal();
  if (const Function *F = dyn_cast<Function>(GV)) {
    UseGOTPCREL = F->hasFnAttribute(Attribute::NonLazyBind);
  }
  // ...
}

If NonLazyBind is set, emit a GOT-relative load instead of an immediate move:

if (UseGOTPCREL) {
  // Emit: mov symbol@GOTPCREL(%rip), %ScratchReg
  MCSymbol *Sym = MCIL.GetSymbolFromOperand(CalleeMO);
  const MCExpr *Expr = MCSymbolRefExpr::create(
      Sym, MCSymbolRefExpr::VK_GOTPCREL, Ctx);

  EmitAndCountInstruction(MCInstBuilder(X86::MOV64rm)
                              .addReg(ScratchReg)
                              .addReg(X86::RIP)
                              .addImm(1)
                              .addReg(0)
                              .addExpr(Expr)
                              .addReg(0));
  EncodedBytes = 7 + (X86II::isX86_64ExtendedReg(ScratchReg) ? 3 : 2);
}

Code Generation Comparison

Without NonLazyBind	With NonLazyBind
`mov $symbol, %reg`	`mov symbol@GOTPCREL(%rip), %reg`
`call *%reg`	`call *%reg`
PLT stub → lazy resolution	Direct GOT load → eager binding
12–13 bytes	9–10 bytes

Rationale

PLT calls go through a stub that resolves the symbol lazily on first use
GOTPCREL calls load the address directly from the Global Offset Table (resolved at startup)
The control point is called on every iteration of the interpreter loop, making PLT overhead significant
This follows the same pattern LLVM uses in X86Subtarget::classifyGlobalFunctionReference() for functions with NonLazyBind

Byte Encoding

MOV64rm (RIP-relative): 7 bytes (REX.W + opcode + ModR/M + 32-bit displacement)
CALL64r: 2 bytes (normal registers) or 3 bytes (extended registers R8–R15)
Total: 9–10 bytes vs 12–13 bytes for immediate move

vext01 · 2026-01-28T10:10:38Z

I ran simple.c under rr using ykllvm main and your branch and can confirm that this change does what it says on the tin.

In main, when we call the control point, we have a call [r11] which jumps to PLT resolution routines.

The first time this happens, this routine does quite a lot of computation (lots of looping and strcmp()), before we eventually land at __ykrt_control_point.

Subsequent calls to the control point are cheaper: the call [r11] still jumps to the PLT resolution routine, but the previous resolution is cached and we quickly jump to __ykrt_control_point. There is still a cmp presumably to check if the target has been cached (and presumably you can invalidate the cache by (e.g.) dlopen()ing a library that contains a symbol with the same name?).

So, your change saves us time by:

avoiding symbol resolution on first call to the control point (quite expensive, but only done once).
avoiding mov, test, jne for subsequent calls to the control point (less expensive than the initial symbol resolution, but probably worthwhile given the frequency we call the control point).

So, although you've touched parts of LLVM none of us are particularly familiar with:

a) your explanation makes sense to me.
b) it only affects patchpoints.

This means that the impact of the change is (mostly) annexed.

One improvement I would suggest is gating the modified behavior though. Perhaps behind the same flag that gates the pass that does control point injection (-yk-patch-control-point)? This way, no llvm tests that use patchpoints are touched. When the flag is not used, the original llvm behavior should be used.

stephenrkell · 2026-01-28T16:29:34Z

This looks good to me.

The only downside I can think of is that some tools like to breakpoint the PLT and so miss cross-DSO calls that don't do this. Actually only ltrace does this, that I'm aware of, but there may be others.

Where are the incoming calls to __ykrt_control_point coming from? Are they always cross-DSO or otherwise possibly-spanning-over-2GB? If not, then you could maybe do something even more direct if you don't care about keeping __ykrt_control_point preemptible. But I guess you've thought of that.

jryans · 2026-01-28T17:10:22Z

In broad terms, this seems sensible enough to me, though I confess I am also not especially familiar with this slice of the LLVM codebase.

I do wonder if you could potentially reduce duplication by triggering the code paths you've borrowed parts from (instead of duplicating bits of their behaviour), but perhaps that's not especially important in an experimental LLVM fork like this.

ltratt · 2026-01-28T21:39:43Z

Thanks both for the reviews!

@Pavel-Durov Do we think deduplication is possible / reasonable, or is it a little harder than someone like me (who doesn't know the codebase) might think?

Pavel-Durov · 2026-01-29T15:40:58Z

I ran simple.c under rr using ykllvm main and your branch and can confirm that this change does what it says on the tin.

In main, when we call the control point, we have a call [r11] which jumps to PLT resolution routines.

The first time this happens, this routine does quite a lot of computation (lots of looping and strcmp()), before we eventually land at __ykrt_control_point.

Subsequent calls to the control point are cheaper: the call [r11] still jumps to the PLT resolution routine, but the previous resolution is cached and we quickly jump to __ykrt_control_point. There is still a cmp presumably to check if the target has been cached (and presumably you can invalidate the cache by (e.g.) dlopen()ing a library that contains a symbol with the same name?).

So, your change saves us time by:
* avoiding symbol resolution on first call to the control point (quite expensive, but only done once).

* avoiding `mov, test, jne` for subsequent calls to the control point (less expensive than the initial symbol resolution, but probably worthwhile given the frequency we call the control point).
So, although you've touched parts of LLVM none of us are particularly familiar with:
* a) your explanation makes sense to me.

* b) it only affects patchpoints.
This means that the impact of the change is (mostly) annexed.

One improvement I would suggest is gating the modified behavior though. Perhaps behind the same flag that gates the pass that does control point injection (-yk-patch-control-point)? This way, no llvm tests that use patchpoints are touched. When the flag is not used, the original llvm behavior should be used.

Added 8b7954905e9c63cae18b613135d1b8dba4d7a51d, de65382e96916b2d8e680bc331a44fc280bd0005

Pavel-Durov · 2026-01-30T14:17:28Z

@stephenrkell

Where Do Calls to __ykrt_control_point Come From?

Interpreter code calls __ykrt_control_point(mt, &loc) inside the main loop.
Example in yklua

Are They Always Cross-DSO?

Calls to __ykrt_control_point are cross-DSO.

The architecture is:

┌─────────────────────────────────┐     ┌──────────────────────────┐
│  Interpreter Binary             │────►│  libykcapi.so (cdylib)   │
│                                 │     │                          │
│  - Compiled with ykllvm         │     │  - Contains ykrt         │
│  - Links: -lykcapi              │     │  - Exports:              │
│  - Calls __ykrt_control_point   │     │    __ykrt_control_point  │
│    via patchpoint               │     │                          │
└─────────────────────────────────┘     └──────────────────────────┘

Pavel-Durov · 2026-01-30T16:55:16Z

Thanks both for the reviews!

@Pavel-Durov Do we think deduplication is possible / reasonable, or is it a little harder than someone like me (who doesn't know the codebase) might think?

I think this duplication is minimal and acceptable (maybe I am missing something here), the patchpoint path is a narrow, self-contained change in X86MCInstLower.cpp.
The main benefit here is avoiding touching shared, target-agnostic code (like SelectionDAGBuilder). This also makes the patch smaller and easier (I think) to maintain when merging upstream LLVM.

ltratt · 2026-01-30T16:56:00Z

Works for me. @vext01 OK with you? If so, we're probably ready for squashing.

vext01 · 2026-01-30T21:40:37Z

Please squash

Apply the NonLazyBind function attribute to __ykrt_control_point so that, together with the X86MCInstLower change, patchpoint calls to the control point avoid PLT trampolines.

Pavel-Durov · 2026-01-31T10:58:30Z

Done 👉 fb75d84

Pavel-Durov assigned vext01 Jan 23, 2026

Pavel-Durov commented Jan 23, 2026

View reviewed changes

Comment thread llvm/lib/Target/X86/X86MCInstLower.cpp

Pavel-Durov commented Jan 27, 2026

View reviewed changes

Comment thread llvm/lib/Target/X86/X86MCInstLower.cpp

Add NonLazyBind to __ykrt_control_point declaration

fb75d84

Apply the NonLazyBind function attribute to __ykrt_control_point so that, together with the X86MCInstLower change, patchpoint calls to the control point avoid PLT trampolines.

Pavel-Durov force-pushed the ykllvm-nonlazybind-control-point branch from de65382 to fb75d84 Compare January 31, 2026 10:58

Pavel-Durov mentioned this pull request Jan 31, 2026

Non-PLT control-point calls ykjit/yk#2060

Merged

ltratt added this pull request to the merge queue Feb 1, 2026

Merged via the queue into main with commit 0d65c10 Feb 1, 2026
2 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add NonLazyBind to __ykrt_control_point declaration.#300

Add NonLazyBind to __ykrt_control_point declaration.#300
ltratt merged 1 commit intomainfrom
ykllvm-nonlazybind-control-point

Pavel-Durov commented Jan 23, 2026 •

edited

Loading

Uh oh!

Uh oh!

Uh oh!

vext01 commented Jan 28, 2026 •

edited

Loading

Uh oh!

stephenrkell commented Jan 28, 2026

Uh oh!

jryans commented Jan 28, 2026

Uh oh!

ltratt commented Jan 28, 2026

Uh oh!

Pavel-Durov commented Jan 29, 2026

Uh oh!

Pavel-Durov commented Jan 30, 2026

Uh oh!

Pavel-Durov commented Jan 30, 2026

Uh oh!

ltratt commented Jan 30, 2026

Uh oh!

vext01 commented Jan 30, 2026

Uh oh!

Pavel-Durov commented Jan 31, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

Conversation

Pavel-Durov commented Jan 23, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Changes

1. ControlPoint.cpp — Add NonLazyBind Attribute

2. X86MCInstLower.cpp — Handle NonLazyBind in Patchpoint Lowering

Code Generation Comparison

Rationale

Byte Encoding

Uh oh!

Uh oh!

Uh oh!

vext01 commented Jan 28, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

stephenrkell commented Jan 28, 2026

Uh oh!

jryans commented Jan 28, 2026

Uh oh!

ltratt commented Jan 28, 2026

Uh oh!

Pavel-Durov commented Jan 29, 2026

Uh oh!

Pavel-Durov commented Jan 30, 2026

Uh oh!

Pavel-Durov commented Jan 30, 2026

Uh oh!

ltratt commented Jan 30, 2026

Uh oh!

vext01 commented Jan 30, 2026

Uh oh!

Pavel-Durov commented Jan 31, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

Pavel-Durov commented Jan 23, 2026 •

edited

Loading

vext01 commented Jan 28, 2026 •

edited

Loading