Add NonLazyBind to __ykrt_control_point declaration.#300
Conversation
|
I ran In The first time this happens, this routine does quite a lot of computation (lots of looping and Subsequent calls to the control point are cheaper: the So, your change saves us time by:
So, although you've touched parts of LLVM none of us are particularly familiar with:
This means that the impact of the change is (mostly) annexed. One improvement I would suggest is gating the modified behavior though. Perhaps behind the same flag that gates the pass that does control point injection ( |
|
This looks good to me. The only downside I can think of is that some tools like to breakpoint the PLT and so miss cross-DSO calls that don't do this. Actually only Where are the incoming calls to |
|
In broad terms, this seems sensible enough to me, though I confess I am also not especially familiar with this slice of the LLVM codebase. I do wonder if you could potentially reduce duplication by triggering the code paths you've borrowed parts from (instead of duplicating bits of their behaviour), but perhaps that's not especially important in an experimental LLVM fork like this. |
|
Thanks both for the reviews! @Pavel-Durov Do we think deduplication is possible / reasonable, or is it a little harder than someone like me (who doesn't know the codebase) might think? |
Added 8b7954905e9c63cae18b613135d1b8dba4d7a51d, de65382e96916b2d8e680bc331a44fc280bd0005 |
Interpreter code calls
Calls to The architecture is: |
I think this duplication is minimal and acceptable (maybe I am missing something here), the patchpoint path is a narrow, self-contained change in |
|
Works for me. @vext01 OK with you? If so, we're probably ready for squashing. |
|
Please squash |
Apply the NonLazyBind function attribute to __ykrt_control_point so that, together with the X86MCInstLower change, patchpoint calls to the control point avoid PLT trampolines.
de65382 to
fb75d84
Compare
|
Done 👉 fb75d84 |
This PR optimises calls to
__ykrt_control_pointby bypassing PLT (Procedure Linkage Table) indirection, reducing call overhead in the hot path of the interpreter loop.Changes
1. ControlPoint.cpp — Add NonLazyBind Attribute
NF = Function::Create(FType, GlobalVariable::ExternalLinkage, YK_NEW_CONTROL_POINT, M); // Use NonLazyBind to avoid PLT indirection, reducing call overhead. NF->addFnAttr(Attribute::NonLazyBind);Marks the
__ykrt_control_pointfunction declaration withNonLazyBind, signalling to the backend that PLT should be avoided.2. X86MCInstLower.cpp — Handle NonLazyBind in Patchpoint Lowering
When lowering patchpoints with a
GlobalAddresstarget, check for theNonLazyBindattribute:If
NonLazyBindis set, emit a GOT-relative load instead of an immediate move:Code Generation Comparison
mov $symbol, %regmov symbol@GOTPCREL(%rip), %regcall *%regcall *%regRationale
X86Subtarget::classifyGlobalFunctionReference()for functions withNonLazyBindByte Encoding
MOV64rm(RIP-relative): 7 bytes (REX.W + opcode + ModR/M + 32-bit displacement)CALL64r: 2 bytes (normal registers) or 3 bytes (extended registers R8–R15)