-
Notifications
You must be signed in to change notification settings - Fork 5.3k
Add AggressiveInlining to EnsureInitializedCore method to allow delegates stack alloc #122983
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
…ates stack alloc
Currently stack allocation (and overall de-abstraction) works only when JIT can has a full understanding of the code. And due to lack of cross-prodcedural analysis it means that for the optimization to work the operations should be inlined.
The same is true for `LazyInitializer.EnsureInitialized` that takes a delegate. Currently, because `EnsureInitializedCore` quite big and can't be inlined by default, JIT can't stack allocate the delegate passed to the method.
The following benchmark shows the impact of this change:
```csharp
[MemoryDiagnoser]
[ShortRunJob(RuntimeMoniker.Net90)]
[ShortRunJob(RuntimeMoniker.Net10_0)]
[CategoriesColumn]
[HideColumns(Column.Job, Column.Error, Column.Median, Column.RatioSD, Column.Ratio, Column.AllocRatio, Column.StdDev, Column.Gen0)]
public class BenchmarkingDelegates
{
[Benchmark]
public string EnsureInitialized_Old()
{
string defaultValue = "defaultValue";
return LazyInitializer.EnsureInitialized(ref _cachedValue, () => defaultValue);
}
[Benchmark]
public string EnsureInitialized_New()
{
string defaultValue = "defaultValue";
return MyLazyInitializer.EnsureInitialized(ref _cachedValue, () => defaultValue);
}
```
The output:
```csharp
| Method | Runtime | Mean | Allocated |
|---------------------- |---------- |---------:|----------:|
| EnsureInitialized_Old | .NET 10.0 | 5.374 ns | 88 B |
| EnsureInitialized_New | .NET 10.0 | 1.951 ns | 24 B |
| EnsureInitialized_Old | .NET 9.0 | 6.194 ns | 88 B |
| EnsureInitialized_New | .NET 9.0 | 6.041 ns | 88 B |
```
The `MyLazyInitializer` is the copy of the existing one with `AggressiveInlining` flag set.
| [MethodImpl(MethodImplOptions.AggressiveInlining)] | ||
| private static T EnsureInitializedCore<T>([NotNull] ref T? target, Func<T> valueFactory) where T : class | ||
| { | ||
| T value = valueFactory() ?? throw new InvalidOperationException(SR.Lazy_StaticInit_InvalidOperation); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Moving throw new InvalidOperationException(SR.Lazy_StaticInit_InvalidOperation); to a throw-helper should also work, probably it's a better approach
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've tried just moving it to a helper without AggressiveInlining, but it was not enough. The AggressiveInlining was still required in this case. And because of that, I've decided not to move the 'throw' into a helper.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hm.. then it's fine as is, but the inliner needs some tuning apparently
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There are more EnsureInitializedCore overloads in this file, any reason why only this one needs AggressiveInlining?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There are 2 more overloads that takes Func:
public static T EnsureInitialized<T>([NotNull] ref T? target, [NotNullIfNotNull(nameof(syncLock))] ref object? syncLock, Func<T> valueFactory) where T : class;
public static T EnsureInitialized<T>([AllowNull] ref T target, ref bool initialized, [NotNullIfNotNull(nameof(syncLock))] ref object? syncLock, Func<T> valueFactory);I've tried adding AggressiveInlining to both of them (to EnsureInitializedCore methods), and here are the results:
| Method | Runtime | Mean | Allocated |
|------------------------------------------- |---------- |----------:|----------:|
| EnsureInitialized_Old | .NET 10.0 | 5.873 ns | 88 B |
| EnsureInitialized_New | .NET 10.0 | 1.744 ns | 24 B |
| EnsureInitialized_SyncRoot_Old | .NET 10.0 | 5.729 ns | 88 B |
| EnsureInitialized_SyncRoot_New | .NET 10.0 | 5.692 ns | 88 B |
| EnsureInitialized_Initialized_SyncRoot_Old | .NET 10.0 | 18.475 ns | 24 B |
| EnsureInitialized_Initialized_SyncRoot_New | .NET 10.0 | 18.614 ns | 24 B |
| EnsureInitialized_Old | .NET 9.0 | 6.522 ns | 88 B |
| EnsureInitialized_New | .NET 9.0 | 6.561 ns | 88 B |
| EnsureInitialized_SyncRoot_Old | .NET 9.0 | 6.566 ns | 88 B |
| EnsureInitialized_SyncRoot_New | .NET 9.0 | 6.485 ns | 88 B |
| EnsureInitialized_Initialized_SyncRoot_Old | .NET 9.0 | 22.287 ns | 88 B |
| EnsureInitialized_Initialized_SyncRoot_New | .NET 9.0 | 19.889 ns | 88 B |
So in both cases the AggressiveInlining has no effect. For the first case, the JIT can't inline the method even with AggressiveInlining attribute so we still have 88B of allocations (for both the delegate and the closure), and for the second overload the JIT can inline the old version, so in both cases we have only the delegate allocation.
So I can add the attribute for consistency, but it seems that it has no effect currently for other 2 cases.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you share what specifically you tried with regards to outlining the throw? With that this attr really shouldn't be necessary, and if it is, that suggests a broader inlining issue we should fix instead.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pushed another commit with the proposed change and trigerred the EgorBot to see the before and after results.
Running this version locally was not enough to make the Core method inlined by the JIT.
|
@EgorBot -arm -amd using System.Threading;
using BenchmarkDotNet.Attributes;
[MemoryDiagnoser]
public class Benchmark
{
static string _cachedValue = null;
[Benchmark]
public string EnsureInitialized_Old()
{
string defaultValue = "defaultValue";
return LazyInitializer.EnsureInitialized(ref _cachedValue, () => defaultValue);
}
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
This PR adds the AggressiveInlining attribute to the EnsureInitializedCore<T> method in LazyInitializer to enable JIT compiler optimizations, specifically delegate stack allocation. Benchmark results show significant performance improvements in .NET 10.0: execution time reduced from 5.374ns to 1.951ns, and allocations decreased from 88B to 24B.
Key Changes:
- Added
[MethodImpl(MethodImplOptions.AggressiveInlining)]attribute to theEnsureInitializedCore<T>method that takes aFunc<T>parameter - Added the required
using System.Runtime.CompilerServices;directive
|
Tagging @stephentoub for an extra pair of eyes on this change. |
Checking if extracting `throw` helps JIT to inline the method without using `MethodImpl` attribute.
|
@EgorBot -arm -amd |
| T value = valueFactory() ?? Throw(); | ||
| Interlocked.CompareExchange(ref target, value, null!); | ||
| Debug.Assert(target != null); | ||
| return target; | ||
|
|
||
| static T Throw() => throw new InvalidOperationException(SR.Lazy_StaticInit_InvalidOperation); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| T value = valueFactory() ?? Throw(); | |
| Interlocked.CompareExchange(ref target, value, null!); | |
| Debug.Assert(target != null); | |
| return target; | |
| static T Throw() => throw new InvalidOperationException(SR.Lazy_StaticInit_InvalidOperation); | |
| T? value = valueFactory(); | |
| if (value is null) | |
| { | |
| Throw(); | |
| } | |
| Interlocked.CompareExchange(ref target, value, null!); | |
| Debug.Assert(target != null); | |
| return target; | |
| [DoesNotReturn] | |
| static void Throw() => throw new InvalidOperationException(SR.Lazy_StaticInit_InvalidOperation); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Happy to take this one. I've tried this version as well, and it was not inlined. My idea was that the generic helper was problematic here, so I've changed it to a void, but then changed it to generic in this PR.
Once the EgorBot will finish running my version, I'll accept this change to run it again.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@EgorBo, do you know why this wouldn't be inlined?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@SergeyTeplyakov @stephentoub this highlights a problem we have in JIT + PGO. We do make inliner more aggressive in hot blocks by making it less aggressive in cold blocks. Here
public static T EnsureInitialized<T>([NotNull] ref T? target, Func<T> valueFactory) where T : class
{
var read = Volatile.Read(ref target!);
if (read != null)
{
return read;
}
return EnsureInitializedCore(ref target, valueFactory);
}EnsureInitializedCore is basically a cold block, so inliner decides to give up on it. In most cases this is the right thing to do, but it does mess with the escape analysis at times. We had many other places similar to this with the same issue, until we fix it (by making inliner more sophisticated or implement inter-procedural analysis, the AggressiveInlining is the only way to go 😞). cc @AndyAyersMS
you can validate my theory by setting target to point to null so the cold path is always executed (I already checked locally).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, wait, this is EnsureInitiskized_Core_. We don't want that to be able inlined; it's very explicitly separated out as we don't want it cluttering up call sites.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
EnsureInitializedCoreis basically a cold block, so inliner decides to give up on it
Yes, it's a known problem without a good solution right now... similar to #116266.
I thought I had done some work to boost inlining if we saw an inlinee passed a locally allocated object, but maybe it was just for boxes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It was clear from the get go that this is a balancing act here, but since it seems that there is a plan to make it inlineable in the future, maybe this trade is worth it?
My line of reasoning is the following. We have 3 cases here:
- A method is used a lot and a delegate can be stack-allocated. Probably a win.
- A method is not used a lot. The delegate won't be stack-allocated. A minor regression. But since the method is not used a lot, its not that important.
- A method is used a lot, but the delegate can't be stack-allocated due to other reasons.
So the trade-off is between cases 1 and 3. I have only an intiutive sense that the benefits will outweight the cost. Especially given the fact that the JIT might get better and will start inlining the method (if the method is too big, JIT won't be able to inline it even with the hint).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, wait, this is EnsureInitiskized_Core_. We don't want that to be able inlined; it's very explicitly separated out as we don't want it cluttering up call sites.
@EgorBo Would it be possible to teach the JIT to move the delegate allocation into the cold if instead?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A method is used a lot, but the delegate can't be stack-allocated due to other reasons.
A super common case is a delegate that's a cached singleton because it's static. By forcing inlining, we're making the whole thing larger, which then negatively impacts the possibility of its callers being inlined / makes all such consumers larger... standard negative impact of inlining things that shouldn't be inlined.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@EgorBo Would it be possible to teach the JIT to move the delegate allocation into the cold if instead?
Maybe.
We should be able to see that the delegate is partially dead and (with PGO) likely to be dead. If so we could sink its allocation and construction like you say. We haven't really looked into sinking partially dead code. It might be doable, though proving it is safe and moving all the right pieces might be hard.
Or if we had partial escape analysis... we'd see that the delegate was unlikely to escape and so we'd stack allocate it initially and then move it to the heap if needed.
…Initializer.cs Co-authored-by: Stephen Toub <[email protected]>
|
@EgorBot -arm -amd using System.Threading;
using BenchmarkDotNet.Attributes;
[MemoryDiagnoser]
public class Benchmark
{
static string _cachedValue = null;
[Benchmark]
public string EnsureInitialized_Old()
{
string defaultValue = "defaultValue";
return LazyInitializer.EnsureInitialized(ref _cachedValue, () => defaultValue);
}
} |
| T? value = valueFactory(); | ||
| if (value is null) | ||
| { | ||
| Throw(); | ||
| } | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| T? value = valueFactory(); | |
| if (value is null) | |
| { | |
| Throw(); | |
| } | |
| if (valueFactory() is not { } value) | |
| { | |
| Throw(); | |
| } |
😄
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This won't help to make the method more inlinable. So this is just a matter of style.
(Btw, my take is that don't really like the new version, maybe my brain didn't re-wire itself to parse it) :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes!
One less line of code.
Doesn't necessarily make it faster to compile either.
😄
|
Tagging subscribers to this area: @JulieLeeMSFT, @jakobbotsch |

Currently stack allocation (and overall de-abstraction) works only when JIT can has a full understanding of the code. And due to lack of cross-prodcedural analysis it means that for the optimization to work the operations should be inlined.
The same is true for
LazyInitializer.EnsureInitializedthat takes a delegate. Currently, becauseEnsureInitializedCorequite big and can't be inlined by default, JIT can't stack allocate the delegate passed to the method.The following benchmark shows the impact of this change:
The output:
The
MyLazyInitializeris the copy of the existing one withAggressiveInliningflag set.