Skip to content

Conversation

@SergeyTeplyakov
Copy link

@SergeyTeplyakov SergeyTeplyakov commented Jan 7, 2026

Currently stack allocation (and overall de-abstraction) works only when JIT can has a full understanding of the code. And due to lack of cross-prodcedural analysis it means that for the optimization to work the operations should be inlined.

The same is true for LazyInitializer.EnsureInitialized that takes a delegate. Currently, because EnsureInitializedCore quite big and can't be inlined by default, JIT can't stack allocate the delegate passed to the method.

The following benchmark shows the impact of this change:

[MemoryDiagnoser]
[ShortRunJob(RuntimeMoniker.Net90)]
[ShortRunJob(RuntimeMoniker.Net10_0)]
[CategoriesColumn]
[HideColumns(Column.Job, Column.Error, Column.Median, Column.RatioSD, Column.Ratio, Column.AllocRatio, Column.StdDev, Column.Gen0)]

public class BenchmarkingDelegates
{
    private string _cachedValue;

    [Benchmark]
    public string EnsureInitialized_Old()
    {
        string defaultValue = "defaultValue";
        return LazyInitializer.EnsureInitialized(ref _cachedValue, () => defaultValue);
    }

   
    [Benchmark]
    public string EnsureInitialized_New()
    {
        string defaultValue = "defaultValue";
        return MyLazyInitializer.EnsureInitialized(ref _cachedValue, () => defaultValue);
    }

The output:

| Method                | Runtime   | Mean     | Allocated |
|---------------------- |---------- |---------:|----------:|
| EnsureInitialized_Old | .NET 10.0 | 5.374 ns |      88 B |
| EnsureInitialized_New | .NET 10.0 | 1.951 ns |      24 B |
| EnsureInitialized_Old | .NET 9.0  | 6.194 ns |      88 B |
| EnsureInitialized_New | .NET 9.0  | 6.041 ns |      88 B |

The MyLazyInitializer is the copy of the existing one with AggressiveInlining flag set.

…ates stack alloc

Currently stack allocation (and overall de-abstraction) works only when JIT can has a full understanding of the code. And due to lack of cross-prodcedural analysis it means that for the optimization to work the operations should be inlined.

The same is true for `LazyInitializer.EnsureInitialized` that takes a delegate. Currently, because `EnsureInitializedCore` quite big and can't be inlined by default, JIT can't stack allocate the delegate passed to the method.

The following benchmark shows the impact of this change:

```csharp
[MemoryDiagnoser]
[ShortRunJob(RuntimeMoniker.Net90)]
[ShortRunJob(RuntimeMoniker.Net10_0)]
[CategoriesColumn]
[HideColumns(Column.Job, Column.Error, Column.Median, Column.RatioSD, Column.Ratio, Column.AllocRatio, Column.StdDev, Column.Gen0)]

public class BenchmarkingDelegates
{
    [Benchmark]
    public string EnsureInitialized_Old()
    {
        string defaultValue = "defaultValue";
        return LazyInitializer.EnsureInitialized(ref _cachedValue, () => defaultValue);
    }

   
    [Benchmark]
    public string EnsureInitialized_New()
    {
        string defaultValue = "defaultValue";
        return MyLazyInitializer.EnsureInitialized(ref _cachedValue, () => defaultValue);
    }
```

The output:

```csharp
| Method                | Runtime   | Mean     | Allocated |
|---------------------- |---------- |---------:|----------:|
| EnsureInitialized_Old | .NET 10.0 | 5.374 ns |      88 B |
| EnsureInitialized_New | .NET 10.0 | 1.951 ns |      24 B |
| EnsureInitialized_Old | .NET 9.0  | 6.194 ns |      88 B |
| EnsureInitialized_New | .NET 9.0  | 6.041 ns |      88 B |
```

The `MyLazyInitializer` is the copy of the existing one with `AggressiveInlining` flag set.
Copilot AI review requested due to automatic review settings January 7, 2026 18:48
@github-actions github-actions bot added the needs-area-label An area label is needed to ensure this gets routed to the appropriate area owners label Jan 7, 2026
@dotnet-policy-service dotnet-policy-service bot added the community-contribution Indicates that the PR has been added by a community member label Jan 7, 2026
[MethodImpl(MethodImplOptions.AggressiveInlining)]
private static T EnsureInitializedCore<T>([NotNull] ref T? target, Func<T> valueFactory) where T : class
{
T value = valueFactory() ?? throw new InvalidOperationException(SR.Lazy_StaticInit_InvalidOperation);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Moving throw new InvalidOperationException(SR.Lazy_StaticInit_InvalidOperation); to a throw-helper should also work, probably it's a better approach

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've tried just moving it to a helper without AggressiveInlining, but it was not enough. The AggressiveInlining was still required in this case. And because of that, I've decided not to move the 'throw' into a helper.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hm.. then it's fine as is, but the inliner needs some tuning apparently

Copy link
Member

@EgorBo EgorBo Jan 7, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are more EnsureInitializedCore overloads in this file, any reason why only this one needs AggressiveInlining?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are 2 more overloads that takes Func:

public static T EnsureInitialized<T>([NotNull] ref T? target, [NotNullIfNotNull(nameof(syncLock))] ref object? syncLock, Func<T> valueFactory) where T : class;

public static T EnsureInitialized<T>([AllowNull] ref T target, ref bool initialized, [NotNullIfNotNull(nameof(syncLock))] ref object? syncLock, Func<T> valueFactory);

I've tried adding AggressiveInlining to both of them (to EnsureInitializedCore methods), and here are the results:

| Method                                     | Runtime   | Mean      | Allocated |
|------------------------------------------- |---------- |----------:|----------:|
| EnsureInitialized_Old                      | .NET 10.0 |  5.873 ns |      88 B |
| EnsureInitialized_New                      | .NET 10.0 |  1.744 ns |      24 B |
| EnsureInitialized_SyncRoot_Old             | .NET 10.0 |  5.729 ns |      88 B |
| EnsureInitialized_SyncRoot_New             | .NET 10.0 |  5.692 ns |      88 B |
| EnsureInitialized_Initialized_SyncRoot_Old | .NET 10.0 | 18.475 ns |      24 B |
| EnsureInitialized_Initialized_SyncRoot_New | .NET 10.0 | 18.614 ns |      24 B |
| EnsureInitialized_Old                      | .NET 9.0  |  6.522 ns |      88 B |
| EnsureInitialized_New                      | .NET 9.0  |  6.561 ns |      88 B |
| EnsureInitialized_SyncRoot_Old             | .NET 9.0  |  6.566 ns |      88 B |
| EnsureInitialized_SyncRoot_New             | .NET 9.0  |  6.485 ns |      88 B |
| EnsureInitialized_Initialized_SyncRoot_Old | .NET 9.0  | 22.287 ns |      88 B |
| EnsureInitialized_Initialized_SyncRoot_New | .NET 9.0  | 19.889 ns |      88 B |

So in both cases the AggressiveInlining has no effect. For the first case, the JIT can't inline the method even with AggressiveInlining attribute so we still have 88B of allocations (for both the delegate and the closure), and for the second overload the JIT can inline the old version, so in both cases we have only the delegate allocation.

So I can add the attribute for consistency, but it seems that it has no effect currently for other 2 cases.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you share what specifically you tried with regards to outlining the throw? With that this attr really shouldn't be necessary, and if it is, that suggests a broader inlining issue we should fix instead.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pushed another commit with the proposed change and trigerred the EgorBot to see the before and after results.
Running this version locally was not enough to make the Core method inlined by the JIT.

@EgorBo
Copy link
Member

EgorBo commented Jan 7, 2026

@EgorBot -arm -amd

using System.Threading;
using BenchmarkDotNet.Attributes;

[MemoryDiagnoser]
public class Benchmark
{
    static string _cachedValue = null;

    [Benchmark]
    public string EnsureInitialized_Old()
    {
        string defaultValue = "defaultValue";
        return LazyInitializer.EnsureInitialized(ref _cachedValue, () => defaultValue);
    }
}

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR adds the AggressiveInlining attribute to the EnsureInitializedCore<T> method in LazyInitializer to enable JIT compiler optimizations, specifically delegate stack allocation. Benchmark results show significant performance improvements in .NET 10.0: execution time reduced from 5.374ns to 1.951ns, and allocations decreased from 88B to 24B.

Key Changes:

  • Added [MethodImpl(MethodImplOptions.AggressiveInlining)] attribute to the EnsureInitializedCore<T> method that takes a Func<T> parameter
  • Added the required using System.Runtime.CompilerServices; directive

@SergeyTeplyakov
Copy link
Author

Tagging @stephentoub for an extra pair of eyes on this change.

Checking if extracting `throw` helps JIT to inline the method without using `MethodImpl` attribute.
@SergeyTeplyakov
Copy link
Author

@EgorBot -arm -amd

using System.Threading;
using BenchmarkDotNet.Attributes;

[MemoryDiagnoser]
public class Benchmark
{
    static string _cachedValue = null;

    [Benchmark]
    public string EnsureInitialized_Old()
    {
        string defaultValue = "defaultValue";
        return LazyInitializer.EnsureInitialized(ref _cachedValue, () => defaultValue);
    }
}

Comment on lines 115 to 120
T value = valueFactory() ?? Throw();
Interlocked.CompareExchange(ref target, value, null!);
Debug.Assert(target != null);
return target;

static T Throw() => throw new InvalidOperationException(SR.Lazy_StaticInit_InvalidOperation);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
T value = valueFactory() ?? Throw();
Interlocked.CompareExchange(ref target, value, null!);
Debug.Assert(target != null);
return target;
static T Throw() => throw new InvalidOperationException(SR.Lazy_StaticInit_InvalidOperation);
T? value = valueFactory();
if (value is null)
{
Throw();
}
Interlocked.CompareExchange(ref target, value, null!);
Debug.Assert(target != null);
return target;
[DoesNotReturn]
static void Throw() => throw new InvalidOperationException(SR.Lazy_StaticInit_InvalidOperation);

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Happy to take this one. I've tried this version as well, and it was not inlined. My idea was that the generic helper was problematic here, so I've changed it to a void, but then changed it to generic in this PR.

Once the EgorBot will finish running my version, I'll accept this change to run it again.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@EgorBo, do you know why this wouldn't be inlined?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@SergeyTeplyakov @stephentoub this highlights a problem we have in JIT + PGO. We do make inliner more aggressive in hot blocks by making it less aggressive in cold blocks. Here

public static T EnsureInitialized<T>([NotNull] ref T? target, Func<T> valueFactory) where T : class
{
    var read = Volatile.Read(ref target!);
    if (read != null)
    {
        return read;
    }
    return EnsureInitializedCore(ref target, valueFactory);
}

EnsureInitializedCore is basically a cold block, so inliner decides to give up on it. In most cases this is the right thing to do, but it does mess with the escape analysis at times. We had many other places similar to this with the same issue, until we fix it (by making inliner more sophisticated or implement inter-procedural analysis, the AggressiveInlining is the only way to go 😞). cc @AndyAyersMS

you can validate my theory by setting target to point to null so the cold path is always executed (I already checked locally).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, wait, this is EnsureInitiskized_Core_. We don't want that to be able inlined; it's very explicitly separated out as we don't want it cluttering up call sites.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

EnsureInitializedCore is basically a cold block, so inliner decides to give up on it

Yes, it's a known problem without a good solution right now... similar to #116266.

I thought I had done some work to boost inlining if we saw an inlinee passed a locally allocated object, but maybe it was just for boxes.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It was clear from the get go that this is a balancing act here, but since it seems that there is a plan to make it inlineable in the future, maybe this trade is worth it?

My line of reasoning is the following. We have 3 cases here:

  • A method is used a lot and a delegate can be stack-allocated. Probably a win.
  • A method is not used a lot. The delegate won't be stack-allocated. A minor regression. But since the method is not used a lot, its not that important.
  • A method is used a lot, but the delegate can't be stack-allocated due to other reasons.

So the trade-off is between cases 1 and 3. I have only an intiutive sense that the benefits will outweight the cost. Especially given the fact that the JIT might get better and will start inlining the method (if the method is too big, JIT won't be able to inline it even with the hint).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, wait, this is EnsureInitiskized_Core_. We don't want that to be able inlined; it's very explicitly separated out as we don't want it cluttering up call sites.

@EgorBo Would it be possible to teach the JIT to move the delegate allocation into the cold if instead?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A method is used a lot, but the delegate can't be stack-allocated due to other reasons.

A super common case is a delegate that's a cached singleton because it's static. By forcing inlining, we're making the whole thing larger, which then negatively impacts the possibility of its callers being inlined / makes all such consumers larger... standard negative impact of inlining things that shouldn't be inlined.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@EgorBo Would it be possible to teach the JIT to move the delegate allocation into the cold if instead?

Maybe.

We should be able to see that the delegate is partially dead and (with PGO) likely to be dead. If so we could sink its allocation and construction like you say. We haven't really looked into sinking partially dead code. It might be doable, though proving it is safe and moving all the right pieces might be hard.

Or if we had partial escape analysis... we'd see that the delegate was unlikely to escape and so we'd stack allocate it initially and then move it to the heap if needed.

@SergeyTeplyakov
Copy link
Author

The version that I had with T Throw didn't inline:
image

@SergeyTeplyakov
Copy link
Author

@EgorBot -arm -amd

using System.Threading;
using BenchmarkDotNet.Attributes;

[MemoryDiagnoser]
public class Benchmark
{
    static string _cachedValue = null;

    [Benchmark]
    public string EnsureInitialized_Old()
    {
        string defaultValue = "defaultValue";
        return LazyInitializer.EnsureInitialized(ref _cachedValue, () => defaultValue);
    }
}

Comment on lines +115 to +120
T? value = valueFactory();
if (value is null)
{
Throw();
}

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
T? value = valueFactory();
if (value is null)
{
Throw();
}
if (valueFactory() is not { } value)
{
Throw();
}

😄

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This won't help to make the method more inlinable. So this is just a matter of style.
(Btw, my take is that don't really like the new version, maybe my brain didn't re-wire itself to parse it) :)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes!

One less line of code.

Doesn't necessarily make it faster to compile either.

😄

@jkotas jkotas added area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI tenet-performance Performance related issue and removed needs-area-label An area label is needed to ensure this gets routed to the appropriate area owners labels Jan 9, 2026
@dotnet-policy-service
Copy link
Contributor

Tagging subscribers to this area: @JulieLeeMSFT, @jakobbotsch
See info in area-owners.md if you want to be subscribed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI community-contribution Indicates that the PR has been added by a community member tenet-performance Performance related issue

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants