Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

FOH (Frozen Object Heap) work items #76151

Closed
14 tasks done
EgorBo opened this issue Sep 25, 2022 · 36 comments
Closed
14 tasks done

FOH (Frozen Object Heap) work items #76151

EgorBo opened this issue Sep 25, 2022 · 36 comments
Assignees
Labels
area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI
Milestone

Comments

@EgorBo
Copy link
Member

EgorBo commented Sep 25, 2022

A tracking issue for FOH-related tasks (Frozen Object Heap). It's a special heap for immortal objects such as string literals, Type objects, etc (see the list below). Conceptually similar to POH but it doesn't have a public API thus it never contains short-living objects + there are some relaxations in GC for FOH. It provides two advantages:

  1. Makes GC's life easier by moving immortal objects out of normal heap
  2. For some objects VM no longer needs to allocate pinned handles
  3. JIT can "bake" direct references to FOH objects in codegen, see example.
  4. Unlocks some JIT optimizations like folding field accesses for immutable objects, fold objects comparisons, etc.

.NET 8.0

Suggestions are very welcome!

category:planning
theme:memory-usage
skill-level:expert
cost:medium
impact:medium

@dotnet-issue-labeler
Copy link

I couldn't figure out the best area label to add to this issue. If you have write-permissions please help me learn by adding exactly one area label.

@ghost ghost added the untriaged New issue has not been triaged by the area owner label Sep 25, 2022
@EgorBo EgorBo added this to the 8.0.0 milestone Sep 25, 2022
@ghost ghost removed the untriaged New issue has not been triaged by the area owner label Sep 25, 2022
@EgorBo EgorBo added the area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI label Sep 25, 2022
@EgorBo EgorBo self-assigned this Sep 25, 2022
@ghost
Copy link

ghost commented Sep 25, 2022

Tagging subscribers to this area: @JulieLeeMSFT, @jakobbotsch
See info in area-owners.md if you want to be subscribed.

Issue Details

.NET 8.0

Potential good/bad ideas

  • Always allocate new T[0] on FOH. Technically, it violates ECMA but we already do the same for:
bool same = ReferenceEquals(new string(new char[0]), new string(new char[0])); // true

However, it's unlikely that it's worth the potential risks.

  • Optimize str == "" as just cmp reg, 0xFOHREFERENCE - it turns out that it's still possible to produce new empty strings

  • Cache boxed objects and allocated them on FOH (e.g. boxed True, boxed False like these)
    So, for example:

object o = myInt;

will be optimized in codegen to:

if ((uint)myInt < 100)
    o = 0xBAADF00D (/*boxed zero on FOH*/ * myInt * 24
else
    o = box(myInt)

Same for booleans, etc. Can't find the related issue but AFAIR it raised some ECMA-related concerns (around box to always emit a new object)

Author: EgorBo
Assignees: -
Labels:

area-CodeGen-coreclr

Milestone: 8.0.0

@jkotas
Copy link
Member

jkotas commented Sep 25, 2022

Potential good/bad ideas

Same ideas have been discussed multiple times. E.g.: #7079 (comment) . They have been always punted so far.

(These ideas are not FOH specific. They can be implemented without FOH by paying for one extra indirection that is a minuscule cost.)

@stephentoub
Copy link
Member

Always allocate new T[0] on FOH

We ourselves have places we don't use Array.Empty because the use is relying on reference equality/inequality.

@EgorBo
Copy link
Member Author

EgorBo commented Sep 25, 2022

We ourselves have places we don't use Array.Empty because the use is relying on reference equality/inequality.

Thanks, good to know, updated the item.

Same ideas have been discussed multiple times. E.g.: #7079 (comment) . They have been always punted so far.

Right, it's just that since we now have FOH we don't need to fragmentate POH/SOH with that pinned cache. I agree that the idea is dangerous, although I still think it could be an opt-in feature for some large projects (e.g. for those who had to do it by hands today), the implementation should be relatively simple.

@jkotas
Copy link
Member

jkotas commented Sep 25, 2022

I still think it could be an opt-in feature for some large projects

Doubt it. Dangerous opt-ins do not work for large projects. It is impossible (prohibitively expensive) to audit their code to validate that it is safe to opt-in.

@EgorBo
Copy link
Member Author

EgorBo commented Sep 25, 2022

I still think it could be an opt-in feature for some large projects

Doubt it. Dangerous opt-ins do not work for large projects. It is impossible (prohibitively expensive) to audit their code to validate that it is safe to opt-in.

fair enough!

@GSPP
Copy link

GSPP commented Sep 28, 2022

Optimize str == "" as just cmp reg, 0xFOHREFERENCE - it turns out that it's still possible to produce new empty strings so

I'm curious how that is possible. If memory serves correctly, @jkotas stated on this issue tracker a long time ago that the policy is to have the empty string be a singleton.

@EgorBo
Copy link
Member Author

EgorBo commented Sep 28, 2022

Optimize str == "" as just cmp reg, 0xFOHREFERENCE - it turns out that it's still possible to produce new empty strings so

I'm curious how that is possible. If memory serves correctly, @jkotas stated on this issue tracker a long time ago that the policy is to have the empty string be a singleton.

That is exactly what is written in that item, isn't? I just decided to put some ideas from the past to, probably, inspire for new ones.

@jkotas
Copy link
Member

jkotas commented Sep 28, 2022

I'm curious how that is possible.

For example:

var s = string.Copy("");

Console.WriteLine(s == "");
Console.WriteLine(ReferenceEquals(s, ""));

@EgorBo
Copy link
Member Author

EgorBo commented Sep 28, 2022

Ah, the question was "how is it possible to create a unique empty string" and I read it as "how the optimization you propose is possible" 🙂

@EgorBo
Copy link
Member Author

EgorBo commented Oct 4, 2022

Idea: try to reserve memory for the first segment (4mb) next to coreclr in memory, the way we do currently for 4Gb for loader heap, I guess it won't hurt if we first try to reserve 4mb and then those 4Gb.

@EgorBo
Copy link
Member Author

EgorBo commented Oct 6, 2022

@jkotas am I correct that we can handle static readonly object SyncObj = new object(); (common idiom for synchronization objects) in JIT without too much efforts?

If we're in a static constructor and see pattern like

newobj instance void [System.Runtime]System.Object::.ctor()
stsfld object myReadonField

we can replace newobj with a helper that allocates on FOH, right?

image

ah, I assume only if constructor doesn't touch that field any more (doesn't try to set it multiple times)

@EgorBo
Copy link
Member Author

EgorBo commented Oct 6, 2022

Same for empty (or any size) arrays:

ldc.i4.0
newarr [System.Runtime]System.Int32
stsfld int32[] C::x

so we won't have to special-case Array.Empty.

So the strategy will be:

  1. Import a cctor normally in JIT, make a list of static readonly fields which are set only once (and aren't address exposed/escape)
  2. Check that cctor doesn't have backward branches (potential loops)
  3. Replace normal allocator helpers with frozen ones.

Or actually, as an initial impl, it can be just:

stsfld
ret

@jkotas
Copy link
Member

jkotas commented Oct 6, 2022

You can do this for non-collectible code only. Also, you can do the same thing for arrays of length 0. It would cover the Array.Empty case without special casing.

Or actually, as an initial impl, it can be just:

I am not sure what you mean by this.

@EgorBo
Copy link
Member Author

EgorBo commented Oct 6, 2022

I am not sure what you mean by this.

I meant that if a static constructor is complicated then

newarr/newobj
stsfld

pattern might be set in a loop so we'll end up allocating immortal frozen object each iteration, etc. So I mean that if we see

newarr/newobj
stsfld
ret

then it's definitely just a simple cctor that only sets one field 🙂 - I meant it as an initial quick/safe implementation. Array.Empty falls into that category

@EgorBo
Copy link
Member Author

EgorBo commented Oct 8, 2022

Idea: make pinning no-op for frozen objects since we plan to move static readonly arrays there.

@MichalPetryka
Copy link
Contributor

Idea: provide higher alignment for arrays on FOH in order to achieve higher perf with SIMD operations.

@EgorBo
Copy link
Member Author

EgorBo commented Oct 8, 2022

Idea: provide higher alignment for arrays on FOH in order to achieve higher perf with SIMD operations.

Definitely will require changes on GC side where gc assumes fixed alignment for all objects when it iterates them, e.g:

void gc_heap::seg_set_mark_bits (heap_segment* seg)
{
    uint8_t* o = heap_segment_mem (seg);
    while (o < heap_segment_allocated (seg))
    {
        set_marked (o);
        o = o + Align (size(o));
    }
}

@jkotas
Copy link
Member

jkotas commented Oct 8, 2022

I would also say that this is very niche. NativeMemory.AlignedAlloc and GC.AllocateArray variant that allocates on POH with specific alignment should sufficiently cover this space.

Note that the embedding of the object references from readonly fields can be also extended to pinned objects allocated on POH.

readonly static int[] a = GC.AllocateArray(..., pinned: true); // This can be treated the same way as if the array was allocated on FOH 

@EgorBo
Copy link
Member Author

EgorBo commented Oct 8, 2022

You can do this for non-collectible code only

@jkotas do I understand it correctly that in order to detect that, I need to add an additional argument to my custom alloc helper - StackCrawlMarkHandle to then get Assembly object inside the alloc via stack walking and find out whether it's collectible or not?

Basically, call Assembly.GetExecutingAssembly().IsCollectible.

However, it's probably easier to just pass cctor's type handle to that helper

UPD: we don't support tiered compilation for collectible assemblies so in theory we can emit custom allocs only for Tier1 (promoted from tier0). But that might be not future-proof.

@jkotas
Copy link
Member

jkotas commented Oct 8, 2022

Stack crawling to find caller is an anti-pattern. You should do it all at JIT time, something like:

  • Add CORJIT_FLAG_FROZEN_ALLOCATIONS_ALLOWED that is set for cctors in non-collectible code
  • Add CORINFO_HELP_NEW_FROZEN and CORINFO_HELP_NEWARRAY_1_FROZEN JIT helpers that the JIT replaces allocations eligible for freezing with.
  • Add READYTORUN_HELPER_NewFrozenObject and READYTORUN_HELPER_NewFrozenArray helpers for R2R that map to the above JIT helpers, so that R2R code does this optimization too and allows the eventual tiered-compilation to benefit.

@MichalPetryka
Copy link
Contributor

Stack crawling to find caller is an anti-pattern. You should do it all at JIT time, something like:

  • Add CORJIT_FLAG_FROZEN_ALLOCATIONS_ALLOWED that is set for cctors in non-collectible code
  • Add CORINFO_HELP_NEW_FROZEN and CORINFO_HELP_NEWARRAY_1_FROZEN JIT helpers that the JIT replaces allocations eligible for freezing with.
  • Add READYTORUN_HELPER_NewFrozenObject and READYTORUN_HELPER_NewFrozenArray helpers for R2R that map to the above JIT helpers, so that R2R code does this optimization too and allows the eventual tiered-compilation to benefit.

Can't R2R assemblies be loaded into collectible ALCs?

@jkotas
Copy link
Member

jkotas commented Oct 8, 2022

Can't R2R assemblies be loaded into collectible ALCs?

They cannot today:

DoLog("Ready to Run disabled - collectible module");
. And even if it was allowed, this scheme would be still compatible with it. We would substitute the frozen allocation helper with a regular allocation helper when loading the R2R image into collectible context.

@EgorBo
Copy link
Member Author

EgorBo commented Oct 10, 2022

@jkotas @Maoni0 you asked for numbers - I downloaded BingSNR, updated to net8 and ran its benchmarks locally with bombardier (that simulates load):
338000 objects were allocated on FOH, total size is 17Mb (=5 frozen segments since size of a segment is hard-coded to 4Mb).

And here are the general performance counters:

[System.Runtime]
    % Time in GC since last GC (%)                                 1
    Allocation Rate (B / 1 sec)                                8,160
    CPU Usage (%)                                                  0
    Exception Count (Count / 1 sec)                                0
    GC Committed Bytes (MB)                                    2,587.238
    GC Fragmentation (%)                                           5.389
    GC Heap Size (MB)                                          2,033.518
    Gen 0 GC Count (Count / 1 sec)                                 0
    Gen 0 Size (B)                                         8,245,392
    Gen 1 GC Count (Count / 1 sec)                                 0
    Gen 1 Size (B)                                        17,739,352
    Gen 2 GC Count (Count / 1 sec)                                 0
    Gen 2 Size (B)                                            1.4631e+09
    IL Bytes Jitted (B)                                   21,699,370
    LOH Size (B)                                              6.1376e+08
    Monitor Lock Contention Count (Count / 1 sec)                  0
    Number of Active Timers                                       14
    Number of Assemblies Loaded                                5,167
    Number of Methods Jitted                                 237,451
    POH (Pinned Object Heap) Size (B)                      2,897,848
    ThreadPool Completed Work Item Count (Count / 1 sec)           6
    ThreadPool Queue Length                                        0
    ThreadPool Thread Count                                        3
    Time spent in JIT (ms / 1 sec)                                 0
    Working Set (MB)                                           7,083.553

My env vars in setup.bat:

REM fix for .NET 7 to use segments instead of regions due to overcommit in net7rc1
SET xxCOMPlus_GCName=clrgc.dll
SET DOTNET_ReadyToRun=0
SET DOTNET_TieredCompilation=1

NOTE: I commented GCName to enable gc regions (just to test net8).

@jkotas
Copy link
Member

jkotas commented Oct 10, 2022

total size is 17Mb (=5 frozen segments since size of a segment is hard-coded to 4Mb).

Should we double the frozen segment reservations (ie 4mb, 8mb, 16mb, ...) to have a fewer of these?

@EgorBo
Copy link
Member Author

EgorBo commented Oct 10, 2022

total size is 17Mb (=5 frozen segments since size of a segment is hard-coded to 4Mb).

Should we double the frozen segment reservations (ie 4mb, 8mb, 16mb, ...) to have a fewer of these?

Makes sense!

Btw, it's impressive how much GC Regions save for the service, GC Heap with regions: 2Gb, without 3.5-4.5Gb

@jkotas
Copy link
Member

jkotas commented Oct 10, 2022

SET DOTNET_ReadyToRun=0

I do not think that Bing is disabling R2R. Was there a particular reason you have disabled it?

@EgorBo
Copy link
Member Author

EgorBo commented Oct 10, 2022

SET DOTNET_ReadyToRun=0

I do not think that Bing is disabling R2R. Was there a particular reason you have disabled it?

It was an unrelated experiment - just wanted to see how much methods it jits in total, the biggest number I saw was 238k methods (21.8Mb IL Jitted) - now plan to check the actual size of native code in memory.

@Maoni0
Copy link
Member

Maoni0 commented Oct 10, 2022

thanks for the data!

Btw, it's impressive how much GC Regions save for the service, GC Heap with regions: 2Gb, without 3.5-4.5Gb

I would also say that this is very niche. NativeMemory.AlignedAlloc and GC.AllocateArray variant that allocates on POH with specific alignment should sufficiently cover this space.

exactly, you don't need the GC support for this.

fewer FOH segs would be good.

@EgorBo
Copy link
Member Author

EgorBo commented Nov 2, 2022

More insights from the bing service - among those 17Mb of objects in the FOH, 9mb of them are RuntimeType and 8mb are string literals, the longest string literal is 24kb

@stephentoub
Copy link
Member

stephentoub commented Nov 2, 2022

the longest string literal is 24kb

So... curious...

Some sort of lookup table?

@EgorBo
Copy link
Member Author

EgorBo commented Nov 2, 2022

the longest string literal is 24kb

So... curious...

Some sort of lookup table?

The content looks like a huge set of coordinates (comma-separated floating points) for that one 🙂

@mjsabby
Copy link
Contributor

mjsabby commented Nov 5, 2022

I'm familiar with this service, and the FOH should be significantly larger than 17MB, it should be in the order of ~800MB or so. It contains all the RESX strings, and significant other object graphs. Perhaps we can chat offline to see why you're not seeing that size.

One question I did have is, will this collection of work regress performance of file-backed segments in the FOH?

@EgorBo
Copy link
Member Author

EgorBo commented Nov 5, 2022

@mjsabby

I'm familiar with this service, and the FOH should be significantly larger than 17MB

I was only counting objects in FOH segments added by runtime. Potentially, for your service we can surface a configuration knob to set the initial size e.g. 20Mb to have only just one extra frozen segment (currently it's 3).

One question I did have is, will this collection of work regress performance of file-backed segments in the FOH?

I'd not expect any regressions. Perhaps, only if you're not on GC Regions as we had to enable scanning for in-range FOH segments there (#76251)

@EgorBo
Copy link
Member Author

EgorBo commented Jun 9, 2023

All items except dotnet/diagnostics#4156 are closed, so closing it for .NET 8.0 and leave dotnet/diagnostics#4156 open (working on it now)

@EgorBo EgorBo closed this as completed Jun 9, 2023
@ghost ghost locked as resolved and limited conversation to collaborators Jul 9, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI
Projects
None yet
Development

No branches or pull requests

7 participants