TcSymbolUseData[] accounting for 19.1 MB on the LOH #6084

cartermp · 2019-01-12T01:13:33Z

This was after opening VS (dev16.0, with built-in VSIX) for about three minutes, slowly implementing a function in FSharp.Editor that uses FCS and Roslyn types for about 3 minutes. I say slowly because I code slowly 🙂

This is the TcSymbolUses ctor getting called 49 times, which is ~390k on the LOH each time. This is called by TypeCheckTask in the incremental builder:

https://github.com/Microsoft/visualfsharp/blob/631401a1cc7c487a6b27486242ac7df3e77bd495/src/fsharp/service/IncrementalBuild.fs#L1378

Since in my scenario the large majority of these symbols are the same (in fact they would probably be the mostly the same in nearly every scenario), I wonder if there's an opportunity to cache them.

The text was updated successfully, but these errors were encountered:

dsyme · 2019-01-12T18:02:49Z

The way things are set up, we have to save some information after checking a file - and the information we save is a dump of all the symbol resolutions. For a medium-sized file I'd imagine this can be 400K no problem. The information is not particularly long lived - the next check of the file will make it irrelevant and the information will be collected. I think this architecture is OK - if we were trying to reduce the amount of re-computed information we would do that by making all of checking more incremental from parsing through to type-checking.

SO it's not a priori wrong to allocate this information, even in one linear array (note that pre-indexing the information or attempting to compress it is a waste of time since it is very often discarded). It does however feel odd if the CLR is somehow assuming it is long-lived.

Is the CLR applying the (in this case false) heuristic that all LOH objects are long-lived? If so I suppose we could artificially chunk the array though it feels, well, artificial. What's the minimum size on the LOH?

[ Aside: TBH much of this memory work would feel easier if the CLR just had a intrinsic allocation method that said "this object is big but transient", or magically worked that out. From a GC perspective there should be no real problem with allocating short-lived big objects apart from the data-copying costs involved, as the data should just evaporate immediately on next collection. ]

dsyme · 2019-01-12T18:07:20Z

BTW this is the definition of TcSymbolUseData:

[<Struct>]
type TcSymbolUseData = 
   { Item: Item
     ItemOccurence: ItemOccurence
     DisplayEnv: DisplayEnv
     Range: range }

Looking at this

someone might want to turn ItemOccurence into a struct.

For the others:

DisplayEnv is generally a pointer to a shared object, not much to do there without too much work (well, ok, you could make it a uint16 indexing into an array of DisplayEnv)
Item is a pointer to more information about the item that is a big discriminated union that's going to be hard to reduce
Range is a struct already, and has recently got a tad bigger, not much we can do about that

dsyme · 2019-01-12T18:09:37Z

To answer my question:

'Big' objects go here – as the size at which an object may end up on this heap is 85,000 bytes, this usually means arrays with more than about 20,000 entries. https://www.red-gate.com/simple-talk/dotnet/net-framework/the-dangers-of-the-large-object-heap/

So we should chunk this thing I guess. Ugh how artificial. Fighting the memory manager is a PITA.

dsyme · 2019-01-12T18:12:38Z

If someone wants to look at this, it should be fairly easy to chunk the array allocated here based on sizeof<TcSymbolUseData> and a heuristic about LOH objects, it's pretty well encapsulated. The member AllUsesOfAllSymbols is only ever iterated, so you simply need to change consumers to iterate an array-of-arrays

type TcSymbolUses(g, capturedNameResolutions : ResizeArray<CapturedNameResolution>, formatSpecifierLocations: (range * int)[]) = 
    
    // Make sure we only capture the information we really need to report symbol uses
    let allUsesOfSymbols = [| for cnr in capturedNameResolutions -> { Item=cnr.Item; ItemOccurence=cnr.ItemOccurence; DisplayEnv=cnr.DisplayEnv; Range=cnr.Range } |]
    let capturedNameResolutions = () 
    do ignore capturedNameResolutions // don't capture this!

    member this.GetUsesOfSymbol(item) = 
        [| for symbolUse in allUsesOfSymbols do
               if protectAssemblyExploration false (fun () -> ItemsAreEffectivelyEqual g item symbolUse.Item) then
                  yield symbolUse |]

    member this.AllUsesOfSymbols = allUsesOfSymbols

    member this.GetFormatSpecifierLocationsAndArity() = formatSpecifierLocations

dsyme · 2019-01-12T18:18:35Z

(I must add: it is great to see this LOH-analysis work beginning to hone in on the actual data we need to save, rather than data that should never have been allocated/copied in the first place :) )

baronfel · 2019-01-12T18:22:45Z

I might be off-base here, but the TcSymbolUseData[] is created from the capturedNameResolutions parameter, which is itself backed by an array at least capturedNameResolutions.Length long, right? If so, this means that the backing array for the list should be on the LOH as well, so we may need to go a step further back and chunk the way those name resolutions are generated in addition.

dsyme · 2019-01-12T18:25:06Z

@baronfel Yes, that's correct (except that CapturedNameResolution is not a large struct so there will be less on the LOH).

baronfel · 2019-01-12T18:27:24Z

Ah, I see the caveat there now. Because CapturedNameResolution contains less data we can have more array elements before we hit the 85,000 byte limit, so we may not hit the LOH for that structure at all. Thanks for clarifying.

cartermp · 2019-01-12T18:32:49Z

FYI there is excellent information in our docs about this: https://docs.microsoft.com/en-us/dotnet/standard/garbage-collection/large-object-heap

cartermp · 2019-01-13T04:04:15Z

SO it's not a priori wrong to allocate this information, even in one linear array (note that pre-indexing the information or attempting to compress it is a waste of time since it is very often discarded).

This feels like a bit of a design smell to me. If we need to allocate ~390k of data very often (in this case, every 3-4 seconds), then it implies something needs to be cached. I was only working with two files and typing in one, and only adding a single function. I don't see why that requires

cartermp · 2019-01-13T04:04:34Z

But relatively speaking this is far less pressing than other issues filed

dsyme · 2019-01-14T10:55:07Z

This feels like a bit of a design smell to me. If we need to allocate ~390k of data very often (in this case, every 3-4 seconds), then it implies something needs to be cached. I was only working with two files and typing in one, and only adding a single function. I don't see why that requires

Yes. The smell is really just a whiff of the much bigger stench of re-checking entire files very frequently even on small changes.

We could in theory pool the arrays, to avoid the reallocation. However that must be done with real care as the arrays contain references. And, putting aside the LOH, I'm certain that the costs of actually doing the re-checking are much higher than the cost of allocating and filling this specific data structure.

dsyme · 2019-01-14T10:55:32Z

FYI there is excellent information in our docs about this: https://docs.microsoft.com/en-us/dotnet/standard/garbage-collection/large-object-heap

Thanks, yes, that's great up-to-date info

* chunkify TcSymbolUseData * move LOH size out to a constant * do chunking and mapping together to reduce allocations * clarify comment around GC impacts * add comment informing others of the potential for LOH allocations

This reverts commit 7584974

This reverts commit 7584974.

cartermp added Tenet-Performance Area-LangService-API labels Jan 12, 2019

cartermp added this to the 16.0 milestone Jan 12, 2019

dsyme changed the title ~~TcSymbolUse[] accounting for 19.1 MB on the LOH~~ TcSymbolUseData[] accounting for 19.1 MB on the LOH Jan 12, 2019

baronfel mentioned this issue Jan 13, 2019

TcSymbolUseData cleanup per #6084 #6089

Merged

2 tasks

cartermp mentioned this issue Jan 14, 2019

VS 2019 GA tooling performance #6096

Closed

11 tasks

cartermp closed this as completed Jan 18, 2019

baronfel mentioned this issue Jan 18, 2019

Chunk large ResizeArrays in TcResultsSinkImpl so that they aren't promoted to the Large Object Heap #6127

Closed

auduchinok added a commit to auduchinok/fsharp that referenced this issue May 14, 2019

Revert "TcSymbolUseData cleanup per dotnet#6084 (dotnet#6089)"

8eef0dd

This reverts commit 7584974

auduchinok added a commit to auduchinok/fsharp that referenced this issue May 23, 2019

Revert "TcSymbolUseData cleanup per dotnet#6084 (dotnet#6089)"

88436d4

This reverts commit 7584974

auduchinok added a commit to auduchinok/fsharp that referenced this issue Sep 28, 2019

Revert TcSymbolUseData cleanup per dotnet#6084 (dotnet#6089)

47cf5af

auduchinok added a commit to auduchinok/fsharp that referenced this issue Nov 22, 2019

Revert TcSymbolUseData cleanup per dotnet#6084 (dotnet#6089)

66dbe21

auduchinok added a commit to auduchinok/fsharp that referenced this issue Feb 27, 2020

Revert TcSymbolUseData cleanup per dotnet#6084 (dotnet#6089)

12a3534

auduchinok added a commit to auduchinok/fsharp that referenced this issue Mar 3, 2020

Revert TcSymbolUseData cleanup per dotnet#6084 (dotnet#6089)

8380040

DedSec256 pushed a commit to DedSec256/fsharp that referenced this issue Apr 3, 2020

Revert TcSymbolUseData cleanup per dotnet#6084 (dotnet#6089)

ab428a6

auduchinok added a commit to auduchinok/fsharp that referenced this issue Apr 9, 2020

Revert TcSymbolUseData cleanup per dotnet#6084 (dotnet#6089)

a4f43d3

auduchinok added a commit to auduchinok/fsharp that referenced this issue Apr 27, 2020

Revert TcSymbolUseData cleanup per dotnet#6084 (dotnet#6089)

08bd449

auduchinok added a commit to auduchinok/fsharp that referenced this issue Jun 9, 2020

Revert TcSymbolUseData cleanup per dotnet#6084 (dotnet#6089)

39ccbd1

auduchinok added a commit to auduchinok/fsharp that referenced this issue Jul 20, 2020

Revert TcSymbolUseData cleanup per dotnet#6084 (dotnet#6089)

5da599d

auduchinok added a commit to auduchinok/fsharp that referenced this issue Jul 27, 2020

Revert TcSymbolUseData cleanup per dotnet#6084 (dotnet#6089)

5f3261c

auduchinok added a commit to auduchinok/fsharp that referenced this issue Oct 22, 2020

Revert TcSymbolUseData cleanup per dotnet#6084 (dotnet#6089)

a7b949b

auduchinok added a commit to auduchinok/fsharp that referenced this issue Oct 28, 2020

Revert TcSymbolUseData cleanup per dotnet#6084 (dotnet#6089)

4329997

auduchinok added a commit to auduchinok/fsharp that referenced this issue Dec 11, 2020

Revert TcSymbolUseData cleanup per dotnet#6084 (dotnet#6089)

6418d79

auduchinok added a commit to auduchinok/fsharp that referenced this issue Mar 5, 2021

Revert "TcSymbolUseData cleanup per dotnet#6084 (dotnet#6089)"

fd9382f

This reverts commit 7584974.

En3Tho pushed a commit to En3Tho/visualfsharp that referenced this issue Mar 5, 2021

Revert TcSymbolUseData cleanup per dotnet#6084 (dotnet#6089)

0433b58

En3Tho pushed a commit to En3Tho/visualfsharp that referenced this issue Mar 5, 2021

Revert TcSymbolUseData cleanup per dotnet#6084 (dotnet#6089)

2a314bd

auduchinok added a commit to auduchinok/fsharp that referenced this issue Apr 2, 2021

Revert "TcSymbolUseData cleanup per dotnet#6084 (dotnet#6089)"

ea32cd4

This reverts commit 7584974.

auduchinok added a commit to auduchinok/fsharp that referenced this issue May 17, 2021

Revert "TcSymbolUseData cleanup per dotnet#6084 (dotnet#6089)"

337ed77

This reverts commit 7584974.

auduchinok added a commit to auduchinok/fsharp that referenced this issue May 17, 2021

Revert "TcSymbolUseData cleanup per dotnet#6084 (dotnet#6089)"

f40bec4

This reverts commit 7584974.

auduchinok added a commit to auduchinok/fsharp that referenced this issue Jun 22, 2021

Revert "TcSymbolUseData cleanup per dotnet#6084 (dotnet#6089)"

e01567f

This reverts commit 7584974.

auduchinok added a commit to auduchinok/fsharp that referenced this issue Jul 12, 2021

Revert "TcSymbolUseData cleanup per dotnet#6084 (dotnet#6089)"

e481dd7

This reverts commit 7584974.

auduchinok added a commit to auduchinok/fsharp that referenced this issue Jul 19, 2021

Revert "TcSymbolUseData cleanup per dotnet#6084 (dotnet#6089)"

c58b072

This reverts commit 7584974.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

TcSymbolUseData[] accounting for 19.1 MB on the LOH #6084

TcSymbolUseData[] accounting for 19.1 MB on the LOH #6084

cartermp commented Jan 12, 2019 •

edited

Loading

dsyme commented Jan 12, 2019 •

edited

Loading

dsyme commented Jan 12, 2019

dsyme commented Jan 12, 2019

dsyme commented Jan 12, 2019 •

edited

Loading

dsyme commented Jan 12, 2019

baronfel commented Jan 12, 2019

dsyme commented Jan 12, 2019

baronfel commented Jan 12, 2019

cartermp commented Jan 12, 2019

cartermp commented Jan 13, 2019

cartermp commented Jan 13, 2019

dsyme commented Jan 14, 2019

dsyme commented Jan 14, 2019

TcSymbolUseData[] accounting for 19.1 MB on the LOH #6084

TcSymbolUseData[] accounting for 19.1 MB on the LOH #6084

Comments

cartermp commented Jan 12, 2019 • edited Loading

dsyme commented Jan 12, 2019 • edited Loading

dsyme commented Jan 12, 2019

dsyme commented Jan 12, 2019

dsyme commented Jan 12, 2019 • edited Loading

dsyme commented Jan 12, 2019

baronfel commented Jan 12, 2019

dsyme commented Jan 12, 2019

baronfel commented Jan 12, 2019

cartermp commented Jan 12, 2019

cartermp commented Jan 13, 2019

cartermp commented Jan 13, 2019

dsyme commented Jan 14, 2019

dsyme commented Jan 14, 2019

cartermp commented Jan 12, 2019 •

edited

Loading

dsyme commented Jan 12, 2019 •

edited

Loading

dsyme commented Jan 12, 2019 •

edited

Loading