Ideas for improving granularity of garbage collection (gc) #749

ehildenb · 2023-04-27T15:38:51Z

We currently only can do garbage collection at the granularity of rewrite steps. It would be nice to be able to do it more frequently, to handle large function evaluations where it's not working with very much live memory.

theo25 · 2023-05-04T15:12:14Z

Some work has been done towards this goal in the new-gc-master-merge branch. Specifically, PR #426 contains the bulk of the work.

Here I will summarize the issues we have encountered so far.

In order to run a garbage collection cycle during the evaluation of a K function, we need to be able to unwind the call stack (e.g. using libunwind) so that we can collect the live pointers of each function in the call stack as garbage collection roots. There was a bug in LLVM that would not allow libunwind APIs to correctly unwind the call stack in presence of tail call optimized functions. We have submitted an upstream fix and currently waiting for it to become part of an LLVM release.
The current implementation infers the K sort category (Map, Set, List, Symbol, ect) of the object pointed by a live pointer on the stack through the LLVM pointer type. This was proven to be unreliable and moreover LLVM pointer types have been removed from the IR since LLVM 16. Therefore we need a better way to link live pointers on the stack with their appropriate K sort category of the pointed object. We decided to use different address spaces for the various sort categories and hence have the LLVM backend code generator attach said address spaces to the relevant instructions in the generated LLVM IR. However, there is an issue with that approach: Assume the case where a pointer points to an object of sort category Symbol and said object contains a child object of sort category Map. Then, a pointer to the map child is computed by a getelementptr instruction on the symbol pointer and passed as an argument to a function call. If that function is part of the call stack when a GC run starts, the second pointer becomes a live pointer and its address space should correspond to the sort category Symbol, because the pointer to the root of the allocation was a symbol pointer. Assume also, that the same function is called elsewhere with a map pointer where the root of the allocation was indeed an object of sort category Map, hence its address space should correspond to the sort category Map. This means that said function could be passed pointers of two different address spaces for the same argument. This of course would require two different function signatures in order to avoid type checking errors.
As part of the above solution, we use an LLVM pass, RewriteStatepointsForGC, that adds GC safepoints to the IR in all locations where it is safe for the runtime to perform a garbage collection run. These safepoints include information for all live pointers that may be relocated by the garbage collector. The pass is parameterized by a GCStrategy object which helps the pass to figure out which pointers are valid targets for relocation by GC. See the documentation here for more details. In our scenario, we would need to create a custom GCStrategy subclass that indicates that all pointers with address spaces other than 0 are valid targets for GC relocation. LLVM allows dynamic loading of such a custom GCStrategy.

ehildenb · 2023-05-11T15:41:42Z

Can we make a benchmark definition to test this? one example would be a function which incremented to 1000000, and then just returned true. The intermediate values would not be GC'ed, but they could be. Then can we post a graph of the peak memory usage here, ideally it would remain basically flat, but if we're only collecting at rewrite steps, it will go up a bunch, then drop.

theo25 · 2023-05-18T00:31:26Z

Using the following benchmark (here is the corresponding KORE for reference: count.kore.txt)

module COUNT-SYNTAX
  imports BOOL
  imports INT
  syntax Bool ::= count(Int) [function]
endmodule

module COUNT
  imports COUNT-SYNTAX

  rule count(I) => count(I -Int 1) requires I =/=Int 0
  rule count(I) => true requires I ==Int 0
endmodule

if we call count for a large number

count(1000000000)

we notice that the memory usage increases with time until the process finishes (in the graph, x-axis is seconds and y-axis is bytes in the left and percentage of total memory used in the right)

This is because the current GC cannot do collections in the middle of function execution and therefore the useless memory objects that contain integer values from previous iterations linger in memory until the function terminates.

Baltoli · 2023-12-07T16:43:58Z

@theo25 to refresh his memory on what the status of this is.

Baltoli · 2023-12-19T16:14:42Z

Theo is working on rebasing the testing branch against the current version of the backend; it's drifted some way out of sync. Outstanding issues left to fix:

Inference of object type from pointers; in the current branch the static type is used (as in the pre-opaque pointer world). The assumptions even in that world are incorrect. The prototype solution is to associate sort categories with address spaces; the issues with that are listed in full above.
Worth noting that the issues only appear on larger examples; the address space version doesn't break on the integration tests.

Could we redesign the object representation to carry sort categories around?

ehildenb assigned theo25 Apr 27, 2023

Baltoli unassigned theo25 May 18, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Ideas for improving granularity of garbage collection (gc) #749

Ideas for improving granularity of garbage collection (gc) #749

ehildenb commented Apr 27, 2023

theo25 commented May 4, 2023 •

edited

Loading

ehildenb commented May 11, 2023

theo25 commented May 18, 2023

Baltoli commented Dec 7, 2023

Baltoli commented Dec 19, 2023

Ideas for improving granularity of garbage collection (gc) #749

Ideas for improving granularity of garbage collection (gc) #749

Comments

ehildenb commented Apr 27, 2023

theo25 commented May 4, 2023 • edited Loading

ehildenb commented May 11, 2023

theo25 commented May 18, 2023

Baltoli commented Dec 7, 2023

Baltoli commented Dec 19, 2023

theo25 commented May 4, 2023 •

edited

Loading