Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ideas for improving granularity of garbage collection (gc) #749

Open
ehildenb opened this issue Apr 27, 2023 · 5 comments
Open

Ideas for improving granularity of garbage collection (gc) #749

ehildenb opened this issue Apr 27, 2023 · 5 comments

Comments

@ehildenb
Copy link
Member

We currently only can do garbage collection at the granularity of rewrite steps. It would be nice to be able to do it more frequently, to handle large function evaluations where it's not working with very much live memory.

@theo25
Copy link
Collaborator

theo25 commented May 4, 2023

Some work has been done towards this goal in the new-gc-master-merge branch. Specifically, PR #426 contains the bulk of the work.

Here I will summarize the issues we have encountered so far.

  • In order to run a garbage collection cycle during the evaluation of a K function, we need to be able to unwind the call stack (e.g. using libunwind) so that we can collect the live pointers of each function in the call stack as garbage collection roots. There was a bug in LLVM that would not allow libunwind APIs to correctly unwind the call stack in presence of tail call optimized functions. We have submitted an upstream fix and currently waiting for it to become part of an LLVM release.
  • The current implementation infers the K sort category (Map, Set, List, Symbol, ect) of the object pointed by a live pointer on the stack through the LLVM pointer type. This was proven to be unreliable and moreover LLVM pointer types have been removed from the IR since LLVM 16. Therefore we need a better way to link live pointers on the stack with their appropriate K sort category of the pointed object. We decided to use different address spaces for the various sort categories and hence have the LLVM backend code generator attach said address spaces to the relevant instructions in the generated LLVM IR. However, there is an issue with that approach: Assume the case where a pointer points to an object of sort category Symbol and said object contains a child object of sort category Map. Then, a pointer to the map child is computed by a getelementptr instruction on the symbol pointer and passed as an argument to a function call. If that function is part of the call stack when a GC run starts, the second pointer becomes a live pointer and its address space should correspond to the sort category Symbol, because the pointer to the root of the allocation was a symbol pointer. Assume also, that the same function is called elsewhere with a map pointer where the root of the allocation was indeed an object of sort category Map, hence its address space should correspond to the sort category Map. This means that said function could be passed pointers of two different address spaces for the same argument. This of course would require two different function signatures in order to avoid type checking errors.
  • As part of the above solution, we use an LLVM pass, RewriteStatepointsForGC, that adds GC safepoints to the IR in all locations where it is safe for the runtime to perform a garbage collection run. These safepoints include information for all live pointers that may be relocated by the garbage collector. The pass is parameterized by a GCStrategy object which helps the pass to figure out which pointers are valid targets for relocation by GC. See the documentation here for more details. In our scenario, we would need to create a custom GCStrategy subclass that indicates that all pointers with address spaces other than 0 are valid targets for GC relocation. LLVM allows dynamic loading of such a custom GCStrategy.

@ehildenb
Copy link
Member Author

Can we make a benchmark definition to test this? one example would be a function which incremented to 1000000, and then just returned true. The intermediate values would not be GC'ed, but they could be. Then can we post a graph of the peak memory usage here, ideally it would remain basically flat, but if we're only collecting at rewrite steps, it will go up a bunch, then drop.

@theo25
Copy link
Collaborator

theo25 commented May 18, 2023

Using the following benchmark (here is the corresponding KORE for reference: count.kore.txt)

module COUNT-SYNTAX
  imports BOOL
  imports INT
  syntax Bool ::= count(Int) [function]
endmodule

module COUNT
  imports COUNT-SYNTAX

  rule count(I) => count(I -Int 1) requires I =/=Int 0
  rule count(I) => true requires I ==Int 0
endmodule

if we call count for a large number

count(1000000000)

we notice that the memory usage increases with time until the process finishes (in the graph, x-axis is seconds and y-axis is bytes in the left and percentage of total memory used in the right)

Image

This is because the current GC cannot do collections in the middle of function execution and therefore the useless memory objects that contain integer values from previous iterations linger in memory until the function terminates.

@Baltoli
Copy link
Contributor

Baltoli commented Dec 7, 2023

@theo25 to refresh his memory on what the status of this is.

@Baltoli
Copy link
Contributor

Baltoli commented Dec 19, 2023

Theo is working on rebasing the testing branch against the current version of the backend; it's drifted some way out of sync. Outstanding issues left to fix:

  • Inference of object type from pointers; in the current branch the static type is used (as in the pre-opaque pointer world). The assumptions even in that world are incorrect. The prototype solution is to associate sort categories with address spaces; the issues with that are listed in full above.
  • Worth noting that the issues only appear on larger examples; the address space version doesn't break on the integration tests.

Could we redesign the object representation to carry sort categories around?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants