Skip to content

Garbage Collection in Spidermonkey

Thomas Barber edited this page Mar 18, 2024 · 3 revisions

Introduction

This is a collection of thoughts on Spidermonkey Garbage Collection that I've encountered during Foxhound development. Most of it is gleaned from blog posts and staring at source code for hours, so may not be accurate! Comments and corrections are welcome!

Garbage Collection Background

Spidermonkey uses generational garbage collection to manage the memory of JavaScript objects.

Nursery vs. Tenured

JavaScript objects can exist in two places, the nursery (for short-lived objects) and tenured (for long-lived ones). Objects are typically created in the nursery (Nursery.cpp), before being moved into the tenured arena if they survive long enough. The idea is to have a smaller area which is regularly garbage collected in order to optimize performance.

During nursery garbage collection, JavaScript objects are traced (Tracing.cpp), starting from rooted objects which are accessible from the current context. Any accessible objects are then tenured and moved to the long-lived arena (Tenuring.cpp). The nursery is then swept and all previously allocated objects can be overwritten. Note that the nursery can also be poisoned so that the memory can't be reused (access will cause a crash), but this is only activated in debug mode.

Note: this means that Nursery objects do not have destructors or finalize methods called during this process. The assumption is that all memory is allocated in the Nursery itself, so any related objects will be freed as part of the GC.

String GC

Strings have some special rules applied to them. As Strings can be dependent (i.e. a substring pointing to another string) or ropes (i.e. pointers to left and right strings), some additional work is needed to trace them. In addition, care needs to be taken when moving a string whose dependents still exist in the nursery.

During the tenuring process, Strings are also sometimes atomized (i.e. replaced with a pointer to a static string) or de-duplicated (i.e. replaced with a pointer to an existing string).

This has been explicitly disabled for tainted strings to avoid losing the taint information.

External Links

Some links I found helpful:

StringTaint Collection

StringTaint objects (Taint.h) contain a pointer to a vector of TaintRanges. This pointer is created by a plain old new and destroyed with a delete via the clear() method. The clear() method calls have been added to the finalize() StringType methods, ensuring that tenured Strings will also free the TaintRange pointers.

However, the finalize methods are not called during nursery sweeping, risking memory leaks. To get around this, Foxhound keeps a separate list of all Strings created in the nursery. During sweeping, the list is iterated, and for all Strings not moved to tenured area, the clear method is called. This is important as any dangling pointers left over can cause hard-to-debug crashes.

Gotchas

There are a few things to look out for when dealing with JSString object creation.

For example, consider creation of a new String object with a buffer containing the string characters. On creation of the string, Spidermonkey will check whether there is enough memory available in the nursery. If the nursery is full and there is not sufficient space, a GC can be triggered, moving objects to the nursery and sweeping all others. If GC has been disabled, or insufficient space is freed, the string is created directly in the tenured arena.

This means that any non-rooted Strings (also Strings created in the same function) are in danger of being freed and therefore no longer valid after creating a new String. This happened recently in https://github.com/SAP/project-foxhound/blob/b4da37289b6daae0713f419eb796320ae1b077f3/js/src/vm/JSONParser.cpp#L737

In more detail, here is the (broken) code snippet with some comments to explain what is happening:

template <typename CharT>
template <JSONStringType ST, typename ParserT>
inline bool JSONFullParseHandler<CharT>::setStringValue(
    StringBuilder& builder, const ParserT* parser) {
  JSString* str;
  if constexpr (ST == JSONStringType::PropertyName) {
    str = builder.buffer.finishAtom();
  } else {
    // Here we create a new String, which can be in the nursery (depending on gcHeap).
    str = builder.buffer.finishString(gcHeap);
  }
  if (!str) {
    return false;
  }

  // TaintFox: Add taint operation.
  if (str->taint().hasTaint()) {
    // CurrentJsonPath also creates a string from the JSON parsing context.
    // This can also be in the nursery, and if the nursery is full, can lead to a GC
    // In the case of a GC, str will be swept as it is not rooted!
    JSString* jsonPath = parser ? parser->CurrentJsonPath() : nullptr;
    TaintOperation op = jsonPath ?
      TaintOperationFromContextJSString(cx, "JSON.parse", true, jsonPath) :
      TaintOperationFromContext(cx, "JSON.parse", true);
    // Now str will no longer be valid if a GC occured.
    str->taint().extend(op);
  }

  v = JS::StringValue(str);
  return true;
}

In order to counter this:

  • Create rooted strings (using the RootedString object or Rooted<JSString*>). Rooting will prevent the GC sweeping these objects.
  • Avoid creating new strings when a unrooted JSString has been created (as was the case above).