Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

iseq recycled while still in use #49

Closed
wks opened this issue Jan 24, 2024 · 4 comments · Fixed by #50
Closed

iseq recycled while still in use #49

wks opened this issue Jan 24, 2024 · 4 comments · Fixed by #50

Comments

@wks
Copy link
Collaborator

wks commented Jan 24, 2024

The Ruby binding tests failed in mmtk/mmtk-core#1073. The error doesn't seem to be related to that PR.

It is reproducible locally. The segmentation fault happens in ImmixSpace::mark_line because it attempted to access unmapped address. It is because the object size encoded in the hidden field added by the mmtk-ruby binding is overwritten to a very large number. This means it is a dangling reference bug.

This happens in the very early stage of execution. It can be reproduced by executing an empty script (or a "hello world" program of course). One way to reproduce it is:

MMTK_STRESS_FACTOR=1048576 MMTK_THREADS=1 ./miniruby --mmtk -e ""

It is easier to reproduce when there are fewer GC threads. 2 is the best. 1 is still able to reproduce it.

Assertion fails because a traced object is already dead. The current evidence shows it is an iseq created during builtin_iseq_load, but is dead (but still referenced) in rb_iseq_eval.

@wks
Copy link
Collaborator Author

wks commented Jan 25, 2024

It is related to how Ruby compiles AST nodes.

rb_iseq_compile_node(rb_iseq_t *iseq, const NODE *node)
{
    DECL_ANCHOR(ret); // Linked list, not heap object. `ret` is a LINK_ANCHOR
    INIT_ANCHOR(ret);

    // ...
        case ISEQ_TYPE_PLAIN:
            CHECK(COMPILE(ret, "ensure", node)); // `Calls iseq_compile_each`
    // ...

    // If GC is triggered before returning, things in `ret` will be collected.
    
    return iseq_setup(iseq, ret); // Move things from `ret` into `iseq`.
}

static int
iseq_compile_each(rb_iseq_t *iseq, LINK_ANCHOR *ret, const NODE *node, int popped)
{
    // ...
    return iseq_compile_each0(iseq, ret, node, popped);
}

static int
iseq_compile_each0(rb_iseq_t *iseq, LINK_ANCHOR *const ret, const NODE *const node, int popped)
{
    // ... many lines omitted
      case NODE_DEFS:{
        // ...

        // Create an iseq
        const rb_iseq_t * singleton_method_iseq = NEW_ISEQ(RNODE_DEFS(node)->nd_defn,
                                                           rb_id2str(mid),
                                                           ISEQ_TYPE_METHOD, line);

        // ...
        // Add it into the linked list `ret`.
        ADD_INSN2(ret, node, definesmethod, ID2SYM(mid), singleton_method_iseq);
        // The following write barrier behaves as if `singleton_method_iseq` were a field of `iseq`,
        // but the `ADD_INSN2` above actually writes `singleton_method_iseq` into a linked list.
        // `singleton_method_iseq` is not yet reachable from `iseq`.
        RB_OBJ_WRITTEN(iseq, Qundef, (VALUE)singleton_method_iseq);

        // ...
        break;
      }
    // ... many lines omitted
}

The function rb_iseq_compile_node creates a linked list with the head node on the stack in the variable ret. It then visits the node and compile it into instructions. During this time, instructions are added to the linked list ret. Some instruction instances have other iseq as children.

However, iseq is a heap object allocated in the MMTk heap. During compilation, other heap objects, such as arrays, will be created, too, and will trigger GC. If GC is triggered after an iseq is added to the local linked list, but before the call of iseq_setup(iseq, ret); in the end of rb_iseq_compile_node, the linked list will not be considered part of the root due to the way how conservative stack scanning works. Then the child iseq will be recycled while it will still be used later.

@wks
Copy link
Collaborator Author

wks commented Jan 25, 2024

This problem does not exist in vanilla CRuby. While rb_iseq_compile_node is working on an iseq, the iseq maintains a pointer to an "arena" where its INSN members are allocated.

rb_iseq_t *
rb_iseq_new_with_opt(...)
{
    // ...
    prepare_iseq_build(iseq, name, path, realpath, first_lineno, node ? &node->nd_loc : NULL, node ? nd_node_id(node) : -1,
                       parent, isolated_depth, type, script_lines, option);

    rb_iseq_compile_node(iseq, node);
    finish_iseq_build(iseq);
    // ...
}

static VALUE
prepare_iseq_build(rb_iseq_t *iseq, ...)
{
    // ...
    ISEQ_COMPILE_DATA(iseq)->insn.storage_head = ISEQ_COMPILE_DATA(iseq)->insn.storage_current = new_arena();
    // ...
}

The "arena" maintains contiguous regions allocated using xmalloc, allowing the user to do cheap bump-pointer allocation into it. When any INSN instance is allocated in rb_iseq_compile_node, it is allocated from ISEQ_COMPILE_DATA(iseq)->insn.storage_head.

If GC is triggered, iseq will be scanned. When scanning an iseq, it will mark and pin the operands of any INSN in the ISEQ_COMPILE_DATA(iseq)->insn.storage_head.

void
rb_iseq_mark_and_move(rb_iseq_t *iseq, bool reference_updating)
{
    // ...
    else if (FL_TEST_RAW((VALUE)iseq, ISEQ_USE_COMPILE_DATA)) {
        const struct iseq_compile_data *const compile_data = ISEQ_COMPILE_DATA(iseq);

        if (!reference_updating) {
            /* The operands in each instruction needs to be pinned because
             * if auto-compaction runs in iseq_set_sequence, then the objects
             * could exist on the generated_iseq buffer, which would not be
             * reference updated which can lead to T_MOVED (and subsequently
             * T_NONE) objects on the iseq. */
            rb_iseq_mark_and_pin_insn_storage(compile_data->insn.storage_head); // THIS!
        }

        rb_gc_mark_and_move((VALUE *)&compile_data->err_info);
        rb_gc_mark_and_move((VALUE *)&compile_data->catch_table_ary);
    }
    // ...
}

It turned out that all INSN instances have the same size (sizeof(INSN)). So it simply does linear scanning on the associated arena compile_data->insn.storage_head. By doing so, all operands, including child iseqs like the singleton_method_iseq in the previous post, are marked and pinned.

But this also means iseq is now a potential pinning parent (PPP) when it uses "compile data" (ISEQ_USE_COMPILE_DATA).

Unfortunately, mmtk-ruby doesn't handle this properly. Currently iseq is not recognized as a PPP. Actually, iseq wasn't a PPP, until two months ago when an upstream change (ruby/ruby@d169161) made it PPP again. When running mmtk-ruby, there is a chance that GC is triggered during rb_iseq_compile_node. Using Immix, if an object is reached from a mutable slot first, the object may be moved. This includes the operands of any INSN in the arena. If the same object is reached again from rb_iseq_mark_and_pin_insn_storage, it will not be able to pin the object. As a consequence, iseq now hold dangling pointers to moved objects, just like the comment after the line if (!reference_updating) { says.

The obvious solution to this problem is making iseq PPP again, but I want to minimize the number of PPPs. One optimization is removing the iseq from the set of PPPs once its ISEQ_USE_COMPILE_DATA flag is cleared.

An alternative solution is taking advantage of the last-in-first-out nature of the construction of iseq. We can maintain a per-thread list of live arenas, and they will be marked as non-moving roots. This should be cheaper than maintaining the set of PPPs which is backed by a HashSet.

But a more interesting question should be why does CRuby allocate INSN instances in the arena instead of the MMTk heap. The Immix allocator is very efficient.

@wks
Copy link
Collaborator Author

wks commented Jan 26, 2024

I tried to maintain a stack of iseq. I push an iseq in ISEQ_COMPILE_DATA_ALLOC, and pop an iseq in ISEQ_COMPILE_DATA_CLEAR. However, they don't always pair perfectly. The problem is, after one iseq is pushed, exceptions may be thrown. For example, a sub-expression may have syntax error and will raise the SyntaxError exception, like in the following test case in test_eval.rb:

     $stderr = STDOUT
     5000.times do
       begin
         eval "0 rescue break"
       rescue SyntaxError
       end
     end

If exception is thrown, dead iseqs will remain on the stack. I'll try to remove elements on the top when popping if the top elements don't match the expected iseq.

wks added a commit to wks/mmtk-ruby that referenced this issue Jan 29, 2024
A commit in the `ruby` repo now treats iseq as PPP again, and fixes a
random failure in CI.

Fixes: mmtk#49
@wks
Copy link
Collaborator Author

wks commented Jan 29, 2024

mmtk/ruby@68a6bbf is a one-line workaround to treat iseq as PPP again. But we need to look into the impact of treating iseq as PPP. iseq tends to be long-living. We may remove iseq instances from the set of PPPs once they lose their ISEQ_USE_COMPILE_DATA flags.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant