Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Semi-circular references #1963

Closed
grgustaf opened this issue Aug 12, 2017 · 20 comments
Closed

Semi-circular references #1963

grgustaf opened this issue Aug 12, 2017 · 20 comments

Comments

@grgustaf
Copy link
Contributor

I ran into a problem today where an object wasn't getting freed. It turned out this was because its native pointer struct held acquired references to "listener" functions which in turn were bound to a scope object that held the original object as a property. I had to dig deeply into the GC today to learn about these things; before I had assumed that once refcounts went to 0, it would be freed. But there are essentially these "weak references" of 10 different types I see in gc_mark().

Anyway, I had been expecting when the reference count went to zero I would get my type_info's free_cb() called and that was where I was releasing these listener functions. It seems to me now that for this kind of case, there should perhaps be release_cb() in the type_info as well. I'm thinking when my refcount drops to zero I could probably release my own references like this. But maybe I'm wrong, it's a bit confusing.

I could use advice if there's another way to solve this. Previously we had implemented a "hidden property" to store an array of listeners; so these were held by "property reference" and disappeared. But this hidden property implementation was also something of a hack of JerryScript and a clever JS user could maybe have accessed it. So another good way to solve this might be for JrS to provide a way to define properties on objects that are hidden to JS and visible only from C APIs. I wonder if others have desired these same things?

@zherczeg
Copy link
Member

zherczeg commented Aug 12, 2017

What about storing an object in the structure pointed by the private property? I am not sure I understand the circular part.

@martijnthe
Copy link
Contributor

martijnthe commented Aug 14, 2017

Hmm, I think I follow, but just to summarize:

  1. You've a JS object "O" that is "backed" by a native struct.
  2. The object has a setter which is implemented with a native handler.
  3. The setter is used to set a function object (callback).
  4. The native handler acquires the value (function) that is being set (the refcount is +1'd) and stores the jerry_value_t in the native struct.
  5. The function that is being set (indirectly) references the JS object "O".
  6. At some point, there are no references any more from the JS code to "O" nor to the function. Yet, the native side does not become aware that JS no longer references them and the function is still "acquired", therefore the GC does not free the JS object, function and native pointer.

I've ran into this issue before too. I don't have a great answer unfortunately.
I think what's missing is an API to express that one JS object holds/releases a reference to another JS object and where C would hold a "weakly referenced" jerry_value_t to the second JS object. This weak reference would have to be cleared in the native pointer "free callback" of the first JS object, to avoid dangling references.

jerry_acquire_value () and jerry_release_value () express that "something in C land" acquires/releases a JS object (strong reference).

A "work-around" that we are using is to move the ownership to JS entirely. It's possible to make the internal variables inaccessible by using closures, for example:

function MyConstructor() {
  function AddMyListener() {
    var myListenerCb;
    this.setMyListener = function(listener) {
      myListenerCb = listener;
    }
    // native binding will call dispatchEventToMyListener(e) whenever it needs to,
    // but it does not call jerry_acquire_value () on anything.
    this.dispatchEventToMyListener = function(e) {
      myListenerCb(e);
    }
  }
  nativeConstructor.call(this);
  AddMyListener.call(this);
}

@jiangzidong
Copy link
Contributor

@martijnthe about your "weak reference", I think it is quite similar with @grgustaf 's "hidden property", right? Only native code can access the "properties", and GC will mark those property as "visited".

@martijnthe
Copy link
Contributor

Sure, I just tried to distill the problem to the essence.

Having "hidden properties" sounds like you would have to do property look-ups, which would be much more expensive compared to having a direct but weak jerry_value_t in the C struct that backs the JS object.

@zherczeg
Copy link
Member

Ok, so the problem is: there is a circular reference between objects, and this circular reference is caused by native handlers. Since native handlers can only create strong references, the engine keeps these objects alive.

Perhaps weak references could be created by not referencing these objects in the native side. As long as they are used, they will be kept alive. When destroyed, their native free callback can drop these weak references. When the object is temporarily used for something (e.g. setting its property), just acquire / release a strong reference to it.

The problem with hidden properties is that names can collide. E.g. if your O object has a private and public .x member, the engine will not be able to tell the difference.

@grgustaf
Copy link
Contributor Author

Yes, you've all understood the issue exactly.

@zherczeg if I'm understanding your last comment, I don't think it works. The object in question represents a network connection. So JS signs up for various callbacks, and then forgets about the object. The callbacks will still be called - the case I'm trying to handle is when the network connection is closed; I have no way to free things up. I'll experiment with manually releasing all the references at the time the connection is closed.

Perhaps weak references could be created by not referencing these objects in the native side.

I can't just not acquire those callback functions, because I'm the only one who holds a reference to them - they would simply be freed if I didn't acquire them.

@zherczeg
Copy link
Member

This does not sound difficult. The C struct of the N network connection has a reference counter, and each native callback increases the reference count. It also has a status member which tells whether the connection is closed. Callbacks do different things when the connection is closed. And you only free things up when the reference counter reaches zero. Basically you don't create a reference to a JS object, instead the private C struct has a reference counter.

@grgustaf
Copy link
Contributor Author

They're not native callbacks, they're JS callbacks - an anonymous function only passed to my setter. So there are no other references to it but one that I make in C. If I don't acquire it, it will be immediately freed the next time the GC runs.

It does appear I can solve my dilemma by releasing the callback(s) when the connection is closed.

But it seems like it would be nice for me if I were able to make a weak reference to the JS callback instead, so that once my connection object drops to zero, they would both get freed together. I guess a weak reference requires directionality "obj A is referred to by obj B", while strong references are global "obj A is referred to by someone". So it would probably be something like jerry_weak_acquire_value(obj_a, obj_b). And then in ecma_gc_mark for obj_a, obj_b would get marked.

Basically, this would give the API user the same kind of flexibility you have internally. But I guess it's just nice to have; I'm not sure if some situation truly requires it.

@zherczeg
Copy link
Member

I can imagine a jerryx extension for this purpose. These references could be stored in a shadow JS object, which is not accessible from JS side (only C side). A new private property type could refer to such shadow object. These objects are also marked by GC.

@jiangzidong
Copy link
Contributor

@zherczeg I dont think jerryx can handle this issue, because we need to change the

  1. ecma-object struct by adding a new field called weak_ref or hidden_property, which is a list of pointers to other ecma-object
  2. add api like jerry_weak_acquire/release(obj_a, name_b, obj_b) or jerry_set/get/remove_hidden_property(obj_a, name_b, obj_b) to add the target b into the above list. This operation will NOT cause any change of the ref count.
  3. in GC mark, when obj_a is marked, we should mark all objs in its weak_ref list.
    All above 3 things need to change the jerry-core

@grgustaf is this what you want?

@martijnthe
Copy link
Contributor

I'm not sure if name_b is really necessary.

Often you already have a C struct "backing" the JS object which has a jerry_value_t obj_b, for example in @grgustaf 's example I think it's the jerry_value_t connect_listener here: https://github.com/01org/zephyr.js/blob/master/src/zjs_net.c#L109

I guess you could remove it and only use get/set by string name (i.e. jerry_set/get/remove_hidden_property(obj_a, name_b, obj_b)) but it feels a bit wasteful to me (property lookups in general are of course much slower than accessing a field in a C struct and you need to create Jerry strings all the time / keep them around).

in GC mark, when obj_a is marked, we should mark all objs in its weak_ref list.
All above 3 things need to change the jerry-core

How about the other way around? If obj_b is marked, will obj_a also get marked?

jerry_weak_acquire/release: probably need a better name :) couple/decouple? associate/deassociate?

@jiangzidong
Copy link
Contributor

I guess you could remove it and only use get/set by string name

Yes, that is exactly what I thought. In this way, we provide a choice that users dont need to track the obj_b in native code. For the name string, we could allow it be undefined, if users decide to track obj_b in native. But that is just my preference.

How about the other way around? If obj_b is marked, will obj_a also get marked?

No. to GC, obj_b is obj_a's property.

@zherczeg
Copy link
Member

A GC change increase the binary size even for those, who don't want to use this feature. As usual, I would prefer a very simple API in core, and something user friendly in jerryx.

This would be my proposal:

core:
jerry_set_referenced_object(object_a, object_b); /* if object_b is undefined, the internal property is deleted */
jerry_get_referenced_object(object_a); /* undefined or object */

When object_a is marked, object_b (and its properties) are also marked. From implementation perspective this would be another private magic string.

jerryx:
You can create a nice interface with names, etc.

@jiangzidong
Copy link
Contributor

When object_a is marked, object_b (and its properties) are also marked.

How can it be implemented without changing the gc?

@zherczeg
Copy link
Member

zherczeg commented Aug 16, 2017

The gc will be changed. But the change will be small (the binary growth should be very minimal).

@grgustaf
Copy link
Contributor Author

Naming the objects holder and held or some similar word pair might make things clearer. (Other brainstorms I liked even less: user/used knower/known referrer/referent.)

As an alternative, I do still wonder if it might make sense to notify when refcount drops to zero, with a second callback in the type_info. If I was notified when refcount went to zero, I could free up any hard references I hold and break these circular dependencies that way. Which of these models is easier for the programmer to grasp? The callback could be called "release_references_cb" perhaps... the time for me to stop referring to any other JS objects on behalf of this one. Is this more or less overhead to implement?

@jiangzidong
Copy link
Contributor

@grgustaf I doubt the refcount zero callback can resolve the solution. For example, see #1963 (comment) case, the object "O"'s refconunt is always 0 after its being created.

@akosthekiss
Copy link
Member

Pinging this issue. The discussion seems to have died away. Is that because a workaround was found and used, or a PR has landed that solved the issue (which I'm not aware of)? So, is this still a problem?

@martijnthe
Copy link
Contributor

Hi @akosthekiss ! The workaround I used involved using the context manager API. In the deinit_cb we would release the references (for example, like in @grgustaf's use case, to listener callbacks). The deinit_cb gets called before jmem_finalize, thus solving the problem.

@akosthekiss
Copy link
Member

Hi @martijnthe ! Thanks for the feedback! I'm going to close this issue then as this workaround seems to be the solution. Let's have another issue opened if and when this is not enough, and another/better approach is needed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants