Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix yet another memory leak caused by caching of coercion data #12313

Closed
simon-king-jena opened this issue Jan 15, 2012 · 386 comments
Closed

Fix yet another memory leak caused by caching of coercion data #12313

simon-king-jena opened this issue Jan 15, 2012 · 386 comments

Comments

@simon-king-jena
Copy link
Member

The following could not be fixed in #715:

sage: K = GF(1<<55,'t')
sage: for i in range(50):
....:     a = K.random_element()
....:     E = EllipticCurve(j=a)
....:     b = K.has_coerce_map_from(E)
....:     
sage: import gc
sage: gc.collect()
0

The problem is that any parent has a dictionary that stores any coerce map (and a different dictionary for conversion maps) ending at this parent. The keys are given by the domains of the maps. So, in the example above, the field K has an attribute that is a dictionary whose keys are the different elliptic curves.

In coercion, it is usually best to compare parents not by equality but by identity. Therefore, I suggest to implement a container called MonoDict that works similar to the new weak TripleDict (see #715), but takes a single item as a key.

First tests show that one can actually gain a lot of speed: MonoDict can access its items much faster than a usual dictionary.

Apply

Depends on #715
Depends on #11521
Depends on #12215
Depends on #13746
Depends on #13378
Depends on #13387

Dependencies: #715, #11521, #12215, #13746, #13378; merge with #13387

CC: @jpflori @vbraun @robertwb @alexanderdreyer @sagetrac-PolyBoRi @nthiery

Component: memleak

Keywords: coercion weak dictionary

Author: Simon King, Jean-Pierre Flori

Reviewer: Simon King, Jean-Pierre Flori, John Perry, Nils Bruin

Merged: sage-5.8.beta2

Issue created by migration from https://trac.sagemath.org/ticket/12313

@simon-king-jena
Copy link
Member Author

comment:1

Memory leaks are quite tricky!!

The post here is mainly a memo for myself, documenting what went wrong in the last couple of days.

I tried to directly modify my code from #715. The approach was: TripleDict or MonoDict should store the data in lists, with some list items being the address of the keys; and in a separate dictionary, a weakref.KeyedRef with a callback function is preserved.

And what happened when I replaced Parent._coerce_from_hash by a MonoDict? A memory leak was created by using weak references! Namely,

  • There is a strong reference to weakref.KeyedRef.
  • The weakref.KeyedRef has a strong reference to the callback function
  • The callback function is an instance of something that I'll call MonoDictEraser (similar to the TripleDictEraser from Parents probably not reclaimed due to too much caching #715)
  • The Eraser has a strong reference to the MonoDict
  • The MonoDict has a strong reference to the coerce map.
  • The coerce map has a strong reference to domain and codomain
  • Hence, domain and codomain can't be collected.

I think the best place to break the reference chain is from the Eraser to the MonoDict. Perhaps one could do the same thing at #715, rather than having a weak reference to the underlying set of an action?

@simon-king-jena
Copy link
Member Author

comment:2

Replying to @simon-king-jena:

Perhaps one could do the same thing at #715, rather than having a weak reference to the underlying set of an action?

Nope! There, the situation was different, since there is strong reference to TripleDict anyway (namely in the coercion model), so that a weak reference from the eraser won't suffice.

@simon-king-jena

This comment has been minimized.

@simon-king-jena
Copy link
Member Author

comment:4

With the attached patch (to be applied after #715), a new kind of weak dictionary is introduced, and used to cache coercions and conversions. All tests pass for me.

Concerning raw speed: Without the patch, we have

sage: RDF.coerce_map_from(ZZ)
Native morphism:
  From: Integer Ring
  To:   Real Double Field
sage: timeit("RDF.coerce_map_from(ZZ)",number=10^5)
100000 loops, best of 3: 518 ns per loop

With the patch, it is 466 ns, hence, one gains 10% in an operation that is very frequent.

The total speed seems to be fine as well, make ptest has not been slower than before.

Concerning memory leaks:

sage: import gc
sage: _ = gc.collect()
sage: K = GF(1<<55,'t')
sage: for i in range(50):
...     a = K.random_element()
...     E = EllipticCurve(j=a)
...     b = K.has_coerce_map_from(E)
sage: _ = gc.collect()
sage: len([x for x in gc.get_objects() if isinstance(x,type(E))])
1

That is a new doctest and tests against a memory leak that has not been fixed by #715.

@simon-king-jena
Copy link
Member Author

Author: Simon King

@simon-king-jena
Copy link
Member Author

comment:5

I have extended my patch by another coercion-related application of MonoDict.

I found that sage.categories.groupoid.Groupoid is often resulting in a memory leak. For example, by #715, polynomial rings are only weakly cached. But if the groupoid of a polynomial ring is created, then a strong reference on the polynomial ring is created, preventing it from being garbage collected. But a groupoid is created as soon as there is an action of that polynomial ring on another ring. Hence, a memory leak.

The new patch version is overriding the __classcall__ method of sage.categories.groupoid.Groupoid (that would normally be inherited from UniqueRepresentation) by a method that uses a MonoDict for caching. In addition, there is now only a weak reference to the underlying set of a groupoid. By consequence, if there is no other strong reference then Groupoid(P) can be garbage collected, but a strong reference to P prevents Groupoid(P) from being garbage collected.

Still needs review.

@jpflori
Copy link

jpflori commented Jan 24, 2012

comment:6

Could you add the second problem fixed into the ticket description ?

Maybe this should be splitted into two tickets to be more readable, or at least two patches.

The title is a little vague as well :)

Sorry for the delay for this and #715 and #11521 but this is now on my top priority (for Sage).

@jpflori
Copy link

jpflori commented Jan 24, 2012

comment:7

And what if a user directly use a Groupoid object ?

Will the underlying set be garbage collected as in #715 for Actions?

Another reason to split this ticket in two (once again :) )

@jpflori
Copy link

jpflori commented Jan 24, 2012

comment:8

To finish my thoughts for now, we could as well leave the tickets as they are (except for some more doc and easy fixing) and potentially implement the other behavior (where a choice is possible between strong and weak refs) in a later ticket.

The current behavior with the error raising (if it gets a description of how not to get garbage collected in addition to telling that it got garbage collected) seems viable enough to me (unless that many people do actually use Groupoids and Actions directly).

@simon-king-jena
Copy link
Member Author

comment:9

Perhaps you are right, and we could re-organise stuff. Namely, for now (that's to say: for #715 and for #12313) we could keep strong references from an action to the underlying set, and from a groupoid to the underlying set. The unfortunate consequence would be that not all memory leaks would be fixed. But one could create a new ticket where the idea of having weak references to the underlying set is tested.

And here is yet another idea, at least for the groupoid case. Let S be any object and G its groupoid. Ideally, S should keep G alive and G should keep S alive. Hence, it would be fine to have a strong reference from S to G and from G to S -- Python's garbage collection can deal with circular references.

However, there should be no strong references to S or G, unless the user explicitly provides it. In particular, when creating Groupoid(S), there should no external cache be used.

Potential solution: The cache for the Groupoid.__classcall__ should be in S! Perhaps like this:

class Groupoid(Category):
    @staticmethod
    def __classcall__(cls, S):
        try:
            return getattr(S, '_cached_'+cls.__name__)
        except AttributeError:
            pass
        instance = type.__call__(cls, S)
        assert(isinstance(instance,cls))
        if instance.__class__.__reduce__ == UniqueRepresentation.__reduce__:
            instance._reduction = (cls, args, options)
        try:
            setattr(S, '_cached_'+cls.__name__, instance)
            return
        except AttributeError:
            pass
        try:
            # Parents have a __cached_methods attribute, by #11115
            S.__cached_methods['_cached_'+cls.__name__] = instance
            return
        except AttributeError:
            pass
        raise TypeError, "%s instance must either allow attribute assignment or be instances of %s"(cls, Parent)

Note that the problem can hardly be solved in the same way for actions. They need to be stored externally, namely in the coercion model.

@simon-king-jena
Copy link
Member Author

Work Issues: Split ticket

@simon-king-jena
Copy link
Member Author

comment:10

I think it is best to follow your suggestion and split stuff.

Soon (perhaps tomorrow) I'll return to the old patch, i.e. without touching the groupoid stuff. In a to-be-created ticket, I'll try the idea for caching groupoids that avoids memory leaks but uses strong references, as explained in my previous post.

However, I think the weak references to the underlying set of actions is an integral part of #715 - without it, the memory leak would not be fixed at all; so, I'd not split #715.

@simon-king-jena
Copy link
Member Author

comment:11

Python's name mangling sucks.

Name mangling occurs when an attribute is requested whose name starts with double underscores and does not end with an underscore. Thus, it happens when S.__cached_method is requested in the code above.

However, for a reason that I don't understand, Python does not use the name of the class of S for mangling, but the name of cls -- for example, if S==ZZ then Python is looking for S._Groupoid__cached_methods, not S._<name of class of S>__cached_methods. In addition, since __cached_methods is a cpdef attribute, it should not be subject to name mangling at all.

I'll see if doing things in Cython helps.

@simon-king-jena
Copy link
Member Author

Weak coerce map dictionary with weakly referenced unique keys and optimised access speed

@simon-king-jena
Copy link
Member Author

Changed work issues from Split ticket to none

@simon-king-jena
Copy link
Member Author

comment:12

Attachment: trac12313_mono_dict.patch.gz

I have updated my patch, so that groupoids are not touched any more. Needs review, again.

Groupoids are now dealt with at #12357 (needs review), and I think the solution provided there is much better than the improper use of weak references.

@simon-king-jena
Copy link
Member Author

Work Issues: Find out why some attribute of a parent is deleted before the parent's elements are deleted

@simon-king-jena
Copy link
Member Author

comment:13

I am not totally sure whether the following problem is caused by this patch or by #715 or #11521. Anyway, with all three patches applied, I obtain:

sage: K.<z> = GF(4)
sage: P.<x> = K[]
sage: del P
sage: del x
sage: import gc
sage: gc.collect()
Exception AttributeError: 'PolynomialRing_field_with_category' object has no attribute '_modulus' in  ignored
148

As you can observe, some attribute error is ignored, but worse: Cython is unable to tell where it is ignored.

The cython-users helped me to track the problem down, using gdb.

The attribute error occurs in the following cdef method defined in sage/rings/polynomial/polynomial_zz_pex.pyx:

cdef cparent get_cparent(parent) except? NULL:
    if parent is None:
        return NULL
    cdef ntl_ZZ_pEContext_class c
    c = parent._modulus
    return &(c.x)

By the way it is defined, the error is propagated. However, get_cparent is in fact called in a deallocation method, namely in sage/rings/polynomial/polynomial_template.pxi:

    def __dealloc__(self):
        """
        EXAMPLE::

            sage: P.<x> = GF(2)[]
            sage: del x
        """
        celement_destruct(&self.x, get_cparent((<Polynomial_template>self)._parent))

That is interesting for two reasons: (1) Cython can not tell the location of the error, which I think is a bug in Cython. (2) The attribute _modulus of self._parent is destroyed before self is destroyed.

I find the latter rather strange. As long as self is alive, self._parent can not be deallocated, but when self._parent can not be deallocated, why is self._parent._modulus deleted??

@simon-king-jena
Copy link
Member Author

comment:327

OK, patchbot has the problem that it thinkgs that #13387 was to be applied first. But in fact, one should first apply the single patch from here, and then #13387.

@simon-king-jena
Copy link
Member Author

comment:328

Could it be that #12313 is not in sage-5.8.beta0? The version that I have downloaded before the official announcement does contain it.

Anyway, let's try:

Apply trac_12313-mono_dict-combined-random-sk.patch trac_12313-revert_callback_from_11521.patch to sage-5.8.beta0

@nbruin
Copy link
Contributor

nbruin commented Feb 24, 2013

comment:329

Why add that patch here when it belongs on #14159? This ticket already has positive review. It's bad form to start changing the code unless absolutely necessary and that new patch has nothing to do with this ticket. I've uploaded your patch there (and changed the commit message to reflect the appropriate ticket number). I hope that's OK with you. If so, you can just remove the ticket here from the list and move the ticket back to "Positive". We really want to get this merged, especially #13387 (which depends on this ticket) because that fixes some issues with TripleDict that could already bite its uses in 5.6 and 5.7.

@simon-king-jena

This comment has been minimized.

@simon-king-jena
Copy link
Member Author

comment:330

Replying to @nbruin:

Why add that patch here when it belongs on #14159? This ticket already has positive review. It's bad form to start changing the code unless absolutely necessary.

I thought it was absolutely necessary, and since it wasn't merged yet, I thought it was OK to do it here.

I've uploaded your patch there (and changed the commit message to reflect the appropriate ticket number). I hope that's OK with you.

If so, you can just remove the ticket here from the list

I can't remove the attachment, but I can remove it from the list of to-be-applied-patches.

and move the ticket back to "Positive".

OK.

Apply trac_12313-mono_dict-combined-random-sk.patch

@jdemeyer
Copy link

comment:331

This ticket is not merged (at some point it was in some preliminary version of sage-5.8.beta0).

You check this by

  1. Noting that this ticket has positive_review (not closed) with an empty "merged in" field.
  2. Checking the release notes (either txt or html).
  3. Checking the announcement on sage-release.

@jdemeyer

This comment has been minimized.

@jdemeyer
Copy link

Merged: sage-5.8.beta2

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

10 participants