Fix yet another memory leak caused by caching of coercion data #12313

simon-king-jena · 2012-01-15T17:04:02Z

The following could not be fixed in #715:

sage: K = GF(1<<55,'t')
sage: for i in range(50):
....:     a = K.random_element()
....:     E = EllipticCurve(j=a)
....:     b = K.has_coerce_map_from(E)
....:     
sage: import gc
sage: gc.collect()
0

The problem is that any parent has a dictionary that stores any coerce map (and a different dictionary for conversion maps) ending at this parent. The keys are given by the domains of the maps. So, in the example above, the field K has an attribute that is a dictionary whose keys are the different elliptic curves.

In coercion, it is usually best to compare parents not by equality but by identity. Therefore, I suggest to implement a container called MonoDict that works similar to the new weak TripleDict (see #715), but takes a single item as a key.

First tests show that one can actually gain a lot of speed: MonoDict can access its items much faster than a usual dictionary.

Apply

attachment: trac_12313-mono_dict-combined-random-sk.patch

Depends on #715
Depends on #11521
Depends on #12215
Depends on #13746
Depends on #13378
Depends on #13387

Dependencies: #715, #11521, #12215, #13746, #13378; merge with #13387

CC: @jpflori @vbraun @robertwb @alexanderdreyer @sagetrac-PolyBoRi @nthiery

Component: memleak

Keywords: coercion weak dictionary

Author: Simon King, Jean-Pierre Flori

Reviewer: Simon King, Jean-Pierre Flori, John Perry, Nils Bruin

Merged: sage-5.8.beta2

Issue created by migration from https://trac.sagemath.org/ticket/12313

The text was updated successfully, but these errors were encountered:

simon-king-jena · 2012-01-16T15:50:26Z

comment:1

Memory leaks are quite tricky!!

The post here is mainly a memo for myself, documenting what went wrong in the last couple of days.

I tried to directly modify my code from #715. The approach was: TripleDict or MonoDict should store the data in lists, with some list items being the address of the keys; and in a separate dictionary, a weakref.KeyedRef with a callback function is preserved.

And what happened when I replaced Parent._coerce_from_hash by a MonoDict? A memory leak was created by using weak references! Namely,

There is a strong reference to weakref.KeyedRef.
The weakref.KeyedRef has a strong reference to the callback function
The callback function is an instance of something that I'll call MonoDictEraser (similar to the TripleDictEraser from Parents probably not reclaimed due to too much caching #715)
The Eraser has a strong reference to the MonoDict
The MonoDict has a strong reference to the coerce map.
The coerce map has a strong reference to domain and codomain
Hence, domain and codomain can't be collected.

I think the best place to break the reference chain is from the Eraser to the MonoDict. Perhaps one could do the same thing at #715, rather than having a weak reference to the underlying set of an action?

simon-king-jena · 2012-01-16T15:52:47Z

comment:2

Replying to @simon-king-jena:

Perhaps one could do the same thing at #715, rather than having a weak reference to the underlying set of an action?

Nope! There, the situation was different, since there is strong reference to TripleDict anyway (namely in the coercion model), so that a weak reference from the eraser won't suffice.

simon-king-jena · 2012-01-18T16:43:05Z

comment:4

With the attached patch (to be applied after #715), a new kind of weak dictionary is introduced, and used to cache coercions and conversions. All tests pass for me.

Concerning raw speed: Without the patch, we have

sage: RDF.coerce_map_from(ZZ)
Native morphism:
  From: Integer Ring
  To:   Real Double Field
sage: timeit("RDF.coerce_map_from(ZZ)",number=10^5)
100000 loops, best of 3: 518 ns per loop

With the patch, it is 466 ns, hence, one gains 10% in an operation that is very frequent.

The total speed seems to be fine as well, make ptest has not been slower than before.

Concerning memory leaks:

sage: import gc
sage: _ = gc.collect()
sage: K = GF(1<<55,'t')
sage: for i in range(50):
...     a = K.random_element()
...     E = EllipticCurve(j=a)
...     b = K.has_coerce_map_from(E)
sage: _ = gc.collect()
sage: len([x for x in gc.get_objects() if isinstance(x,type(E))])
1

That is a new doctest and tests against a memory leak that has not been fixed by #715.

simon-king-jena · 2012-01-18T16:43:05Z

Author: Simon King

simon-king-jena · 2012-01-23T17:22:13Z

comment:5

I have extended my patch by another coercion-related application of MonoDict.

I found that sage.categories.groupoid.Groupoid is often resulting in a memory leak. For example, by #715, polynomial rings are only weakly cached. But if the groupoid of a polynomial ring is created, then a strong reference on the polynomial ring is created, preventing it from being garbage collected. But a groupoid is created as soon as there is an action of that polynomial ring on another ring. Hence, a memory leak.

The new patch version is overriding the __classcall__ method of sage.categories.groupoid.Groupoid (that would normally be inherited from UniqueRepresentation) by a method that uses a MonoDict for caching. In addition, there is now only a weak reference to the underlying set of a groupoid. By consequence, if there is no other strong reference then Groupoid(P) can be garbage collected, but a strong reference to P prevents Groupoid(P) from being garbage collected.

Still needs review.

jpflori · 2012-01-24T08:42:45Z

comment:6

Could you add the second problem fixed into the ticket description ?

Maybe this should be splitted into two tickets to be more readable, or at least two patches.

The title is a little vague as well :)

Sorry for the delay for this and #715 and #11521 but this is now on my top priority (for Sage).

jpflori · 2012-01-24T13:51:57Z

comment:7

And what if a user directly use a Groupoid object ?

Will the underlying set be garbage collected as in #715 for Actions?

Another reason to split this ticket in two (once again :) )

jpflori · 2012-01-24T13:56:47Z

comment:8

To finish my thoughts for now, we could as well leave the tickets as they are (except for some more doc and easy fixing) and potentially implement the other behavior (where a choice is possible between strong and weak refs) in a later ticket.

The current behavior with the error raising (if it gets a description of how not to get garbage collected in addition to telling that it got garbage collected) seems viable enough to me (unless that many people do actually use Groupoids and Actions directly).

simon-king-jena · 2012-01-24T15:59:43Z

comment:9

Perhaps you are right, and we could re-organise stuff. Namely, for now (that's to say: for #715 and for #12313) we could keep strong references from an action to the underlying set, and from a groupoid to the underlying set. The unfortunate consequence would be that not all memory leaks would be fixed. But one could create a new ticket where the idea of having weak references to the underlying set is tested.

And here is yet another idea, at least for the groupoid case. Let S be any object and G its groupoid. Ideally, S should keep G alive and G should keep S alive. Hence, it would be fine to have a strong reference from S to G and from G to S -- Python's garbage collection can deal with circular references.

However, there should be no strong references to S or G, unless the user explicitly provides it. In particular, when creating Groupoid(S), there should no external cache be used.

Potential solution: The cache for the Groupoid.__classcall__ should be in S! Perhaps like this:

class Groupoid(Category):
    @staticmethod
    def __classcall__(cls, S):
        try:
            return getattr(S, '_cached_'+cls.__name__)
        except AttributeError:
            pass
        instance = type.__call__(cls, S)
        assert(isinstance(instance,cls))
        if instance.__class__.__reduce__ == UniqueRepresentation.__reduce__:
            instance._reduction = (cls, args, options)
        try:
            setattr(S, '_cached_'+cls.__name__, instance)
            return
        except AttributeError:
            pass
        try:
            # Parents have a __cached_methods attribute, by #11115
            S.__cached_methods['_cached_'+cls.__name__] = instance
            return
        except AttributeError:
            pass
        raise TypeError, "%s instance must either allow attribute assignment or be instances of %s"(cls, Parent)

Note that the problem can hardly be solved in the same way for actions. They need to be stored externally, namely in the coercion model.

simon-king-jena · 2012-01-25T09:03:38Z

Work Issues: Split ticket

simon-king-jena · 2012-01-25T09:03:38Z

comment:10

I think it is best to follow your suggestion and split stuff.

Soon (perhaps tomorrow) I'll return to the old patch, i.e. without touching the groupoid stuff. In a to-be-created ticket, I'll try the idea for caching groupoids that avoids memory leaks but uses strong references, as explained in my previous post.

However, I think the weak references to the underlying set of actions is an integral part of #715 - without it, the memory leak would not be fixed at all; so, I'd not split #715.

simon-king-jena · 2012-01-25T12:03:37Z

comment:11

Python's name mangling sucks.

Name mangling occurs when an attribute is requested whose name starts with double underscores and does not end with an underscore. Thus, it happens when S.__cached_method is requested in the code above.

However, for a reason that I don't understand, Python does not use the name of the class of S for mangling, but the name of cls -- for example, if S==ZZ then Python is looking for S._Groupoid__cached_methods, not S._<name of class of S>__cached_methods. In addition, since __cached_methods is a cpdef attribute, it should not be subject to name mangling at all.

I'll see if doing things in Cython helps.

simon-king-jena · 2012-01-25T12:36:08Z

Weak coerce map dictionary with weakly referenced unique keys and optimised access speed

simon-king-jena · 2012-01-25T12:38:47Z

Changed work issues from Split ticket to none

simon-king-jena · 2012-01-25T12:38:47Z

comment:12

Attachment: trac12313_mono_dict.patch.gz

I have updated my patch, so that groupoids are not touched any more. Needs review, again.

Groupoids are now dealt with at #12357 (needs review), and I think the solution provided there is much better than the improper use of weak references.

simon-king-jena · 2012-01-29T14:41:09Z

Work Issues: Find out why some attribute of a parent is deleted before the parent's elements are deleted

simon-king-jena · 2012-01-29T14:41:09Z

comment:13

I am not totally sure whether the following problem is caused by this patch or by #715 or #11521. Anyway, with all three patches applied, I obtain:

sage: K.<z> = GF(4)
sage: P.<x> = K[]
sage: del P
sage: del x
sage: import gc
sage: gc.collect()
Exception AttributeError: 'PolynomialRing_field_with_category' object has no attribute '_modulus' in  ignored
148

As you can observe, some attribute error is ignored, but worse: Cython is unable to tell where it is ignored.

The cython-users helped me to track the problem down, using gdb.

The attribute error occurs in the following cdef method defined in sage/rings/polynomial/polynomial_zz_pex.pyx:

cdef cparent get_cparent(parent) except? NULL:
    if parent is None:
        return NULL
    cdef ntl_ZZ_pEContext_class c
    c = parent._modulus
    return &(c.x)

By the way it is defined, the error is propagated. However, get_cparent is in fact called in a deallocation method, namely in sage/rings/polynomial/polynomial_template.pxi:

    def __dealloc__(self):
        """
        EXAMPLE::

            sage: P.<x> = GF(2)[]
            sage: del x
        """
        celement_destruct(&self.x, get_cparent((<Polynomial_template>self)._parent))

That is interesting for two reasons: (1) Cython can not tell the location of the error, which I think is a bug in Cython. (2) The attribute _modulus of self._parent is destroyed before self is destroyed.

I find the latter rather strange. As long as self is alive, self._parent can not be deallocated, but when self._parent can not be deallocated, why is self._parent._modulus deleted??

simon-king-jena · 2013-02-23T22:46:31Z

comment:327

OK, patchbot has the problem that it thinkgs that #13387 was to be applied first. But in fact, one should first apply the single patch from here, and then #13387.

simon-king-jena · 2013-02-24T00:39:55Z

comment:328

Could it be that #12313 is not in sage-5.8.beta0? The version that I have downloaded before the official announcement does contain it.

Anyway, let's try:

Apply trac_12313-mono_dict-combined-random-sk.patch trac_12313-revert_callback_from_11521.patch to sage-5.8.beta0

nbruin · 2013-02-24T03:18:09Z

comment:329

Why add that patch here when it belongs on #14159? This ticket already has positive review. It's bad form to start changing the code unless absolutely necessary and that new patch has nothing to do with this ticket. I've uploaded your patch there (and changed the commit message to reflect the appropriate ticket number). I hope that's OK with you. If so, you can just remove the ticket here from the list and move the ticket back to "Positive". We really want to get this merged, especially #13387 (which depends on this ticket) because that fixes some issues with TripleDict that could already bite its uses in 5.6 and 5.7.

simon-king-jena · 2013-02-24T08:30:26Z

comment:330

Replying to @nbruin:

Why add that patch here when it belongs on #14159? This ticket already has positive review. It's bad form to start changing the code unless absolutely necessary.

I thought it was absolutely necessary, and since it wasn't merged yet, I thought it was OK to do it here.

I've uploaded your patch there (and changed the commit message to reflect the appropriate ticket number). I hope that's OK with you.

If so, you can just remove the ticket here from the list

I can't remove the attachment, but I can remove it from the list of to-be-applied-patches.

and move the ticket back to "Positive".

OK.

Apply trac_12313-mono_dict-combined-random-sk.patch

jdemeyer · 2013-02-24T08:49:56Z

comment:331

This ticket is not merged (at some point it was in some preliminary version of sage-5.8.beta0).

You check this by

Noting that this ticket has positive_review (not closed) with an empty "merged in" field.
Checking the release notes (either txt or html).
Checking the announcement on sage-release.

jdemeyer · 2013-02-28T10:31:00Z

Merged: sage-5.8.beta2

simon-king-jena added this to the sage-5.0 milestone Jan 15, 2012

simon-king-jena added t: bug labels Jan 15, 2012

simon-king-jena assigned rlmill Jan 15, 2012

This comment has been minimized.

Sign in to view

simon-king-jena added the s: needs review label Jan 18, 2012

jpflori added s: needs work and removed s: needs review labels Jan 24, 2012

simon-king-jena added s: needs review and removed s: needs work labels Jan 25, 2012

simon-king-jena removed the s: needs review label Jan 29, 2012

simon-king-jena added s: needs review and removed s: needs work labels Feb 23, 2013

nbruin added s: positive review and removed s: needs review labels Feb 24, 2013

This comment has been minimized.

Sign in to view

jdemeyer removed the s: positive review label Feb 28, 2013

jdemeyer closed this as completed Feb 28, 2013

nbruin mentioned this issue Aug 15, 2012

Parents probably not reclaimed due to too much caching #715

Closed

This was referenced Aug 22, 2012

Improve MonoDict and TripleDict data structures #13387

Closed

Write a WeakValueDictionary with safer key removal #13394

Closed

This was referenced Aug 25, 2012

Use strong caches diligently #13400

Open

Fix refcounting of libsingular rings #13450

Closed

jpflori mentioned this issue Dec 16, 2012

Do not delete a borrowed reference to reduction strategies in pbori #13746

Closed

simon-king-jena mentioned this issue Jan 21, 2013

Better deletion of items of TripleDict #13904

Closed

This was referenced Dec 29, 2022

Tune rational echelon form code #13925

Open

Mitigate speed regressions in symmetric function related code due to #12313 #13991

Closed

Weakly reference binary operation codomains #14058

Closed

simon-king-jena mentioned this issue Mar 7, 2013

Caching of data needed for computations in k_dual #14228

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix yet another memory leak caused by caching of coercion data #12313

Fix yet another memory leak caused by caching of coercion data #12313

simon-king-jena commented Jan 15, 2012

simon-king-jena commented Jan 16, 2012

simon-king-jena commented Jan 16, 2012

This comment has been minimized.

simon-king-jena commented Jan 18, 2012

simon-king-jena commented Jan 18, 2012

simon-king-jena commented Jan 23, 2012

jpflori commented Jan 24, 2012

jpflori commented Jan 24, 2012

jpflori commented Jan 24, 2012

simon-king-jena commented Jan 24, 2012

simon-king-jena commented Jan 25, 2012

simon-king-jena commented Jan 25, 2012

simon-king-jena commented Jan 25, 2012

simon-king-jena commented Jan 25, 2012

simon-king-jena commented Jan 25, 2012

simon-king-jena commented Jan 25, 2012

simon-king-jena commented Jan 29, 2012

simon-king-jena commented Jan 29, 2012

simon-king-jena commented Feb 23, 2013

simon-king-jena commented Feb 24, 2013

nbruin commented Feb 24, 2013

This comment has been minimized.

simon-king-jena commented Feb 24, 2013

jdemeyer commented Feb 24, 2013

This comment has been minimized.

jdemeyer commented Feb 28, 2013

Fix yet another memory leak caused by caching of coercion data #12313

Fix yet another memory leak caused by caching of coercion data #12313

Comments

simon-king-jena commented Jan 15, 2012

simon-king-jena commented Jan 16, 2012

simon-king-jena commented Jan 16, 2012

This comment has been minimized.

simon-king-jena commented Jan 18, 2012

simon-king-jena commented Jan 18, 2012

simon-king-jena commented Jan 23, 2012

jpflori commented Jan 24, 2012

jpflori commented Jan 24, 2012

jpflori commented Jan 24, 2012

simon-king-jena commented Jan 24, 2012

simon-king-jena commented Jan 25, 2012

simon-king-jena commented Jan 25, 2012

simon-king-jena commented Jan 25, 2012

simon-king-jena commented Jan 25, 2012

simon-king-jena commented Jan 25, 2012

simon-king-jena commented Jan 25, 2012

simon-king-jena commented Jan 29, 2012

simon-king-jena commented Jan 29, 2012

simon-king-jena commented Feb 23, 2013

simon-king-jena commented Feb 24, 2013

nbruin commented Feb 24, 2013

This comment has been minimized.

simon-king-jena commented Feb 24, 2013

jdemeyer commented Feb 24, 2013

This comment has been minimized.

jdemeyer commented Feb 28, 2013