Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Pickling of immutable graphs #15619

Closed
nathanncohen mannequin opened this issue Jan 2, 2014 · 75 comments
Closed

Pickling of immutable graphs #15619

nathanncohen mannequin opened this issue Jan 2, 2014 · 75 comments

Comments

@nathanncohen
Copy link
Mannequin

nathanncohen mannequin commented Jan 2, 2014

Before

sage: g=Graph(graphs.PetersenGraph(),immutable=True)
sage: g == loads(dumps(g))
...
TypeError: __cinit__() takes exactly 1 positional argument (0 given)

After

sage: g=Graph(graphs.PetersenGraph(),immutable=True)
sage: g == loads(dumps(g))
True

Depends on #15603

CC: @simon-king-jena

Component: graph theory

Author: Nathann Cohen

Branch/Commit: public/15619 @ e761346

Reviewer: Simon King

Issue created by migration from https://trac.sagemath.org/ticket/15619

@nathanncohen nathanncohen mannequin added this to the sage-6.1 milestone Jan 2, 2014
@nathanncohen
Copy link
Mannequin Author

nathanncohen mannequin commented Jan 2, 2014

Branch: u/ncohen/15619

@nathanncohen nathanncohen mannequin added the s: needs review label Jan 2, 2014
@sagetrac-git
Copy link
Mannequin

sagetrac-git mannequin commented Jan 2, 2014

Branch pushed to git repo; I updated commit sha1. New commits:

0229348trac #15619: Pickling of immutable graphs
7860f39trac #15603: "immutable=True" for Graph/Digraph `__init__` and copy()
2525d22Trac 15278: Fix syntax error in doc test
fcf9e30Trac 15278: Only graphs that use the static backend are identical with their copy
8fc9c94Merge branch 'develop' into ticket/15278
51d6328Trac 15278: Error messages explain how to create (im)mutable graph copy
2fc8a77Make digraphs using the static backend immutable and hashable
07bad46Merge branch 'u/ncohen/15491' of git://trac.sagemath.org/sage into ticket/15278
126b036Merge branch 'master' into ticket/15278
c774057Prepare hash and copy for immutable graphs. Let .weighted() respect mutability.

@sagetrac-git
Copy link
Mannequin

sagetrac-git mannequin commented Jan 2, 2014

Commit: 0229348

@simon-king-jena
Copy link
Member

comment:3

I am not convinced that creating a graph in order to pickle a backend is the best possible solution. So, I'll do some further experiments with a reduce method for graphs---no need for you to worry about it at this point...

@nathanncohen
Copy link
Mannequin Author

nathanncohen mannequin commented Jan 2, 2014

comment:4

Of course not, it is indeed ugly code, and my point is that I write ugly code because I am forced by a machine to write code for which I have no personal use :-P

Yet it passes those stupid tests, and for this I am grateful. Right now I am fighting with "Poset.promotion", and I have noooooooo idea on earth of where the broken doctests are coming from O_o

Nathann

@nathanncohen
Copy link
Mannequin Author

nathanncohen mannequin commented Jan 2, 2014

comment:5

It does minimize the amount of code necessary for such a function, though. And clear code is good by itself :-P

Nathann

@simon-king-jena
Copy link
Member

comment:6

Replying to @nathanncohen:

Of course not, it is indeed ugly code, and my point is that I write ugly code because I am forced by a machine to write code for which I have no personal use :-P

Sage is a community project after all.

@nathanncohen
Copy link
Mannequin Author

nathanncohen mannequin commented Jan 2, 2014

comment:7

I don't see the relationship O_o

@simon-king-jena
Copy link
Member

comment:8

What do you think about the following way to pickle generic graphs?

    def __reduce__(self):
        # copy the dict
        D = dict(self.__dict__.iteritems())
        # remove something that may not be picklable
        try:
            del D['_backend']
        except KeyError:
            pass
        return type(self), (self.edges(),), D

Is this elegant enough for you? Additional bait: We may even remove the existing __reduce__ methods of the other backends ;-)

@simon-king-jena
Copy link
Member

comment:9

PS: With the above __reduce__ method put into sage.graphs.generic_graph.py, one has

sage: G = graphs.PetersenGraph()
sage: g = G.copy(data_structure='static_sparse')
sage: loads(dumps(g))
Graph on 10 vertices
sage: g == _
True

@simon-king-jena
Copy link
Member

comment:10

Some little timing:

Your way of pickling:

sage: G = graphs.PetersenGraph()
sage: g = G.copy(data_structure='static_sparse')
sage: %timeit h = loads(dumps(g))
1000 loops, best of 3: 1.11 ms per loop

My way of pickling:

sage: G = graphs.PetersenGraph()
sage: g = G.copy(data_structure='static_sparse')
sage: %timeit h = loads(dumps(g))
1000 loops, best of 3: 493 us per loop

:-P

@simon-king-jena
Copy link
Member

comment:11

On the other hand, I think I did something severely wrong:

sage:  h = loads(dumps(g))
sage: type(h._backend)
<class 'sage.graphs.base.sparse_graph.SparseGraphBackend'>

Bad. So, it doesn't work in the way I thought...

@simon-king-jena
Copy link
Member

comment:12

Would it be possible to let the static graph backend __init__ accept a list of edges rather than a graph? Then, your __reduce__ method could be simplified (and actually it seems odd that the creation of a graph backend needs a graph!).

@simon-king-jena
Copy link
Member

comment:13

Hm. I tried to understand what happens when one creates a static sparse backend.

I always thought that one needs a backend to implement a graph. But in fact __init__ and __cinit__ and init_short_digraph (which are all involved in the creation of a static sparse backend) rely on the input being a graph. :-/

@nathanncohen
Copy link
Mannequin Author

nathanncohen mannequin commented Jan 2, 2014

comment:14

Well, that can be changed too at some point if it is really needed (I would personally hate to change all this code just because the guy who implemented TestSuite decide for everybody that all the graphs classes HE tested should be picklable), but building a graph is not such a trivial matter. Because you have labels of not, because you have multiple edges or not, because the labels can be integers, have labels, because you can have loops, because they can be oriented or not.

I mean, that's actually a lot of stuff, and it's not funny to implement it so many times in the code. It's already VERY messy in the Graph/DiGraph constructors.

Honestly, I don't think pickling really needs any amount of efficient code. So as we are just implementing it to make the testsuite happy, and unless you want to mess with all this code just because of that, let's use it. It's short and it works. It's bad code of course, because pickling is bad work. It's a generic way to store data, that should work regardless of what the data is. It's like XML. Saving data is not something that should be done without knowing what the data is. If you want to write generic code for this, it will end up being bad code.

Anyway. The only problem I would like to see solved is to have Posets use a static data structure. That would make them both faster and lighter. And I think something should be done about the time complexity of the Graph/DiGraph constructor too.

Let's move on. Can you help me with this promotion bug ? I have no idea of how it should be solved, and it seems to be the last step before having the Posets use a static backend.

Nathann

@simon-king-jena
Copy link
Member

comment:15

Replying to @nathanncohen:

but building a graph is not such a trivial matter. Because you have labels of not, because you have multiple edges or not, because the labels can be integers, have labels, because you can have loops, because they can be oriented or not.

Right. This is in fact a strong argument for implementing the pickling in the way you did. All these optional arguments of (Di)Graph.__init__ are a mess.

Honestly, I don't think pickling really needs any amount of efficient code.

That's a valid point, too. Efficient pickling mainly plays a role if copying is implemented by pickling, since this may happen interactively and frequently. But we have a separate and hopefully efficient implementation of __copy__, and thus __reduce__ is really only involved when people want to store a poset (and hence, an immutable graph, and thus, a static sparse backend) into a file. So, spending some milli seconds seems acceptable.

So as we are just implementing it to make the testsuite happy, and unless you want to mess with all this code just because of that, let's use it. It's short and it works. It's bad code of course, because pickling is bad work. It's a generic way to store data, that should work regardless of what the data is. It's like XML. Saving data is not something that should be done without knowing what the data is.

Careful! You are just arguing that the person who knows the data structure best (i.e., you) is responsible for making pickling work :-)

If you want to write generic code for this, it will end up being bad code.

Correct. Sometimes generic code just works, but in some cases __reduce__ needs to be provided. Wonderful that Python is object oriented and lets us do such things...

Anyway. The only problem I would like to see solved is to have Posets use a static data structure. That would make them both faster and lighter. And I think something should be done about the time complexity of the Graph/DiGraph constructor too.

Here or on a different ticket?

Let's move on. Can you help me with this promotion bug ?

OK. So, I'll review your pickling approach, and in the meantime you point me to your branch for "making posets use immutable graphs", so that I can test the promotion bug.

@simon-king-jena
Copy link
Member

comment:16

Just for completeness: It would also be possible to say "graphs do not want being pickled", which would force all structures that use graphs to implement their own pickling by storing the defining data of the graphs they use, rather than storing these graphs directly.

Hence, one could provide Poset with a custom __reduce__ method that just stores the list of edges of the graph, rather than the graph itself. However, I think in the long run it would be better if graphs knew how to save themselves.

@nathanncohen
Copy link
Mannequin Author

nathanncohen mannequin commented Jan 2, 2014

comment:17

Well. So what do we do with this commit ? Is it good for you ? And more importantly, what do we do with this bug in linear_extensions.py in the u/ncohen/tmp branch ? :-/

Nathann

@simon-king-jena
Copy link
Member

comment:18

This one seems to be the underlying reason for the problems with posets:

sage: uc = [[2,3], [], [1], [1], [1], [3,4]]
sage: D = DiGraph(dict([[i,uc[i]] for i in range(len(uc))]), data_structure="static_sparse")
sage: loads(dumps(D)) == D
False

@simon-king-jena
Copy link
Member

comment:19

The pickling result is in fact seriously wrong:

sage: d = loads(dumps(D))
sage: d.edges()
[(0, 2, None), (0, 3, None)]
sage: D.edges()
[(0, 2, None),
 (0, 3, None),
 (2, 1, None),
 (3, 1, None),
 (4, 1, None),
 (5, 3, None),
 (5, 4, None)]

@simon-king-jena
Copy link
Member

comment:20

Indeed the error occurs when constructing the graph used to pickle the graph backend:

sage: constructor = DiGraph
sage: self = D._backend
sage: G.add_vertices(self.iterator_verts(None))
sage: G
Digraph on 6 vertices
sage: G.vertices()
[0, 1, 2, 3, 4, 5]
sage: D.vertices()
[0, 1, 2, 3, 4, 5]
sage: G.add_edges(list(self.iterator_edges(self.iterator_verts(None),1)))
sage: G.edges()
[(0, 2, None), (0, 3, None)]

Bug in iterator_edges? Then you should thank the guy who invented the TestSuite, since it helped you uncover a bug in a method you probably care about (namely iteration over edges).

@simon-king-jena
Copy link
Member

comment:21

In pure form, the failing example is

sage: uc = [[2,3], [], [1], [1], [1], [3,4]]
sage: D = DiGraph(dict([[i,uc[i]] for i in range(len(uc))]), data_structure="static_sparse")
sage: list(D._backend.iterator_edges(D.vertices(), True))
[(0, 2, None), (0, 3, None)]

@nathanncohen
Copy link
Mannequin Author

nathanncohen mannequin commented Jan 2, 2014

comment:22

Ahahah. You are oversimplying things. It's good that we found a bug. But that does not justify all at once the principle of testsuites and its use.

I'm trying to fix that. It's weird, if you use iterator_out_edges instead there is no problem anymore O_o

Nathann

@nathanncohen
Copy link
Mannequin Author

nathanncohen mannequin commented Jan 2, 2014

comment:23

(by the way, D.show fails :-P)

@nathanncohen
Copy link
Mannequin Author

nathanncohen mannequin commented Jan 2, 2014

comment:24

Oh. Well, it is not really a bug. It's just that iterator_edges is not meant for digraphs O_o

Nathann

@nathanncohen
Copy link
Mannequin Author

nathanncohen mannequin commented Jan 2, 2014

comment:55

Annnnnnnnnd with this all tests pass.. WUUUUUHUUUUUUUUUUUUUUUUUUUUUUUU !! O_O_O

Nathann

@nathanncohen
Copy link
Mannequin Author

nathanncohen mannequin commented Jan 2, 2014

comment:56

#15623 makes Posets use immutable graphs. Cool, isn't it ? :-PPP

Nathann

@simon-king-jena
Copy link
Member

comment:57

Replying to @nathanncohen:

Hmmmm... Looks like the StaticSparseBackend has a working relabel function :-PPPPPPPP

Ouch. But you take care of this in #15622, right?

Apart from this: All doctests pass, and the code looks fine to me. In a review commit, I'll add some tests for issues we discussed here.

@nathanncohen
Copy link
Mannequin Author

nathanncohen mannequin commented Jan 3, 2014

comment:58

Indeed, indeed !

Nathann

@simon-king-jena
Copy link
Member

comment:59

Right now I am testing whether everything keeps working after merging my review patch from #15603.

@simon-king-jena
Copy link
Member

Changed branch from u/ncohen/15619 to u/SimonKing/ticket/15619

@simon-king-jena
Copy link
Member

comment:61

I am not sure if pushing my changes properly worked (apparently I needed to manually set the commit I added). Anyway, if you like my review commit, then it is a positive review.

@simon-king-jena
Copy link
Member

Changed commit from c441178 to cabc969

@simon-king-jena
Copy link
Member

Reviewer: Simon King

@nathanncohen
Copy link
Mannequin Author

nathanncohen mannequin commented Jan 3, 2014

comment:62

Argggggggg sorry. I just noticed that there is a problem with this code. It does not handle multiple edges and loops correctly.

I HAAAAAAAAAAAAAAAAAAAAATE THESE THINGS.

sage: g = Graph(multiedges = True)
sage: g.add_edges(graphs.PetersenGraph().edges())
sage: g.add_edges(graphs.PetersenGraph().edges())
sage: gi = g.copy(immutable=True)
sage: loads(dumps(gi))
Multi-graph on 10 vertices
sage: loads(dumps(gi)) == gi
False
sage: gi.size()             
30
sage: loads(dumps(gi)).size()
15

I will add a commit to this one within 10 minutes. Need to write it first >_<

Nathann

@nathanncohen
Copy link
Mannequin Author

nathanncohen mannequin commented Jan 3, 2014

Changed commit from cabc969 to none

@nathanncohen
Copy link
Mannequin Author

nathanncohen mannequin commented Jan 3, 2014

Changed branch from u/SimonKing/ticket/15619 to public/15619

@sagetrac-git
Copy link
Mannequin

sagetrac-git mannequin commented Jan 3, 2014

Commit: e761346

@sagetrac-git
Copy link
Mannequin

sagetrac-git mannequin commented Jan 3, 2014

Branch pushed to git repo; I updated commit sha1. Last 10 new commits:

e761346trac #15619: Pickling multigraphs with loops and labels
cabc969Trac 15619: Review commit
c441178trac #15619: bug in the former definition; exception to avoid it in the future
0229348trac #15619: Pickling of immutable graphs
7860f39trac #15603: "immutable=True" for Graph/Digraph `__init__` and copy()
2525d22Trac 15278: Fix syntax error in doc test
fcf9e30Trac 15278: Only graphs that use the static backend are identical with their copy
8fc9c94Merge branch 'develop' into ticket/15278
51d6328Trac 15278: Error messages explain how to create (im)mutable graph copy
2fc8a77Make digraphs using the static backend immutable and hashable

@nathanncohen
Copy link
Mannequin Author

nathanncohen mannequin commented Jan 3, 2014

comment:66

Done. tested on the most ugly graphs I could build.

I hate multigraphs. I hate loops. I hate labels.

Nathann

@nathanncohen
Copy link
Mannequin Author

nathanncohen mannequin commented Jan 4, 2014

comment:67

Helloooo Simon ! Is this one good for you ? It depends on a patch with positive_review, and a patch with positive_review depends on it, but it still unchecked :-P

Nathann

@simon-king-jena
Copy link
Member

comment:68

The solution seems logical to me (to the extent that multigraphs with labels and loops can be "logical"...), and the example is carefully chosen to cover all cases (including isolated vertices). Hence, if the doctests pass (I'll know in about one hour) then I'll give it a positive review.

@simon-king-jena
Copy link
Member

comment:69
----------------------------------------------------------------------
All tests passed!
----------------------------------------------------------------------
Total time for all tests: 3981.0 seconds
    cpu time: 6195.0 seconds
    cumulative wall time: 7673.1 seconds

OK, a bit more than an hour.

According to the previous comment, this is a positive review.

@vbraun vbraun closed this as completed in 0418a3e Jan 5, 2014
tscrim pushed a commit to tscrim/sage that referenced this issue Jun 1, 2023
* develop: (59 commits)
  Updated Sage version to 6.1.beta4
  trac sagemath#15572: DiGraph.is_directed_acyclic handles loops pretty well
  Fixed bug in uniform matroid.
  trac sagemath#15619: Pickling multigraphs with loops and labels
  Trac 15619: Review commit
  Trac 15603: More doctests, nicer error message
  No need to specify caller_name in verbose()
  Add comments, small cosmetic changes
  Make q monic before computing cubic resolvent
  Implement splitting fields for number fields
  trac sagemath#15619: bug in the former definition; exception to avoid it in the future
  trac 8723: fix one line in isogeny_small_degree
  Use "in" instead of PyDict_Contains()
  test for membership with `x in self`; minor doc formatting change
  trac sagemath#15619: Pickling of immutable graphs
  Rebased on sage-6.1.beta2
  # User Thomas Feulner <thomas.feulner@uni-bayreuth.de>
  Implemented IntegerVectors_nk.rank(). Fixes sagemath#15609
  Trac sagemath#5153: small change in documentation.
  Trac 7695: Variable name for all subfields where the name ends with a digit
  ...
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants