-
-
Notifications
You must be signed in to change notification settings - Fork 453
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
clean up the code for Kruskal's algorithm #10433
Comments
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
Author: Minh Van Nguyen |
This comment has been minimized.
This comment has been minimized.
comment:7
Hello Minh !!!! I just looked at your code... Here are several questions/remarks about it :
Nathann |
comment:8
Replying to @nathanncohen:
I've added a new parameter "check" to the method
You're right. That's two computation of connectedness. This is now fixed to only do one computation of connectedness and afterward to check for tree.
Here is an explanation of the code block on retaining only the minimum weighted edge among all weighted multiple edges with the same start and edge vertices. Let G be an undirected, weighted multigraph. If there are multiple edges from u to v, retain only the start and end vertices of such edges; at this point we don't care about the edge weights. Let a and b be the start and end vertices, respectively, of a weighted edge
Specifically, I'm using the idiom:
It is more or less equivalent to using a lambda, but there are clear documentation on the above sorting idiom. See this page in the Python library. And this other page from the Python wiki, especially the section "Operator Module Functions". Also note that what you're suggesting is using a closure, which is not supported in the version of Cython currently distributed with Sage 4.6.1.alpha3.
I work with the assumption that we can only compute minimum spanning trees for undirected simple graphs. Let g be a copy of the input graph and suppose g is an unweighted multigraph. By the working assumption, we merely convert g to be a simple graph and then run the resulting simple graph through Kruskal's algorithm. I think working directly with the input graph, and not a copy of this graph, would result in more complicated code. If you want speed, make sure your input graph is a simple unweighted graph that is connected and not a tree. The keyword
Because I want an iterator over the list returned by "e[0:2]". If I do
then I would first produce a list as requested by the slice operation "e[0:2]" and I would need to wait for this production process to end. After I have finished producing the list, I would then begin iterating over items in the list. However, if I do
then I don't need to first wait for the list "e[0:2]" to be produced before I can start iterating over its elements. The function
I've added the keyword |
comment:9
Patch updated with more documentation: more comments within code, a new doctest, more explanation about what's expected of the input graph to |
comment:10
Update patch to use the set data structure during the process of removing multiple edges. |
comment:11
Hello Minh !!! I'm sorry to come back with another thing to fix after having made so many remarks, but I really hadn't thought this problem would occur.. I just did some profiling, and even though your code is faster when no checkings are required, there is a large slowdown by default :
for i in range(10):
g = graphs.RandomGNP(50,.3)
timeit("g.min_spanning_tree()")
25 loops, best of 3: 13.1 ms per loop
25 loops, best of 3: 12.7 ms per loop
25 loops, best of 3: 12.5 ms per loop
25 loops, best of 3: 12.5 ms per loop
25 loops, best of 3: 13.1 ms per loop
25 loops, best of 3: 12.6 ms per loop
25 loops, best of 3: 12 ms per loop
25 loops, best of 3: 11.9 ms per loop
25 loops, best of 3: 12.8 ms per loop
25 loops, best of 3: 13.2 ms per loop
for i in range(10):
g = graphs.RandomGNP(50,.3)
timeit("g.min_spanning_tree(check=False)")
125 loops, best of 3: 2.36 ms per loop
125 loops, best of 3: 2.1 ms per loop
125 loops, best of 3: 2.34 ms per loop
125 loops, best of 3: 2.49 ms per loop
125 loops, best of 3: 2.42 ms per loop
125 loops, best of 3: 2.36 ms per loop
125 loops, best of 3: 2.37 ms per loop
125 loops, best of 3: 2.23 ms per loop
125 loops, best of 3: 2.34 ms per loop
125 loops, best of 3: 2.2 ms per loop
for i in range(10):
g = graphs.RandomGNP(50,.3)
timeit("g.min_spanning_tree()")
125 loops, best of 3: 3.33 ms per loop
125 loops, best of 3: 2.96 ms per loop
125 loops, best of 3: 3.21 ms per loop
125 loops, best of 3: 3.32 ms per loop
125 loops, best of 3: 2.8 ms per loop
125 loops, best of 3: 3.06 ms per loop
125 loops, best of 3: 3.03 ms per loop
125 loops, best of 3: 3.13 ms per loop
125 loops, best of 3: 3.4 ms per loop
125 loops, best of 3: 3.5 ms per loop Calling for i in range(10):
g = graphs.RandomGNP(50,.3)
timeit("g.to_undirected()")
125 loops, best of 3: 4.43 ms per loop
125 loops, best of 3: 4.56 ms per loop
125 loops, best of 3: 4.88 ms per loop
125 loops, best of 3: 4.59 ms per loop
125 loops, best of 3: 4.92 ms per loop
125 loops, best of 3: 4.54 ms per loop
125 loops, best of 3: 4.57 ms per loop
125 loops, best of 3: 4.25 ms per loop
125 loops, best of 3: 4.69 ms per loop
125 loops, best of 3: 4.63 ms per loop Calling for i in range(10):
g = graphs.RandomGNP(50,.3)
timeit("g.is_connected()")
625 loops, best of 3: 153 µs per loop
625 loops, best of 3: 154 µs per loop
625 loops, best of 3: 157 µs per loop
625 loops, best of 3: 152 µs per loop
625 loops, best of 3: 150 µs per loop
625 loops, best of 3: 149 µs per loop
625 loops, best of 3: 149 µs per loop
625 loops, best of 3: 153 µs per loop
625 loops, best of 3: 153 µs per loop
625 loops, best of 3: 155 µs per loop So in general the default behaviour leads to a much slower performance, even though disabling the checkings makes it better than the current implementation. I think most of it is because of the time required to copy the graph (checking the connectivity is comparatively much faster). I tried to find some cheap trick to avoid the checking, at least when the graph is simple and unweighted. Actually, I rewrote a computation of spanning trees based on depth-first search, but I wasn't able to do better than what your code does when the checkings are disabled, so it's not useful by itself. I think the best way is to find a way around graph copies, but no cheap tricks so far Nathann |
comment:12
Attachment: trac-10433_clean-up-kruskal.patch.gz Replying to @nathanncohen:
What can be done is to turn off the sanity checks by default. I have done this in the updated patch. But note that the function now assumes that by default the input graph is simple, connected, not a tree, and has at least one vertex. If you can't be certain that the input graph meet all of the latter conditions, then toggle sage: g = graphs.RandomGNP(2000, 0.3)
sage: save(g, "graphs") I get some timing for the case where we don't use the patch: sage: g = load("graphs")
sage: %time g.min_spanning_tree();
CPU times: user 15.57 s, sys: 0.00 s, total: 15.57 s
Wall time: 15.57 s
sage: %time g.min_spanning_tree();
CPU times: user 16.29 s, sys: 0.00 s, total: 16.29 s
Wall time: 16.29 s
sage: %time g.min_spanning_tree();
CPU times: user 16.02 s, sys: 0.00 s, total: 16.02 s
Wall time: 16.02 s
sage: %time g.min_spanning_tree();
CPU times: user 16.50 s, sys: 0.00 s, total: 16.50 s
Wall time: 16.50 s Now with the patch, but use sage: g = load("graphs")
sage: %time g.min_spanning_tree();
CPU times: user 15.22 s, sys: 0.00 s, total: 15.22 s
Wall time: 15.22 s
sage: %time g.min_spanning_tree();
CPU times: user 14.86 s, sys: 0.00 s, total: 14.86 s
Wall time: 14.86 s
sage: %time g.min_spanning_tree();
CPU times: user 14.72 s, sys: 0.00 s, total: 14.72 s
Wall time: 14.71 s
sage: %time g.min_spanning_tree();
CPU times: user 15.10 s, sys: 0.00 s, total: 15.10 s
Wall time: 15.09 s Finally, with the patch and directly import and use sage: g = load("graphs")
sage: from sage.graphs.spanning_tree import kruskal
sage: %time kruskal(g);
CPU times: user 15.11 s, sys: 0.00 s, total: 15.11 s
Wall time: 15.11 s
sage: %time kruskal(g);
CPU times: user 14.79 s, sys: 0.00 s, total: 14.79 s
Wall time: 14.79 s
sage: %time kruskal(g);
CPU times: user 14.66 s, sys: 0.00 s, total: 14.66 s
Wall time: 14.66 s
sage: %time kruskal(g);
CPU times: user 14.99 s, sys: 0.00 s, total: 14.99 s
Wall time: 14.99 s
The call to
Here's one way to improve performance: get rid of the keyword "as_edges" and return only the edges of a minimum spanning tree. You might want What could also increase performance slightly is to use a priority queue to iterate over edges of the input graph, instead of using a sorting function. But a Cython implementation of priority queue is still not yet available. |
Reviewer: Nathann Cohen |
comment:13
Positive review to this patch ! It passes "testall -long", and is a very nice piece of work (plus the beginning of a new module) Thank you very much for your patience, Minh ! Nathann |
Work Issues: rebase |
comment:14
Needs to be rebased to sage-4.6.2.alpha0, I get the following patch output::
|
comment:15
The fixing about the results of a search for "tree" in the doc has been fixed elsewhere ! Here is a rebased version of the patch ! Nathann |
comment:16
Attachment: trac-10433-rebased-4.6.2.alpha0.patch.gz |
Merged: sage-4.6.2.alpha1 |
Changed work issues from rebase to none |
From sage-support. See also this sage-devel thread for design discussion.
CC: @nathanncohen
Component: graph theory
Author: Minh Van Nguyen
Reviewer: Nathann Cohen
Merged: sage-4.6.2.alpha1
Issue created by migration from https://trac.sagemath.org/ticket/10433
The text was updated successfully, but these errors were encountered: