-
Notifications
You must be signed in to change notification settings - Fork 183
Added constructors to create graphs from a vector or an iterator of edges #880
Added constructors to create graphs from a vector or an iterator of edges #880
Conversation
…ector or an iterator of edges
Codecov Report
@@ Coverage Diff @@
## master #880 +/- ##
==========================================
+ Coverage 99.67% 99.69% +0.01%
==========================================
Files 60 60
Lines 2160 2281 +121
==========================================
+ Hits 2153 2274 +121
Misses 7 7 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
More comments later. Thanks.
neg = 0 | ||
for v in 1:length(fadjlist) | ||
if !issorted(fadjlist[v]) | ||
sort!(fadjlist[v]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this check can be expensive. Might as well just sort and be done with it, or pass in a boolean argument to denote whether we can assume the vectors are sorted.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think calling issorted
is that expensive; especially compared to sort!
. When I wrote that function I used the profiler to check where the bottlenecks are and issort
was definitely not the problem. unique!
actually also calls this function, which is unnecessary in our case as we know that the vector must be sorted.
I could implement a flag to indicate that the input is sorted and otherwise correct but then the problem is that a user could actually produce an invalid graph. I think, if someone really need to optimize for the last bits of performance, then they are better of by creating fadjlist and badjlist themself. I tested that by writing a function that builds a complete graph and for 10^4 vertices I am a factor of 2.5 times faster than by creating a vector of edges and calling SimpleGraph with sorting disabled.
Then I wrote some code to see what the performance is, if we don't sort/don't check for sorted input:
Here are the results for 100 and 10^4 vertices:
julia> run_speed_tests(100; ntimes=50)
Sorted Input:
SimpleGraph : mean == 0.000229s std == 1.61e-5s
SimpleGraph (no checks) : mean == 0.00258s std == 0.0174s
SimpleGraph (always sort) : mean == 0.000119s std == 2.1e-5s
Shuffled Input:
SimpleGraph : mean == 0.000307s std == 2.2e-5s
SimpleGraph (always sort) : mean == 0.000325s std == 2.03e-5s
julia> run_speed_tests(10000; ntimes=50)
Sorted Input:
SimpleGraph : mean == 1.44s std == 0.0452s
SimpleGraph (no checks) : mean == 1.16s std == 0.0314s
SimpleGraph (always sort) : mean == 1.97s std == 0.0348s
Shuffled Input:
SimpleGraph : mean == 7.16s std == 0.079s
SimpleGraph (always sort) : mean == 7.26s std == 0.068s
-
Sorted input constructs a complete graph from a sorted list of edges
-
Shuffled Input constructs the same graph when the list is randomly shuffled
-
Simple graph is the constructor form the pr
-
Simple graph (no checks) assumes the input is sorted, has no duplicates and the vertices of each edge have values >= 1.
-
Simple graph (always sort) does not check if the vertices are sorted and just sorts them
As you can see, if the input is random, then issort
does not make the code much slower, at least not for large graphs. If the input is already sorted, then using issort
is faster than just sorting. In this case, using a flag that indicates sorted input gives some performance improvements but as I said, I'm not sure if it is worth the potential errors that improper usage could create.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The numbers don't lie. Thank you for checking. issorted
will be at worst an O(n) operation, right?
I agree that the performance difference is not significant enough to worry about at this point. Carry on!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, O(n) where n is the number of input edges.
src/SimpleGraphs/simpledigraph.jl
Outdated
g = SimpleDiGraph{T}() | ||
g.fadjlist = fadjlist | ||
g.badjlist = badjlist | ||
g.ne = cleanupedges!(fadjlist, badjlist) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You're calling cleanupedges!
here, which sorts and dedups fadjlist and badjlist, but you've already assigned them to g
. The issue is that the adjacency lists of a SimpleGraph are presumed to be sorted and deduplicated. So if they're unsorted/redundant at the time you assign them to g
, then that's wrong. If they're sorted/unique, then you don't need to re-sort. Since this is likely modifying the backing array (and changing g.fadjlist
which is a reference to it), it should probably come first just for clarity.
…ector or an iterator of edges
…ly/LightGraphs.jl into edgelist_constructor
cc @mschauer, @somil55 and @jpfairbanks for reviews / comments, please? |
Nice. I'll have a look. Could you do performance comparisons with my implementation? |
Your solution is of course more general, so I had to write a wrapper around it:
Then I added some benchmarks for ordered and unordered lists of edges of a complete directed graph
Here are the median running times for n = 70 and n = 7000.
So as long as the data is not ordered and an arbitrary iterator, your solution has the same running time as mine. |
function SimpleDiGraph(edge_list::Vector{SimpleDiGraphEdge{T}}) where T<:Integer | ||
nvg = zero(T) | ||
@inbounds( | ||
for e in edge_list |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Iterates twice over edge_list
, this perhaps the right thing to do as it is known to be a vector.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have to do that in order to preallocate the adjacency list as this is much faster.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you.
* removed flow algorithms (sbromberger#815) * fixes sbromberger#820, plus tests (sbromberger#821) * change show() for empty graphs (ref JuliaGraphs/MetaGraphs.jl#20 (comment)) (sbromberger#822) * Pull request clique_percolation.jl (sbromberger#826) clique percolation is a method of uncovering the overlapping community structure of complex networks in nature and society * add src/community/clique_percolation.jl * tests in file test/community/clique_percolation.jl * cites the original clique percolation paper * for undirected graphs only using traitfn * in_ / out_ -> in / out (sbromberger#830) * (in,out)_neighbors -> (in,out)neighbors * all_neighbors -> allneighbors * Pull request clique_percolation.jl (sbromberger#826) clique percolation is a method of uncovering the overlapping community structure of complex networks in nature and society * add src/community/clique_percolation.jl * tests in file test/community/clique_percolation.jl * cites the original clique percolation paper * for undirected graphs only using traitfn * revert allneighbors * expected_degree_graph (Chung-Lu model) (sbromberger#832) * Expected degree random graph generator implemented, including tests * algorithm corrected * Missing seed corrected in expected_degree_graph * expected_degree_graph! implemented * Added return in function, comment with references removed, references in docs added (expected_degree_graph) * Update randgraphs.jl minor doc update * Update randgraphs.jl * Fixing problems with MST Kruskal's on SimpleWeightedGraphs (sbromberger#835) * Update kruskal.jl * Update prim.jl * Update kruskal.jl * Update kruskal.jl * Update prim.jl * Update kruskal.jl * Update prim.jl * Update kruskal.jl * Update prim.jl * reverting changes * Revert "reverting changes" This reverts commit ac1760b. * Revert "Update prim.jl" This reverts commit 677f6fa. * Revert "Update kruskal.jl" This reverts commit a0e9c47. * Revert "Update prim.jl" This reverts commit 793bac4. * Revert "Update kruskal.jl" This reverts commit 6114e16. * Revert "Update prim.jl" This reverts commit 551f1e6. * Revert "Update kruskal.jl" This reverts commit 941005e. * Revert "Update kruskal.jl" This reverts commit a404514. * Revert "Update prim.jl" This reverts commit 2d43a60. * Revert "Update kruskal.jl" This reverts commit 4577920. * fix problems with SimpleWeightedGraphs * fix problems with SimpleWeightedGraphs * fix problems with SimpleWeightedGraphs * bipartite_map on 2-order graphs fixed. Added proper tests. Fixed test connected to bipartite_map (sbromberger#836) * Correct pre-allocation of memory in Prim's MST (sbromberger#839) * Improve Kruskal's MST by optimizing Union-Find (sbromberger#843) * add missing backtick (sbromberger#846) * Add greedy_color for Graph Coloring (sbromberger#844) * Add greedy_color for Graph Coloring * Improve Kruskal's MST by optimizing Union-Find (sbromberger#843) * Update README.md * Update README.md * first cut at 0.7 compatibility (sbromberger#848) * using LightGraphs does not error * Switch to LinearAlgebra and SparseArrays std libs * Fix most of linalg tests * Add SharedArrays for distance tests to compile * Add Random and Markdown to stdlibs used * Fix connectivity tests * IntSet -> BitSet * Add DelimitedFiles stdlib for readcsv * Fix failing test * first cut * Use mauro/SimpleTraits.jl/m3/where-fns in tests * Fix SimpleTraits checkout (sbromberger#851) * Move up SimpleTraits checkout (sbromberger#852) * Update runtests.jl * Update REQUIRE * Update REQUIRE * femtocleaner with additional depwarn fixes (sbromberger#856) fix deprecation warnings based on local femtocleaner run * use equalto in degeneracy.jl (sbromberger#858) * fix depwarns in linalg submodule (sbromberger#860) * update linalg/spectral to fix deprecations * fix depwarns in graphmatrices * fixes doc deprecation warnings (sbromberger#861) * fixes doc deprecation warnings * adding Base64 to runtests * Update README.md * remove add/remove vertex/edge from core, minor bug fix (sbromberger#862) * remove add/remove vertex/edge from core, minor bug fix * fix tests * export add/rem vertex * remove long-term deprecation warnings (sbromberger#863) * uninitialized -> undef, blkdiag -> blockdiag, and removed import of d… (sbromberger#865) * uninitialized -> undef, blkdiag -> blockdiag, and removed import of deprecated functions from LG to LinAlg * test coverage for digraph eltype * removes equalto (sbromberger#867) * optional sorting algorithm for gdistances (sbromberger#868) add the ability to pass RadixSort to gdistances! * update url and mention directed graphs explicilty (sbromberger#837) * update url and mention directed graphs explicilty * Update graphtypes.md * Update graphtypes.md fixed references. * Speed improvements for function transitiveclosure! (sbromberger#870) * Speed improvements for function transitiveclosure! Instead of checking for all paths i -> k and k -> j for a given vertex k we only iterate over the in- and outneighbours of k. * Merged some conditionals into a single statement * Cache efficient Floyd Warshall (sbromberger#873) * Update floyd-warshall.jl * Cache efficient Floyd Warshall * fixed an error where smallgraph(:frucht) had 20 vertices instead of 12 (sbromberger#878) * Delete .Rhistory * Added function transitivereduction (sbromberger#877) * added function transitivereduction * Update transitivity.jl docstring formatting * Fixed some tests && added testdigraphs for all tests * Johnson Shortest Path for Sparse Graphs (sbromberger#884) * Johnson Shortest Path for Sparse Graphs Johnson Shortest Path for Sparse Graphs * Improved memory efficiency if distmx is mutable * Improved memory efficiency for parallel implementation * Update index.md * Added constructors to create graphs from a vector or an iterator of edges (sbromberger#880) * Added constructors to create SimpleGraphs and SimpleDiGraphs from a vector or an iterator of edges * Added constructors to create SimpleGraphs and SimpleDiGraphs from a vector or an iterator of edges * Slyles1001/892 (sbromberger#894) * comments are your friend * Most of LightGraphs warnings are fixed * Delete HITS.jl * Slyles1001/872 (sbromberger#891) * DataStructures fixed * missed heappop!, now it tests clean * spaces * Update LightGraphs.jl * Update runtests.jl * fixes most depwarns as of 20180529 (sbromberger#895) * fixes most depwarns as of 20180529 * graphmatrices problems * remove tabs * tabs, again * Update CONTRIBUTING.md (sbromberger#897) * Improve Kruskal and use in-built disjoint set data structure (sbromberger#896) * Improve Kruskal and use in-built disjoint set data structure * Update kruskal.jl Changes requested by @somil55
…dges (#880) * Added constructors to create SimpleGraphs and SimpleDiGraphs from a vector or an iterator of edges * Added constructors to create SimpleGraphs and SimpleDiGraphs from a vector or an iterator of edges
* ignore coverage file * merge master (#2) * removed flow algorithms (#815) * fixes #820, plus tests (#821) * change show() for empty graphs (ref JuliaGraphs/MetaGraphs.jl#20 (comment)) (#822) * Pull request clique_percolation.jl (#826) clique percolation is a method of uncovering the overlapping community structure of complex networks in nature and society * add src/community/clique_percolation.jl * tests in file test/community/clique_percolation.jl * cites the original clique percolation paper * for undirected graphs only using traitfn * in_ / out_ -> in / out (#830) * (in,out)_neighbors -> (in,out)neighbors * all_neighbors -> allneighbors * Pull request clique_percolation.jl (#826) clique percolation is a method of uncovering the overlapping community structure of complex networks in nature and society * add src/community/clique_percolation.jl * tests in file test/community/clique_percolation.jl * cites the original clique percolation paper * for undirected graphs only using traitfn * revert allneighbors * expected_degree_graph (Chung-Lu model) (#832) * Expected degree random graph generator implemented, including tests * algorithm corrected * Missing seed corrected in expected_degree_graph * expected_degree_graph! implemented * Added return in function, comment with references removed, references in docs added (expected_degree_graph) * Update randgraphs.jl minor doc update * Update randgraphs.jl * Fixing problems with MST Kruskal's on SimpleWeightedGraphs (#835) * Update kruskal.jl * Update prim.jl * Update kruskal.jl * Update kruskal.jl * Update prim.jl * Update kruskal.jl * Update prim.jl * Update kruskal.jl * Update prim.jl * reverting changes * Revert "reverting changes" This reverts commit ac1760b. * Revert "Update prim.jl" This reverts commit 677f6fa. * Revert "Update kruskal.jl" This reverts commit a0e9c47. * Revert "Update prim.jl" This reverts commit 793bac4. * Revert "Update kruskal.jl" This reverts commit 6114e16. * Revert "Update prim.jl" This reverts commit 551f1e6. * Revert "Update kruskal.jl" This reverts commit 941005e. * Revert "Update kruskal.jl" This reverts commit a404514. * Revert "Update prim.jl" This reverts commit 2d43a60. * Revert "Update kruskal.jl" This reverts commit 4577920. * fix problems with SimpleWeightedGraphs * fix problems with SimpleWeightedGraphs * fix problems with SimpleWeightedGraphs * bipartite_map on 2-order graphs fixed. Added proper tests. Fixed test connected to bipartite_map (#836) * Correct pre-allocation of memory in Prim's MST (#839) * Improve Kruskal's MST by optimizing Union-Find (#843) * add missing backtick (#846) * Add greedy_color for Graph Coloring (#844) * Add greedy_color for Graph Coloring * Improve Kruskal's MST by optimizing Union-Find (#843) * Update README.md * Update README.md * first cut at 0.7 compatibility (#848) * using LightGraphs does not error * Switch to LinearAlgebra and SparseArrays std libs * Fix most of linalg tests * Add SharedArrays for distance tests to compile * Add Random and Markdown to stdlibs used * Fix connectivity tests * IntSet -> BitSet * Add DelimitedFiles stdlib for readcsv * Fix failing test * first cut * Use mauro/SimpleTraits.jl/m3/where-fns in tests * Fix SimpleTraits checkout (#851) * Move up SimpleTraits checkout (#852) * Update runtests.jl * Update REQUIRE * Update REQUIRE * femtocleaner with additional depwarn fixes (#856) fix deprecation warnings based on local femtocleaner run * use equalto in degeneracy.jl (#858) * fix depwarns in linalg submodule (#860) * update linalg/spectral to fix deprecations * fix depwarns in graphmatrices * fixes doc deprecation warnings (#861) * fixes doc deprecation warnings * adding Base64 to runtests * Update README.md * remove add/remove vertex/edge from core, minor bug fix (#862) * remove add/remove vertex/edge from core, minor bug fix * fix tests * export add/rem vertex * remove long-term deprecation warnings (#863) * uninitialized -> undef, blkdiag -> blockdiag, and removed import of d… (#865) * uninitialized -> undef, blkdiag -> blockdiag, and removed import of deprecated functions from LG to LinAlg * test coverage for digraph eltype * removes equalto (#867) * optional sorting algorithm for gdistances (#868) add the ability to pass RadixSort to gdistances! * update url and mention directed graphs explicilty (#837) * update url and mention directed graphs explicilty * Update graphtypes.md * Update graphtypes.md fixed references. * Speed improvements for function transitiveclosure! (#870) * Speed improvements for function transitiveclosure! Instead of checking for all paths i -> k and k -> j for a given vertex k we only iterate over the in- and outneighbours of k. * Merged some conditionals into a single statement * Cache efficient Floyd Warshall (#873) * Update floyd-warshall.jl * Cache efficient Floyd Warshall * fixed an error where smallgraph(:frucht) had 20 vertices instead of 12 (#878) * Delete .Rhistory * Added function transitivereduction (#877) * added function transitivereduction * Update transitivity.jl docstring formatting * Fixed some tests && added testdigraphs for all tests * Johnson Shortest Path for Sparse Graphs (#884) * Johnson Shortest Path for Sparse Graphs Johnson Shortest Path for Sparse Graphs * Improved memory efficiency if distmx is mutable * Improved memory efficiency for parallel implementation * Update index.md * Added constructors to create graphs from a vector or an iterator of edges (#880) * Added constructors to create SimpleGraphs and SimpleDiGraphs from a vector or an iterator of edges * Added constructors to create SimpleGraphs and SimpleDiGraphs from a vector or an iterator of edges * Slyles1001/892 (#894) * comments are your friend * Most of LightGraphs warnings are fixed * Delete HITS.jl * Slyles1001/872 (#891) * DataStructures fixed * missed heappop!, now it tests clean * spaces * Update LightGraphs.jl * Update runtests.jl * fixes most depwarns as of 20180529 (#895) * fixes most depwarns as of 20180529 * graphmatrices problems * remove tabs * tabs, again * Update CONTRIBUTING.md (#897) * Improve Kruskal and use in-built disjoint set data structure (#896) * Improve Kruskal and use in-built disjoint set data structure * Update kruskal.jl Changes requested by @somil55 * updated syntax for iterator protocol * fixed iterator, broken inference
As mentioned in #875 , I have created the following constructors:
Some shortcomings of my approach:
SimpleGraph(edge_list)
andSimpleDiGraph(edge_list)
first evaluate how much memory to allocate. For this, they have to temporarily allocate some memory in the order of O(|V|).SimpleGraphFromIterator(iter)
andSimpleDiGraphFromIterator(iter)
can not be inferred. The problem is, that one cannot easily infer the element type of aGenerator.
If one looks at how the constructor forSet
is implemented, they can see that is somehow possible, but this is not trivial and goes beyond my knowledge of Julia's internals.