Skip to content
This repository has been archived by the owner on Oct 8, 2021. It is now read-only.

Added constructors to create graphs from a vector or an iterator of edges #880

Merged
merged 5 commits into from
May 23, 2018

Conversation

simonschoelly
Copy link
Contributor

As mentioned in #875 , I have created the following constructors:

SimpleGraph(edge_list::Vector{SimpleGraphEdge{T}}) where T<:Integer
SimpleGraphFromIterator(iter)
SimpleDiGraph(edge_list::Vector{SimpleDiGraphEdge{T}}) where T<:Integer
SimpleDiGraphFromIterator(iter)

Some shortcomings of my approach:

  1. SimpleGraph(edge_list) and SimpleDiGraph(edge_list) first evaluate how much memory to allocate. For this, they have to temporarily allocate some memory in the order of O(|V|).
  2. The return type of SimpleGraphFromIterator(iter) and SimpleDiGraphFromIterator(iter) can not be inferred. The problem is, that one cannot easily infer the element type of a Generator. If one looks at how the constructor for Set is implemented, they can see that is somehow possible, but this is not trivial and goes beyond my knowledge of Julia's internals.
  3. The constructors are conservative and assume that the input is neither sorted nor free of duplicates. As the constructors probably spend most of their time allocating memory, this should not be that big of a problem though.

@codecov
Copy link

codecov bot commented May 11, 2018

Codecov Report

Merging #880 into master will increase coverage by 0.01%.
The diff coverage is 100%.

@@            Coverage Diff             @@
##           master     #880      +/-   ##
==========================================
+ Coverage   99.67%   99.69%   +0.01%     
==========================================
  Files          60       60              
  Lines        2160     2281     +121     
==========================================
+ Hits         2153     2274     +121     
  Misses          7        7

Copy link
Owner

@sbromberger sbromberger left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

More comments later. Thanks.

neg = 0
for v in 1:length(fadjlist)
if !issorted(fadjlist[v])
sort!(fadjlist[v])
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this check can be expensive. Might as well just sort and be done with it, or pass in a boolean argument to denote whether we can assume the vectors are sorted.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think calling issorted is that expensive; especially compared to sort! . When I wrote that function I used the profiler to check where the bottlenecks are and issort was definitely not the problem. unique! actually also calls this function, which is unnecessary in our case as we know that the vector must be sorted.

I could implement a flag to indicate that the input is sorted and otherwise correct but then the problem is that a user could actually produce an invalid graph. I think, if someone really need to optimize for the last bits of performance, then they are better of by creating fadjlist and badjlist themself. I tested that by writing a function that builds a complete graph and for 10^4 vertices I am a factor of 2.5 times faster than by creating a vector of edges and calling SimpleGraph with sorting disabled.

Then I wrote some code to see what the performance is, if we don't sort/don't check for sorted input:
Here are the results for 100 and 10^4 vertices:

julia> run_speed_tests(100; ntimes=50)
Sorted Input:
  SimpleGraph                  : mean == 0.000229s  std == 1.61e-5s
  SimpleGraph (no checks)       : mean == 0.00258s  std == 0.0174s
  SimpleGraph (always sort)    : mean == 0.000119s  std == 2.1e-5s
Shuffled Input:
  SimpleGraph                  : mean == 0.000307s  std == 2.2e-5s
  SimpleGraph (always sort)    : mean == 0.000325s  std == 2.03e-5s
julia> run_speed_tests(10000; ntimes=50)
Sorted Input:
  SimpleGraph                  : mean == 1.44s  std == 0.0452s
  SimpleGraph (no checks)       : mean == 1.16s  std == 0.0314s
  SimpleGraph (always sort)    : mean == 1.97s  std == 0.0348s
Shuffled Input:
  SimpleGraph                  : mean == 7.16s  std == 0.079s
  SimpleGraph (always sort)    : mean == 7.26s  std == 0.068s
  • Sorted input constructs a complete graph from a sorted list of edges

  • Shuffled Input constructs the same graph when the list is randomly shuffled

  • Simple graph is the constructor form the pr

  • Simple graph (no checks) assumes the input is sorted, has no duplicates and the vertices of each edge have values >= 1.

  • Simple graph (always sort) does not check if the vertices are sorted and just sorts them

As you can see, if the input is random, then issort does not make the code much slower, at least not for large graphs. If the input is already sorted, then using issort is faster than just sorting. In this case, using a flag that indicates sorted input gives some performance improvements but as I said, I'm not sure if it is worth the potential errors that improper usage could create.

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The numbers don't lie. Thank you for checking. issorted will be at worst an O(n) operation, right?

I agree that the performance difference is not significant enough to worry about at this point. Carry on!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, O(n) where n is the number of input edges.

g = SimpleDiGraph{T}()
g.fadjlist = fadjlist
g.badjlist = badjlist
g.ne = cleanupedges!(fadjlist, badjlist)
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're calling cleanupedges! here, which sorts and dedups fadjlist and badjlist, but you've already assigned them to g. The issue is that the adjacency lists of a SimpleGraph are presumed to be sorted and deduplicated. So if they're unsorted/redundant at the time you assign them to g, then that's wrong. If they're sorted/unique, then you don't need to re-sort. Since this is likely modifying the backing array (and changing g.fadjlist which is a reference to it), it should probably come first just for clarity.

@sbromberger
Copy link
Owner

cc @mschauer, @somil55 and @jpfairbanks for reviews / comments, please?

@mschauer
Copy link
Contributor

Nice. I'll have a look. Could you do performance comparisons with my implementation?

@simonschoelly
Copy link
Contributor Author

Your solution is of course more general, so I had to write a wrapper around it:

using LightGraphs
using Random
using BenchmarkTools

function SimpleDiGraphAddEdges(edge_list::Vector{Tuple{T, T}}) where T<:Integer
    g = SimpleDiGraph{T}()
    add_edges!(g, edge_list)
    return g
end

Then I added some benchmarks for ordered and unordered lists of edges of a complete directed graph

el = [e for e in edges(CompleteDiGraph(n))];
el_tuple = Tuple.(el);
el_shuffle = shuffle(el);
el_shuffle_tuple = Tuple.(el_shuffle);

@benchmark SimpleDiGraph(el)
@benchmark SimpleDiGraphFromIterator(el)
@benchmark SimpleDiGraphAddeEdges(el_tuple)
@benchmark SimpleDiGraph(el_shuffle)
@benchmark SimpleDiGraphFromIterator(el_shuffle)
@benchmark SimpleDiGraph(el_shuffle_tuple)

Here are the median running times for n = 70 and n = 7000.

n = 70

SimpleDiGraph(el)                      :  58.939 μs
SimpleDiGraphFromIterator(el)          : 129.210 μs
SimpleDiGraphAddEdges(el_tuple)        : 204.505 μs
SimpleDiGraph(el_shuffle)                : 279.022 μs
SimpleDiGraphFromIterator(el_shuffle)    : 360.221 μs
SimpleDiGraphAddEdges(el_shuffle_tuple)  : 384.589 μs

n = 7000

SimpleDiGraph(el)                      : 1.568 s
SimpleDiGraphFromIterator(el)          : 2.531 s
SimpleDiGraphAddEdges(el_tuple)        : 3.382 s
SimpleDiGraph(el_shuffle)                : 7.542 s
SimpleDiGraphFromIterator(el_shuffle)    : 9.068 s
SimpleDiGraphAddEdges(el_shuffle_tuple)  : 9.444 s

So as long as the data is not ordered and an arbitrary iterator, your solution has the same running time as mine.
Instead of append!(g.fadjlist[s], d) you could have used push!(g.fadjlist[s], d) but I don't know if that would make any difference.

function SimpleDiGraph(edge_list::Vector{SimpleDiGraphEdge{T}}) where T<:Integer
nvg = zero(T)
@inbounds(
for e in edge_list
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Iterates twice over edge_list, this perhaps the right thing to do as it is known to be a vector.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have to do that in order to preallocate the adjacency list as this is much faster.

Copy link
Contributor

@mschauer mschauer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you.

@sbromberger sbromberger merged commit 3e5053a into sbromberger:master May 23, 2018
matbesancon added a commit to matbesancon/LightGraphs.jl that referenced this pull request Jun 5, 2018
* removed flow algorithms (sbromberger#815)

* fixes sbromberger#820, plus tests (sbromberger#821)

* change show() for empty graphs (ref JuliaGraphs/MetaGraphs.jl#20 (comment)) (sbromberger#822)

* Pull request clique_percolation.jl (sbromberger#826)

clique percolation is a method of uncovering the overlapping community structure of complex networks in nature and society

* add src/community/clique_percolation.jl
* tests in file test/community/clique_percolation.jl
* cites the original clique percolation paper
* for undirected graphs only using traitfn

* in_ / out_ -> in / out (sbromberger#830)

* (in,out)_neighbors -> (in,out)neighbors

* all_neighbors -> allneighbors

* Pull request clique_percolation.jl (sbromberger#826)

clique percolation is a method of uncovering the overlapping community structure of complex networks in nature and society

* add src/community/clique_percolation.jl
* tests in file test/community/clique_percolation.jl
* cites the original clique percolation paper
* for undirected graphs only using traitfn

* revert allneighbors

* expected_degree_graph (Chung-Lu model) (sbromberger#832)

* Expected degree random graph generator implemented, including tests

* algorithm corrected

* Missing seed corrected in expected_degree_graph

* expected_degree_graph! implemented

* Added return in function, comment with references removed, references in docs added (expected_degree_graph)

* Update randgraphs.jl

minor doc update

* Update randgraphs.jl

* Fixing problems with MST Kruskal's on SimpleWeightedGraphs (sbromberger#835)

* Update kruskal.jl

* Update prim.jl

* Update kruskal.jl

* Update kruskal.jl

* Update prim.jl

* Update kruskal.jl

* Update prim.jl

* Update kruskal.jl

* Update prim.jl

* reverting changes

* Revert "reverting changes"

This reverts commit ac1760b.

* Revert "Update prim.jl"

This reverts commit 677f6fa.

* Revert "Update kruskal.jl"

This reverts commit a0e9c47.

* Revert "Update prim.jl"

This reverts commit 793bac4.

* Revert "Update kruskal.jl"

This reverts commit 6114e16.

* Revert "Update prim.jl"

This reverts commit 551f1e6.

* Revert "Update kruskal.jl"

This reverts commit 941005e.

* Revert "Update kruskal.jl"

This reverts commit a404514.

* Revert "Update prim.jl"

This reverts commit 2d43a60.

* Revert "Update kruskal.jl"

This reverts commit 4577920.

* fix problems with SimpleWeightedGraphs

* fix problems with SimpleWeightedGraphs

* fix problems with SimpleWeightedGraphs

* bipartite_map on 2-order graphs fixed. Added proper tests. Fixed test connected to bipartite_map (sbromberger#836)

* Correct pre-allocation of memory in Prim's MST (sbromberger#839)

* Improve Kruskal's MST by optimizing Union-Find (sbromberger#843)

* add missing backtick (sbromberger#846)

* Add greedy_color for Graph Coloring (sbromberger#844)

* Add greedy_color for Graph Coloring

* Improve Kruskal's MST by optimizing Union-Find (sbromberger#843)

* Update README.md

* Update README.md

* first cut at 0.7 compatibility (sbromberger#848)

* using LightGraphs does not error

* Switch to LinearAlgebra and SparseArrays std libs

* Fix most of linalg tests

* Add SharedArrays for distance tests to compile

* Add Random and Markdown to stdlibs used

* Fix connectivity tests

* IntSet -> BitSet

* Add DelimitedFiles stdlib for readcsv

* Fix failing test

* first cut

* Use mauro/SimpleTraits.jl/m3/where-fns in tests

* Fix SimpleTraits checkout (sbromberger#851)

* Move up SimpleTraits checkout (sbromberger#852)

* Update runtests.jl

* Update REQUIRE

* Update REQUIRE

* femtocleaner with additional depwarn fixes (sbromberger#856)

fix deprecation warnings based on local femtocleaner run

* use equalto in degeneracy.jl (sbromberger#858)

* fix depwarns in linalg submodule (sbromberger#860)

* update linalg/spectral to fix deprecations

* fix depwarns in graphmatrices

* fixes doc deprecation warnings (sbromberger#861)

* fixes doc deprecation warnings

* adding Base64 to runtests

* Update README.md

* remove add/remove vertex/edge from core, minor bug fix (sbromberger#862)

* remove add/remove vertex/edge from core, minor bug fix

* fix tests

* export add/rem vertex

* remove long-term deprecation warnings (sbromberger#863)

* uninitialized -> undef, blkdiag -> blockdiag, and removed import of d… (sbromberger#865)

* uninitialized -> undef, blkdiag -> blockdiag, and removed import of deprecated functions from LG to LinAlg

* test coverage for digraph eltype

* removes equalto (sbromberger#867)

* optional sorting algorithm for gdistances (sbromberger#868)

add the ability to pass RadixSort to gdistances!

* update url and mention directed graphs explicilty (sbromberger#837)

* update url and mention directed graphs explicilty

* Update graphtypes.md

* Update graphtypes.md

fixed references.

* Speed improvements for function transitiveclosure! (sbromberger#870)

* Speed improvements for function transitiveclosure!

Instead of checking for all paths i -> k and k -> j for a given vertex k
we only iterate over the in- and outneighbours of k.

* Merged some conditionals into a single statement

* Cache efficient Floyd Warshall (sbromberger#873)

* Update floyd-warshall.jl

* Cache efficient Floyd Warshall

* fixed an error where smallgraph(:frucht) had 20 vertices instead of 12 (sbromberger#878)

* Delete .Rhistory

* Added function transitivereduction (sbromberger#877)

* added function transitivereduction

* Update transitivity.jl

docstring formatting

* Fixed some tests && added testdigraphs for all tests

* Johnson Shortest Path for Sparse Graphs (sbromberger#884)

* Johnson Shortest Path for Sparse Graphs

Johnson Shortest Path for Sparse Graphs

* Improved memory efficiency if distmx is mutable

* Improved memory efficiency for parallel implementation

* Update index.md

* Added constructors to create graphs from a vector or an iterator of edges (sbromberger#880)

* Added constructors to create SimpleGraphs and SimpleDiGraphs from a vector or an iterator of edges

* Added constructors to create SimpleGraphs and SimpleDiGraphs from a vector or an iterator of edges

* Slyles1001/892 (sbromberger#894)

* comments are your friend

* Most of LightGraphs warnings are fixed

* Delete HITS.jl

* Slyles1001/872 (sbromberger#891)

* DataStructures fixed

* missed heappop!, now it tests clean

* spaces

* Update LightGraphs.jl

* Update runtests.jl

* fixes most depwarns as of 20180529 (sbromberger#895)

* fixes most depwarns as of 20180529

* graphmatrices problems

* remove tabs

* tabs, again

* Update CONTRIBUTING.md (sbromberger#897)

* Improve Kruskal and use in-built disjoint set data structure (sbromberger#896)

* Improve Kruskal and use in-built disjoint set data structure

* Update kruskal.jl

Changes requested by @somil55
sbromberger pushed a commit that referenced this pull request Jun 5, 2018
…dges (#880)

* Added constructors to create SimpleGraphs and SimpleDiGraphs from a vector or an iterator of edges

* Added constructors to create SimpleGraphs and SimpleDiGraphs from a vector or an iterator of edges
sbromberger pushed a commit that referenced this pull request Jun 14, 2018
* ignore coverage file

* merge master (#2)

* removed flow algorithms (#815)

* fixes #820, plus tests (#821)

* change show() for empty graphs (ref JuliaGraphs/MetaGraphs.jl#20 (comment)) (#822)

* Pull request clique_percolation.jl (#826)

clique percolation is a method of uncovering the overlapping community structure of complex networks in nature and society

* add src/community/clique_percolation.jl
* tests in file test/community/clique_percolation.jl
* cites the original clique percolation paper
* for undirected graphs only using traitfn

* in_ / out_ -> in / out (#830)

* (in,out)_neighbors -> (in,out)neighbors

* all_neighbors -> allneighbors

* Pull request clique_percolation.jl (#826)

clique percolation is a method of uncovering the overlapping community structure of complex networks in nature and society

* add src/community/clique_percolation.jl
* tests in file test/community/clique_percolation.jl
* cites the original clique percolation paper
* for undirected graphs only using traitfn

* revert allneighbors

* expected_degree_graph (Chung-Lu model) (#832)

* Expected degree random graph generator implemented, including tests

* algorithm corrected

* Missing seed corrected in expected_degree_graph

* expected_degree_graph! implemented

* Added return in function, comment with references removed, references in docs added (expected_degree_graph)

* Update randgraphs.jl

minor doc update

* Update randgraphs.jl

* Fixing problems with MST Kruskal's on SimpleWeightedGraphs (#835)

* Update kruskal.jl

* Update prim.jl

* Update kruskal.jl

* Update kruskal.jl

* Update prim.jl

* Update kruskal.jl

* Update prim.jl

* Update kruskal.jl

* Update prim.jl

* reverting changes

* Revert "reverting changes"

This reverts commit ac1760b.

* Revert "Update prim.jl"

This reverts commit 677f6fa.

* Revert "Update kruskal.jl"

This reverts commit a0e9c47.

* Revert "Update prim.jl"

This reverts commit 793bac4.

* Revert "Update kruskal.jl"

This reverts commit 6114e16.

* Revert "Update prim.jl"

This reverts commit 551f1e6.

* Revert "Update kruskal.jl"

This reverts commit 941005e.

* Revert "Update kruskal.jl"

This reverts commit a404514.

* Revert "Update prim.jl"

This reverts commit 2d43a60.

* Revert "Update kruskal.jl"

This reverts commit 4577920.

* fix problems with SimpleWeightedGraphs

* fix problems with SimpleWeightedGraphs

* fix problems with SimpleWeightedGraphs

* bipartite_map on 2-order graphs fixed. Added proper tests. Fixed test connected to bipartite_map (#836)

* Correct pre-allocation of memory in Prim's MST (#839)

* Improve Kruskal's MST by optimizing Union-Find (#843)

* add missing backtick (#846)

* Add greedy_color for Graph Coloring (#844)

* Add greedy_color for Graph Coloring

* Improve Kruskal's MST by optimizing Union-Find (#843)

* Update README.md

* Update README.md

* first cut at 0.7 compatibility (#848)

* using LightGraphs does not error

* Switch to LinearAlgebra and SparseArrays std libs

* Fix most of linalg tests

* Add SharedArrays for distance tests to compile

* Add Random and Markdown to stdlibs used

* Fix connectivity tests

* IntSet -> BitSet

* Add DelimitedFiles stdlib for readcsv

* Fix failing test

* first cut

* Use mauro/SimpleTraits.jl/m3/where-fns in tests

* Fix SimpleTraits checkout (#851)

* Move up SimpleTraits checkout (#852)

* Update runtests.jl

* Update REQUIRE

* Update REQUIRE

* femtocleaner with additional depwarn fixes (#856)

fix deprecation warnings based on local femtocleaner run

* use equalto in degeneracy.jl (#858)

* fix depwarns in linalg submodule (#860)

* update linalg/spectral to fix deprecations

* fix depwarns in graphmatrices

* fixes doc deprecation warnings (#861)

* fixes doc deprecation warnings

* adding Base64 to runtests

* Update README.md

* remove add/remove vertex/edge from core, minor bug fix (#862)

* remove add/remove vertex/edge from core, minor bug fix

* fix tests

* export add/rem vertex

* remove long-term deprecation warnings (#863)

* uninitialized -> undef, blkdiag -> blockdiag, and removed import of d… (#865)

* uninitialized -> undef, blkdiag -> blockdiag, and removed import of deprecated functions from LG to LinAlg

* test coverage for digraph eltype

* removes equalto (#867)

* optional sorting algorithm for gdistances (#868)

add the ability to pass RadixSort to gdistances!

* update url and mention directed graphs explicilty (#837)

* update url and mention directed graphs explicilty

* Update graphtypes.md

* Update graphtypes.md

fixed references.

* Speed improvements for function transitiveclosure! (#870)

* Speed improvements for function transitiveclosure!

Instead of checking for all paths i -> k and k -> j for a given vertex k
we only iterate over the in- and outneighbours of k.

* Merged some conditionals into a single statement

* Cache efficient Floyd Warshall (#873)

* Update floyd-warshall.jl

* Cache efficient Floyd Warshall

* fixed an error where smallgraph(:frucht) had 20 vertices instead of 12 (#878)

* Delete .Rhistory

* Added function transitivereduction (#877)

* added function transitivereduction

* Update transitivity.jl

docstring formatting

* Fixed some tests && added testdigraphs for all tests

* Johnson Shortest Path for Sparse Graphs (#884)

* Johnson Shortest Path for Sparse Graphs

Johnson Shortest Path for Sparse Graphs

* Improved memory efficiency if distmx is mutable

* Improved memory efficiency for parallel implementation

* Update index.md

* Added constructors to create graphs from a vector or an iterator of edges (#880)

* Added constructors to create SimpleGraphs and SimpleDiGraphs from a vector or an iterator of edges

* Added constructors to create SimpleGraphs and SimpleDiGraphs from a vector or an iterator of edges

* Slyles1001/892 (#894)

* comments are your friend

* Most of LightGraphs warnings are fixed

* Delete HITS.jl

* Slyles1001/872 (#891)

* DataStructures fixed

* missed heappop!, now it tests clean

* spaces

* Update LightGraphs.jl

* Update runtests.jl

* fixes most depwarns as of 20180529 (#895)

* fixes most depwarns as of 20180529

* graphmatrices problems

* remove tabs

* tabs, again

* Update CONTRIBUTING.md (#897)

* Improve Kruskal and use in-built disjoint set data structure (#896)

* Improve Kruskal and use in-built disjoint set data structure

* Update kruskal.jl

Changes requested by @somil55

* updated syntax for iterator protocol

* fixed iterator, broken inference
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants