Implement complement for undirected graphs #237

gabrielelana · 2019-09-21T14:21:03Z

Regarding the last tests we don't know if it's ok for Arbitrary to generate graphs with loops

Fixes #21

Proudly made by @sphaso and @gabrielelana at "Open Source Saturday Milano"

snowleopard · 2019-09-22T21:20:47Z

Could you estimate the complexity of your implementation in terms of time and memory, as well as the size of the resulting expression? I suspect it can only handle very small graphs because of the repeated use of expensive removeEdge, but I may be wrong.

Of course, having an expensive algorithm is much better than having none, so I'm still happy to merge your PR, but I'd like to document the complexity.

we don't know if it's ok for Arbitrary to generate graphs with loops

I think the answer is Yes because we don't rule out self-loops.

It is not entirely clear how complement should handle self-loops. What about just not touching them at all? This allows you to keep the nice property complement . complement = id.

sphaso · 2019-09-23T07:07:36Z

Hi @snowleopard , thanks for getting back :)

I suspect it can only handle very small graphs because of the repeated use of expensive removeEdge, but I may be wrong.

If you deem removeEdge too expensive there are alternative strategies we considered: simply using \\ on the edgeList was one, going through the symmetricRelation sets was another. Our first idea was to XOR the adjancency matrix with a 1 mask, but we went for what we believed was the most "idiomatic" solution

we don't know if it's ok for Arbitrary to generate graphs with loops

I think the answer is Yes because we don't rule out self-loops.

We were indeed a bit puzzled by loops, because then one might start thinking about multigraphs, and the property complement . complement = id won't pass for those either. The only way to handle loops I can think of, would be to re-apply them on the clique, but that can't work for other types of multigraphs.

As you said, an expensive algorithm is better than none, so we're happy to analyze the complexity of complement and keep these as food for thought depending on how you feel about it.

snowleopard · 2019-09-24T22:43:36Z

If you deem removeEdge too expensive

It's linear w.r.t. to the size of the graph expression, which is quite expensive:

http://hackage.haskell.org/package/algebraic-graphs-0.4/docs/Algebra-Graph.html#v:removeEdge

simply using \\ on the edgeList

I think filtering existing edges may indeed be faster, although it would be nice to do some benchmarking. Could you compare your approach with \\?

We were indeed a bit puzzled by loops, because then one might start thinking about multigraphs

The basic Graph data type corresponds to "classic" graphs represented by a pair (V, E), where self-loops are allowed, so we should be able to handle self-loops.

The only way to handle loops I can think of, would be to re-apply them on the clique, but that can't work for other types of multigraphs.

Why don't you simply leave self-loops as is, without removing them? As far as I can see, this is the simplest option.

As you said, an expensive algorithm is better than none, so we're happy to analyze the complexity of complement

Great, please do! And some benchmarking would be nice too: I'm curious how your current implementation based on removeEdge compares to the one based on \\.

sphaso · 2019-09-28T15:33:37Z

Why don't you simply leave self-loops as is, without removing them? As far as I can see, this is the simplest option.

I feel there might be a critical piece of information regarding this library I'm missing. The problem I see is that:

Arbitrary Graph produces graphs with self-loops
clique does not (e.g. clique [x,y,z] == 'edges' [(x,y), (x,z), (y,z)])

So the situation is: I want complement . complement = id to be a valid property even for graphs with self-loops. However, if I use clique as the reference graph for the complement, I don't have the self-loops in the first place.
That's what I meant by "re-apply". If the pseudo code for the complement function is clique edges \\ current edges, I would have to take the self-loops from current edges and re-apply them after the difference operation.

We'll work on the benchmark ASAP.

sphaso · 2019-10-05T10:54:18Z

Hi @snowleopard ,
we finally ran the benchmark and you were right! Data.List.\\ is at least one order of magnitude faster than removeEdge.

Here you can find the benchmark code.
Here you can find the results.

For easier reference, these are the two differing implementations. We took some time to calculate the complexity of the faster one as well as handling loops:

complement :: Ord a => Graph a -> Graph a
complement g@(UG _) = overlay (edges loops) $ foldr (uncurry removeEdge) (cliqueG g) previousEdges
 where cliqueG = clique . vertexList
       previousEdges = edgeList g
       loops = filter (uncurry (==)) previousEdges

-- Complexity: /O(E^2+V)/ time, /O(E+V)/ memory where
-- E is the number of edges and V is the number of vertices
-- The quadratic bound is due to `edges`.
complement' :: Ord a => Graph a -> Graph a
complement' g@(UG _) = overlay (vertices allVertices) (edges complementEdges)
 where cliqueG = clique . vertexList
       allVertices = vertexList g
       previousEdges = edgeList g
       loops = filter (uncurry (==)) previousEdges
       complementEdges = loops ++ (edgeList (cliqueG g) \\ previousEdges)

We were curious about the time complexity of edges and we ran a second benchmark on it. The benchmark results seem to show that edges is actually linear with regard to time.
Here the benchmark code.
Here the results.

What do you think?

If you like our implementation we'll move it to the current PR.
Would you like to keep the benchmarks as well?

snowleopard · 2019-10-06T21:50:15Z

we finally ran the benchmark and you were right! Data.List.\ is at least one order of magnitude faster than removeEdge.

@sphaso Great, thank you for doing this!

We were curious about the time complexity of edges and we ran a second benchmark on it. The benchmark results seem to show that edges is actually linear with regard to time.

Yes, edges is indeed linear with respect to the length of the given list, as stated in the documentation:

http://hackage.haskell.org/package/algebraic-graphs-0.4/docs/Algebra-Graph.html#v:edges

Did you expect its complexity to be non-linear?

If you like our implementation we'll move it to the current PR.

Yes, let's use the faster implementation based on \\.

Responding to your earlier comment: yes, I think what you call "re-applying" will work. Conceptually, it's not very complex: you complement all non-loop edges and make sure to keep the original self-loops in the final graph.

snowleopard · 2019-10-06T21:51:10Z

Would you like to keep the benchmarks as well?

Regarding benchmarks: there is a separate repository for benchmarks, which you can find here:

https://github.com/haskell-perf/graphs

sphaso · 2019-10-07T09:07:22Z

Yes, edges is indeed linear with respect to the length of the given list [...] Did you expect its complexity to be non-linear?

When estimating the complexity of complement we looked at the docs of edges inside the Undirected module where it's stated:

Complexity: /O(L^2)/ time, /O(L)/ memory and size

We'll adjust our estimates of complement accordingly. Would you like us to amend this comment as well?

snowleopard · 2019-10-08T07:57:05Z

@sphaso Oh, that's just wrong. If you correct it to "Complexity: O(L) time, memory and size, where L is the length of the given list." right in this PR that would be great!

gabrielelana · 2019-10-26T08:41:43Z

@snowleopard we ported the faster implementation and amended the complexity of edges, can you check if everything is ok?

src/Algebra/Graph/Undirected.hs

snowleopard · 2019-10-28T22:57:14Z

@gabrielelana Many thanks! The implementation looks good, but see my comments regarding the docs; they should be easy to address.

src/Algebra/Graph/Undirected.hs

sphaso · 2019-10-31T11:05:29Z

@snowleopard I should have addressed all the points you mentioned in the latest comments. Good catch on the quadratic complexity!

src/Algebra/Graph/Undirected.hs

test/Algebra/Graph/Test/Undirected.hs

snowleopard · 2019-11-01T00:18:26Z

@sphaso Thanks for the fixes! The result doesn't seem completely right though, please see my comments.

sphaso · 2019-11-05T21:18:16Z

@snowleopard I pushed a new commit that takes care of at least one concern, see my answer above for the other (time \ memory estimates, for which I already updated it as m^2). Regarding tests, the fact that we're now preserving self-loops rendered two properties false. Good thing we swapped those fixed values :)
Namely:
complement (edge x y) == (vertices [x, y] :: UGI)
is no longer true if x == y.

snowleopard · 2019-11-05T22:58:45Z

Namely:
complement (edge x y) == (vertices [x, y] :: UGI)
is no longer true if x == y.

@sphaso Indeed :-) Thanks!

src/Algebra/Graph/Undirected.hs

sphaso · 2019-11-06T09:09:22Z

@snowleopard hopefully I got everything right this time :) thanks!

snowleopard · 2019-11-06T22:49:43Z

@sphaso Thank you, merged! :)

By the way, feel free to add yourself and @gabrielelana to the list of contributors:

https://github.com/snowleopard/alga/blob/master/AUTHORS.md

Implement complement for undirected graphs

0d8a682

Fixes snowleopard#21

gabrielelana added 2 commits October 26, 2019 10:36

Faster implementation of complement with loops support

159b9eb

Wrong complexity of edges

b5e827a

snowleopard reviewed Oct 28, 2019

View reviewed changes