Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DFS and DFS iterator #315

Merged
merged 4 commits into from
Aug 3, 2020
Merged

DFS and DFS iterator #315

merged 4 commits into from
Aug 3, 2020

Conversation

iosonofabio
Copy link
Member

This PR aims at addressing #295.

In that issue a trivial Python implementation using recursion is proposed. That would be simplest, of course, but potentially runs into a few issues:

  • max depth of the Python call stack
  • perhaps speed

The way BFS is coded is via a two-tier system:

  • a function that returns a whole list of the traversal, which relies on the C core igraph_i_bfs
  • an iterator that reimplements the basic algorithm (a queue)

Both live at the C API interface with Python and are not recursive, therefore should be more performant than an equivalent, pure Python recursive call.

This PR aims at either of the following three:

  1. implementing DFS and its iterator the same way that BFS is implemented
  2. take down the whole infrastructure for BFS and substitute it with a simple pure-python recursive call and do the same for DFS
  3. leave BFS the way it is and use the pure Python recursion for DFS, along the lines of "better slow than nothing"

Any preferences out there among these?

@iosonofabio
Copy link
Member Author

Alright, I implemented two things:

  1. Graph.DFSIter at the C/Python interface similar to BFSIter (basically using a stack instead of a queue)
  2. Graph.dfs in pure Python but without recursion (also using a stack). This is a temporary implementation until we have a working igraph_i_dfs fuction in C, for which I started igraph_i_dfs igraph#1451. Notice this is not a lazy generator but rather an immediate function: same as Graph.bfs. I suppose the XFSIter are supposed to be the lazy versions.

I also added tests to for these guys and also one test for Graph.bfs which was missing.

There is one thing I am not quite sure about in terms of API. Graph.bfs has these layers, but it seems like the code was written somewhat confusingly. The C/Python interface file seems to think this vector should have length n (number of nodes) and initialized that way, but then the vector gets blissfully cleared inside of igraph_i_bfs and has a variable length - the meaning of which escapes me.

Since I didn't understand it I did not add any layers to DFS. I don't think it's too important as we can use DFSIter to return the distance and parents anyway, but if you have an illuminating explanation I'd love to hear it.

Otherwise ready to merge.

@iosonofabio iosonofabio marked this pull request as ready for review July 29, 2020 23:21
@iosonofabio iosonofabio requested a review from ntamas July 29, 2020 23:21
@ntamas
Copy link
Member

ntamas commented Jul 31, 2020

Thanks a lot for this PR! Yes, the problem with BFS and DFS in general is that the "pythonic" way of doing these is via iterators or generator functions, and neither of them can be connected with the C interface because the C core is fundamentally different (it calculates the whole BFS / DFS in advance). That's why we have .bfs() and BFSIter simultaneously, so the approach you have chosen is correct.

I'll start reviewing this soon.

src/_igraph/dfsiter.c Outdated Show resolved Hide resolved
src/_igraph/graphobject.c Outdated Show resolved Hide resolved
while stack:
vid = stack[-1]
if mode == IN:
edges = self.vs[vid].in_edges()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can't we simply use self.neighbors(vid, mode=mode) here? This would avoid creating a Vertex object (the output of self.vs[vid]) and would also simplify the code that comes later when we have to figure out which is the "other" endpoint of the edge.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another trick that comes into my mind: we could have another stack where we push the retrieved neighbors of vid when it is visited for the first time. Then, whenever we need to find the next neighbor of vid to visit, we simply pop off the last element of the list. If the list is empty, there are no more neighbors of vid to visit so we can backtrack.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Alright I use neighbors now, good catch.

I tried playing a little with a different stack with all new neighbors, but it didn't make much of a change in terms of speed. I guess we could use a heap to keep the things sorted by distance and parent, but I'm not sure it improves a lot considering it's pure Python anyway.

Copy link
Member

@ntamas ntamas left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking good - the only suggestion that should seriously be considered is the usage of self.neighbors(vid) instead of self.vs[vid].out_edges() (or .in_edges()) to retrieve the neighbors. This is a low-hanging fruit that would also simplify some of the code that follows after it.

@iosonofabio
Copy link
Member Author

Good, I made the neighbors and all other changes, ready to merge. If you want to elaborate a bit on the "second stack" idea I'm happy to hear, else let's merge.

@ntamas ntamas merged commit 5267860 into igraph:master Aug 3, 2020
@ntamas
Copy link
Member

ntamas commented Aug 3, 2020

Okay, let's merge this now and I might experiment a bit with the idea that I outlined above.

@ntamas
Copy link
Member

ntamas commented Aug 4, 2020

FYI, this is what I was thinking about: 5e4a5e2

@iosonofabio
Copy link
Member Author

iosonofabio commented Aug 4, 2020 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants