Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

node names in vertex dataframe type not consistent with ones generated from edge in Graph.DataFrame method #347

Closed
alexwwang opened this issue Dec 3, 2020 · 5 comments
Assignees

Comments

@alexwwang
Copy link

Describe the bug
In Graph.DataFrame method, on line 3125

3125              if len(np.setdiff1d(names_edges, names_vertices)):

this is used to check if number of nodes passed in from vertices dataframe equals to ones counted from edges.

If we don't set the type of source and target columns in edge dataframe to str, but leave them to int,
this will raise the error below:

3126                  raise ValueError( 
3127                      'Some vertices in the edge DataFrame are missing from vertices DataFrame') 

However no document mentioned this before.

To reproduce

import pandas as pd
import igraph
igraph.Graph.DataFrame(
    pd.DataFrame({'source':[1,2,3], 'target':[4,5,6]}),
    directed=True,
    vertices=pd.DataFrame({'node':[1,2,3,4,5,6], 'label':['1', '2', '3', '4', '5', '6']})
)

Version information
pip install python-igraph
version: 0.8.3

@ntamas
Copy link
Member

ntamas commented Dec 3, 2020

Summon @iosonofabio -- can you please check when you have some time?

@iosonofabio
Copy link
Member

iosonofabio commented Dec 3, 2020 via email

@iosonofabio
Copy link
Member

iosonofabio commented Dec 3, 2020

@alexwwang almost correct. That line checks if the specified vertices contain the ones inferred from the edges, there could be more vertices that have no edges (singletons).

Independent of that there is a bug. It is a little unclear what the expected behaviour is though. The first two columns of the edges argument can refer to two different things:

  1. vertex ids, i.e. 0,1,2,3...
  2. vertex names as specified in the vertices argument, in this case 1,2,3,4...

ATM we force vertex names to be strings: I wouldn't want to change that. In that sense, the user here specified a wrong input for vertices since they were not strings to start with (@alexwwang not your fault, it isn't documented properly).

My suggestion is that if the user specified vertices, then the first two columns of edges should have the vertex names, otherwise the vertex ids. Another convention would be: if the edges columns have dtype string take them as the names, else as ids.

Any opinion on the table? Once we decide the fix is 5 minutes.

@iosonofabio
Copy link
Member

@ntamas @vtraag @szhorvat if any of you wanna quickly jump on Teams I'd be happy to decide and fix

@ntamas
Copy link
Member

ntamas commented Dec 3, 2020

ATM we force vertex names to be strings: I wouldn't want to change that.

That's okay, this is a convention that is used throughout the entire library. In the long term I would be happier if we could hide the existence of numeric vertex IDs completely so the user would be able to use arbitrary Python objects as vertices (just like in NetworkX), but that would require a complete overhaul of the API of python-igraph, so it is not going to happen any time soon.

Any opinion on the table? Once we decide the fix is 5 minutes.

For the record, we agreed this morning on Teams that we will have an extra argument that lets the user decide whether the vertices are referred to with numeric IDs or not. This will be released with the next version of python-igraph. PR #348 addresses the issue; I'll merge it soon.

@ntamas ntamas closed this as completed Dec 3, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants