Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

COO constructor behavior #151

Closed
ahwillia opened this issue May 5, 2018 · 3 comments
Closed

COO constructor behavior #151

ahwillia opened this issue May 5, 2018 · 3 comments

Comments

@ahwillia
Copy link
Contributor

ahwillia commented May 5, 2018

The COO constructor currently accepts many inputs. While this flexibility is nice, it isn't fully documented yet and we might want to modify it further. Here are a couple of points to start the discussion


  1. Should COO accept scalar input for data (new functionality proposed in add full function to simplify COO creation #150)?
COO(coords, data=5)   # creates sparse array with 5s at all nonzero entries

  1. Should COO convert dense numpy arrays or should we force COO.from_numpy(...) to be called separately?
x = np.random.randint(10, size=(10,1000))
x[x < 5] = 0
s = COO(x)   # is x supposed to be interpreted as coords or as data?
s = COO.from_numpy(x)  # seems a bit more explicit to me

If we added a dtype option to the constructor I think this would be even more ambiguous.

# cast `x` to bool array or create sparse array of True at indices specified by x?
s = COO(x, dtype=bool)

I suppose my confusion here stems from the assumption that calling s = COO(coords, dtype=...) without specifying data would initialize self.data with np.empty(dtype=dtype)


  1. Other input forms {(i, j, k): x, (i, j, k): y, ...} or [((i, j, k), value), (i, j, k), value), ...] are also accepted but don't seem to be documented? Should these be accepted to the constructor or should we provide class method for it (e.g. COO.from_dict(...))
@hameerabbasi
Copy link
Collaborator

I'm +1 on all counts.

What I propose (the constructor doesn't support this unless otherwise stated):

  • COO.from_numpy for Numpy Arrays
  • COO.from_iter for dicts/iterables
  • Constructor for making something from coords/data
  • COO.from_sparse_array for converting from DOK and other future formats.
  • COO.from_scipy_sparse for converting from scipy.sparse.spmatrix
  • sparse.as_coo for converting any given form to COO
  • sparse.full for functionality proposed in add full function to simplify COO creation #150, but it should be supported in the constructor as well.

cc @mrocklin if you're +1 on this, I can put together a PR.

@mrocklin
Copy link
Collaborator

mrocklin commented May 7, 2018

Happy either way. Explicit methods are probably nicer on maintainers, all-in-one constructors are probably nicer for users. Numpy and Pandas mostly went with the all-in-one constructors approach. It has pros and cons.

@hameerabbasi
Copy link
Collaborator

Numpy has a clear definition of what array_like is for the ndarray constructor, unlike us. Pandas does something similar to what we're doing now, where the first item can have multiple meanings.

One thing we can do, however, is better document the currently supported arguments, separate them out into the functions I laid out above so that the constructor can just dispatch to sparse.as_coo. That way, we can support all cases while keeping the maintenance burden down.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants