Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactor and clean format conversions. #152

Merged
merged 7 commits into from
May 15, 2018

Conversation

hameerabbasi
Copy link
Collaborator

Closes #151

@hameerabbasi hameerabbasi requested a review from mrocklin May 9, 2018 11:14
@hameerabbasi
Copy link
Collaborator Author

cc @ahwillia I encourage you to take a look, if you can.

@codecov-io
Copy link

codecov-io commented May 9, 2018

Codecov Report

Merging #152 into master will increase coverage by 0.47%.
The diff coverage is 100%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master     #152      +/-   ##
==========================================
+ Coverage   96.42%   96.89%   +0.47%     
==========================================
  Files          10       10              
  Lines        1174     1191      +17     
==========================================
+ Hits         1132     1154      +22     
+ Misses         42       37       -5
Impacted Files Coverage Δ
sparse/coo/__init__.py 100% <100%> (ø) ⬆️
sparse/coo/core.py 96.36% <100%> (+1.72%) ⬆️
sparse/dok.py 95.57% <100%> (+0.24%) ⬆️
sparse/utils.py 97.05% <100%> (-0.09%) ⬇️
sparse/sparse_array.py 92% <100%> (+0.33%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 3afd0e0...5e0f36b. Read the comment docs.

@hameerabbasi
Copy link
Collaborator Author

cc @mrocklin Is it possible to have a quicker review on this? It's blocking my progress on fill-values.

@mrocklin
Copy link
Collaborator

mrocklin commented May 12, 2018 via email

@hameerabbasi
Copy link
Collaborator Author

Ah, take your time, in that case.

@mrocklin
Copy link
Collaborator

mrocklin commented May 12, 2018 via email

@hameerabbasi
Copy link
Collaborator Author

It's always awesome to have more people involved! Developers, users, reviewers, or any other kind of role. 😃

@ahwillia
Copy link
Contributor

These changes generally look good to me. Now that broadcasting works in the COO constructor it seems quite intuitive to me. Does this supercede #150 ? It doesn't seem like there are any conflicts, but maybe sparse.full(...) is now unnecessary / redundant? I'm happy to close that PR if so.

If we were to keep sparse.full(...) would it make sense to add a format keyword? Right now the function is only able to produce a COO array.

@hameerabbasi
Copy link
Collaborator Author

hameerabbasi commented May 13, 2018

I hadn't added data broadcasting yet when you last commented, but I just did (along with a docs update). I'll leave the decision of whether or not to add sparse.full up to you. IMHO, it's unnecessary, and it doesn't match up exactly with np.full. In this library, we try to make things match exactly where possible.

In that light, sparse.full should be an array with all-same values. This will be possible (or rather feasible) after #143 is implemented.

@ahwillia
Copy link
Contributor

In that light, sparse.full should be an array with all-same values.

I agree with this 👍

Copy link
Collaborator

@mrocklin mrocklin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems good to me. A few small comments. My apologies for the late review.

x = list(x.items())

if len(x) != 2 and not all(len(item) == 2 for item in x):
raise ValueError('Invalid iterable to convert to COO.')
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it possible that they user will give an Iterator rather than an Iterable? If so then this check will consume our data. We might want to defensively convert to a tuple or list.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not too familiar with the differences between the two. Will this check break or is it just a performance consideration?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Iterators are ephemeral and get consumed on first use. They are common, for example, when reading a sequence of data from a file. Here is an example that creates an iterator (seq) from an iterable (L) does a check, and then discovers that some values have been removed:

In [1]: L = [1, 2, 3]

In [2]: seq = iter(L)

In [3]: if all(x > 100 for x in seq):
   ...:     print("hello")
   ...:     

In [4]: list(seq)
Out[4]: [2, 3]

In [5]: from collections import Iterator, Iterable

In [7]: isinstance(seq, Iterator)
Out[7]: True

In [8]: isinstance(L, Iterable)
Out[8]: True

In [9]: L
Out[9]: [1, 2, 3]

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, I understand now. I don't think making this a supported input is worth the extra effort.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suspect that users will use iterators here whether or not we support it. Currently we silently fail. We should either fail loudly

if isinstance(x, Iterator):
    raise TypeError("...")

Or we should coerce to a list

if isinstance(x, Iterator):
    x = list(x)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, if we have to add code, I guess supporting it is better, so I did that.

if isinstance(x, Iterable):
return COO.from_iter(x, shape=shape)

raise NotImplementedError('Format not supported for conversion.')
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It might be helpful to list here what was provided from the user and a set of valid values.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done!

@@ -20,7 +20,7 @@ class SparseArray(object):

def __init__(self, shape):
if not isinstance(shape, Iterable):
shape = (int(shape),)
shape = (shape,)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why this change?

Copy link
Collaborator Author

@hameerabbasi hameerabbasi May 14, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we provided a simple unsupported class that wasn't Iterable as a shape, this would break but give an unhelpful error message about not being able to cast to int. This way, the error message is more informative.

I tend to bunch in small bugfixes along with big PRs in order to get them in quickly, I don't really know how good of a habit this is.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is common for numpy integers to sneak in here. Sometimes these cause issues later on. It would be good to ensure that all values of shape are eventually coerced to be normal Python integers.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I anticipated this already. This is already done, a few lines below this one. :-)

It was being done here an extra time before the check. I just changed that so it happens after the check, and gives a meaningful error message.

def __init__(self, shape):
if not isinstance(shape, Iterable):
shape = (int(shape),)
if not all(isinstance(l, Integral) and int(l) >= 0 for l in shape):
raise ValueError('shape must be an non-negative integer or a tuple '
'of non-negative integers.')
self.shape = tuple(int(l) for l in shape)

@hameerabbasi
Copy link
Collaborator Author

Any further changes required here?

@mrocklin
Copy link
Collaborator

No objection from me

@hameerabbasi hameerabbasi merged commit 975a202 into pydata:master May 15, 2018
@hameerabbasi hameerabbasi deleted the refactor-coo-creation branch May 15, 2018 15:21
@hameerabbasi
Copy link
Collaborator Author

This is in. Thanks for the review, @mrocklin; and thanks for the idea, @ahwillia!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants