-
-
Notifications
You must be signed in to change notification settings - Fork 124
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Refactor and clean format conversions. #152
Refactor and clean format conversions. #152
Conversation
cc @ahwillia I encourage you to take a look, if you can. |
Codecov Report
@@ Coverage Diff @@
## master #152 +/- ##
==========================================
+ Coverage 96.42% 96.89% +0.47%
==========================================
Files 10 10
Lines 1174 1191 +17
==========================================
+ Hits 1132 1154 +22
+ Misses 42 37 -5
Continue to review full report at Codecov.
|
581439c
to
c7a7ca3
Compare
c7a7ca3
to
ae38ae4
Compare
cc @mrocklin Is it possible to have a quicker review on this? It's blocking my progress on fill-values. |
My apologies for the delay. I'm at a conference this week and may not have
a ton of time. I'll put this high-ish on my todo list, but it's a slow
moving list at the moment. My apologies for being a bottleneck.
…On Sat, May 12, 2018 at 2:24 PM, Hameer Abbasi ***@***.***> wrote:
cc @mrocklin <https://github.com/mrocklin> Is it possible to have a
quicker review on this? It's blocking my progress on fill-values.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#152 (comment)>, or mute
the thread
<https://github.com/notifications/unsubscribe-auth/AASszK6nkjHCZ8HCLrnmnvX68zMJLKmLks5txyjfgaJpZM4T4I63>
.
|
Ah, take your time, in that case. |
@jcrist *might* be interested in starting reviewing some code in this
project.
…On Sat, May 12, 2018 at 2:33 PM, Hameer Abbasi ***@***.***> wrote:
Ah, take your time, in that case.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#152 (comment)>, or mute
the thread
<https://github.com/notifications/unsubscribe-auth/AASszLbAHuRYqKLDigyXfXbZxtiv184Rks5txyr2gaJpZM4T4I63>
.
|
It's always awesome to have more people involved! Developers, users, reviewers, or any other kind of role. 😃 |
These changes generally look good to me. Now that broadcasting works in the If we were to keep |
I hadn't added data broadcasting yet when you last commented, but I just did (along with a docs update). I'll leave the decision of whether or not to add In that light, |
I agree with this 👍 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems good to me. A few small comments. My apologies for the late review.
sparse/coo/core.py
Outdated
x = list(x.items()) | ||
|
||
if len(x) != 2 and not all(len(item) == 2 for item in x): | ||
raise ValueError('Invalid iterable to convert to COO.') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it possible that they user will give an Iterator
rather than an Iterable
? If so then this check will consume our data. We might want to defensively convert to a tuple or list.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not too familiar with the differences between the two. Will this check break or is it just a performance consideration?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Iterators are ephemeral and get consumed on first use. They are common, for example, when reading a sequence of data from a file. Here is an example that creates an iterator (seq) from an iterable (L) does a check, and then discovers that some values have been removed:
In [1]: L = [1, 2, 3]
In [2]: seq = iter(L)
In [3]: if all(x > 100 for x in seq):
...: print("hello")
...:
In [4]: list(seq)
Out[4]: [2, 3]
In [5]: from collections import Iterator, Iterable
In [7]: isinstance(seq, Iterator)
Out[7]: True
In [8]: isinstance(L, Iterable)
Out[8]: True
In [9]: L
Out[9]: [1, 2, 3]
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, I understand now. I don't think making this a supported input is worth the extra effort.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I suspect that users will use iterators here whether or not we support it. Currently we silently fail. We should either fail loudly
if isinstance(x, Iterator):
raise TypeError("...")
Or we should coerce to a list
if isinstance(x, Iterator):
x = list(x)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well, if we have to add code, I guess supporting it is better, so I did that.
sparse/coo/core.py
Outdated
if isinstance(x, Iterable): | ||
return COO.from_iter(x, shape=shape) | ||
|
||
raise NotImplementedError('Format not supported for conversion.') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It might be helpful to list here what was provided from the user and a set of valid values.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done!
@@ -20,7 +20,7 @@ class SparseArray(object): | |||
|
|||
def __init__(self, shape): | |||
if not isinstance(shape, Iterable): | |||
shape = (int(shape),) | |||
shape = (shape,) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why this change?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we provided a simple unsupported class that wasn't Iterable
as a shape
, this would break but give an unhelpful error message about not being able to cast to int
. This way, the error message is more informative.
I tend to bunch in small bugfixes along with big PRs in order to get them in quickly, I don't really know how good of a habit this is.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is common for numpy integers to sneak in here. Sometimes these cause issues later on. It would be good to ensure that all values of shape
are eventually coerced to be normal Python integers.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I anticipated this already. This is already done, a few lines below this one. :-)
It was being done here an extra time before the check. I just changed that so it happens after the check, and gives a meaningful error message.
Lines 21 to 29 in b9fc91c
def __init__(self, shape): | |
if not isinstance(shape, Iterable): | |
shape = (int(shape),) | |
if not all(isinstance(l, Integral) and int(l) >= 0 for l in shape): | |
raise ValueError('shape must be an non-negative integer or a tuple ' | |
'of non-negative integers.') | |
self.shape = tuple(int(l) for l in shape) |
Any further changes required here? |
No objection from me |
Closes #151