Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allows for merging of SparseDataFrames, and fixes __array__ interface #19488

Closed
wants to merge 25 commits into from

Conversation

hexgnu
Copy link
Contributor

@hexgnu hexgnu commented Feb 1, 2018

This is still an in progress thing... since this is pretty complected code that I'm trying to unravel.

@pep8speaks
Copy link

pep8speaks commented Feb 1, 2018

Hello @hexgnu! Thanks for updating the PR.

Cheers ! There are no PEP8 issues in this Pull Request. 🍻

Comment last updated on February 12, 2018 at 05:07 Hours UTC

dense_merge = dense_evens.merge(dense_threes, how=how, on='A')

# If you merge two dense frames together it tends to default to float64 not the original dtype
dense_merge['B_x'] = dense_merge['B_x'].astype(np.int64, errors='ignore')
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This to me seems kind of bizarre and I couldn't find an issue for it but basically:

If you merge two dense frames together that I defined above the dtype goes from int64 to float64. I think I know where the code is that's doing that so I could fix it but didn't want to get too side tracked in this work.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's since np.nan is the missing value indicator, which is a float. Doing the merge induces missing values for how=left/right/outer, so the ints are cast to floats.

@@ -73,6 +75,9 @@ def __init__(self, data=None, index=None, columns=None, default_kind=None,
if columns is None:
raise Exception("cannot pass a series w/o a name or columns")
data = {columns[0]: data}
elif isinstance(data, BlockManager):
if default_fill_value is None:
default_fill_value, _ = Counter([b.fill_value for b in data.blocks]).most_common(1)[0]
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't know if this is kosher or not...

Basically I kept running into the issue of

If you create a SparseDataFrame with a bunch of SparseSeries / SparseArrays that have fill_values like == 1 or ==0 or something then it doesn't persist that to default_fill_value. That seems like an enhancement and I could take it out of this PR but it helped me test.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems like a recipe for surprise. I'd hate for df.reindex() to do something different, based on the types of blocks that SparseDataFrame happened to be initialized with. If a user explicitly sets default_fill_value that's one thing, but inferring it from the data seems problematic.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can take it out it didn't add much tbh. was just making my testing easier ;)

@jreback jreback added Reshaping Concat, Merge/Join, Stack/Unstack, Explode Sparse Sparse Data Type labels Feb 1, 2018
@@ -538,7 +538,7 @@ Reshaping
- Bug in :func:`DataFrame.merge` in which merging using ``Index`` objects as vectors raised an Exception (:issue:`19038`)
- Bug in :func:`DataFrame.stack`, :func:`DataFrame.unstack`, :func:`Series.unstack` which were not returning subclasses (:issue:`15563`)
- Bug in timezone comparisons, manifesting as a conversion of the index to UTC in ``.concat()`` (:issue:`18523`)
-
- Bug in :func:`SparseDataFrame.merge` which raises error (:issue:`13665`)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This can go under enhancements, saying "Merging sparse DataFrames" is now supported.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh it's an enhancement alrighty thanks for the guidance.

if klass is None:
dtype = dtype or values.dtype
klass = get_block_type(values, dtype)

elif klass is DatetimeTZBlock and not is_datetimetz(values):
return klass(values, ndim=ndim,
placement=placement, dtype=dtype)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You might want to install a flake8 plugin for your editor ;)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok ok installing now. I fully admit that I'm awful when it comes to linting failures and should just tool the fix. Might as well install Ctrl-P while I'm at it.

blocks.append(b)

return BlockManager(blocks, axes)

def is_sparse_join_units(join_units):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Docstring would be nice, at least noting that this is true if any blocks are sparse.

@@ -5120,14 +5121,28 @@ def concatenate_block_managers(mgrs_indexers, axes, concat_axis, copy):
elif is_uniform_join_units(join_units):
b = join_units[0].block.concat_same_type(
[ju.block for ju in join_units], placement=placement)
elif is_sparse_join_units(join_units):
values = concatenate_join_units(join_units, concat_axis, copy=copy)
values = values[0]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't understand this line... A comment maybe? or is it incorrect?

@@ -73,6 +75,9 @@ def __init__(self, data=None, index=None, columns=None, default_kind=None,
if columns is None:
raise Exception("cannot pass a series w/o a name or columns")
data = {columns[0]: data}
elif isinstance(data, BlockManager):
if default_fill_value is None:
default_fill_value, _ = Counter([b.fill_value for b in data.blocks]).most_common(1)[0]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems like a recipe for surprise. I'd hate for df.reindex() to do something different, based on the types of blocks that SparseDataFrame happened to be initialized with. If a user explicitly sets default_fill_value that's one thing, but inferring it from the data seems problematic.

dense_merge = dense_evens.merge(dense_threes, how=how, on='A')

# If you merge two dense frames together it tends to default to float64 not the original dtype
dense_merge['B_x'] = dense_merge['B_x'].astype(np.int64, errors='ignore')
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's since np.nan is the missing value indicator, which is a float. Doing the merge induces missing values for how=left/right/outer, so the ints are cast to floats.

@codecov
Copy link

codecov bot commented Feb 5, 2018

Codecov Report

Merging #19488 into master will decrease coverage by 0.19%.
The diff coverage is 96.87%.

Impacted file tree graph

@@            Coverage Diff            @@
##           master   #19488     +/-   ##
=========================================
- Coverage   91.82%   91.62%   -0.2%     
=========================================
  Files         152      150      -2     
  Lines       49248    48833    -415     
=========================================
- Hits        45222    44744    -478     
- Misses       4026     4089     +63
Flag Coverage Δ
#multiple 90% <96.87%> (-0.22%) ⬇️
#single 41.72% <28.12%> (-0.18%) ⬇️
Impacted Files Coverage Δ
pandas/core/indexes/base.py 96.44% <100%> (-0.25%) ⬇️
pandas/core/reshape/merge.py 94.28% <100%> (+0.02%) ⬆️
pandas/core/algorithms.py 94.16% <100%> (-0.2%) ⬇️
pandas/core/sparse/series.py 95.31% <100%> (+0.1%) ⬆️
pandas/core/sparse/frame.py 95.28% <100%> (+0.46%) ⬆️
pandas/core/sparse/array.py 91.62% <100%> (+0.3%) ⬆️
pandas/core/internals.py 95.04% <92.85%> (-0.5%) ⬇️
pandas/tseries/plotting.py 0% <0%> (-100%) ⬇️
pandas/core/dtypes/base.py 47.61% <0%> (-44.28%) ⬇️
pandas/plotting/_compat.py 62% <0%> (-28.91%) ⬇️
... and 69 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update cdfce2b...353171b. Read the comment docs.

@@ -1315,6 +1315,11 @@ def take_nd(arr, indexer, axis=0, out=None, fill_value=np.nan, mask_info=None,
undefined if allow_fill == False and -1 is present in indexer.
"""

if is_sparse(arr):
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So something that is a bit of a hotspot IMHO is that inside of algos it iterates through the memview and could cause some weird security / segfault issues. This fills that hole but really should fix that at a lower level.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, this seems sub-optimal. We shouldn't have to densify just to slice, right?

Can you take a look at SparseAray.take to see if any of that logic can be refactored / reused?

And since you're touching performance-related code, could you run the sparse-related ASVs?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thank you for pointing this out. exactly what I was looking for. Will run the ASVs

if len(values.shape) == 2:
values = values[0]
else:
assert len(values).shape == 1
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This never happens in my testing... so it's probably safe to pull out but it also would be odd if we got to this place.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sorry what I mean to say is... len(values).shape is always either 1 or 2 in my testing. I'm not really sure what to do if shape == 3 or something and it was a true ndarray that was sparse.

@@ -5313,6 +5330,10 @@ def concatenate_block_managers(mgrs_indexers, axes, concat_axis, copy):
return BlockManager(blocks, axes)


def is_sparse_join_units(join_units):
return all(type(ju.block) is SparseBlock for ju in join_units)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is very similar (and maybe could be DRYed up) like pd.concat([sparse, sparse])


# You would think that you want self.block.fill_value here
# But in reality that will fill with a bunch of wrong values
fill_value = np.nan
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this was surprising to me. If you don't pass in np.nan what ends up happening is that if you merge two sparse frames it will concat with a bunch of fill_value's which is most definitely not what I would expect as a user.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

huh?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yea this confused me a bunch.

evens = pd.DataFrame({'A': range(0, 20, 2), 'B': range(10)})
threes = pd.DataFrame({'A': range(0, 30, 3), 'B': range(10)})
threes.merge(evens, how="left", on="A")

Yields

    A  B_x  B_y
0   0    0  0.0
1   3    1  NaN
2   6    2  3.0
3   9    3  NaN
4  12    4  6.0
5  15    5  NaN
6  18    6  9.0
7  21    7  NaN
8  24    8  NaN
9  27    9  NaN

As you'd expect for a dense dataframe

but if you sparsify it with a fill_value other than np.nan / None you get weird results

evens = pd.DataFrame({'A': range(0, 20, 2), 'B': range(10)}).to_sparse(fill_value=8675309)
threes = pd.DataFrame({'A': range(0, 30, 3), 'B': range(10)}).to_sparse(fill_value=90210)
threes.merge(evens, how="left", on="A")

Yields

    A  B_x      B_y
0   0    0        0
1   3    1  8675309
2   6    2        3
3   9    3  8675309
4  12    4        6
5  15    5  8675309
6  18    6        9
7  21    7  8675309
8  24    8  8675309
9  27    9  8675309

Is that expected behavior?

key_col = Index(lvals).where(~mask, rvals)
# Might need to be IntIndex not Index
if isinstance(lvals, SparseArray):
key_col = Index(lvals.get_values()).where(~mask, rvals)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure if this has memory or performance issues. But this is the best solution I could come to with this. The other solution would be to look at using lvals.sp_index and implement a where on it that works.

One thing I have noticed is that IntIndex doesn't act quite like Index which makes for doing these masks tricky in sparse land.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It'd be nice to avoid get_values if possible.

@@ -292,9 +292,9 @@ def __setstate__(self, state):
self._fill_value = fill_value

def __len__(self):
try:
if hasattr(self, 'sp_index'):
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This felt really icky to me to try and catch something... Feel free to push back on that change.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How was this failing before?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have no clue! 😄 I couldn't figure out why the try / except was there in the first place tbh.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

huh?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This isn't essential to this PR. I just don't understand why this code was here in the first place.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Meaning I can take it out to bring down the code changes

@@ -73,6 +73,10 @@ def __init__(self, data=None, index=None, columns=None, default_kind=None,
if columns is None:
raise Exception("cannot pass a series w/o a name or columns")
data = {columns[0]: data}
elif isinstance(data, BlockManager):
fill_value_size = len(set(b.fill_value for b in data.blocks))
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@TomAugspurger @jreback

Ok I need feedback on this change. Basically what this does is: if every SparseSeries's fill_value is the same it sets the default_fill_value to that. This to me seems really intuitive, but I trust your judgement. It also makes testing a whole of a hell lot easier which is why I did it.

I would argue pretty heavily that as a user if I added sparse series that all had fill_value=0 then I would expect the sparse data frame to have default_fill_value = 0 as well.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes this is the only reason to have a standalone SparseDataFrame, meaning that they have the same default fill values. But its not worth the cost generally.

@@ -175,7 +175,7 @@ def values(self):

def __array__(self, result=None):
""" the array interface, return my values """
return self.block.values
return self.block.values.values
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will return the dense version of SparseSeries when calling np.array(sp)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we need some more asv's for sparse I think

# Since we are returning a dense representation of
# SparseSeries sparse_index might not align when calling
# ufunc on the array. There doesn't seem to be a better way
# to do this unfortunately.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This wasn't intuitive to me but the array*_ functions defined in numpy don't work quite as expected in certain circumstances. np.abs(s) is the one that was throwing me for a loop.

s = SparseArray([1,-2,2,3], fill_value=-2)
np.abs(s)

What would happen to the dense version of the array would be that it would have 3 npoints when really it should only have 2. By zeroing out the sparse_index it fixes the problem, because it relies on the dense index instead going around the problem.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

these should simply operate on the sp_values, not the on the densified (or maybe its possible that some classes of ufuncs could operate on the dense, but not sure)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

so the expected behavior is np.abs(SparseArray([1,-2,2,3])) == array([1,2,3]) ?

@hexgnu
Copy link
Contributor Author

hexgnu commented Feb 5, 2018

Also wanted to note that I would have liked to keep PR's separate (one for each bug) but honestly this code was all tied together.

@hexgnu hexgnu changed the title First pass at fixing issues with SparseDataFrame merging Allows for merging of SparseDataFrames, and fixes __array__ interface Feb 6, 2018
@hexgnu
Copy link
Contributor Author

hexgnu commented Feb 6, 2018

This is ready for a review now that the tests pass. @TomAugspurger thanks :)

Do you want me to squash commits on here?

@TomAugspurger
Copy link
Contributor

No need to squash commits.

Looks like there's a merge conflict in the whatsnew though, if you want to fix that and push again. Taking a look now.

Copy link
Contributor

@TomAugspurger TomAugspurger left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just partway through. Will revisit later.

@@ -238,6 +238,7 @@ Other Enhancements

- ``IntervalIndex.astype`` now supports conversions between subtypes when passed an ``IntervalDtype`` (:issue:`19197`)
- :class:`IntervalIndex` and its associated constructor methods (``from_arrays``, ``from_breaks``, ``from_tuples``) have gained a ``dtype`` parameter (:issue:`19262`)
- :func:`pandas.merge` now supports merging of :class:`SparseDataFrame` (:issue:`13665`)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Clarify: merging sparse to sparse? Sparse and dense?

@@ -555,7 +556,7 @@ Sparse

- Bug in which creating a ``SparseDataFrame`` from a dense ``Series`` or an unsupported type raised an uncontrolled exception (:issue:`19374`)
- Bug in :class:`SparseDataFrame.to_csv` causing exception (:issue:`19384`)
-
- Bug in :class:`SparseSeries.__array__` returning only non-faills (:issue:`13665`)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Typo in "faills"?

@@ -1315,6 +1315,11 @@ def take_nd(arr, indexer, axis=0, out=None, fill_value=np.nan, mask_info=None,
undefined if allow_fill == False and -1 is present in indexer.
"""

if is_sparse(arr):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, this seems sub-optimal. We shouldn't have to densify just to slice, right?

Can you take a look at SparseAray.take to see if any of that logic can be refactored / reused?

And since you're touching performance-related code, could you run the sparse-related ASVs?

@@ -5304,6 +5305,22 @@ def concatenate_block_managers(mgrs_indexers, axes, concat_axis, copy):
elif is_uniform_join_units(join_units):
b = join_units[0].block.concat_same_type(
[ju.block for ju in join_units], placement=placement)
elif is_sparse_join_units(join_units):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you ever go down the initial if with sparse arrays?

key_col = Index(lvals).where(~mask, rvals)
# Might need to be IntIndex not Index
if isinstance(lvals, SparseArray):
key_col = Index(lvals.get_values()).where(~mask, rvals)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It'd be nice to avoid get_values if possible.

@@ -292,9 +292,9 @@ def __setstate__(self, state):
self._fill_value = fill_value

def __len__(self):
try:
if hasattr(self, 'sp_index'):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How was this failing before?

Copy link
Contributor

@jreback jreback left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would wait on this for some more general machinery to exist, mainly the _ndarray_values in @TomAugspurger PR. This should be simpler over time. Further ok on adding some things e.g. fixing __array__ and/or .take which can be split from this.

fill_value=fill_value)

# return take_nd(arr.get_values(), indexer, axis=axis, out=out,
# fill_value=fill_value, mask_info=mask_info,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

remove this, make this an elif

@@ -618,6 +618,9 @@ def where(self, cond, other=None):
if other is None:
other = self._na_value

if isinstance(other, ABCSparseArray):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we shouldn't have Sparse specific things like this, use ._values

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

there isn't ._values on SparseArray though? Unless I'm missing something out of @TomAugspurger 's pull request...

@@ -5304,6 +5305,22 @@ def concatenate_block_managers(mgrs_indexers, axes, concat_axis, copy):
elif is_uniform_join_units(join_units):
b = join_units[0].block.concat_same_type(
[ju.block for ju in join_units], placement=placement)
elif is_sparse_join_units(join_units):
values = concatenate_join_units(join_units, concat_axis, copy=copy)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is a mess

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, yes it is. I'll work on cleaning this up tomorrow since there's too many branches in this code.


# You would think that you want self.block.fill_value here
# But in reality that will fill with a bunch of wrong values
fill_value = np.nan
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

huh?

@@ -731,7 +732,11 @@ def _maybe_add_join_keys(self, result, left_indexer, right_indexer):
if mask.all():
key_col = rvals
else:
key_col = Index(lvals).where(~mask, rvals)
# Might need to be IntIndex not Index
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

don't do this, use _values

@@ -292,9 +292,9 @@ def __setstate__(self, state):
self._fill_value = fill_value

def __len__(self):
try:
if hasattr(self, 'sp_index'):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

huh?

@@ -73,6 +73,10 @@ def __init__(self, data=None, index=None, columns=None, default_kind=None,
if columns is None:
raise Exception("cannot pass a series w/o a name or columns")
data = {columns[0]: data}
elif isinstance(data, BlockManager):
fill_value_size = len(set(b.fill_value for b in data.blocks))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes this is the only reason to have a standalone SparseDataFrame, meaning that they have the same default fill values. But its not worth the cost generally.

@@ -175,7 +175,7 @@ def values(self):

def __array__(self, result=None):
""" the array interface, return my values """
return self.block.values
return self.block.values.values
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we need some more asv's for sparse I think

# Since we are returning a dense representation of
# SparseSeries sparse_index might not align when calling
# ufunc on the array. There doesn't seem to be a better way
# to do this unfortunately.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

these should simply operate on the sp_values, not the on the densified (or maybe its possible that some classes of ufuncs could operate on the dense, but not sure)

@@ -296,13 +287,13 @@ def test_concat_axis1(self):
res = pd.concat([sparse, sparse3], axis=1)
exp = pd.concat([self.dense1, self.dense3],
axis=1).to_sparse(fill_value=0)
exp._default_fill_value = np.nan
# exp._default_fill_value = np.nan
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this needs to be taken out fully instead of just commented out

@TomAugspurger
Copy link
Contributor

@hexgnu SparseArray.__array__ has been fixed to return a dense ndarray with all the values.

Is there anything remaining in this PR that would be useful?

@hexgnu
Copy link
Contributor Author

hexgnu commented Oct 17, 2018

@TomAugspurger I don't think there's anything else we should merge in. Let's just close this PR :) Thanks for all your work!

@hexgnu hexgnu closed this Oct 17, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Reshaping Concat, Merge/Join, Stack/Unstack, Explode Sparse Sparse Data Type
Projects
None yet
Development

Successfully merging this pull request may close these issues.

merge of sparse dataframe fails
4 participants