Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ENH: implement clip for sparse structures #9265

Closed
wholmgren opened this issue Jan 16, 2015 · 4 comments
Closed

ENH: implement clip for sparse structures #9265

wholmgren opened this issue Jan 16, 2015 · 4 comments
Labels
Enhancement Sparse Sparse Data Type

Comments

@wholmgren
Copy link
Contributor

Recognizing that this is probably not very high on the priority list, I'll document that a user tried to use clip on sparse structures and wished that it worked.

Some ideas for what to do in order of best to worst solution:

  1. Make it work in a memory and cpu efficient way.
  2. Raise a NotImplementedError (possibly suggesting converting to a dense structure).
  3. Make clip first convert to a dense structure, apply the limits, and return a sparse structure.

I am willing to submit PRs for 2 and 3, although I think that 3 is a bad idea. I need to understand the source better before I can consider volunteering to help with 1.

SparseSeries, SparseDataFrame, and SparsePanel all raise different exceptions.

pd.SparseSeries([0,1,np.nan,3,4]).clip(1,2)
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-159-4c91e2de7ae6> in <module>()
----> 1 pd.SparseSeries([0,1,np.nan,3,4]).clip(1,2)

/home/will/.virtualenvs/dev3/lib/python3.4/site-packages/pandas/core/generic.py in clip(self, lower, upper, out)
   2805         result = self
   2806         if lower is not None:
-> 2807             result = result.clip_lower(lower)
   2808         if upper is not None:
   2809             result = result.clip_upper(upper)

/home/will/.virtualenvs/dev3/lib/python3.4/site-packages/pandas/core/generic.py in clip_lower(self, threshold)
   2843             raise ValueError("Cannot use an NA value as a clip threshold")
   2844 
-> 2845         return self.where((self >= threshold) | isnull(self), threshold)
   2846 
   2847     def groupby(self, by=None, axis=0, level=None, as_index=True, sort=True,

/home/will/.virtualenvs/dev3/lib/python3.4/site-packages/pandas/core/ops.py in wrapper(self, other)
    661             return self._constructor(na_op(self.values, other.values),
    662                                      index=self.index,
--> 663                                      name=name).fillna(False).astype(bool)
    664         elif isinstance(other, pd.DataFrame):
    665             return NotImplemented

/home/will/.virtualenvs/dev3/lib/python3.4/site-packages/pandas/core/ops.py in na_op(x, y)
    627     def na_op(x, y):
    628         try:
--> 629             result = op(x, y)
    630         except TypeError:
    631             if isinstance(y, list):

ValueError: operands could not be broadcast together with shapes (5,) (0,) 
pd.SparseDataFrame([[0,1,np.nan,3,4],[0,1,np.nan,3,4]]).clip(1,2)
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-160-b399e57148ac> in <module>()
----> 1 pd.SparseDataFrame([[0,1,np.nan,3,4],[0,1,np.nan,3,4]]).clip(1,2)

/home/will/.virtualenvs/dev3/lib/python3.4/site-packages/pandas/core/generic.py in clip(self, lower, upper, out)
   2805         result = self
   2806         if lower is not None:
-> 2807             result = result.clip_lower(lower)
   2808         if upper is not None:
   2809             result = result.clip_upper(upper)

/home/will/.virtualenvs/dev3/lib/python3.4/site-packages/pandas/core/generic.py in clip_lower(self, threshold)
   2843             raise ValueError("Cannot use an NA value as a clip threshold")
   2844 
-> 2845         return self.where((self >= threshold) | isnull(self), threshold)
   2846 
   2847     def groupby(self, by=None, axis=0, level=None, as_index=True, sort=True,

/home/will/.virtualenvs/dev3/lib/python3.4/site-packages/pandas/core/ops.py in f(self, other)
    917             # straight boolean comparisions we want to allow all columns
    918             # (regardless of dtype to pass thru) See #4537 for discussion.
--> 919             res = self._combine_const(other, func, raise_on_error=False)
    920             return res.fillna(True).astype(bool)
    921 

TypeError: _combine_const() got an unexpected keyword argument 'raise_on_error'
pd.Panel(np.zeros((3,3,3))).to_sparse().clip(1,2)
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-171-ec53f9e1dd24> in <module>()
----> 1 pd.Panel(np.zeros((3,3,3))).to_sparse().clip(1,2)

/home/will/.virtualenvs/dev3/lib/python3.4/site-packages/pandas/core/generic.py in clip(self, lower, upper, out)
   2805         result = self
   2806         if lower is not None:
-> 2807             result = result.clip_lower(lower)
   2808         if upper is not None:
   2809             result = result.clip_upper(upper)

/home/will/.virtualenvs/dev3/lib/python3.4/site-packages/pandas/core/generic.py in clip_lower(self, threshold)
   2843             raise ValueError("Cannot use an NA value as a clip threshold")
   2844 
-> 2845         return self.where((self >= threshold) | isnull(self), threshold)
   2846 
   2847     def groupby(self, by=None, axis=0, level=None, as_index=True, sort=True,

/home/will/.virtualenvs/dev3/lib/python3.4/site-packages/pandas/core/ops.py in f(self, other)
    960             raise ValueError('Simple arithmetic with %s can only be '
    961                              'done with scalar values' %
--> 962                              self._constructor.__name__)
    963 
    964         return self._combine(other, op)

ValueError: Simple arithmetic with SparsePanel can only be done with scalar values
@jreback jreback added Enhancement Sparse Sparse Data Type labels Jan 16, 2015
@jreback
Copy link
Contributor

jreback commented Jan 16, 2015

you can directly implement clip I believe. I eg. you apply the clipper to the actual non-missing values, the return a new structure. This will be efficient. This is simply inheriting existing behavior (which is not correct).

So would love for:

  • implement this soln
  • mark methods which don't 'work' as NotImplemented (e.g. have tests for these systematically). If you'd like to attempt this, create another issue for this (and can put up a list)

@TomAugspurger
Copy link
Contributor

This is probably still an issue with DataFrame/Series[Sparse]. I don't think we dispatch to EAs for .clip.

@shaunvxc
Copy link

shaunvxc commented Nov 2, 2021

Yep, calling clip on any DataFrame with SparseArray types will throw:

TypeError: SparseArray does not support item assignment via setitem

Seems related to #21818?

@mroeschke mroeschke removed this from the Contributions Welcome milestone Oct 13, 2022
@topper-123
Copy link
Contributor

This works as intended now:

>>> arr = pd.arrays.SparseArray([0,1,np.nan,3,4])
>>> pd.Series(arr).clip(1,2)
0    1.0
1    1.0
2    NaN
3    2.0
4    2.0
dtype: Sparse[float64, nan]

Closing.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Enhancement Sparse Sparse Data Type
Projects
None yet
Development

No branches or pull requests

6 participants