Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PERF: Very slow clip performance #15400

Closed
wesm opened this issue Feb 14, 2017 · 2 comments · Fixed by #16364
Closed

PERF: Very slow clip performance #15400

wesm opened this issue Feb 14, 2017 · 2 comments · Fixed by #16364
Labels
Performance Memory or execution speed performance
Milestone

Comments

@wesm
Copy link
Member

wesm commented Feb 14, 2017

Code Sample, a copy-pastable example if possible

In [38]: s = pd.Series(np.random.randn(30))

In [39]: timeit s.clip(0, 1)
100 loops, best of 3: 2.02 ms per loop

Problem description

There is more than 1000x performance difference between Series.clip and numpy.clip:

In [43]: timeit np.clip(arr, 0, 1)
1000000 loops, best of 3: 1.06 µs per loop

Output of pd.show_versions()

pandas 0.19.2

@wesm wesm added the Bug label Feb 14, 2017
@wesm wesm added this to the 0.20.0 milestone Feb 14, 2017
@jorisvandenbossche
Copy link
Member

I wondered where this huge difference came from. Not that I want to say that this big difference is not a problem, but this seems a consequence of its implemention / several slower functions that are used under the hood.
The clip is done in two separate steps for clip_upper and clip_lower. Each of those clips does a comparison to create a mask and then uses where; in where an align is done, etc:

In [89]: %timeit s.clip(0, 1)
100 loops, best of 3: 1.91 ms per loop

In [91]: %timeit s.clip_lower(0)
1000 loops, best of 3: 958 µs per loop

In [92]: %timeit s < 0
10000 loops, best of 3: 118 µs per loop

In [93]: mask = s < 0

In [94]: %timeit s.where(mask, 0)
1000 loops, best of 3: 395 µs per loop

In [100]: %timeit s.align(mask)
10000 loops, best of 3: 98.6 µs per loop

So it seems that several individual steps in the current implementation (creation of the mask, the alignment, ..) already take way longer than the actual clip in numpy. Probably each of those steps can be optimized, but you won't get a big speed-up with that I think. To get a big speed-up in pandas' clip, we should probably need a more low-level implementation.

When you look at a larger series, the difference is not that huge anymore:

In [32]: s = pd.Series(np.random.randn(100000))

In [33]: %timeit s.clip(0,1)
The slowest run took 8.48 times longer than the fastest. This could mean that an intermediate result is being cached.
100 loops, best of 3: 4.27 ms per loop

In [34]: arr = s.values

In [35]: %timeit np.clip(arr,0,1)
The slowest run took 4.41 times longer than the fastest. This could mean that an intermediate result is being cached.
1000 loops, best of 3: 558 µs per loop

@jreback jreback added Difficulty Intermediate Performance Memory or execution speed performance and removed Bug labels Feb 14, 2017
@wesm
Copy link
Member Author

wesm commented Feb 20, 2017

Profile results of 100 runs

         301103 function calls (300903 primitive calls) in 0.220 seconds

   Ordered by: cumulative time

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
        1    0.000    0.000    0.220    0.220 {built-in method builtins.exec}
        1    0.000    0.000    0.220    0.220 <string>:1(<module>)
      100    0.001    0.000    0.220    0.002 generic.py:3825(clip)
      100    0.001    0.000    0.109    0.001 generic.py:3913(clip_lower)
      100    0.001    0.000    0.109    0.001 generic.py:3889(clip_upper)
      200    0.001    0.000    0.092    0.000 generic.py:4806(where)
      200    0.002    0.000    0.092    0.000 generic.py:4547(_where)
2000/1800    0.013    0.000    0.074    0.000 internals.py:2978(apply)
     2800    0.012    0.000    0.065    0.000 series.py:135(__init__)
      200    0.001    0.000    0.062    0.000 ops.py:903(wrapper)
      400    0.001    0.000    0.050    0.000 ops.py:907(<lambda>)
      200    0.001    0.000    0.047    0.000 ops.py:1039(flex_wrapper)
      600    0.002    0.000    0.040    0.000 series.py:2364(fillna)
      600    0.005    0.000    0.039    0.000 generic.py:3200(fillna)
      200    0.002    0.000    0.036    0.000 ops.py:803(wrapper)
      200    0.001    0.000    0.034    0.000 internals.py:3158(where)
      600    0.002    0.000    0.032    0.000 generic.py:3007(astype)
      600    0.002    0.000    0.031    0.000 generic.py:3057(copy)
      200    0.000    0.000    0.027    0.000 series.py:2342(align)
      200    0.001    0.000    0.027    0.000 generic.py:4379(align)
      200    0.001    0.000    0.026    0.000 generic.py:4470(_align_series)
      400    0.001    0.000    0.022    0.000 series.py:2360(reindex)
      400    0.002    0.000    0.022    0.000 generic.py:2224(reindex)
      400    0.000    0.000    0.021    0.000 series.py:2326(_reindex_inde

A single call to clip calls the Series constructor 28 times. Not good. I will try to look more deeply into fixing this if no one beats me to it

@jreback jreback modified the milestones: 0.20.0, 0.21.0, Next Minor Release Mar 23, 2017
@jreback jreback modified the milestones: 0.20.2, Interesting Issues May 16, 2017
jreback added a commit to jreback/pandas that referenced this issue May 16, 2017
jreback added a commit to jreback/pandas that referenced this issue May 16, 2017
jreback added a commit to jreback/pandas that referenced this issue May 16, 2017
jreback added a commit that referenced this issue May 16, 2017
pcluo pushed a commit to pcluo/pandas that referenced this issue May 22, 2017
TomAugspurger pushed a commit to TomAugspurger/pandas that referenced this issue May 29, 2017
TomAugspurger pushed a commit that referenced this issue May 30, 2017
closes #15400
(cherry picked from commit 42e2a87)
stangirala pushed a commit to stangirala/pandas that referenced this issue Jun 11, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Performance Memory or execution speed performance
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants