PERF: Very slow clip performance #15400

wesm · 2017-02-14T19:00:26Z

Code Sample, a copy-pastable example if possible

In [38]: s = pd.Series(np.random.randn(30))

In [39]: timeit s.clip(0, 1)
100 loops, best of 3: 2.02 ms per loop

Problem description

There is more than 1000x performance difference between Series.clip and numpy.clip:

In [43]: timeit np.clip(arr, 0, 1)
1000000 loops, best of 3: 1.06 µs per loop

Output of `pd.show_versions()`

pandas 0.19.2

The text was updated successfully, but these errors were encountered:

jorisvandenbossche · 2017-02-14T20:06:45Z

I wondered where this huge difference came from. Not that I want to say that this big difference is not a problem, but this seems a consequence of its implemention / several slower functions that are used under the hood.
The clip is done in two separate steps for clip_upper and clip_lower. Each of those clips does a comparison to create a mask and then uses where; in where an align is done, etc:

In [89]: %timeit s.clip(0, 1)
100 loops, best of 3: 1.91 ms per loop

In [91]: %timeit s.clip_lower(0)
1000 loops, best of 3: 958 µs per loop

In [92]: %timeit s < 0
10000 loops, best of 3: 118 µs per loop

In [93]: mask = s < 0

In [94]: %timeit s.where(mask, 0)
1000 loops, best of 3: 395 µs per loop

In [100]: %timeit s.align(mask)
10000 loops, best of 3: 98.6 µs per loop

So it seems that several individual steps in the current implementation (creation of the mask, the alignment, ..) already take way longer than the actual clip in numpy. Probably each of those steps can be optimized, but you won't get a big speed-up with that I think. To get a big speed-up in pandas' clip, we should probably need a more low-level implementation.

When you look at a larger series, the difference is not that huge anymore:

In [32]: s = pd.Series(np.random.randn(100000))

In [33]: %timeit s.clip(0,1)
The slowest run took 8.48 times longer than the fastest. This could mean that an intermediate result is being cached.
100 loops, best of 3: 4.27 ms per loop

In [34]: arr = s.values

In [35]: %timeit np.clip(arr,0,1)
The slowest run took 4.41 times longer than the fastest. This could mean that an intermediate result is being cached.
1000 loops, best of 3: 558 µs per loop

wesm · 2017-02-20T03:32:11Z

Profile results of 100 runs

         301103 function calls (300903 primitive calls) in 0.220 seconds

   Ordered by: cumulative time

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
        1    0.000    0.000    0.220    0.220 {built-in method builtins.exec}
        1    0.000    0.000    0.220    0.220 <string>:1(<module>)
      100    0.001    0.000    0.220    0.002 generic.py:3825(clip)
      100    0.001    0.000    0.109    0.001 generic.py:3913(clip_lower)
      100    0.001    0.000    0.109    0.001 generic.py:3889(clip_upper)
      200    0.001    0.000    0.092    0.000 generic.py:4806(where)
      200    0.002    0.000    0.092    0.000 generic.py:4547(_where)
2000/1800    0.013    0.000    0.074    0.000 internals.py:2978(apply)
     2800    0.012    0.000    0.065    0.000 series.py:135(__init__)
      200    0.001    0.000    0.062    0.000 ops.py:903(wrapper)
      400    0.001    0.000    0.050    0.000 ops.py:907(<lambda>)
      200    0.001    0.000    0.047    0.000 ops.py:1039(flex_wrapper)
      600    0.002    0.000    0.040    0.000 series.py:2364(fillna)
      600    0.005    0.000    0.039    0.000 generic.py:3200(fillna)
      200    0.002    0.000    0.036    0.000 ops.py:803(wrapper)
      200    0.001    0.000    0.034    0.000 internals.py:3158(where)
      600    0.002    0.000    0.032    0.000 generic.py:3007(astype)
      600    0.002    0.000    0.031    0.000 generic.py:3057(copy)
      200    0.000    0.000    0.027    0.000 series.py:2342(align)
      200    0.001    0.000    0.027    0.000 generic.py:4379(align)
      200    0.001    0.000    0.026    0.000 generic.py:4470(_align_series)
      400    0.001    0.000    0.022    0.000 series.py:2360(reindex)
      400    0.002    0.000    0.022    0.000 generic.py:2224(reindex)
      400    0.000    0.000    0.021    0.000 series.py:2326(_reindex_inde

A single call to clip calls the Series constructor 28 times. Not good. I will try to look more deeply into fixing this if no one beats me to it

closes pandas-dev#15400

closes #15400

closes pandas-dev#15400

closes pandas-dev#15400 (cherry picked from commit 42e2a87)

closes #15400 (cherry picked from commit 42e2a87)

closes pandas-dev#15400

wesm added the Bug label Feb 14, 2017

wesm added this to the 0.20.0 milestone Feb 14, 2017

jreback added Difficulty Intermediate Performance Memory or execution speed performance and removed Bug labels Feb 14, 2017

cpcloud added the HackIllinois 2017 label Feb 25, 2017

jreback modified the milestones: 0.20.0, 0.21.0, Next Minor Release Mar 23, 2017

jreback modified the milestones: 0.20.2, Interesting Issues May 16, 2017

jreback added a commit to jreback/pandas that referenced this issue May 16, 2017

PERF: improved clip performance

daff5ea

closes pandas-dev#15400

jreback added a commit to jreback/pandas that referenced this issue May 16, 2017

PERF: improved clip performance

6efa1c8

closes pandas-dev#15400

jreback mentioned this issue May 16, 2017

PERF: improved clip performance #16364

Merged

jreback added a commit to jreback/pandas that referenced this issue May 16, 2017

PERF: improved clip performance

62843f8

closes pandas-dev#15400

jreback closed this as completed in #16364 May 16, 2017

jreback added a commit that referenced this issue May 16, 2017

PERF: improved clip performance (#16364)

42e2a87

closes #15400

pcluo pushed a commit to pcluo/pandas that referenced this issue May 22, 2017

PERF: improved clip performance (pandas-dev#16364)

a4730d5

closes pandas-dev#15400

TomAugspurger pushed a commit to TomAugspurger/pandas that referenced this issue May 29, 2017

PERF: improved clip performance (pandas-dev#16364)

41d90dc

closes pandas-dev#15400 (cherry picked from commit 42e2a87)

TomAugspurger pushed a commit that referenced this issue May 30, 2017

PERF: improved clip performance (#16364)

f16141f

closes #15400 (cherry picked from commit 42e2a87)

stangirala pushed a commit to stangirala/pandas that referenced this issue Jun 11, 2017

PERF: improved clip performance (pandas-dev#16364)

4c6b1c9

closes pandas-dev#15400

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

PERF: Very slow clip performance #15400

PERF: Very slow clip performance #15400

wesm commented Feb 14, 2017

jorisvandenbossche commented Feb 14, 2017

wesm commented Feb 20, 2017

PERF: Very slow clip performance #15400

PERF: Very slow clip performance #15400

Comments

wesm commented Feb 14, 2017

Code Sample, a copy-pastable example if possible

Problem description

Output of pd.show_versions()

jorisvandenbossche commented Feb 14, 2017

wesm commented Feb 20, 2017

Output of `pd.show_versions()`