Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CLN: ASV Algorithms benchmark #18423

Merged
merged 4 commits into from
Nov 25, 2017
Merged

Conversation

mroeschke
Copy link
Member

High Level changes:

  • The Algorithms class benchmark was getting large so I broke it into several smaller classes, avoiding creating data structures in each setup call that would only be used for 1 benchmark.

  • Added np.random.seed(1234) in setup classes where random data is created xref BENCH: put in np.random.seed on vbenches #8144

  • Utilized params and setup_cache where applicable.

  • Added additional hashing benchmarks

The benchmarks should be equivalent to what existed before. If the diff is too large I can break it up into smaller PRs. You can find 3 asv runs below.

$ asv run -b ^algorithms -q

[  5.26%] ··· Running algorithms.Hashing.time_frame                                                     58.8ms
[ 10.53%] ··· Running algorithms.Hashing.time_series_categorical                                        15.6ms
[ 15.79%] ··· Running algorithms.Hashing.time_series_dates                                              10.3ms
[ 21.05%] ··· Running algorithms.Hashing.time_series_float                                              13.7ms
[ 26.32%] ··· Running algorithms.Hashing.time_series_int                                                11.0ms
[ 31.58%] ··· Running algorithms.Hashing.time_series_string                                             29.7ms
[ 36.84%] ··· Running algorithms.Hashing.time_series_timedeltas                                         13.4ms
[ 42.11%] ··· Running algorithms.AddOverflowArray.time_add_overflow_arr_mask_nan                        45.8ms
[ 47.37%] ··· Running algorithms.AddOverflowArray.time_add_overflow_arr_rev                             32.5ms
[ 52.63%] ··· Running algorithms.AddOverflowArray.time_add_overflow_b_mask_nan                          43.2ms
[ 57.89%] ··· Running algorithms.AddOverflowArray.time_add_overflow_both_arg_nan                        40.2ms
[ 63.16%] ··· Running algorithms.AddOverflowScalar.time_add_overflow_scalar                         25.8ms;...
[ 68.42%] ··· Running algorithms.Duplicated.time_duplicated_float                                       40.1ms
[ 73.68%] ··· Running algorithms.Duplicated.time_duplicated_int                                         26.3ms
[ 78.95%] ··· Running algorithms.DuplicatedUniqueIndex.time_duplicated_unique_int                        374μs
[ 84.21%] ··· Running algorithms.Factorize.time_factorize_float                                         27.8ms
[ 89.47%] ··· Running algorithms.Factorize.time_factorize_int                                           18.3ms
[ 94.74%] ··· Running algorithms.Factorize.time_factorize_string                                        52.3ms
[100.00%] ··· Running algorithms.Match.time_match_string                                                 902μs

asv run -b ^algorithms -q
[  5.26%] ··· Running algorithms.Hashing.time_frame                                                     58.7ms
[ 10.53%] ··· Running algorithms.Hashing.time_series_categorical                                        12.4ms
[ 15.79%] ··· Running algorithms.Hashing.time_series_dates                                              9.24ms
[ 21.05%] ··· Running algorithms.Hashing.time_series_float                                              9.53ms
[ 26.32%] ··· Running algorithms.Hashing.time_series_int                                                9.53ms
[ 31.58%] ··· Running algorithms.Hashing.time_series_string                                             29.1ms
[ 36.84%] ··· Running algorithms.Hashing.time_series_timedeltas                                         9.57ms
[ 42.11%] ··· Running algorithms.AddOverflowArray.time_add_overflow_arr_mask_nan                        47.3ms
[ 47.37%] ··· Running algorithms.AddOverflowArray.time_add_overflow_arr_rev                             26.5ms
[ 52.63%] ··· Running algorithms.AddOverflowArray.time_add_overflow_b_mask_nan                          38.9ms
[ 57.89%] ··· Running algorithms.AddOverflowArray.time_add_overflow_both_arg_nan                        39.6ms
[ 63.16%] ··· Running algorithms.AddOverflowScalar.time_add_overflow_scalar                         24.2ms;...
[ 68.42%] ··· Running algorithms.Duplicated.time_duplicated_float                                       40.1ms
[ 73.68%] ··· Running algorithms.Duplicated.time_duplicated_int                                         28.5ms
[ 78.95%] ··· Running algorithms.DuplicatedUniqueIndex.time_duplicated_unique_int                        450μs
[ 84.21%] ··· Running algorithms.Factorize.time_factorize_float                                         31.6ms
[ 89.47%] ··· Running algorithms.Factorize.time_factorize_int                                           18.6ms
[ 94.74%] ··· Running algorithms.Factorize.time_factorize_string                                        56.0ms
[100.00%] ··· Running algorithms.Match.time_match_string                                                 788μs

asv run -b ^algorithms -q
[  5.26%] ··· Running algorithms.Hashing.time_frame                                                     57.4ms
[ 10.53%] ··· Running algorithms.Hashing.time_series_categorical                                        13.3ms
[ 15.79%] ··· Running algorithms.Hashing.time_series_dates                                              12.5ms
[ 21.05%] ··· Running algorithms.Hashing.time_series_float                                              9.70ms
[ 26.32%] ··· Running algorithms.Hashing.time_series_int                                                9.27ms
[ 31.58%] ··· Running algorithms.Hashing.time_series_string                                             30.0ms
[ 36.84%] ··· Running algorithms.Hashing.time_series_timedeltas                                         15.8ms
[ 42.11%] ··· Running algorithms.AddOverflowArray.time_add_overflow_arr_mask_nan                        40.5ms
[ 47.37%] ··· Running algorithms.AddOverflowArray.time_add_overflow_arr_rev                             25.5ms
[ 52.63%] ··· Running algorithms.AddOverflowArray.time_add_overflow_b_mask_nan                          38.7ms
[ 57.89%] ··· Running algorithms.AddOverflowArray.time_add_overflow_both_arg_nan                        44.5ms
[ 63.16%] ··· Running algorithms.AddOverflowScalar.time_add_overflow_scalar                         20.3ms;...
[ 68.42%] ··· Running algorithms.Duplicated.time_duplicated_float                                       45.3ms
[ 73.68%] ··· Running algorithms.Duplicated.time_duplicated_int                                         33.3ms
[ 78.95%] ··· Running algorithms.DuplicatedUniqueIndex.time_duplicated_unique_int                        332μs
[ 84.21%] ··· Running algorithms.Factorize.time_factorize_float                                         26.2ms
[ 89.47%] ··· Running algorithms.Factorize.time_factorize_int                                           18.5ms
[ 94.74%] ··· Running algorithms.Factorize.time_factorize_string                                        55.5ms
[100.00%] ··· Running algorithms.Match.time_match_string                                                 800μs

@jreback jreback added the Performance Memory or execution speed performance label Nov 22, 2017
@codecov
Copy link

codecov bot commented Nov 22, 2017

Codecov Report

Merging #18423 into master will decrease coverage by 0.04%.
The diff coverage is n/a.

Impacted file tree graph

@@            Coverage Diff             @@
##           master   #18423      +/-   ##
==========================================
- Coverage   91.35%   91.31%   -0.05%     
==========================================
  Files         163      163              
  Lines       49714    49714              
==========================================
- Hits        45415    45394      -21     
- Misses       4299     4320      +21
Flag Coverage Δ
#multiple 89.1% <ø> (-0.03%) ⬇️
#single 39.63% <ø> (-0.07%) ⬇️
Impacted Files Coverage Δ
pandas/io/gbq.py 25% <0%> (-58.34%) ⬇️
pandas/plotting/_converter.py 63.44% <0%> (-1.82%) ⬇️
pandas/core/frame.py 97.8% <0%> (-0.1%) ⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 103ea6f...a7858e3. Read the comment docs.

@codecov
Copy link

codecov bot commented Nov 22, 2017

Codecov Report

Merging #18423 into master will decrease coverage by 0.01%.
The diff coverage is n/a.

Impacted file tree graph

@@            Coverage Diff             @@
##           master   #18423      +/-   ##
==========================================
- Coverage   91.33%   91.32%   -0.02%     
==========================================
  Files         163      163              
  Lines       49752    49717      -35     
==========================================
- Hits        45443    45404      -39     
- Misses       4309     4313       +4
Flag Coverage Δ
#multiple 89.12% <ø> (ø) ⬆️
#single 40.51% <ø> (-0.27%) ⬇️
Impacted Files Coverage Δ
pandas/io/gbq.py 25% <0%> (-58.34%) ⬇️
pandas/core/frame.py 97.8% <0%> (-0.1%) ⬇️
pandas/core/indexes/interval.py 92.64% <0%> (-0.06%) ⬇️
pandas/core/indexes/timedeltas.py 91.17% <0%> (-0.04%) ⬇️
pandas/core/indexes/datetimes.py 95.49% <0%> (-0.04%) ⬇️
pandas/core/base.py 96.56% <0%> (+0.02%) ⬆️
pandas/core/indexes/base.py 96.43% <0%> (+0.03%) ⬆️
pandas/core/dtypes/cast.py 88.59% <0%> (+0.07%) ⬆️
pandas/core/series.py 94.84% <0%> (+0.07%) ⬆️
pandas/core/indexes/datetimelike.py 97.11% <0%> (+0.19%) ⬆️
... and 1 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update be66ef8...35c7af4. Read the comment docs.

Copy link
Contributor

@jreback jreback left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

minor comment. ping on green.


def time_duplicated_float(self):
self.float.duplicated()
params = [1, -1, 0]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you add params_names

@jreback jreback added Benchmark Performance (ASV) benchmarks and removed Performance Memory or execution speed performance labels Nov 23, 2017
@jreback jreback added this to the 0.22.0 milestone Nov 23, 2017
@mroeschke mroeschke force-pushed the asv_clean_algorithms branch from a7858e3 to 0b9a3e2 Compare November 24, 2017 06:50
np.random.seed(1234)
self.int_idx = pd.Int64Index(np.arange(N).repeat(5))
self.float_idx = pd.Float64Index(np.random.randn(N).repeat(5))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Breaking this into two classes it duplicating the setup, and the 'Factorize' and 'Duplicated' name is also already in the time_.. method names. So not sure it is necessarily a net improvement

(for this specific case I actually also think we could add a time_duplicated_string)

@mroeschke mroeschke force-pushed the asv_clean_algorithms branch from 0b9a3e2 to c8124ff Compare November 25, 2017 01:06
@mroeschke
Copy link
Member Author

@jorisvandenbossche for Factorize and Duplicated classed I added param arguments over their sort and keep kwags respectively for a little more motivation to keep them as separate classes. I think there may be cases in the future where specific Index data might want to be run on one method and not the other.


def time_add_overflow_zero_scalar(self):
self.checked_add(self.arr, 0)
class AddOverflowArray(object):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would move this to binary_ops.py (AddOverflow)

Try setup_cache

Add base ASV class and test Hashing

rework algorithms benchmarks

improve algorithms benchmark

Benchmarks working!

cleanup
@mroeschke mroeschke force-pushed the asv_clean_algorithms branch from c8124ff to 35c7af4 Compare November 25, 2017 18:56
@jreback jreback merged commit 1fab808 into pandas-dev:master Nov 25, 2017
@jreback
Copy link
Contributor

jreback commented Nov 25, 2017

thanks @mroeschke

@mroeschke mroeschke deleted the asv_clean_algorithms branch November 25, 2017 23:23
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Benchmark Performance (ASV) benchmarks
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants