PERF: For GH23814, return early in Categorical.init #23888

eoveson · 2018-11-24T22:57:33Z

closes equality comparison with a scalar is slow for category (performance regression) #23814
tests added / passed
passes git diff upstream/master -u -- "*.py" | flake8 --diff
whatsnew entry

pep8speaks · 2018-11-24T22:57:36Z

Hello @eoveson! Thanks for submitting the PR.

There are no PEP8 issues in the file asv_bench/benchmarks/categoricals.py !
There are no PEP8 issues in the file pandas/core/arrays/categorical.py !

codecov · 2018-11-24T23:32:49Z

Codecov Report

Merging #23888 into master will increase coverage by 0.01%.
The diff coverage is 100%.

@@            Coverage Diff             @@
##           master   #23888      +/-   ##
==========================================
+ Coverage   92.29%    92.3%   +0.01%     
==========================================
  Files         161      161              
  Lines       51498    51556      +58     
==========================================
+ Hits        47530    47590      +60     
+ Misses       3968     3966       -2

Flag	Coverage Δ
#multiple	`90.7% <100%> (+0.01%)`	⬆️
#single	`42.43% <0%> (ø)`	⬆️

Impacted Files	Coverage Δ
pandas/core/arrays/categorical.py	`95.4% <100%> (+0.04%)`	⬆️
pandas/core/arrays/timedeltas.py	`95.95% <0%> (-0.49%)`	⬇️
pandas/plotting/_misc.py	`38.68% <0%> (-0.31%)`	⬇️
pandas/core/indexes/base.py	`96.32% <0%> (-0.17%)`	⬇️
pandas/core/arrays/datetimes.py	`98.37% <0%> (-0.14%)`	⬇️
pandas/tseries/offsets.py	`96.84% <0%> (-0.14%)`	⬇️
pandas/core/ops.py	`94.14% <0%> (-0.14%)`	⬇️
pandas/core/config.py	`87.04% <0%> (-0.13%)`	⬇️
pandas/io/sas/sas_xport.py	`90.14% <0%> (-0.1%)`	⬇️
pandas/io/formats/printing.py	`93.01% <0%> (-0.08%)`	⬇️
... and 49 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update d865e52...9e270e9. Read the comment docs.

gfyoung · 2018-11-25T02:18:51Z

@eoveson : Thanks for the PR! Can you run asv to check performance benchmarks?

jreback · 2018-11-25T02:31:47Z

pandas/core/arrays/categorical.py

@@ -314,6 +314,16 @@ class Categorical(ExtensionArray, PandasObject):
    def __init__(self, values, categories=None, ordered=None, dtype=None,
                 fastpath=False):

+        # GH23814, for perf, if no optional params used and values already an


i think we can just move this down to where the fastpath check is now; you can add this on i think. this constructor is already amazing too complicated.

I think at that point, the arg dtype, and maybe categories, will be set. I wanted to only use this early return if none of the optional args were specified (I believe @TomAugsperger was suggesting this in the issue thread).

@eoveson I would still like to investiagte consolidating some of this code. This is a very complicated constructor and more code is not great here. See if you can add it lower down, even if its slightly lower perf.

@jreback , Ok let me look into this and see if I can consolidate some of the code..

Updated, please check it out when you get a chance

eoveson · 2018-11-25T19:23:01Z

@gfyoung -- yes, ran a subset of the asv suite (tried to target categorical), I can run the entire suite also. It reported no significant difference (maybe because there was no existing test for this scenario? -- which is why I added the new perf test)
(asv continuous -f 1.1 upstream/master category-perf -b ^categorical

gfyoung · 2018-11-25T20:27:45Z

(maybe because there was no existing test for this scenario? -- which is why I added the new perf test)

The test output should list all performance tests that were run. If it's not there, create a new branch off master with just the performance test added, and compare the two branches.

eoveson · 2018-11-27T05:22:44Z

The test output should list all performance tests that were run. If it's not there, create a new branch off master with just the performance test added, and compare the two branches.

@gfyoung, Yes, I saw the test I added show in the output when I ran the command I mentioned. Should I run all of the asv tests (tried running all asv tests, but it failed when it was about 1/3 done with a file access error for a temporary file), or should I target categorical tests?

Btw, this is the error I saw when trying to run all the asv tests:

[ 32.67%] ▒▒▒ Running (index_object.Indexing.time_get_loc--).
Traceback (most recent call last):
  File "c:\users\erikov\appdata\local\continuum\anaconda3\lib\runpy.py", line 193, in _run_module_as_main
    "__main__", mod_spec)
  File "c:\users\erikov\appdata\local\continuum\anaconda3\lib\runpy.py", line 85, in _run_code
    exec(code, run_globals)
  File "C:\Users\erikov\AppData\Local\Continuum\anaconda3\scripts\asv.exe\__main__.py", line 9, in <module>
  File "c:\users\erikov\appdata\local\continuum\anaconda3\lib\site-packages\asv\main.py", line 38, in main
    result = args.func(args)
  File "c:\users\erikov\appdata\local\continuum\anaconda3\lib\site-packages\asv\commands\__init__.py", line 49, in run_from_args
    return cls.run_from_conf_args(conf, args)
  File "c:\users\erikov\appdata\local\continuum\anaconda3\lib\site-packages\asv\commands\continuous.py", line 72,in run_from_conf_args
    launch_method=args.launch_method, **kwargs
  File "c:\users\erikov\appdata\local\continuum\anaconda3\lib\site-packages\asv\commands\continuous.py", line 106, in run
    _returns=run_objs, _machine_file=_machine_file)
  File "c:\users\erikov\appdata\local\continuum\anaconda3\lib\site-packages\asv\commands\run.py", line 406, in run
    launch_method=launch_method)
  File "c:\users\erikov\appdata\local\continuum\anaconda3\lib\site-packages\asv\runner.py", line 349, in run_benchmarks
    cwd=cache_dir)
  File "c:\users\erikov\appdata\local\continuum\anaconda3\lib\site-packages\asv\runner.py", line 515, in run_benchmark
    cwd=cwd)
  File "c:\users\erikov\appdata\local\continuum\anaconda3\lib\site-packages\asv\runner.py", line 647, in _run_benchmark_single_param
    os.remove(result_file.name)
PermissionError: [WinError 5] Access is denied: 'C:\\Users\\erikov\\AppData\\Local\\Temp\\tmpfq5htpg5'

gfyoung · 2018-11-27T05:31:39Z

Yes, I saw the test I added show in the output when I ran the command I mentioned. Should I run all of the asv tests (tried running all asv tests, but it failed when it was about 1/3 done with a file access error for a temporary file), or should I target categorical tests?

Running the Categorical tests is fine. I'm concerned though...you didn't see any noticeable improvement in performance, even with your newly added test?

eoveson · 2018-11-27T05:39:49Z

Well, this change only helps in the case that they have passed in an existing instance of Categorical to Categorical.init, and used no optional params. Not sure how common that would be in the tests? I didn't see that test case in the file I added the test case to. But I'm also not exactly sure how this asv test suite works. How does it get a baseline to compare against (since machine specs are different)? Am I supposed to create a baseline on my machine without my changes, and then run with my changes? If so, I didn't do that. I simply ran the asv command I mentioned, so I'm not sure if I'm doing things correctly..

gfyoung · 2018-11-27T05:40:49Z

Well, this change only helps in the case that they have passed in an existing instance of Categorical to Categorical.init, and used no optional params.

Right, but didn't you say you saw no substantial changes in performance?

I simply ran the asv command I mentioned, so I'm not sure if I'm doing things correctly..

Can you copy / paste the output of your ASV?

eoveson · 2018-11-27T05:42:32Z

I see now that you specifically mentioned the test I added. So I should have seen a difference for that test, so I guess I need to run with that new test, but without my real changes to init to create the baseline first?

gfyoung · 2018-11-27T05:43:36Z

so I guess I need to run with that new test, but without my real changes to init to create the baseline first?

Exactly. That's why I said earlier:

create a new branch off master with just the performance test added, and compare the two branches.

eoveson · 2018-11-27T05:49:44Z

Exactly. That's why I said earlier:

create a new branch off master with just the performance test added, and compare the two branches.

Ah, makes sense, thanks. I'll compare the two branches and get back to you (I'll first work on the code consolidation requested by jreback since that may impact things).

jreback

tiny doc comment. ping when pushed.

jreback · 2018-11-29T13:56:23Z

doc/source/whatsnew/v0.24.0.rst

@@ -1150,7 +1150,7 @@ Performance Improvements
 - Improved performance of :func:`pd.concat` for `Series` objects (:issue:`23404`)
 - Improved performance of :meth:`DatetimeIndex.normalize` and :meth:`Timestamp.normalize` for timezone naive or UTC datetimes (:issue:`23634`)
 - Improved performance of :meth:`DatetimeIndex.tz_localize` and various ``DatetimeIndex`` attributes with dateutil UTC timezone (:issue:`23772`)
-
+- Improved performance of :meth:`Categorical.__init__` (:issue:`23814`)


say constructor rather than referring to __init__

@jreback , updated doc string, and also added asv test that exercises the code (first one didn't, but left it since still useful) (you can see my comment about asv results to gfyoung)

eoveson · 2018-11-29T16:46:00Z

@gfyoung , it turns out that asv test I added previously was not exercising the code (I should have been passing in a Series rather than Categorical to the constructor). I added a new asv test for this (but left the other one since it could still be useful). I re-ran asv, and did see a significant difference reported in that newly added test and one other test. I didn't expect that other test to change, so I re-ran the same command and looking at the numbers that test doesn't change much. (However, the reporting no longer says my newly added test shows significant difference in the second run of the command, even though I do see the same difference from the first run of the command). So I think things are ok now, but pasting the output here so you can take a look.

Here is the first execution of the command (and then down below you will see the second one):

$ asv continuous -f 1.1 upstream/master category-perf -b categorical
▒ Creating environments
▒ Discovering benchmarks
▒▒ Uninstalling from conda-py3.6-Cython-matplotlib-numexpr-numpy-openpyxl-pytables-pytest-scipy-sqlalchemy-xlrd-xlsxwriter-xlwt
▒▒ Building 9e270e9 for conda-py3.6-Cython-matplotlib-numexpr-numpy-openpyxl-pytables-pytest-scipy-sqlalchemy-xlrd-xlsxwriter-xlwt
▒▒ Installing 9e270e9 into conda-py3.6-Cython-matplotlib-numexpr-numpy-openpyxl-pytables-pytest-scipy-sqlalchemy-xlrd-xlsxwriter-xlwt
▒ Running 66 total benchmarks (2 commits * 1 environments * 33 benchmarks)
[ 0.00%] ▒ For pandas commit 3e01c38 <master^2> (round 1/2):
[ 0.00%] ▒▒ Building for conda-py3.6-Cython-matplotlib-numexpr-numpy-openpyxl-pytables-pytest-scipy-sqlalchemy-xlrd-xlsxwriter-xlwt
[ 0.00%] ▒▒ Benchmarking conda-py3.6-Cython-matplotlib-numexpr-numpy-openpyxl-pytables-pytest-scipy-sqlalchemy-xlrd-xlsxwriter-xlwt
[ 0.76%] ▒▒▒ Setting up algorithms.py:83 ok
[ 0.76%] ▒▒▒ Running (algorithms.Hashing.time_series_categorical--)...
[ 3.03%] ▒▒▒ Running (categoricals.CategoricalSlicing.time_getitem_list_like--)..
[ 4.55%] ▒▒▒ Running (categoricals.CategoricalSlicing.time_getitem_slice--)..
[ 6.06%] ▒▒▒ Running (categoricals.Concat.time_union--).................
[ 18.94%] ▒▒▒ Running (categoricals.Rank.time_rank_int--).........
[ 25.00%] ▒ For pandas commit 9e270e9 (round 1/2):
[ 25.00%] ▒▒ Building for conda-py3.6-Cython-matplotlib-numexpr-numpy-openpyxl-pytables-pytest-scipy-sqlalchemy-xlrd-xlsxwriter-xlwt
[ 25.00%] ▒▒ Benchmarking conda-py3.6-Cython-matplotlib-numexpr-numpy-openpyxl-pytables-pytest-scipy-sqlalchemy-xlrd-xlsxwriter-xlwt
[ 25.76%] ▒▒▒ Setting up algorithms.py:83 ok
[ 25.76%] ▒▒▒ Running (algorithms.Hashing.time_series_categorical--)..
[ 27.27%] ▒▒▒ Running (categoricals.CategoricalSlicing.time_getitem_list--).
[ 28.03%] ▒▒▒ Running (categoricals.CategoricalSlicing.time_getitem_list_like--)..
[ 29.55%] ▒▒▒ Running (categoricals.CategoricalSlicing.time_getitem_slice--)...
[ 31.82%] ▒▒▒ Running (categoricals.Constructor.time_all_nan--)................
[ 43.94%] ▒▒▒ Running (categoricals.Rank.time_rank_int--).........
[ 50.00%] ▒ For pandas commit 9e270e9 (round 2/2):
[ 50.00%] ▒▒ Benchmarking conda-py3.6-Cython-matplotlib-numexpr-numpy-openpyxl-pytables-pytest-scipy-sqlalchemy-xlrd-xlsxwriter-xlwt
[ 50.76%] ▒▒▒ Setting up algorithms.py:83 ok
[ 50.76%] ▒▒▒ algorithms.Hashing.time_series_categorical 15.6▒8ms
[ 51.52%] ▒▒▒ ...s.CategoricalSlicing.time_getitem_bool_array ok
[ 51.52%] ▒▒▒ ================ ==========
index
---------------- ----------
monotonic_incr 2.60▒3ms
monotonic_decr 3.91▒0ms
non_monotonic 15.6▒0ms
================ ==========

[ 52.27%] ▒▒▒ ...oricals.CategoricalSlicing.time_getitem_list ok monotonic_incr 679~0us

[ 52.27%] ▒▒▒ ================ ============
index
---------------- ------------
monotonic_decr 1.12▒1ms
non_monotonic 0▒600000ns
================ ============

[ 53.03%] ▒▒▒ ...ls.CategoricalSlicing.time_getitem_list_like ok monotonic_incr 12.55us
monotonic_decr 14.10us
non_monotonic 14.1~0us

[ 53.03%] ▒▒▒ ================ ==========
index
---------------- ----------
================ ==========

[ 53.79%] ▒▒▒ ...icals.CategoricalSlicing.time_getitem_scalar ok monotonic_incr 5.052us
monotonic_decr 5.730us
non_monotonic 5.04~0us

[ 53.79%] ▒▒▒ ================ ==========
index
---------------- ----------
================ ==========

[ 54.55%] ▒▒▒ ...ricals.CategoricalSlicing.time_getitem_slice ok monotonic_incr 3.934us
monotonic_decr 4.274us
non_monotonic 8.45~0us

[ 54.55%] ▒▒▒ ================ ==========
index
---------------- ----------
================ ==========

[ 55.30%] ▒▒▒ categoricals.Concat.time_concat 7.81▒0ms
[ 56.06%] ▒▒▒ categoricals.Concat.time_union 15.6▒8ms
[ 56.82%] ▒▒▒ categoricals.Constructor.time_all_nan 31.2▒0ms
[ 57.58%] ▒▒▒ categoricals.Constructor.time_datetimes 1.30▒0.7ms
[ 58.33%] ▒▒▒ ...goricals.Constructor.time_datetimes_with_nat 1.30▒0ms
[ 59.09%] ▒▒▒ ...ricals.Constructor.time_existing_categorical 3.12▒1ms
[ 59.85%] ▒▒▒ categoricals.Constructor.time_existing_series 42.220us
[ 60.61%] ▒▒▒ categoricals.Constructor.time_fastpath 460200us
[ 61.36%] ▒▒▒ ...oricals.Constructor.time_from_codes_all_int8 347200us
[ 62.12%] ▒▒▒ categoricals.Constructor.time_regular 46.9▒6ms
[ 62.88%] ▒▒▒ categoricals.Constructor.time_with_nan 156▒0ms
[ 63.64%] ▒▒▒ categoricals.Contains.time_categorical_contains 78.90us
[ 64.39%] ▒▒▒ ...als.Contains.time_categorical_index_contains 3.460us
[ 65.15%] ▒▒▒ ...me_categorical_index_is_monotonic_decreasing 450▒0ns
[ 65.91%] ▒▒▒ ...me_categorical_index_is_monotonic_increasing 457▒200ns
[ 66.67%] ▒▒▒ ...e_categorical_series_is_monotonic_decreasing 51.40us
[ 67.42%] ▒▒▒ ...e_categorical_series_is_monotonic_increasing 62.5~20us
[ 68.18%] ▒▒▒ categoricals.Isin.time_isin_categorical ok
[ 68.18%] ▒▒▒ ======== ==========
dtype
-------- ----------
object 15.6▒0ms
int64 15.6▒6ms
======== ==========

[ 68.94%] ▒▒▒ categoricals.Rank.time_rank_int 11.7▒4ms
[ 69.70%] ▒▒▒ categoricals.Rank.time_rank_int_cat 7.81▒0ms
[ 70.45%] ▒▒▒ categoricals.Rank.time_rank_int_cat_ordered 0▒8000000ns
[ 71.21%] ▒▒▒ categoricals.Rank.time_rank_string 172▒10ms
[ 71.97%] ▒▒▒ categoricals.Rank.time_rank_string_cat 15.6▒0ms
[ 72.73%] ▒▒▒ categoricals.Rank.time_rank_string_cat_ordered 15.6▒6ms
[ 73.48%] ▒▒▒ categoricals.Repr.time_rendering 744~0us
[ 74.24%] ▒▒▒ categoricals.SetCategories.time_set_categories 31.2▒8ms
[ 75.00%] ▒▒▒ categoricals.ValueCounts.time_value_counts ok
[ 75.00%] ▒▒▒ ======== ==========
dropna
-------- ----------
True 15.6▒0ms
False 15.6▒0ms
======== ==========

[ 75.00%] ▒ For pandas commit 3e01c38 <master^2> (round 2/2):
[ 75.00%] ▒▒ Building for conda-py3.6-Cython-matplotlib-numexpr-numpy-openpyxl-pytables-pytest-scipy-sqlalchemy-xlrd-xlsxwriter-xlwt
[ 75.00%] ▒▒ Benchmarking conda-py3.6-Cython-matplotlib-numexpr-numpy-openpyxl-pytables-pytest-scipy-sqlalchemy-xlrd-xlsxwriter-xlwt
[ 75.76%] ▒▒▒ Setting up algorithms.py:83 ok
[ 75.76%] ▒▒▒ algorithms.Hashing.time_series_categorical 7.81▒3ms
[ 76.52%] ▒▒▒ ...s.CategoricalSlicing.time_getitem_bool_array ok
[ 76.52%] ▒▒▒ ================ ==========
index
---------------- ----------
monotonic_incr 5.21▒2ms
monotonic_decr 3.91▒1ms
non_monotonic 7.81▒4ms
================ ==========

[ 77.27%] ▒▒▒ ...oricals.CategoricalSlicing.time_getitem_list ok monotonic_incr 625200us
monotonic_decr 601300us
non_monotonic 539~200us

[ 77.27%] ▒▒▒ ================ ===========
index
---------------- -----------
================ ===========

[ 78.03%] ▒▒▒ ...ls.CategoricalSlicing.time_getitem_list_like ok monotonic_incr 12.50us
monotonic_decr 6.426us
non_monotonic 11.4~0us

[ 78.03%] ▒▒▒ ================ ==========
index
---------------- ----------
================ ==========

[ 78.79%] ▒▒▒ ...icals.CategoricalSlicing.time_getitem_scalar ok monotonic_decr 6.270us
non_monotonic 4.730us

[ 78.79%] ▒▒▒ ================ ==========
index
---------------- ----------
monotonic_incr 0▒0ns
================ ==========

[ 79.55%] ▒▒▒ ...ricals.CategoricalSlicing.time_getitem_slice ok monotonic_incr 7.820us
non_monotonic 8.520us

[ 79.55%] ▒▒▒ ================ ==========
index
---------------- ----------
monotonic_decr 0▒6000ns
================ ==========

[ 80.30%] ▒▒▒ categoricals.Concat.time_concat 7.81▒0ms
[ 81.06%] ▒▒▒ categoricals.Concat.time_union 7.81▒0ms
[ 81.82%] ▒▒▒ categoricals.Constructor.time_all_nan 31.2▒0ms
[ 82.58%] ▒▒▒ categoricals.Constructor.time_datetimes 1.42▒0ms
[ 83.33%] ▒▒▒ ...goricals.Constructor.time_datetimes_with_nat 1.30▒0ms
[ 84.09%] ▒▒▒ ...ricals.Constructor.time_existing_categorical 2.60▒0ms
[ 84.85%] ▒▒▒ categoricals.Constructor.time_existing_series 3.12▒0ms
[ 85.61%] ▒▒▒ categoricals.Constructor.time_fastpath 4600us
[ 86.36%] ▒▒▒ ...oricals.Constructor.time_from_codes_all_int8 434200us
[ 87.12%] ▒▒▒ categoricals.Constructor.time_regular 46.9▒10ms
[ 87.88%] ▒▒▒ categoricals.Constructor.time_with_nan 148▒10ms
[ 88.64%] ▒▒▒ categoricals.Contains.time_categorical_contains 0▒40000ns
[ 89.39%] ▒▒▒ ...als.Contains.time_categorical_index_contains 2.840us
[ 90.15%] ▒▒▒ ...me_categorical_index_is_monotonic_decreasing 312▒0ns
[ 90.91%] ▒▒▒ ...me_categorical_index_is_monotonic_increasing 312▒0ns
[ 91.67%] ▒▒▒ ...e_categorical_series_is_monotonic_decreasing 47.30us
[ 92.42%] ▒▒▒ ...e_categorical_series_is_monotonic_increasing 56.8~0us
[ 93.18%] ▒▒▒ categoricals.Isin.time_isin_categorical ok
[ 93.18%] ▒▒▒ ======== ==========
dtype
-------- ----------
object 15.6▒0ms
int64 15.6▒8ms
======== ==========

[ 93.94%] ▒▒▒ categoricals.Rank.time_rank_int 7.81▒4ms
[ 94.70%] ▒▒▒ categoricals.Rank.time_rank_int_cat 7.81▒4ms
[ 95.45%] ▒▒▒ categoricals.Rank.time_rank_int_cat_ordered 7.81▒4ms
[ 96.21%] ▒▒▒ categoricals.Rank.time_rank_string 156▒10ms
[ 96.97%] ▒▒▒ categoricals.Rank.time_rank_string_cat 15.6▒0ms
[ 97.73%] ▒▒▒ categoricals.Rank.time_rank_string_cat_ordered 7.81▒0ms
[ 98.48%] ▒▒▒ categoricals.Repr.time_rendering 710~0us
[ 99.24%] ▒▒▒ categoricals.SetCategories.time_set_categories 31.2▒6ms
[100.00%] ▒▒▒ categoricals.ValueCounts.time_value_counts ok
[100.00%] ▒▒▒ ======== ==========
dropna
-------- ----------
True 15.6▒0ms
False 15.6▒0ms
======== ==========

  0~40000ns         78.9~0us      n/a  -        3.12~0ms        42.2~20us     0.01         before          after         ratio

[3e01c38] [9e270e9]
<master^2>
categoricals.Contains.time_categorical_contains
categoricals.Constructor.time_existing_series

SOME BENCHMARKS HAVE CHANGED SIGNIFICANTLY.

$ asv continuous -f 1.1 upstream/master category-perf -b categorical
▒ Creating environments
▒ Discovering benchmarks
▒▒ Uninstalling from conda-py3.6-Cython-matplotlib-numexpr-numpy-openpyxl-pytables-pytest-scipy-sqlalchemy-xlrd-xlsxwriter-xlwt
▒▒ Installing 9e270e9 into conda-py3.6-Cython-matplotlib-numexpr-numpy-openpyxl-pytables-pytest-scipy-sqlalchemy-xlrd-xlsxwriter-xlwt
▒ Running 66 total benchmarks (2 commits * 1 environments * 33 benchmarks)
[ 0.00%] ▒ For pandas commit 3e01c38 <master^2> (round 1/2):
[ 0.00%] ▒▒ Building for conda-py3.6-Cython-matplotlib-numexpr-numpy-openpyxl-pytables-pytest-scipy-sqlalchemy-xlrd-xlsxwriter-xlwt
[ 0.00%] ▒▒ Benchmarking conda-py3.6-Cython-matplotlib-numexpr-numpy-openpyxl-pytables-pytest-scipy-sqlalchemy-xlrd-xlsxwriter-xlwt
[ 0.76%] ▒▒▒ Setting up algorithms.py:83 ok
[ 0.76%] ▒▒▒ Running (algorithms.Hashing.time_series_categorical--)...
[ 3.03%] ▒▒▒ Running (categoricals.CategoricalSlicing.time_getitem_list_like--)..
[ 4.55%] ▒▒▒ Running (categoricals.CategoricalSlicing.time_getitem_slice--).....
[ 8.33%] ▒▒▒ Running (categoricals.Constructor.time_datetimes_with_nat--)..............
[ 18.94%] ▒▒▒ Running (categoricals.Rank.time_rank_int--).........
[ 25.00%] ▒ For pandas commit 9e270e9 (round 1/2):
[ 25.00%] ▒▒ Building for conda-py3.6-Cython-matplotlib-numexpr-numpy-openpyxl-pytables-pytest-scipy-sqlalchemy-xlrd-xlsxwriter-xlwt
[ 25.00%] ▒▒ Benchmarking conda-py3.6-Cython-matplotlib-numexpr-numpy-openpyxl-pytables-pytest-scipy-sqlalchemy-xlrd-xlsxwriter-xlwt
[ 25.76%] ▒▒▒ Setting up algorithms.py:83 ok
[ 25.76%] ▒▒▒ Running (algorithms.Hashing.time_series_categorical--)...
[ 28.03%] ▒▒▒ Running (categoricals.CategoricalSlicing.time_getitem_list_like--)..
[ 29.55%] ▒▒▒ Running (categoricals.CategoricalSlicing.time_getitem_slice--).....
[ 33.33%] ▒▒▒ Running (categoricals.Constructor.time_datetimes_with_nat--)..............
[ 43.94%] ▒▒▒ Running (categoricals.Rank.time_rank_int--).........
[ 50.00%] ▒ For pandas commit 9e270e9 (round 2/2):
[ 50.00%] ▒▒ Benchmarking conda-py3.6-Cython-matplotlib-numexpr-numpy-openpyxl-pytables-pytest-scipy-sqlalchemy-xlrd-xlsxwriter-xlwt
[ 50.76%] ▒▒▒ Setting up algorithms.py:83 ok
[ 50.76%] ▒▒▒ algorithms.Hashing.time_series_categorical 7.81▒0ms
[ 51.52%] ▒▒▒ ...s.CategoricalSlicing.time_getitem_bool_array ok
[ 51.52%] ▒▒▒ ================ ==========
index
---------------- ----------
monotonic_incr 3.91▒0ms
monotonic_decr 3.91▒0ms
non_monotonic 7.81▒0ms
================ ==========

[ 52.27%] ▒▒▒ ...oricals.CategoricalSlicing.time_getitem_list ok monotonic_incr 601300us
monotonic_decr 5580us
non_monotonic 521~0us

[ 52.27%] ▒▒▒ ================ ===========
index
---------------- -----------
================ ===========

[ 53.03%] ▒▒▒ ...ls.CategoricalSlicing.time_getitem_list_like ok monotonic_incr 6.416us
monotonic_decr 11.70us
non_monotonic 12.8~6us

[ 53.03%] ▒▒▒ ================ ==========
index
---------------- ----------
================ ==========

[ 53.79%] ▒▒▒ ...icals.CategoricalSlicing.time_getitem_scalar ok monotonic_incr 4.510us
monotonic_decr 4.312us
non_monotonic 4.55~2us

[ 53.79%] ▒▒▒ ================ ==========
index
---------------- ----------
================ ==========

[ 54.55%] ▒▒▒ ...ricals.CategoricalSlicing.time_getitem_slice ok monotonic_incr 7.163us
monotonic_decr 7.150us
non_monotonic 7.69~0us

[ 54.55%] ▒▒▒ ================ ==========
index
---------------- ----------
================ ==========

[ 55.30%] ▒▒▒ categoricals.Concat.time_concat 7.81▒0ms
[ 56.06%] ▒▒▒ categoricals.Concat.time_union 7.81▒0ms
[ 56.82%] ▒▒▒ categoricals.Constructor.time_all_nan 31.2▒0ms
[ 57.58%] ▒▒▒ categoricals.Constructor.time_datetimes 1.20▒0.5ms
[ 58.33%] ▒▒▒ ...goricals.Constructor.time_datetimes_with_nat 1.30▒0ms
[ 59.09%] ▒▒▒ ...ricals.Constructor.time_existing_categorical 3.12▒0ms
[ 59.85%] ▒▒▒ categoricals.Constructor.time_existing_series 42.20us
[ 60.61%] ▒▒▒ categoricals.Constructor.time_fastpath 3550us
[ 61.36%] ▒▒▒ ...oricals.Constructor.time_from_codes_all_int8 446200us
[ 62.12%] ▒▒▒ categoricals.Constructor.time_regular 46.9▒6ms
[ 62.88%] ▒▒▒ categoricals.Constructor.time_with_nan 141▒0ms
[ 63.64%] ▒▒▒ categoricals.Contains.time_categorical_contains 63.30us
[ 64.39%] ▒▒▒ ...als.Contains.time_categorical_index_contains 2.720us
[ 65.15%] ▒▒▒ ...me_categorical_index_is_monotonic_decreasing 383▒100ns
[ 65.91%] ▒▒▒ ...me_categorical_index_is_monotonic_increasing 323▒0ns
[ 66.67%] ▒▒▒ ...e_categorical_series_is_monotonic_decreasing 51.120us
[ 67.42%] ▒▒▒ ...e_categorical_series_is_monotonic_increasing 46.6~0us
[ 68.18%] ▒▒▒ categoricals.Isin.time_isin_categorical ok
[ 68.18%] ▒▒▒ ======== ==========
dtype
-------- ----------
object 15.6▒0ms
int64 15.6▒0ms
======== ==========

[ 68.94%] ▒▒▒ categoricals.Rank.time_rank_int 7.81▒3ms
[ 69.70%] ▒▒▒ categoricals.Rank.time_rank_int_cat 7.81▒3ms
[ 70.45%] ▒▒▒ categoricals.Rank.time_rank_int_cat_ordered 7.81▒0ms
[ 71.21%] ▒▒▒ categoricals.Rank.time_rank_string 141▒8ms
[ 71.97%] ▒▒▒ categoricals.Rank.time_rank_string_cat 15.6▒0ms
[ 72.73%] ▒▒▒ categoricals.Rank.time_rank_string_cat_ordered 7.81▒0ms
[ 73.48%] ▒▒▒ categoricals.Repr.time_rendering 710~0us
[ 74.24%] ▒▒▒ categoricals.SetCategories.time_set_categories 15.6▒6ms
[ 75.00%] ▒▒▒ categoricals.ValueCounts.time_value_counts ok
[ 75.00%] ▒▒▒ ======== ==========
dropna
-------- ----------
True 15.6▒0ms
False 15.6▒0ms
======== ==========

[ 75.00%] ▒ For pandas commit 3e01c38 <master^2> (round 2/2):
[ 75.00%] ▒▒ Building for conda-py3.6-Cython-matplotlib-numexpr-numpy-openpyxl-pytables-pytest-scipy-sqlalchemy-xlrd-xlsxwriter-xlwt
[ 75.00%] ▒▒ Benchmarking conda-py3.6-Cython-matplotlib-numexpr-numpy-openpyxl-pytables-pytest-scipy-sqlalchemy-xlrd-xlsxwriter-xlwt
[ 75.76%] ▒▒▒ Setting up algorithms.py:83 ok
[ 75.76%] ▒▒▒ algorithms.Hashing.time_series_categorical 7.81▒0ms
[ 76.52%] ▒▒▒ ...s.CategoricalSlicing.time_getitem_bool_array ok
[ 76.52%] ▒▒▒ ================ ==========
index
---------------- ----------
monotonic_incr 5.21▒2ms
monotonic_decr 3.91▒0ms
non_monotonic 7.81▒0ms
================ ==========

[ 77.27%] ▒▒▒ ...oricals.CategoricalSlicing.time_getitem_list ok monotonic_incr 601200us
monotonic_decr 5390us
non_monotonic 539~200us

[ 77.27%] ▒▒▒ ================ ===========
index
---------------- -----------
================ ===========

[ 78.03%] ▒▒▒ ...ls.CategoricalSlicing.time_getitem_list_like ok monotonic_incr 11.40us
monotonic_decr 12.55us
non_monotonic 12.7~0us

[ 78.03%] ▒▒▒ ================ ==========
index
---------------- ----------
================ ==========

[ 78.79%] ▒▒▒ ...icals.CategoricalSlicing.time_getitem_scalar ok monotonic_incr 4.370us
monotonic_decr 4.702us
non_monotonic 4.70~0us

[ 78.79%] ▒▒▒ ================ ==========
index
---------------- ----------
================ ==========

[ 79.55%] ▒▒▒ ...ricals.CategoricalSlicing.time_getitem_slice ok monotonic_incr 7.230us
monotonic_decr 7.100us
non_monotonic 7.84~0us

[ 79.55%] ▒▒▒ ================ ==========
index
---------------- ----------
================ ==========

[ 80.30%] ▒▒▒ categoricals.Concat.time_concat 7.81▒0ms
[ 81.06%] ▒▒▒ categoricals.Concat.time_union 7.81▒0ms
[ 81.82%] ▒▒▒ categoricals.Constructor.time_all_nan 31.2▒6ms
[ 82.58%] ▒▒▒ categoricals.Constructor.time_datetimes 1.30▒0.5ms
[ 83.33%] ▒▒▒ ...goricals.Constructor.time_datetimes_with_nat 1.30▒0ms
[ 84.09%] ▒▒▒ ...ricals.Constructor.time_existing_categorical 3.12▒0ms
[ 84.85%] ▒▒▒ categoricals.Constructor.time_existing_series 3.12▒2ms
[ 85.61%] ▒▒▒ categoricals.Constructor.time_fastpath 411200us
[ 86.36%] ▒▒▒ ...oricals.Constructor.time_from_codes_all_int8 3470us
[ 87.12%] ▒▒▒ categoricals.Constructor.time_regular 46.9▒8ms
[ 87.88%] ▒▒▒ categoricals.Constructor.time_with_nan 141▒0ms
[ 88.64%] ▒▒▒ categoricals.Contains.time_categorical_contains 71.70us
[ 89.39%] ▒▒▒ ...als.Contains.time_categorical_index_contains 2.781us
[ 90.15%] ▒▒▒ ...me_categorical_index_is_monotonic_decreasing 316▒100ns
[ 90.91%] ▒▒▒ ...me_categorical_index_is_monotonic_increasing 343▒100ns
[ 91.67%] ▒▒▒ ...e_categorical_series_is_monotonic_decreasing 52.10us
[ 92.42%] ▒▒▒ ...e_categorical_series_is_monotonic_increasing 46.820us
[ 93.18%] ▒▒▒ categoricals.Isin.time_isin_categorical ok
[ 93.18%] ▒▒▒ ======== ==========
dtype
-------- ----------
object 15.6▒0ms
int64 15.6▒6ms
======== ==========

[ 93.94%] ▒▒▒ categoricals.Rank.time_rank_int 7.81▒3ms
[ 94.70%] ▒▒▒ categoricals.Rank.time_rank_int_cat 7.81▒4ms
[ 95.45%] ▒▒▒ categoricals.Rank.time_rank_int_cat_ordered 7.81▒0ms
[ 96.21%] ▒▒▒ categoricals.Rank.time_rank_string 141▒8ms
[ 96.97%] ▒▒▒ categoricals.Rank.time_rank_string_cat 15.6▒0ms
[ 97.73%] ▒▒▒ categoricals.Rank.time_rank_string_cat_ordered 7.81▒3ms
[ 98.48%] ▒▒▒ categoricals.Repr.time_rendering 679~300us
[ 99.24%] ▒▒▒ categoricals.SetCategories.time_set_categories 31.2▒0ms
[100.00%] ▒▒▒ categoricals.ValueCounts.time_value_counts ok
[100.00%] ▒▒▒ ======== ==========
dropna
-------- ----------
True 15.6▒0ms
False 15.6▒0ms
======== ==========

BENCHMARKS NOT SIGNIFICANTLY CHANGED.

jreback · 2018-11-29T17:01:06Z

@eoveson can you show a before / after using timeit in ipython

eoveson · 2018-11-29T17:10:57Z

@eoveson can you show a before / after using timeit in ipython

For sure. Before my change:

In [2]: s = pd.Series(list('abcd') * 1000000).astype('category')

In [3]: %timeit s == 'a'
25.7 ms ± 409 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)

In [4]: %timeit s.cat.codes == s.cat.categories.get_loc('a')
3.29 ms ± 70.6 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

After change:

In [6]: s = pd.Series(list('abcd') * 1000000).astype('category')

In [7]: %timeit s == 'a'
5.24 ms ± 97.7 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

In [8]: %timeit s.cat.codes == s.cat.categories.get_loc('a')
3.28 ms ± 70 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

jreback · 2018-11-30T21:37:23Z

thanks @eoveson

…3888)

erikoveson and others added 12 commits November 17, 2018 20:17

TST: Add test case for GH14080 for overflow exception

5349052

TST: Add test case for GH14080 for overflow exception

a7df6ea

TST: Add test case for GH14080 for overflow exception

f7be8f3

TST: For GH14080, break up tests, test message

c727003

Merge remote-tracking branch 'upstream/master'

2eaa8fd

TST: For GH4861, Period and datetime in multiindex

eecede1

TST: GH4861 move changes to different test file

ae90f93

Merge branch 'master' into PR_TOOL_MERGE_PR_23776

90af4a5

fix sorting

4f16c8b

Merge remote-tracking branch 'upstream/master'

89871a0

PERF: For GH23814, return early in Categorical.__init__

3e96734

PERF: For GH23814, add whatsnew entry

f6d10b8

eoveson changed the title ~~Category perf~~ PERF: For GH23814, return early in Categorical.__init__ Nov 24, 2018

gfyoung added Performance Memory or execution speed performance Categorical Categorical Data Type labels Nov 25, 2018

jreback requested changes Nov 25, 2018

View reviewed changes

erikoveson added 2 commits November 28, 2018 21:20

Consolidating code to fit existing pattern better

2173c89

Fix comment

325be92

jreback requested changes Nov 29, 2018

View reviewed changes

jreback added this to the 0.24.0 milestone Nov 29, 2018

Add asv test that exercises code, update docs

9e270e9

jreback approved these changes Nov 30, 2018

View reviewed changes

jreback merged commit bdeddb1 into pandas-dev:master Nov 30, 2018

saurav2608 pushed a commit to saurav2608/pandas that referenced this pull request Dec 1, 2018

PERF: For GH23814, return early in Categorical.__init__ (pandas-dev#2…

f161593

…3888)

eoveson deleted the category-perf branch December 1, 2018 03:21

batterseapower mentioned this pull request Feb 15, 2019

pd.Categorical(Series, categories=..) returns broken data with out-of-bound codes #25318

Closed

Pingviinituutti pushed a commit to Pingviinituutti/pandas that referenced this pull request Feb 28, 2019

PERF: For GH23814, return early in Categorical.__init__ (pandas-dev#2…

d5b7c93

…3888)

Pingviinituutti pushed a commit to Pingviinituutti/pandas that referenced this pull request Feb 28, 2019

PERF: For GH23814, return early in Categorical.__init__ (pandas-dev#2…

97e22b6

…3888)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

PERF: For GH23814, return early in Categorical.init #23888

PERF: For GH23814, return early in Categorical.init #23888

eoveson commented Nov 24, 2018

pep8speaks commented Nov 24, 2018

codecov bot commented Nov 24, 2018 •

edited

Loading

gfyoung commented Nov 25, 2018

jreback Nov 25, 2018

eoveson Nov 25, 2018

jreback Nov 27, 2018

eoveson Nov 27, 2018

eoveson Nov 29, 2018

eoveson commented Nov 25, 2018

gfyoung commented Nov 25, 2018 •

edited

Loading

eoveson commented Nov 27, 2018

gfyoung commented Nov 27, 2018 •

edited

Loading

eoveson commented Nov 27, 2018

gfyoung commented Nov 27, 2018 •

edited

Loading

eoveson commented Nov 27, 2018

gfyoung commented Nov 27, 2018

eoveson commented Nov 27, 2018 •

edited

Loading

jreback left a comment

jreback Nov 29, 2018

eoveson Nov 29, 2018

eoveson commented Nov 29, 2018

jreback commented Nov 29, 2018

eoveson commented Nov 29, 2018

jreback commented Nov 30, 2018

PERF: For GH23814, return early in Categorical.__init__ #23888

PERF: For GH23814, return early in Categorical.__init__ #23888

Conversation

eoveson commented Nov 24, 2018

pep8speaks commented Nov 24, 2018

codecov bot commented Nov 24, 2018 • edited Loading

Codecov Report

gfyoung commented Nov 25, 2018

jreback Nov 25, 2018

Choose a reason for hiding this comment

eoveson Nov 25, 2018

Choose a reason for hiding this comment

jreback Nov 27, 2018

Choose a reason for hiding this comment

eoveson Nov 27, 2018

Choose a reason for hiding this comment

eoveson Nov 29, 2018

Choose a reason for hiding this comment

eoveson commented Nov 25, 2018

gfyoung commented Nov 25, 2018 • edited Loading

eoveson commented Nov 27, 2018

gfyoung commented Nov 27, 2018 • edited Loading

eoveson commented Nov 27, 2018

gfyoung commented Nov 27, 2018 • edited Loading

eoveson commented Nov 27, 2018

gfyoung commented Nov 27, 2018

eoveson commented Nov 27, 2018 • edited Loading

jreback left a comment

Choose a reason for hiding this comment

jreback Nov 29, 2018

Choose a reason for hiding this comment

eoveson Nov 29, 2018

Choose a reason for hiding this comment

eoveson commented Nov 29, 2018

jreback commented Nov 29, 2018

eoveson commented Nov 29, 2018

jreback commented Nov 30, 2018

PERF: For GH23814, return early in Categorical.init #23888

PERF: For GH23814, return early in Categorical.init #23888

codecov bot commented Nov 24, 2018 •

edited

Loading

gfyoung commented Nov 25, 2018 •

edited

Loading

gfyoung commented Nov 27, 2018 •

edited

Loading

gfyoung commented Nov 27, 2018 •

edited

Loading

eoveson commented Nov 27, 2018 •

edited

Loading