diff --git a/.github/CONTRIBUTING.md b/.github/CONTRIBUTING.md index 21df1a3aacd59..faff68b636109 100644 --- a/.github/CONTRIBUTING.md +++ b/.github/CONTRIBUTING.md @@ -8,16 +8,16 @@ Our main contributing guide can be found [in this repo](https://github.com/panda If you are looking to contribute to the *pandas* codebase, the best place to start is the [GitHub "issues" tab](https://github.com/pandas-dev/pandas/issues). This is also a great place for filing bug reports and making suggestions for ways in which we can improve the code and documentation. -If you have additional questions, feel free to ask them on the [mailing list](https://groups.google.com/forum/?fromgroups#!forum/pydata) or on [Gitter](https://gitter.im/pydata/pandas). Further information can also be found in the "[Where to start?](https://github.com/pandas-dev/pandas/blob/master/doc/source/contributing.rst#where-to-start)" section. +If you have additional questions, feel free to ask them on the [mailing list](https://groups.google.com/forum/?fromgroups#!forum/pydata) or on [Gitter](https://gitter.im/pydata/pandas). Further information can also be found in the "[Where to start?](https://github.com/pandas-dev/pandas/blob/master/doc/source/development/contributing.rst#where-to-start)" section. ## Filing Issues -If you notice a bug in the code or documentation, or have suggestions for how we can improve either, feel free to create an issue on the [GitHub "issues" tab](https://github.com/pandas-dev/pandas/issues) using [GitHub's "issue" form](https://github.com/pandas-dev/pandas/issues/new). The form contains some questions that will help us best address your issue. For more information regarding how to file issues against *pandas*, please refer to the "[Bug reports and enhancement requests](https://github.com/pandas-dev/pandas/blob/master/doc/source/contributing.rst#bug-reports-and-enhancement-requests)" section. +If you notice a bug in the code or documentation, or have suggestions for how we can improve either, feel free to create an issue on the [GitHub "issues" tab](https://github.com/pandas-dev/pandas/issues) using [GitHub's "issue" form](https://github.com/pandas-dev/pandas/issues/new). The form contains some questions that will help us best address your issue. For more information regarding how to file issues against *pandas*, please refer to the "[Bug reports and enhancement requests](https://github.com/pandas-dev/pandas/blob/master/doc/source/development/contributing.rst#bug-reports-and-enhancement-requests)" section. ## Contributing to the Codebase -The code is hosted on [GitHub](https://www.github.com/pandas-dev/pandas), so you will need to use [Git](http://git-scm.com/) to clone the project and make changes to the codebase. Once you have obtained a copy of the code, you should create a development environment that is separate from your existing Python environment so that you can make and test changes without compromising your own work environment. For more information, please refer to the "[Working with the code](https://github.com/pandas-dev/pandas/blob/master/doc/source/contributing.rst#working-with-the-code)" section. +The code is hosted on [GitHub](https://www.github.com/pandas-dev/pandas), so you will need to use [Git](http://git-scm.com/) to clone the project and make changes to the codebase. Once you have obtained a copy of the code, you should create a development environment that is separate from your existing Python environment so that you can make and test changes without compromising your own work environment. For more information, please refer to the "[Working with the code](https://github.com/pandas-dev/pandas/blob/master/doc/source/development/contributing.rst#working-with-the-code)" section. -Before submitting your changes for review, make sure to check that your changes do not break any tests. You can find more information about our test suites in the "[Test-driven development/code writing](https://github.com/pandas-dev/pandas/blob/master/doc/source/contributing.rst#test-driven-development-code-writing)" section. We also have guidelines regarding coding style that will be enforced during testing, which can be found in the "[Code standards](https://github.com/pandas-dev/pandas/blob/master/doc/source/contributing.rst#code-standards)" section. +Before submitting your changes for review, make sure to check that your changes do not break any tests. You can find more information about our test suites in the "[Test-driven development/code writing](https://github.com/pandas-dev/pandas/blob/master/doc/source/contributing.rst#test-driven-development-code-writing)" section. We also have guidelines regarding coding style that will be enforced during testing, which can be found in the "[Code standards](https://github.com/pandas-dev/pandas/blob/master/doc/source/development/contributing.rst#code-standards)" section. -Once your changes are ready to be submitted, make sure to push your changes to GitHub before creating a pull request. Details about how to do that can be found in the "[Contributing your changes to pandas](https://github.com/pandas-dev/pandas/blob/master/doc/source/contributing.rst#contributing-your-changes-to-pandas)" section. We will review your changes, and you will most likely be asked to make additional changes before it is finally ready to merge. However, once it's ready, we will merge it, and you will have successfully contributed to the codebase! +Once your changes are ready to be submitted, make sure to push your changes to GitHub before creating a pull request. Details about how to do that can be found in the "[Contributing your changes to pandas](https://github.com/pandas-dev/pandas/blob/master/doc/source/development/contributing.rst#contributing-your-changes-to-pandas)" section. We will review your changes, and you will most likely be asked to make additional changes before it is finally ready to merge. However, once it's ready, we will merge it, and you will have successfully contributed to the codebase! diff --git a/.gitignore b/.gitignore index 4598714db6c6a..816aff376fc83 100644 --- a/.gitignore +++ b/.gitignore @@ -101,7 +101,8 @@ asv_bench/pandas/ # Documentation generated files # ################################# doc/source/generated -doc/source/api/generated +doc/source/user_guide/styled.xlsx +doc/source/reference/api doc/source/_static doc/source/vbench doc/source/vbench.rst @@ -109,6 +110,5 @@ doc/source/index.rst doc/build/html/index.html # Windows specific leftover: doc/tmp.sv -doc/source/styled.xlsx env/ doc/source/savefig/ diff --git a/Makefile b/Makefile index d2bd067950fd0..956ff52338839 100644 --- a/Makefile +++ b/Makefile @@ -23,4 +23,3 @@ doc: cd doc; \ python make.py clean; \ python make.py html - python make.py spellcheck diff --git a/asv_bench/benchmarks/__init__.py b/asv_bench/benchmarks/__init__.py index e69de29bb2d1d..eada147852fe1 100644 --- a/asv_bench/benchmarks/__init__.py +++ b/asv_bench/benchmarks/__init__.py @@ -0,0 +1 @@ +"""Pandas benchmarks.""" diff --git a/asv_bench/benchmarks/algorithms.py b/asv_bench/benchmarks/algorithms.py index 34fb161e5afcb..74849d330f2bc 100644 --- a/asv_bench/benchmarks/algorithms.py +++ b/asv_bench/benchmarks/algorithms.py @@ -5,7 +5,6 @@ import pandas as pd from pandas.util import testing as tm - for imp in ['pandas.util', 'pandas.tools.hashing']: try: hashing = import_module(imp) @@ -142,4 +141,4 @@ def time_quantile(self, quantile, interpolation, dtype): self.idx.quantile(quantile, interpolation=interpolation) -from .pandas_vb_common import setup # noqa: F401 +from .pandas_vb_common import setup # noqa: F401 isort:skip diff --git a/asv_bench/benchmarks/categoricals.py b/asv_bench/benchmarks/categoricals.py index e5dab0cb066aa..4b5b2848f7e0f 100644 --- a/asv_bench/benchmarks/categoricals.py +++ b/asv_bench/benchmarks/categoricals.py @@ -223,12 +223,19 @@ class CategoricalSlicing(object): def setup(self, index): N = 10**6 - values = list('a' * N + 'b' * N + 'c' * N) - indices = { - 'monotonic_incr': pd.Categorical(values), - 'monotonic_decr': pd.Categorical(reversed(values)), - 'non_monotonic': pd.Categorical(list('abc' * N))} - self.data = indices[index] + categories = ['a', 'b', 'c'] + values = [0] * N + [1] * N + [2] * N + if index == 'monotonic_incr': + self.data = pd.Categorical.from_codes(values, + categories=categories) + elif index == 'monotonic_decr': + self.data = pd.Categorical.from_codes(list(reversed(values)), + categories=categories) + elif index == 'non_monotonic': + self.data = pd.Categorical.from_codes([0, 1, 2] * N, + categories=categories) + else: + raise ValueError('Invalid index param: {}'.format(index)) self.scalar = 10000 self.list = list(range(10000)) diff --git a/asv_bench/benchmarks/ctors.py b/asv_bench/benchmarks/ctors.py index 9082b4186bfa4..5715c4fb2d0d4 100644 --- a/asv_bench/benchmarks/ctors.py +++ b/asv_bench/benchmarks/ctors.py @@ -72,7 +72,7 @@ class SeriesDtypesConstructors(object): def setup(self): N = 10**4 - self.arr = np.random.randn(N, N) + self.arr = np.random.randn(N) self.arr_str = np.array(['foo', 'bar', 'baz'], dtype=object) self.s = Series([Timestamp('20110101'), Timestamp('20120101'), Timestamp('20130101')] * N * 10) diff --git a/asv_bench/benchmarks/index_object.py b/asv_bench/benchmarks/index_object.py index f76040921393f..bbe164d4858ab 100644 --- a/asv_bench/benchmarks/index_object.py +++ b/asv_bench/benchmarks/index_object.py @@ -138,7 +138,8 @@ def setup(self, dtype): self.sorted = self.idx.sort_values() half = N // 2 self.non_unique = self.idx[:half].append(self.idx[:half]) - self.non_unique_sorted = self.sorted[:half].append(self.sorted[:half]) + self.non_unique_sorted = (self.sorted[:half].append(self.sorted[:half]) + .sort_values()) self.key = self.sorted[N // 4] def time_boolean_array(self, dtype): diff --git a/asv_bench/benchmarks/strings.py b/asv_bench/benchmarks/strings.py index e9f2727f64e15..b5b2c955f0133 100644 --- a/asv_bench/benchmarks/strings.py +++ b/asv_bench/benchmarks/strings.py @@ -102,10 +102,10 @@ def setup(self, repeats): N = 10**5 self.s = Series(tm.makeStringIndex(N)) repeat = {'int': 1, 'array': np.random.randint(1, 3, N)} - self.repeat = repeat[repeats] + self.values = repeat[repeats] def time_repeat(self, repeats): - self.s.str.repeat(self.repeat) + self.s.str.repeat(self.values) class Cat(object): diff --git a/azure-pipelines.yml b/azure-pipelines.yml index f0567d76659b6..c86d5c50705a8 100644 --- a/azure-pipelines.yml +++ b/azure-pipelines.yml @@ -104,7 +104,7 @@ jobs: if git diff upstream/master --name-only | grep -q "^asv_bench/"; then cd asv_bench asv machine --yes - ASV_OUTPUT="$(asv dev)" + ASV_OUTPUT="$(asv run --quick --show-stderr --python=same --launch-method=spawn)" if [[ $(echo "$ASV_OUTPUT" | grep "failed") ]]; then echo "##vso[task.logissue type=error]Benchmarks run with errors" echo "$ASV_OUTPUT" diff --git a/ci/code_checks.sh b/ci/code_checks.sh index c8bfc564e7573..5c9d20e483ce4 100755 --- a/ci/code_checks.sh +++ b/ci/code_checks.sh @@ -93,7 +93,7 @@ if [[ -z "$CHECK" || "$CHECK" == "lint" ]]; then # this particular codebase (e.g. src/headers, src/klib, src/msgpack). However, # we can lint all header files since they aren't "generated" like C files are. MSG='Linting .c and .h' ; echo $MSG - cpplint --quiet --extensions=c,h --headers=h --recursive --filter=-readability/casting,-runtime/int,-build/include_subdir pandas/_libs/src/*.h pandas/_libs/src/parser pandas/_libs/ujson pandas/_libs/tslibs/src/datetime + cpplint --quiet --extensions=c,h --headers=h --recursive --filter=-readability/casting,-runtime/int,-build/include_subdir pandas/_libs/src/*.h pandas/_libs/src/parser pandas/_libs/ujson pandas/_libs/tslibs/src/datetime pandas/io/msgpack pandas/_libs/*.cpp pandas/util RET=$(($RET + $?)) ; echo $MSG "DONE" echo "isort --version-number" @@ -174,9 +174,10 @@ if [[ -z "$CHECK" || "$CHECK" == "patterns" ]]; then MSG='Check that no file in the repo contains tailing whitespaces' ; echo $MSG set -o pipefail if [[ "$AZURE" == "true" ]]; then - ! grep -n --exclude="*.svg" -RI "\s$" * | awk -F ":" '{print "##vso[task.logissue type=error;sourcepath=" $1 ";linenumber=" $2 ";] Tailing whitespaces found: " $3}' + # we exclude all c/cpp files as the c/cpp files of pandas code base are tested when Linting .c and .h files + ! grep -n '--exclude=*.'{svg,c,cpp,html} -RI "\s$" * | awk -F ":" '{print "##vso[task.logissue type=error;sourcepath=" $1 ";linenumber=" $2 ";] Tailing whitespaces found: " $3}' else - ! grep -n --exclude="*.svg" -RI "\s$" * | awk -F ":" '{print $1 ":" $2 ":Tailing whitespaces found: " $3}' + ! grep -n '--exclude=*.'{svg,c,cpp,html} -RI "\s$" * | awk -F ":" '{print $1 ":" $2 ":Tailing whitespaces found: " $3}' fi RET=$(($RET + $?)) ; echo $MSG "DONE" fi @@ -206,7 +207,7 @@ if [[ -z "$CHECK" || "$CHECK" == "doctests" ]]; then MSG='Doctests frame.py' ; echo $MSG pytest -q --doctest-modules pandas/core/frame.py \ - -k"-axes -combine -itertuples -join -pivot_table -query -reindex -reindex_axis -round" + -k" -itertuples -join -reindex -reindex_axis -round" RET=$(($RET + $?)) ; echo $MSG "DONE" MSG='Doctests series.py' ; echo $MSG @@ -240,8 +241,8 @@ fi ### DOCSTRINGS ### if [[ -z "$CHECK" || "$CHECK" == "docstrings" ]]; then - MSG='Validate docstrings (GL06, GL07, GL09, SS04, PR03, PR05, EX04)' ; echo $MSG - $BASE_DIR/scripts/validate_docstrings.py --format=azure --errors=GL06,GL07,GL09,SS04,PR03,PR05,EX04 + MSG='Validate docstrings (GL06, GL07, GL09, SS04, PR03, PR05, PR10, EX04, RT04, SS05, SA05)' ; echo $MSG + $BASE_DIR/scripts/validate_docstrings.py --format=azure --errors=GL06,GL07,GL09,SS04,PR03,PR05,EX04,RT04,SS05,SA05 RET=$(($RET + $?)) ; echo $MSG "DONE" fi diff --git a/doc/cheatsheet/Pandas_Cheat_Sheet.pdf b/doc/cheatsheet/Pandas_Cheat_Sheet.pdf index 696ed288cf7a6..d50896dc5ccc5 100644 Binary files a/doc/cheatsheet/Pandas_Cheat_Sheet.pdf and b/doc/cheatsheet/Pandas_Cheat_Sheet.pdf differ diff --git a/doc/cheatsheet/Pandas_Cheat_Sheet.pptx b/doc/cheatsheet/Pandas_Cheat_Sheet.pptx index f8b98a6f1f8e4..95f2771017db5 100644 Binary files a/doc/cheatsheet/Pandas_Cheat_Sheet.pptx and b/doc/cheatsheet/Pandas_Cheat_Sheet.pptx differ diff --git a/doc/cheatsheet/Pandas_Cheat_Sheet_JA.pdf b/doc/cheatsheet/Pandas_Cheat_Sheet_JA.pdf index daa65a944e68a..05e4b87f6a210 100644 Binary files a/doc/cheatsheet/Pandas_Cheat_Sheet_JA.pdf and b/doc/cheatsheet/Pandas_Cheat_Sheet_JA.pdf differ diff --git a/doc/cheatsheet/Pandas_Cheat_Sheet_JA.pptx b/doc/cheatsheet/Pandas_Cheat_Sheet_JA.pptx index 6270a71e20ee8..cb0f058db5448 100644 Binary files a/doc/cheatsheet/Pandas_Cheat_Sheet_JA.pptx and b/doc/cheatsheet/Pandas_Cheat_Sheet_JA.pptx differ diff --git a/doc/make.py b/doc/make.py index 0b14a9dcd4c34..438c4a04a3f08 100755 --- a/doc/make.py +++ b/doc/make.py @@ -15,15 +15,18 @@ import sys import os import shutil +import csv import subprocess import argparse import webbrowser +import docutils +import docutils.parsers.rst DOC_PATH = os.path.dirname(os.path.abspath(__file__)) SOURCE_PATH = os.path.join(DOC_PATH, 'source') BUILD_PATH = os.path.join(DOC_PATH, 'build') -BUILD_DIRS = ['doctrees', 'html', 'latex', 'plots', '_static', '_templates'] +REDIRECTS_FILE = os.path.join(DOC_PATH, 'redirects.csv') class DocBuilder: @@ -50,7 +53,7 @@ def __init__(self, num_jobs=0, include_api=True, single_doc=None, if single_doc and single_doc.endswith('.rst'): self.single_doc_html = os.path.splitext(single_doc)[0] + '.html' elif single_doc: - self.single_doc_html = 'api/generated/pandas.{}.html'.format( + self.single_doc_html = 'reference/api/pandas.{}.html'.format( single_doc) def _process_single_doc(self, single_doc): @@ -60,7 +63,7 @@ def _process_single_doc(self, single_doc): For example, categorial.rst or pandas.DataFrame.head. For the latter, return the corresponding file path - (e.g. generated/pandas.DataFrame.head.rst). + (e.g. reference/api/pandas.DataFrame.head.rst). """ base_name, extension = os.path.splitext(single_doc) if extension in ('.rst', '.ipynb'): @@ -118,8 +121,6 @@ def _sphinx_build(self, kind): raise ValueError('kind must be html or latex, ' 'not {}'.format(kind)) - self.clean() - cmd = ['sphinx-build', '-b', kind] if self.num_jobs: cmd += ['-j', str(self.num_jobs)] @@ -139,6 +140,77 @@ def _open_browser(self, single_doc_html): single_doc_html) webbrowser.open(url, new=2) + def _get_page_title(self, page): + """ + Open the rst file `page` and extract its title. + """ + fname = os.path.join(SOURCE_PATH, '{}.rst'.format(page)) + option_parser = docutils.frontend.OptionParser( + components=(docutils.parsers.rst.Parser,)) + doc = docutils.utils.new_document( + '', + option_parser.get_default_values()) + with open(fname) as f: + data = f.read() + + parser = docutils.parsers.rst.Parser() + # do not generate any warning when parsing the rst + with open(os.devnull, 'a') as f: + doc.reporter.stream = f + parser.parse(data, doc) + + section = next(node for node in doc.children + if isinstance(node, docutils.nodes.section)) + title = next(node for node in section.children + if isinstance(node, docutils.nodes.title)) + + return title.astext() + + def _add_redirects(self): + """ + Create in the build directory an html file with a redirect, + for every row in REDIRECTS_FILE. + """ + html = ''' + + + + + +

+ The page has been moved to {title} +

+ + + ''' + with open(REDIRECTS_FILE) as mapping_fd: + reader = csv.reader(mapping_fd) + for row in reader: + if not row or row[0].strip().startswith('#'): + continue + + path = os.path.join(BUILD_PATH, + 'html', + *row[0].split('/')) + '.html' + + try: + title = self._get_page_title(row[1]) + except Exception: + # the file can be an ipynb and not an rst, or docutils + # may not be able to read the rst because it has some + # sphinx specific stuff + title = 'this page' + + if os.path.exists(path): + raise RuntimeError(( + 'Redirection would overwrite an existing file: ' + '{}').format(path)) + + with open(path, 'w') as moved_page_fd: + moved_page_fd.write( + html.format(url='{}.html'.format(row[1]), + title=title)) + def html(self): """ Build HTML documentation. @@ -150,6 +222,8 @@ def html(self): if self.single_doc_html is not None: self._open_browser(self.single_doc_html) + else: + self._add_redirects() return ret_code def latex(self, force=False): @@ -184,7 +258,7 @@ def clean(): Clean documentation generated files. """ shutil.rmtree(BUILD_PATH, ignore_errors=True) - shutil.rmtree(os.path.join(SOURCE_PATH, 'api', 'generated'), + shutil.rmtree(os.path.join(SOURCE_PATH, 'reference', 'api'), ignore_errors=True) def zip_html(self): diff --git a/doc/redirects.csv b/doc/redirects.csv new file mode 100644 index 0000000000000..a7886779c97d5 --- /dev/null +++ b/doc/redirects.csv @@ -0,0 +1,1581 @@ +# This file should contain all the redirects in the documentation +# in the format `,` + +# whatsnew +whatsnew,whatsnew/index +release,whatsnew/index + +# getting started +10min,getting_started/10min +basics,getting_started/basics +comparison_with_r,getting_started/comparison/comparison_with_r +comparison_with_sql,getting_started/comparison/comparison_with_sql +comparison_with_sas,getting_started/comparison/comparison_with_sas +comparison_with_stata,getting_started/comparison/comparison_with_stata +dsintro,getting_started/dsintro +overview,getting_started/overview +tutorials,getting_started/tutorials + +# user guide +advanced,user_guide/advanced +categorical,user_guide/categorical +computation,user_guide/computation +cookbook,user_guide/cookbook +enhancingperf,user_guide/enhancingperf +gotchas,user_guide/gotchas +groupby,user_guide/groupby +indexing,user_guide/indexing +integer_na,user_guide/integer_na +io,user_guide/io +merging,user_guide/merging +missing_data,user_guide/missing_data +options,user_guide/options +reshaping,user_guide/reshaping +sparse,user_guide/sparse +style,user_guide/style +text,user_guide/text +timedeltas,user_guide/timedeltas +timeseries,user_guide/timeseries +visualization,user_guide/visualization + +# development +contributing,development/contributing +contributing_docstring,development/contributing_docstring +developer,development/developer +extending,development/extending +internals,development/internals + +# api +api,reference/index +generated/pandas.api.extensions.ExtensionArray.argsort,../reference/api/pandas.api.extensions.ExtensionArray.argsort +generated/pandas.api.extensions.ExtensionArray.astype,../reference/api/pandas.api.extensions.ExtensionArray.astype +generated/pandas.api.extensions.ExtensionArray.copy,../reference/api/pandas.api.extensions.ExtensionArray.copy +generated/pandas.api.extensions.ExtensionArray.dropna,../reference/api/pandas.api.extensions.ExtensionArray.dropna +generated/pandas.api.extensions.ExtensionArray.dtype,../reference/api/pandas.api.extensions.ExtensionArray.dtype +generated/pandas.api.extensions.ExtensionArray.factorize,../reference/api/pandas.api.extensions.ExtensionArray.factorize +generated/pandas.api.extensions.ExtensionArray.fillna,../reference/api/pandas.api.extensions.ExtensionArray.fillna +generated/pandas.api.extensions.ExtensionArray,../reference/api/pandas.api.extensions.ExtensionArray +generated/pandas.api.extensions.ExtensionArray.isna,../reference/api/pandas.api.extensions.ExtensionArray.isna +generated/pandas.api.extensions.ExtensionArray.nbytes,../reference/api/pandas.api.extensions.ExtensionArray.nbytes +generated/pandas.api.extensions.ExtensionArray.ndim,../reference/api/pandas.api.extensions.ExtensionArray.ndim +generated/pandas.api.extensions.ExtensionArray.shape,../reference/api/pandas.api.extensions.ExtensionArray.shape +generated/pandas.api.extensions.ExtensionArray.take,../reference/api/pandas.api.extensions.ExtensionArray.take +generated/pandas.api.extensions.ExtensionArray.unique,../reference/api/pandas.api.extensions.ExtensionArray.unique +generated/pandas.api.extensions.ExtensionDtype.construct_array_type,../reference/api/pandas.api.extensions.ExtensionDtype.construct_array_type +generated/pandas.api.extensions.ExtensionDtype.construct_from_string,../reference/api/pandas.api.extensions.ExtensionDtype.construct_from_string +generated/pandas.api.extensions.ExtensionDtype,../reference/api/pandas.api.extensions.ExtensionDtype +generated/pandas.api.extensions.ExtensionDtype.is_dtype,../reference/api/pandas.api.extensions.ExtensionDtype.is_dtype +generated/pandas.api.extensions.ExtensionDtype.kind,../reference/api/pandas.api.extensions.ExtensionDtype.kind +generated/pandas.api.extensions.ExtensionDtype.name,../reference/api/pandas.api.extensions.ExtensionDtype.name +generated/pandas.api.extensions.ExtensionDtype.names,../reference/api/pandas.api.extensions.ExtensionDtype.names +generated/pandas.api.extensions.ExtensionDtype.na_value,../reference/api/pandas.api.extensions.ExtensionDtype.na_value +generated/pandas.api.extensions.ExtensionDtype.type,../reference/api/pandas.api.extensions.ExtensionDtype.type +generated/pandas.api.extensions.register_dataframe_accessor,../reference/api/pandas.api.extensions.register_dataframe_accessor +generated/pandas.api.extensions.register_extension_dtype,../reference/api/pandas.api.extensions.register_extension_dtype +generated/pandas.api.extensions.register_index_accessor,../reference/api/pandas.api.extensions.register_index_accessor +generated/pandas.api.extensions.register_series_accessor,../reference/api/pandas.api.extensions.register_series_accessor +generated/pandas.api.types.infer_dtype,../reference/api/pandas.api.types.infer_dtype +generated/pandas.api.types.is_bool_dtype,../reference/api/pandas.api.types.is_bool_dtype +generated/pandas.api.types.is_bool,../reference/api/pandas.api.types.is_bool +generated/pandas.api.types.is_categorical_dtype,../reference/api/pandas.api.types.is_categorical_dtype +generated/pandas.api.types.is_categorical,../reference/api/pandas.api.types.is_categorical +generated/pandas.api.types.is_complex_dtype,../reference/api/pandas.api.types.is_complex_dtype +generated/pandas.api.types.is_complex,../reference/api/pandas.api.types.is_complex +generated/pandas.api.types.is_datetime64_any_dtype,../reference/api/pandas.api.types.is_datetime64_any_dtype +generated/pandas.api.types.is_datetime64_dtype,../reference/api/pandas.api.types.is_datetime64_dtype +generated/pandas.api.types.is_datetime64_ns_dtype,../reference/api/pandas.api.types.is_datetime64_ns_dtype +generated/pandas.api.types.is_datetime64tz_dtype,../reference/api/pandas.api.types.is_datetime64tz_dtype +generated/pandas.api.types.is_datetimetz,../reference/api/pandas.api.types.is_datetimetz +generated/pandas.api.types.is_dict_like,../reference/api/pandas.api.types.is_dict_like +generated/pandas.api.types.is_extension_array_dtype,../reference/api/pandas.api.types.is_extension_array_dtype +generated/pandas.api.types.is_extension_type,../reference/api/pandas.api.types.is_extension_type +generated/pandas.api.types.is_file_like,../reference/api/pandas.api.types.is_file_like +generated/pandas.api.types.is_float_dtype,../reference/api/pandas.api.types.is_float_dtype +generated/pandas.api.types.is_float,../reference/api/pandas.api.types.is_float +generated/pandas.api.types.is_hashable,../reference/api/pandas.api.types.is_hashable +generated/pandas.api.types.is_int64_dtype,../reference/api/pandas.api.types.is_int64_dtype +generated/pandas.api.types.is_integer_dtype,../reference/api/pandas.api.types.is_integer_dtype +generated/pandas.api.types.is_integer,../reference/api/pandas.api.types.is_integer +generated/pandas.api.types.is_interval_dtype,../reference/api/pandas.api.types.is_interval_dtype +generated/pandas.api.types.is_interval,../reference/api/pandas.api.types.is_interval +generated/pandas.api.types.is_iterator,../reference/api/pandas.api.types.is_iterator +generated/pandas.api.types.is_list_like,../reference/api/pandas.api.types.is_list_like +generated/pandas.api.types.is_named_tuple,../reference/api/pandas.api.types.is_named_tuple +generated/pandas.api.types.is_number,../reference/api/pandas.api.types.is_number +generated/pandas.api.types.is_numeric_dtype,../reference/api/pandas.api.types.is_numeric_dtype +generated/pandas.api.types.is_object_dtype,../reference/api/pandas.api.types.is_object_dtype +generated/pandas.api.types.is_period_dtype,../reference/api/pandas.api.types.is_period_dtype +generated/pandas.api.types.is_period,../reference/api/pandas.api.types.is_period +generated/pandas.api.types.is_re_compilable,../reference/api/pandas.api.types.is_re_compilable +generated/pandas.api.types.is_re,../reference/api/pandas.api.types.is_re +generated/pandas.api.types.is_scalar,../reference/api/pandas.api.types.is_scalar +generated/pandas.api.types.is_signed_integer_dtype,../reference/api/pandas.api.types.is_signed_integer_dtype +generated/pandas.api.types.is_sparse,../reference/api/pandas.api.types.is_sparse +generated/pandas.api.types.is_string_dtype,../reference/api/pandas.api.types.is_string_dtype +generated/pandas.api.types.is_timedelta64_dtype,../reference/api/pandas.api.types.is_timedelta64_dtype +generated/pandas.api.types.is_timedelta64_ns_dtype,../reference/api/pandas.api.types.is_timedelta64_ns_dtype +generated/pandas.api.types.is_unsigned_integer_dtype,../reference/api/pandas.api.types.is_unsigned_integer_dtype +generated/pandas.api.types.pandas_dtype,../reference/api/pandas.api.types.pandas_dtype +generated/pandas.api.types.union_categoricals,../reference/api/pandas.api.types.union_categoricals +generated/pandas.bdate_range,../reference/api/pandas.bdate_range +generated/pandas.Categorical.__array__,../reference/api/pandas.Categorical.__array__ +generated/pandas.Categorical.categories,../reference/api/pandas.Categorical.categories +generated/pandas.Categorical.codes,../reference/api/pandas.Categorical.codes +generated/pandas.CategoricalDtype.categories,../reference/api/pandas.CategoricalDtype.categories +generated/pandas.Categorical.dtype,../reference/api/pandas.Categorical.dtype +generated/pandas.CategoricalDtype,../reference/api/pandas.CategoricalDtype +generated/pandas.CategoricalDtype.ordered,../reference/api/pandas.CategoricalDtype.ordered +generated/pandas.Categorical.from_codes,../reference/api/pandas.Categorical.from_codes +generated/pandas.Categorical,../reference/api/pandas.Categorical +generated/pandas.CategoricalIndex.add_categories,../reference/api/pandas.CategoricalIndex.add_categories +generated/pandas.CategoricalIndex.as_ordered,../reference/api/pandas.CategoricalIndex.as_ordered +generated/pandas.CategoricalIndex.as_unordered,../reference/api/pandas.CategoricalIndex.as_unordered +generated/pandas.CategoricalIndex.categories,../reference/api/pandas.CategoricalIndex.categories +generated/pandas.CategoricalIndex.codes,../reference/api/pandas.CategoricalIndex.codes +generated/pandas.CategoricalIndex.equals,../reference/api/pandas.CategoricalIndex.equals +generated/pandas.CategoricalIndex,../reference/api/pandas.CategoricalIndex +generated/pandas.CategoricalIndex.map,../reference/api/pandas.CategoricalIndex.map +generated/pandas.CategoricalIndex.ordered,../reference/api/pandas.CategoricalIndex.ordered +generated/pandas.CategoricalIndex.remove_categories,../reference/api/pandas.CategoricalIndex.remove_categories +generated/pandas.CategoricalIndex.remove_unused_categories,../reference/api/pandas.CategoricalIndex.remove_unused_categories +generated/pandas.CategoricalIndex.rename_categories,../reference/api/pandas.CategoricalIndex.rename_categories +generated/pandas.CategoricalIndex.reorder_categories,../reference/api/pandas.CategoricalIndex.reorder_categories +generated/pandas.CategoricalIndex.set_categories,../reference/api/pandas.CategoricalIndex.set_categories +generated/pandas.Categorical.ordered,../reference/api/pandas.Categorical.ordered +generated/pandas.concat,../reference/api/pandas.concat +generated/pandas.core.groupby.DataFrameGroupBy.all,../reference/api/pandas.core.groupby.DataFrameGroupBy.all +generated/pandas.core.groupby.DataFrameGroupBy.any,../reference/api/pandas.core.groupby.DataFrameGroupBy.any +generated/pandas.core.groupby.DataFrameGroupBy.bfill,../reference/api/pandas.core.groupby.DataFrameGroupBy.bfill +generated/pandas.core.groupby.DataFrameGroupBy.boxplot,../reference/api/pandas.core.groupby.DataFrameGroupBy.boxplot +generated/pandas.core.groupby.DataFrameGroupBy.corr,../reference/api/pandas.core.groupby.DataFrameGroupBy.corr +generated/pandas.core.groupby.DataFrameGroupBy.corrwith,../reference/api/pandas.core.groupby.DataFrameGroupBy.corrwith +generated/pandas.core.groupby.DataFrameGroupBy.count,../reference/api/pandas.core.groupby.DataFrameGroupBy.count +generated/pandas.core.groupby.DataFrameGroupBy.cov,../reference/api/pandas.core.groupby.DataFrameGroupBy.cov +generated/pandas.core.groupby.DataFrameGroupBy.cummax,../reference/api/pandas.core.groupby.DataFrameGroupBy.cummax +generated/pandas.core.groupby.DataFrameGroupBy.cummin,../reference/api/pandas.core.groupby.DataFrameGroupBy.cummin +generated/pandas.core.groupby.DataFrameGroupBy.cumprod,../reference/api/pandas.core.groupby.DataFrameGroupBy.cumprod +generated/pandas.core.groupby.DataFrameGroupBy.cumsum,../reference/api/pandas.core.groupby.DataFrameGroupBy.cumsum +generated/pandas.core.groupby.DataFrameGroupBy.describe,../reference/api/pandas.core.groupby.DataFrameGroupBy.describe +generated/pandas.core.groupby.DataFrameGroupBy.diff,../reference/api/pandas.core.groupby.DataFrameGroupBy.diff +generated/pandas.core.groupby.DataFrameGroupBy.ffill,../reference/api/pandas.core.groupby.DataFrameGroupBy.ffill +generated/pandas.core.groupby.DataFrameGroupBy.fillna,../reference/api/pandas.core.groupby.DataFrameGroupBy.fillna +generated/pandas.core.groupby.DataFrameGroupBy.filter,../reference/api/pandas.core.groupby.DataFrameGroupBy.filter +generated/pandas.core.groupby.DataFrameGroupBy.hist,../reference/api/pandas.core.groupby.DataFrameGroupBy.hist +generated/pandas.core.groupby.DataFrameGroupBy.idxmax,../reference/api/pandas.core.groupby.DataFrameGroupBy.idxmax +generated/pandas.core.groupby.DataFrameGroupBy.idxmin,../reference/api/pandas.core.groupby.DataFrameGroupBy.idxmin +generated/pandas.core.groupby.DataFrameGroupBy.mad,../reference/api/pandas.core.groupby.DataFrameGroupBy.mad +generated/pandas.core.groupby.DataFrameGroupBy.pct_change,../reference/api/pandas.core.groupby.DataFrameGroupBy.pct_change +generated/pandas.core.groupby.DataFrameGroupBy.plot,../reference/api/pandas.core.groupby.DataFrameGroupBy.plot +generated/pandas.core.groupby.DataFrameGroupBy.quantile,../reference/api/pandas.core.groupby.DataFrameGroupBy.quantile +generated/pandas.core.groupby.DataFrameGroupBy.rank,../reference/api/pandas.core.groupby.DataFrameGroupBy.rank +generated/pandas.core.groupby.DataFrameGroupBy.resample,../reference/api/pandas.core.groupby.DataFrameGroupBy.resample +generated/pandas.core.groupby.DataFrameGroupBy.shift,../reference/api/pandas.core.groupby.DataFrameGroupBy.shift +generated/pandas.core.groupby.DataFrameGroupBy.size,../reference/api/pandas.core.groupby.DataFrameGroupBy.size +generated/pandas.core.groupby.DataFrameGroupBy.skew,../reference/api/pandas.core.groupby.DataFrameGroupBy.skew +generated/pandas.core.groupby.DataFrameGroupBy.take,../reference/api/pandas.core.groupby.DataFrameGroupBy.take +generated/pandas.core.groupby.DataFrameGroupBy.tshift,../reference/api/pandas.core.groupby.DataFrameGroupBy.tshift +generated/pandas.core.groupby.GroupBy.agg,../reference/api/pandas.core.groupby.GroupBy.agg +generated/pandas.core.groupby.GroupBy.aggregate,../reference/api/pandas.core.groupby.GroupBy.aggregate +generated/pandas.core.groupby.GroupBy.all,../reference/api/pandas.core.groupby.GroupBy.all +generated/pandas.core.groupby.GroupBy.any,../reference/api/pandas.core.groupby.GroupBy.any +generated/pandas.core.groupby.GroupBy.apply,../reference/api/pandas.core.groupby.GroupBy.apply +generated/pandas.core.groupby.GroupBy.bfill,../reference/api/pandas.core.groupby.GroupBy.bfill +generated/pandas.core.groupby.GroupBy.count,../reference/api/pandas.core.groupby.GroupBy.count +generated/pandas.core.groupby.GroupBy.cumcount,../reference/api/pandas.core.groupby.GroupBy.cumcount +generated/pandas.core.groupby.GroupBy.ffill,../reference/api/pandas.core.groupby.GroupBy.ffill +generated/pandas.core.groupby.GroupBy.first,../reference/api/pandas.core.groupby.GroupBy.first +generated/pandas.core.groupby.GroupBy.get_group,../reference/api/pandas.core.groupby.GroupBy.get_group +generated/pandas.core.groupby.GroupBy.groups,../reference/api/pandas.core.groupby.GroupBy.groups +generated/pandas.core.groupby.GroupBy.head,../reference/api/pandas.core.groupby.GroupBy.head +generated/pandas.core.groupby.GroupBy.indices,../reference/api/pandas.core.groupby.GroupBy.indices +generated/pandas.core.groupby.GroupBy.__iter__,../reference/api/pandas.core.groupby.GroupBy.__iter__ +generated/pandas.core.groupby.GroupBy.last,../reference/api/pandas.core.groupby.GroupBy.last +generated/pandas.core.groupby.GroupBy.max,../reference/api/pandas.core.groupby.GroupBy.max +generated/pandas.core.groupby.GroupBy.mean,../reference/api/pandas.core.groupby.GroupBy.mean +generated/pandas.core.groupby.GroupBy.median,../reference/api/pandas.core.groupby.GroupBy.median +generated/pandas.core.groupby.GroupBy.min,../reference/api/pandas.core.groupby.GroupBy.min +generated/pandas.core.groupby.GroupBy.ngroup,../reference/api/pandas.core.groupby.GroupBy.ngroup +generated/pandas.core.groupby.GroupBy.nth,../reference/api/pandas.core.groupby.GroupBy.nth +generated/pandas.core.groupby.GroupBy.ohlc,../reference/api/pandas.core.groupby.GroupBy.ohlc +generated/pandas.core.groupby.GroupBy.pct_change,../reference/api/pandas.core.groupby.GroupBy.pct_change +generated/pandas.core.groupby.GroupBy.pipe,../reference/api/pandas.core.groupby.GroupBy.pipe +generated/pandas.core.groupby.GroupBy.prod,../reference/api/pandas.core.groupby.GroupBy.prod +generated/pandas.core.groupby.GroupBy.rank,../reference/api/pandas.core.groupby.GroupBy.rank +generated/pandas.core.groupby.GroupBy.sem,../reference/api/pandas.core.groupby.GroupBy.sem +generated/pandas.core.groupby.GroupBy.size,../reference/api/pandas.core.groupby.GroupBy.size +generated/pandas.core.groupby.GroupBy.std,../reference/api/pandas.core.groupby.GroupBy.std +generated/pandas.core.groupby.GroupBy.sum,../reference/api/pandas.core.groupby.GroupBy.sum +generated/pandas.core.groupby.GroupBy.tail,../reference/api/pandas.core.groupby.GroupBy.tail +generated/pandas.core.groupby.GroupBy.transform,../reference/api/pandas.core.groupby.GroupBy.transform +generated/pandas.core.groupby.GroupBy.var,../reference/api/pandas.core.groupby.GroupBy.var +generated/pandas.core.groupby.SeriesGroupBy.is_monotonic_decreasing,../reference/api/pandas.core.groupby.SeriesGroupBy.is_monotonic_decreasing +generated/pandas.core.groupby.SeriesGroupBy.is_monotonic_increasing,../reference/api/pandas.core.groupby.SeriesGroupBy.is_monotonic_increasing +generated/pandas.core.groupby.SeriesGroupBy.nlargest,../reference/api/pandas.core.groupby.SeriesGroupBy.nlargest +generated/pandas.core.groupby.SeriesGroupBy.nsmallest,../reference/api/pandas.core.groupby.SeriesGroupBy.nsmallest +generated/pandas.core.groupby.SeriesGroupBy.nunique,../reference/api/pandas.core.groupby.SeriesGroupBy.nunique +generated/pandas.core.groupby.SeriesGroupBy.unique,../reference/api/pandas.core.groupby.SeriesGroupBy.unique +generated/pandas.core.groupby.SeriesGroupBy.value_counts,../reference/api/pandas.core.groupby.SeriesGroupBy.value_counts +generated/pandas.core.resample.Resampler.aggregate,../reference/api/pandas.core.resample.Resampler.aggregate +generated/pandas.core.resample.Resampler.apply,../reference/api/pandas.core.resample.Resampler.apply +generated/pandas.core.resample.Resampler.asfreq,../reference/api/pandas.core.resample.Resampler.asfreq +generated/pandas.core.resample.Resampler.backfill,../reference/api/pandas.core.resample.Resampler.backfill +generated/pandas.core.resample.Resampler.bfill,../reference/api/pandas.core.resample.Resampler.bfill +generated/pandas.core.resample.Resampler.count,../reference/api/pandas.core.resample.Resampler.count +generated/pandas.core.resample.Resampler.ffill,../reference/api/pandas.core.resample.Resampler.ffill +generated/pandas.core.resample.Resampler.fillna,../reference/api/pandas.core.resample.Resampler.fillna +generated/pandas.core.resample.Resampler.first,../reference/api/pandas.core.resample.Resampler.first +generated/pandas.core.resample.Resampler.get_group,../reference/api/pandas.core.resample.Resampler.get_group +generated/pandas.core.resample.Resampler.groups,../reference/api/pandas.core.resample.Resampler.groups +generated/pandas.core.resample.Resampler.indices,../reference/api/pandas.core.resample.Resampler.indices +generated/pandas.core.resample.Resampler.interpolate,../reference/api/pandas.core.resample.Resampler.interpolate +generated/pandas.core.resample.Resampler.__iter__,../reference/api/pandas.core.resample.Resampler.__iter__ +generated/pandas.core.resample.Resampler.last,../reference/api/pandas.core.resample.Resampler.last +generated/pandas.core.resample.Resampler.max,../reference/api/pandas.core.resample.Resampler.max +generated/pandas.core.resample.Resampler.mean,../reference/api/pandas.core.resample.Resampler.mean +generated/pandas.core.resample.Resampler.median,../reference/api/pandas.core.resample.Resampler.median +generated/pandas.core.resample.Resampler.min,../reference/api/pandas.core.resample.Resampler.min +generated/pandas.core.resample.Resampler.nearest,../reference/api/pandas.core.resample.Resampler.nearest +generated/pandas.core.resample.Resampler.nunique,../reference/api/pandas.core.resample.Resampler.nunique +generated/pandas.core.resample.Resampler.ohlc,../reference/api/pandas.core.resample.Resampler.ohlc +generated/pandas.core.resample.Resampler.pad,../reference/api/pandas.core.resample.Resampler.pad +generated/pandas.core.resample.Resampler.pipe,../reference/api/pandas.core.resample.Resampler.pipe +generated/pandas.core.resample.Resampler.prod,../reference/api/pandas.core.resample.Resampler.prod +generated/pandas.core.resample.Resampler.quantile,../reference/api/pandas.core.resample.Resampler.quantile +generated/pandas.core.resample.Resampler.sem,../reference/api/pandas.core.resample.Resampler.sem +generated/pandas.core.resample.Resampler.size,../reference/api/pandas.core.resample.Resampler.size +generated/pandas.core.resample.Resampler.std,../reference/api/pandas.core.resample.Resampler.std +generated/pandas.core.resample.Resampler.sum,../reference/api/pandas.core.resample.Resampler.sum +generated/pandas.core.resample.Resampler.transform,../reference/api/pandas.core.resample.Resampler.transform +generated/pandas.core.resample.Resampler.var,../reference/api/pandas.core.resample.Resampler.var +generated/pandas.core.window.EWM.corr,../reference/api/pandas.core.window.EWM.corr +generated/pandas.core.window.EWM.cov,../reference/api/pandas.core.window.EWM.cov +generated/pandas.core.window.EWM.mean,../reference/api/pandas.core.window.EWM.mean +generated/pandas.core.window.EWM.std,../reference/api/pandas.core.window.EWM.std +generated/pandas.core.window.EWM.var,../reference/api/pandas.core.window.EWM.var +generated/pandas.core.window.Expanding.aggregate,../reference/api/pandas.core.window.Expanding.aggregate +generated/pandas.core.window.Expanding.apply,../reference/api/pandas.core.window.Expanding.apply +generated/pandas.core.window.Expanding.corr,../reference/api/pandas.core.window.Expanding.corr +generated/pandas.core.window.Expanding.count,../reference/api/pandas.core.window.Expanding.count +generated/pandas.core.window.Expanding.cov,../reference/api/pandas.core.window.Expanding.cov +generated/pandas.core.window.Expanding.kurt,../reference/api/pandas.core.window.Expanding.kurt +generated/pandas.core.window.Expanding.max,../reference/api/pandas.core.window.Expanding.max +generated/pandas.core.window.Expanding.mean,../reference/api/pandas.core.window.Expanding.mean +generated/pandas.core.window.Expanding.median,../reference/api/pandas.core.window.Expanding.median +generated/pandas.core.window.Expanding.min,../reference/api/pandas.core.window.Expanding.min +generated/pandas.core.window.Expanding.quantile,../reference/api/pandas.core.window.Expanding.quantile +generated/pandas.core.window.Expanding.skew,../reference/api/pandas.core.window.Expanding.skew +generated/pandas.core.window.Expanding.std,../reference/api/pandas.core.window.Expanding.std +generated/pandas.core.window.Expanding.sum,../reference/api/pandas.core.window.Expanding.sum +generated/pandas.core.window.Expanding.var,../reference/api/pandas.core.window.Expanding.var +generated/pandas.core.window.Rolling.aggregate,../reference/api/pandas.core.window.Rolling.aggregate +generated/pandas.core.window.Rolling.apply,../reference/api/pandas.core.window.Rolling.apply +generated/pandas.core.window.Rolling.corr,../reference/api/pandas.core.window.Rolling.corr +generated/pandas.core.window.Rolling.count,../reference/api/pandas.core.window.Rolling.count +generated/pandas.core.window.Rolling.cov,../reference/api/pandas.core.window.Rolling.cov +generated/pandas.core.window.Rolling.kurt,../reference/api/pandas.core.window.Rolling.kurt +generated/pandas.core.window.Rolling.max,../reference/api/pandas.core.window.Rolling.max +generated/pandas.core.window.Rolling.mean,../reference/api/pandas.core.window.Rolling.mean +generated/pandas.core.window.Rolling.median,../reference/api/pandas.core.window.Rolling.median +generated/pandas.core.window.Rolling.min,../reference/api/pandas.core.window.Rolling.min +generated/pandas.core.window.Rolling.quantile,../reference/api/pandas.core.window.Rolling.quantile +generated/pandas.core.window.Rolling.skew,../reference/api/pandas.core.window.Rolling.skew +generated/pandas.core.window.Rolling.std,../reference/api/pandas.core.window.Rolling.std +generated/pandas.core.window.Rolling.sum,../reference/api/pandas.core.window.Rolling.sum +generated/pandas.core.window.Rolling.var,../reference/api/pandas.core.window.Rolling.var +generated/pandas.core.window.Window.mean,../reference/api/pandas.core.window.Window.mean +generated/pandas.core.window.Window.sum,../reference/api/pandas.core.window.Window.sum +generated/pandas.crosstab,../reference/api/pandas.crosstab +generated/pandas.cut,../reference/api/pandas.cut +generated/pandas.DataFrame.abs,../reference/api/pandas.DataFrame.abs +generated/pandas.DataFrame.add,../reference/api/pandas.DataFrame.add +generated/pandas.DataFrame.add_prefix,../reference/api/pandas.DataFrame.add_prefix +generated/pandas.DataFrame.add_suffix,../reference/api/pandas.DataFrame.add_suffix +generated/pandas.DataFrame.agg,../reference/api/pandas.DataFrame.agg +generated/pandas.DataFrame.aggregate,../reference/api/pandas.DataFrame.aggregate +generated/pandas.DataFrame.align,../reference/api/pandas.DataFrame.align +generated/pandas.DataFrame.all,../reference/api/pandas.DataFrame.all +generated/pandas.DataFrame.any,../reference/api/pandas.DataFrame.any +generated/pandas.DataFrame.append,../reference/api/pandas.DataFrame.append +generated/pandas.DataFrame.apply,../reference/api/pandas.DataFrame.apply +generated/pandas.DataFrame.applymap,../reference/api/pandas.DataFrame.applymap +generated/pandas.DataFrame.as_blocks,../reference/api/pandas.DataFrame.as_blocks +generated/pandas.DataFrame.asfreq,../reference/api/pandas.DataFrame.asfreq +generated/pandas.DataFrame.as_matrix,../reference/api/pandas.DataFrame.as_matrix +generated/pandas.DataFrame.asof,../reference/api/pandas.DataFrame.asof +generated/pandas.DataFrame.assign,../reference/api/pandas.DataFrame.assign +generated/pandas.DataFrame.astype,../reference/api/pandas.DataFrame.astype +generated/pandas.DataFrame.at,../reference/api/pandas.DataFrame.at +generated/pandas.DataFrame.at_time,../reference/api/pandas.DataFrame.at_time +generated/pandas.DataFrame.axes,../reference/api/pandas.DataFrame.axes +generated/pandas.DataFrame.between_time,../reference/api/pandas.DataFrame.between_time +generated/pandas.DataFrame.bfill,../reference/api/pandas.DataFrame.bfill +generated/pandas.DataFrame.blocks,../reference/api/pandas.DataFrame.blocks +generated/pandas.DataFrame.bool,../reference/api/pandas.DataFrame.bool +generated/pandas.DataFrame.boxplot,../reference/api/pandas.DataFrame.boxplot +generated/pandas.DataFrame.clip,../reference/api/pandas.DataFrame.clip +generated/pandas.DataFrame.clip_lower,../reference/api/pandas.DataFrame.clip_lower +generated/pandas.DataFrame.clip_upper,../reference/api/pandas.DataFrame.clip_upper +generated/pandas.DataFrame.columns,../reference/api/pandas.DataFrame.columns +generated/pandas.DataFrame.combine_first,../reference/api/pandas.DataFrame.combine_first +generated/pandas.DataFrame.combine,../reference/api/pandas.DataFrame.combine +generated/pandas.DataFrame.compound,../reference/api/pandas.DataFrame.compound +generated/pandas.DataFrame.convert_objects,../reference/api/pandas.DataFrame.convert_objects +generated/pandas.DataFrame.copy,../reference/api/pandas.DataFrame.copy +generated/pandas.DataFrame.corr,../reference/api/pandas.DataFrame.corr +generated/pandas.DataFrame.corrwith,../reference/api/pandas.DataFrame.corrwith +generated/pandas.DataFrame.count,../reference/api/pandas.DataFrame.count +generated/pandas.DataFrame.cov,../reference/api/pandas.DataFrame.cov +generated/pandas.DataFrame.cummax,../reference/api/pandas.DataFrame.cummax +generated/pandas.DataFrame.cummin,../reference/api/pandas.DataFrame.cummin +generated/pandas.DataFrame.cumprod,../reference/api/pandas.DataFrame.cumprod +generated/pandas.DataFrame.cumsum,../reference/api/pandas.DataFrame.cumsum +generated/pandas.DataFrame.describe,../reference/api/pandas.DataFrame.describe +generated/pandas.DataFrame.diff,../reference/api/pandas.DataFrame.diff +generated/pandas.DataFrame.div,../reference/api/pandas.DataFrame.div +generated/pandas.DataFrame.divide,../reference/api/pandas.DataFrame.divide +generated/pandas.DataFrame.dot,../reference/api/pandas.DataFrame.dot +generated/pandas.DataFrame.drop_duplicates,../reference/api/pandas.DataFrame.drop_duplicates +generated/pandas.DataFrame.drop,../reference/api/pandas.DataFrame.drop +generated/pandas.DataFrame.droplevel,../reference/api/pandas.DataFrame.droplevel +generated/pandas.DataFrame.dropna,../reference/api/pandas.DataFrame.dropna +generated/pandas.DataFrame.dtypes,../reference/api/pandas.DataFrame.dtypes +generated/pandas.DataFrame.duplicated,../reference/api/pandas.DataFrame.duplicated +generated/pandas.DataFrame.empty,../reference/api/pandas.DataFrame.empty +generated/pandas.DataFrame.eq,../reference/api/pandas.DataFrame.eq +generated/pandas.DataFrame.equals,../reference/api/pandas.DataFrame.equals +generated/pandas.DataFrame.eval,../reference/api/pandas.DataFrame.eval +generated/pandas.DataFrame.ewm,../reference/api/pandas.DataFrame.ewm +generated/pandas.DataFrame.expanding,../reference/api/pandas.DataFrame.expanding +generated/pandas.DataFrame.ffill,../reference/api/pandas.DataFrame.ffill +generated/pandas.DataFrame.fillna,../reference/api/pandas.DataFrame.fillna +generated/pandas.DataFrame.filter,../reference/api/pandas.DataFrame.filter +generated/pandas.DataFrame.first,../reference/api/pandas.DataFrame.first +generated/pandas.DataFrame.first_valid_index,../reference/api/pandas.DataFrame.first_valid_index +generated/pandas.DataFrame.floordiv,../reference/api/pandas.DataFrame.floordiv +generated/pandas.DataFrame.from_csv,../reference/api/pandas.DataFrame.from_csv +generated/pandas.DataFrame.from_dict,../reference/api/pandas.DataFrame.from_dict +generated/pandas.DataFrame.from_items,../reference/api/pandas.DataFrame.from_items +generated/pandas.DataFrame.from_records,../reference/api/pandas.DataFrame.from_records +generated/pandas.DataFrame.ftypes,../reference/api/pandas.DataFrame.ftypes +generated/pandas.DataFrame.ge,../reference/api/pandas.DataFrame.ge +generated/pandas.DataFrame.get_dtype_counts,../reference/api/pandas.DataFrame.get_dtype_counts +generated/pandas.DataFrame.get_ftype_counts,../reference/api/pandas.DataFrame.get_ftype_counts +generated/pandas.DataFrame.get,../reference/api/pandas.DataFrame.get +generated/pandas.DataFrame.get_value,../reference/api/pandas.DataFrame.get_value +generated/pandas.DataFrame.get_values,../reference/api/pandas.DataFrame.get_values +generated/pandas.DataFrame.groupby,../reference/api/pandas.DataFrame.groupby +generated/pandas.DataFrame.gt,../reference/api/pandas.DataFrame.gt +generated/pandas.DataFrame.head,../reference/api/pandas.DataFrame.head +generated/pandas.DataFrame.hist,../reference/api/pandas.DataFrame.hist +generated/pandas.DataFrame,../reference/api/pandas.DataFrame +generated/pandas.DataFrame.iat,../reference/api/pandas.DataFrame.iat +generated/pandas.DataFrame.idxmax,../reference/api/pandas.DataFrame.idxmax +generated/pandas.DataFrame.idxmin,../reference/api/pandas.DataFrame.idxmin +generated/pandas.DataFrame.iloc,../reference/api/pandas.DataFrame.iloc +generated/pandas.DataFrame.index,../reference/api/pandas.DataFrame.index +generated/pandas.DataFrame.infer_objects,../reference/api/pandas.DataFrame.infer_objects +generated/pandas.DataFrame.info,../reference/api/pandas.DataFrame.info +generated/pandas.DataFrame.insert,../reference/api/pandas.DataFrame.insert +generated/pandas.DataFrame.interpolate,../reference/api/pandas.DataFrame.interpolate +generated/pandas.DataFrame.is_copy,../reference/api/pandas.DataFrame.is_copy +generated/pandas.DataFrame.isin,../reference/api/pandas.DataFrame.isin +generated/pandas.DataFrame.isna,../reference/api/pandas.DataFrame.isna +generated/pandas.DataFrame.isnull,../reference/api/pandas.DataFrame.isnull +generated/pandas.DataFrame.items,../reference/api/pandas.DataFrame.items +generated/pandas.DataFrame.__iter__,../reference/api/pandas.DataFrame.__iter__ +generated/pandas.DataFrame.iteritems,../reference/api/pandas.DataFrame.iteritems +generated/pandas.DataFrame.iterrows,../reference/api/pandas.DataFrame.iterrows +generated/pandas.DataFrame.itertuples,../reference/api/pandas.DataFrame.itertuples +generated/pandas.DataFrame.ix,../reference/api/pandas.DataFrame.ix +generated/pandas.DataFrame.join,../reference/api/pandas.DataFrame.join +generated/pandas.DataFrame.keys,../reference/api/pandas.DataFrame.keys +generated/pandas.DataFrame.kurt,../reference/api/pandas.DataFrame.kurt +generated/pandas.DataFrame.kurtosis,../reference/api/pandas.DataFrame.kurtosis +generated/pandas.DataFrame.last,../reference/api/pandas.DataFrame.last +generated/pandas.DataFrame.last_valid_index,../reference/api/pandas.DataFrame.last_valid_index +generated/pandas.DataFrame.le,../reference/api/pandas.DataFrame.le +generated/pandas.DataFrame.loc,../reference/api/pandas.DataFrame.loc +generated/pandas.DataFrame.lookup,../reference/api/pandas.DataFrame.lookup +generated/pandas.DataFrame.lt,../reference/api/pandas.DataFrame.lt +generated/pandas.DataFrame.mad,../reference/api/pandas.DataFrame.mad +generated/pandas.DataFrame.mask,../reference/api/pandas.DataFrame.mask +generated/pandas.DataFrame.max,../reference/api/pandas.DataFrame.max +generated/pandas.DataFrame.mean,../reference/api/pandas.DataFrame.mean +generated/pandas.DataFrame.median,../reference/api/pandas.DataFrame.median +generated/pandas.DataFrame.melt,../reference/api/pandas.DataFrame.melt +generated/pandas.DataFrame.memory_usage,../reference/api/pandas.DataFrame.memory_usage +generated/pandas.DataFrame.merge,../reference/api/pandas.DataFrame.merge +generated/pandas.DataFrame.min,../reference/api/pandas.DataFrame.min +generated/pandas.DataFrame.mode,../reference/api/pandas.DataFrame.mode +generated/pandas.DataFrame.mod,../reference/api/pandas.DataFrame.mod +generated/pandas.DataFrame.mul,../reference/api/pandas.DataFrame.mul +generated/pandas.DataFrame.multiply,../reference/api/pandas.DataFrame.multiply +generated/pandas.DataFrame.ndim,../reference/api/pandas.DataFrame.ndim +generated/pandas.DataFrame.ne,../reference/api/pandas.DataFrame.ne +generated/pandas.DataFrame.nlargest,../reference/api/pandas.DataFrame.nlargest +generated/pandas.DataFrame.notna,../reference/api/pandas.DataFrame.notna +generated/pandas.DataFrame.notnull,../reference/api/pandas.DataFrame.notnull +generated/pandas.DataFrame.nsmallest,../reference/api/pandas.DataFrame.nsmallest +generated/pandas.DataFrame.nunique,../reference/api/pandas.DataFrame.nunique +generated/pandas.DataFrame.pct_change,../reference/api/pandas.DataFrame.pct_change +generated/pandas.DataFrame.pipe,../reference/api/pandas.DataFrame.pipe +generated/pandas.DataFrame.pivot,../reference/api/pandas.DataFrame.pivot +generated/pandas.DataFrame.pivot_table,../reference/api/pandas.DataFrame.pivot_table +generated/pandas.DataFrame.plot.barh,../reference/api/pandas.DataFrame.plot.barh +generated/pandas.DataFrame.plot.bar,../reference/api/pandas.DataFrame.plot.bar +generated/pandas.DataFrame.plot.box,../reference/api/pandas.DataFrame.plot.box +generated/pandas.DataFrame.plot.density,../reference/api/pandas.DataFrame.plot.density +generated/pandas.DataFrame.plot.hexbin,../reference/api/pandas.DataFrame.plot.hexbin +generated/pandas.DataFrame.plot.hist,../reference/api/pandas.DataFrame.plot.hist +generated/pandas.DataFrame.plot,../reference/api/pandas.DataFrame.plot +generated/pandas.DataFrame.plot.kde,../reference/api/pandas.DataFrame.plot.kde +generated/pandas.DataFrame.plot.line,../reference/api/pandas.DataFrame.plot.line +generated/pandas.DataFrame.plot.pie,../reference/api/pandas.DataFrame.plot.pie +generated/pandas.DataFrame.plot.scatter,../reference/api/pandas.DataFrame.plot.scatter +generated/pandas.DataFrame.pop,../reference/api/pandas.DataFrame.pop +generated/pandas.DataFrame.pow,../reference/api/pandas.DataFrame.pow +generated/pandas.DataFrame.prod,../reference/api/pandas.DataFrame.prod +generated/pandas.DataFrame.product,../reference/api/pandas.DataFrame.product +generated/pandas.DataFrame.quantile,../reference/api/pandas.DataFrame.quantile +generated/pandas.DataFrame.query,../reference/api/pandas.DataFrame.query +generated/pandas.DataFrame.radd,../reference/api/pandas.DataFrame.radd +generated/pandas.DataFrame.rank,../reference/api/pandas.DataFrame.rank +generated/pandas.DataFrame.rdiv,../reference/api/pandas.DataFrame.rdiv +generated/pandas.DataFrame.reindex_axis,../reference/api/pandas.DataFrame.reindex_axis +generated/pandas.DataFrame.reindex,../reference/api/pandas.DataFrame.reindex +generated/pandas.DataFrame.reindex_like,../reference/api/pandas.DataFrame.reindex_like +generated/pandas.DataFrame.rename_axis,../reference/api/pandas.DataFrame.rename_axis +generated/pandas.DataFrame.rename,../reference/api/pandas.DataFrame.rename +generated/pandas.DataFrame.reorder_levels,../reference/api/pandas.DataFrame.reorder_levels +generated/pandas.DataFrame.replace,../reference/api/pandas.DataFrame.replace +generated/pandas.DataFrame.resample,../reference/api/pandas.DataFrame.resample +generated/pandas.DataFrame.reset_index,../reference/api/pandas.DataFrame.reset_index +generated/pandas.DataFrame.rfloordiv,../reference/api/pandas.DataFrame.rfloordiv +generated/pandas.DataFrame.rmod,../reference/api/pandas.DataFrame.rmod +generated/pandas.DataFrame.rmul,../reference/api/pandas.DataFrame.rmul +generated/pandas.DataFrame.rolling,../reference/api/pandas.DataFrame.rolling +generated/pandas.DataFrame.round,../reference/api/pandas.DataFrame.round +generated/pandas.DataFrame.rpow,../reference/api/pandas.DataFrame.rpow +generated/pandas.DataFrame.rsub,../reference/api/pandas.DataFrame.rsub +generated/pandas.DataFrame.rtruediv,../reference/api/pandas.DataFrame.rtruediv +generated/pandas.DataFrame.sample,../reference/api/pandas.DataFrame.sample +generated/pandas.DataFrame.select_dtypes,../reference/api/pandas.DataFrame.select_dtypes +generated/pandas.DataFrame.select,../reference/api/pandas.DataFrame.select +generated/pandas.DataFrame.sem,../reference/api/pandas.DataFrame.sem +generated/pandas.DataFrame.set_axis,../reference/api/pandas.DataFrame.set_axis +generated/pandas.DataFrame.set_index,../reference/api/pandas.DataFrame.set_index +generated/pandas.DataFrame.set_value,../reference/api/pandas.DataFrame.set_value +generated/pandas.DataFrame.shape,../reference/api/pandas.DataFrame.shape +generated/pandas.DataFrame.shift,../reference/api/pandas.DataFrame.shift +generated/pandas.DataFrame.size,../reference/api/pandas.DataFrame.size +generated/pandas.DataFrame.skew,../reference/api/pandas.DataFrame.skew +generated/pandas.DataFrame.slice_shift,../reference/api/pandas.DataFrame.slice_shift +generated/pandas.DataFrame.sort_index,../reference/api/pandas.DataFrame.sort_index +generated/pandas.DataFrame.sort_values,../reference/api/pandas.DataFrame.sort_values +generated/pandas.DataFrame.squeeze,../reference/api/pandas.DataFrame.squeeze +generated/pandas.DataFrame.stack,../reference/api/pandas.DataFrame.stack +generated/pandas.DataFrame.std,../reference/api/pandas.DataFrame.std +generated/pandas.DataFrame.style,../reference/api/pandas.DataFrame.style +generated/pandas.DataFrame.sub,../reference/api/pandas.DataFrame.sub +generated/pandas.DataFrame.subtract,../reference/api/pandas.DataFrame.subtract +generated/pandas.DataFrame.sum,../reference/api/pandas.DataFrame.sum +generated/pandas.DataFrame.swapaxes,../reference/api/pandas.DataFrame.swapaxes +generated/pandas.DataFrame.swaplevel,../reference/api/pandas.DataFrame.swaplevel +generated/pandas.DataFrame.tail,../reference/api/pandas.DataFrame.tail +generated/pandas.DataFrame.take,../reference/api/pandas.DataFrame.take +generated/pandas.DataFrame.T,../reference/api/pandas.DataFrame.T +generated/pandas.DataFrame.timetuple,../reference/api/pandas.DataFrame.timetuple +generated/pandas.DataFrame.to_clipboard,../reference/api/pandas.DataFrame.to_clipboard +generated/pandas.DataFrame.to_csv,../reference/api/pandas.DataFrame.to_csv +generated/pandas.DataFrame.to_dense,../reference/api/pandas.DataFrame.to_dense +generated/pandas.DataFrame.to_dict,../reference/api/pandas.DataFrame.to_dict +generated/pandas.DataFrame.to_excel,../reference/api/pandas.DataFrame.to_excel +generated/pandas.DataFrame.to_feather,../reference/api/pandas.DataFrame.to_feather +generated/pandas.DataFrame.to_gbq,../reference/api/pandas.DataFrame.to_gbq +generated/pandas.DataFrame.to_hdf,../reference/api/pandas.DataFrame.to_hdf +generated/pandas.DataFrame.to,../reference/api/pandas.DataFrame.to +generated/pandas.DataFrame.to_json,../reference/api/pandas.DataFrame.to_json +generated/pandas.DataFrame.to_latex,../reference/api/pandas.DataFrame.to_latex +generated/pandas.DataFrame.to_msgpack,../reference/api/pandas.DataFrame.to_msgpack +generated/pandas.DataFrame.to_numpy,../reference/api/pandas.DataFrame.to_numpy +generated/pandas.DataFrame.to_panel,../reference/api/pandas.DataFrame.to_panel +generated/pandas.DataFrame.to_parquet,../reference/api/pandas.DataFrame.to_parquet +generated/pandas.DataFrame.to_period,../reference/api/pandas.DataFrame.to_period +generated/pandas.DataFrame.to_pickle,../reference/api/pandas.DataFrame.to_pickle +generated/pandas.DataFrame.to_records,../reference/api/pandas.DataFrame.to_records +generated/pandas.DataFrame.to_sparse,../reference/api/pandas.DataFrame.to_sparse +generated/pandas.DataFrame.to_sql,../reference/api/pandas.DataFrame.to_sql +generated/pandas.DataFrame.to_stata,../reference/api/pandas.DataFrame.to_stata +generated/pandas.DataFrame.to_string,../reference/api/pandas.DataFrame.to_string +generated/pandas.DataFrame.to_timestamp,../reference/api/pandas.DataFrame.to_timestamp +generated/pandas.DataFrame.to_xarray,../reference/api/pandas.DataFrame.to_xarray +generated/pandas.DataFrame.transform,../reference/api/pandas.DataFrame.transform +generated/pandas.DataFrame.transpose,../reference/api/pandas.DataFrame.transpose +generated/pandas.DataFrame.truediv,../reference/api/pandas.DataFrame.truediv +generated/pandas.DataFrame.truncate,../reference/api/pandas.DataFrame.truncate +generated/pandas.DataFrame.tshift,../reference/api/pandas.DataFrame.tshift +generated/pandas.DataFrame.tz_convert,../reference/api/pandas.DataFrame.tz_convert +generated/pandas.DataFrame.tz_localize,../reference/api/pandas.DataFrame.tz_localize +generated/pandas.DataFrame.unstack,../reference/api/pandas.DataFrame.unstack +generated/pandas.DataFrame.update,../reference/api/pandas.DataFrame.update +generated/pandas.DataFrame.values,../reference/api/pandas.DataFrame.values +generated/pandas.DataFrame.var,../reference/api/pandas.DataFrame.var +generated/pandas.DataFrame.where,../reference/api/pandas.DataFrame.where +generated/pandas.DataFrame.xs,../reference/api/pandas.DataFrame.xs +generated/pandas.date_range,../reference/api/pandas.date_range +generated/pandas.DatetimeIndex.ceil,../reference/api/pandas.DatetimeIndex.ceil +generated/pandas.DatetimeIndex.date,../reference/api/pandas.DatetimeIndex.date +generated/pandas.DatetimeIndex.day,../reference/api/pandas.DatetimeIndex.day +generated/pandas.DatetimeIndex.day_name,../reference/api/pandas.DatetimeIndex.day_name +generated/pandas.DatetimeIndex.dayofweek,../reference/api/pandas.DatetimeIndex.dayofweek +generated/pandas.DatetimeIndex.dayofyear,../reference/api/pandas.DatetimeIndex.dayofyear +generated/pandas.DatetimeIndex.floor,../reference/api/pandas.DatetimeIndex.floor +generated/pandas.DatetimeIndex.freq,../reference/api/pandas.DatetimeIndex.freq +generated/pandas.DatetimeIndex.freqstr,../reference/api/pandas.DatetimeIndex.freqstr +generated/pandas.DatetimeIndex.hour,../reference/api/pandas.DatetimeIndex.hour +generated/pandas.DatetimeIndex,../reference/api/pandas.DatetimeIndex +generated/pandas.DatetimeIndex.indexer_at_time,../reference/api/pandas.DatetimeIndex.indexer_at_time +generated/pandas.DatetimeIndex.indexer_between_time,../reference/api/pandas.DatetimeIndex.indexer_between_time +generated/pandas.DatetimeIndex.inferred_freq,../reference/api/pandas.DatetimeIndex.inferred_freq +generated/pandas.DatetimeIndex.is_leap_year,../reference/api/pandas.DatetimeIndex.is_leap_year +generated/pandas.DatetimeIndex.is_month_end,../reference/api/pandas.DatetimeIndex.is_month_end +generated/pandas.DatetimeIndex.is_month_start,../reference/api/pandas.DatetimeIndex.is_month_start +generated/pandas.DatetimeIndex.is_quarter_end,../reference/api/pandas.DatetimeIndex.is_quarter_end +generated/pandas.DatetimeIndex.is_quarter_start,../reference/api/pandas.DatetimeIndex.is_quarter_start +generated/pandas.DatetimeIndex.is_year_end,../reference/api/pandas.DatetimeIndex.is_year_end +generated/pandas.DatetimeIndex.is_year_start,../reference/api/pandas.DatetimeIndex.is_year_start +generated/pandas.DatetimeIndex.microsecond,../reference/api/pandas.DatetimeIndex.microsecond +generated/pandas.DatetimeIndex.minute,../reference/api/pandas.DatetimeIndex.minute +generated/pandas.DatetimeIndex.month,../reference/api/pandas.DatetimeIndex.month +generated/pandas.DatetimeIndex.month_name,../reference/api/pandas.DatetimeIndex.month_name +generated/pandas.DatetimeIndex.nanosecond,../reference/api/pandas.DatetimeIndex.nanosecond +generated/pandas.DatetimeIndex.normalize,../reference/api/pandas.DatetimeIndex.normalize +generated/pandas.DatetimeIndex.quarter,../reference/api/pandas.DatetimeIndex.quarter +generated/pandas.DatetimeIndex.round,../reference/api/pandas.DatetimeIndex.round +generated/pandas.DatetimeIndex.second,../reference/api/pandas.DatetimeIndex.second +generated/pandas.DatetimeIndex.snap,../reference/api/pandas.DatetimeIndex.snap +generated/pandas.DatetimeIndex.strftime,../reference/api/pandas.DatetimeIndex.strftime +generated/pandas.DatetimeIndex.time,../reference/api/pandas.DatetimeIndex.time +generated/pandas.DatetimeIndex.timetz,../reference/api/pandas.DatetimeIndex.timetz +generated/pandas.DatetimeIndex.to_frame,../reference/api/pandas.DatetimeIndex.to_frame +generated/pandas.DatetimeIndex.to_perioddelta,../reference/api/pandas.DatetimeIndex.to_perioddelta +generated/pandas.DatetimeIndex.to_period,../reference/api/pandas.DatetimeIndex.to_period +generated/pandas.DatetimeIndex.to_pydatetime,../reference/api/pandas.DatetimeIndex.to_pydatetime +generated/pandas.DatetimeIndex.to_series,../reference/api/pandas.DatetimeIndex.to_series +generated/pandas.DatetimeIndex.tz_convert,../reference/api/pandas.DatetimeIndex.tz_convert +generated/pandas.DatetimeIndex.tz,../reference/api/pandas.DatetimeIndex.tz +generated/pandas.DatetimeIndex.tz_localize,../reference/api/pandas.DatetimeIndex.tz_localize +generated/pandas.DatetimeIndex.weekday,../reference/api/pandas.DatetimeIndex.weekday +generated/pandas.DatetimeIndex.week,../reference/api/pandas.DatetimeIndex.week +generated/pandas.DatetimeIndex.weekofyear,../reference/api/pandas.DatetimeIndex.weekofyear +generated/pandas.DatetimeIndex.year,../reference/api/pandas.DatetimeIndex.year +generated/pandas.DatetimeTZDtype.base,../reference/api/pandas.DatetimeTZDtype.base +generated/pandas.DatetimeTZDtype.construct_array_type,../reference/api/pandas.DatetimeTZDtype.construct_array_type +generated/pandas.DatetimeTZDtype.construct_from_string,../reference/api/pandas.DatetimeTZDtype.construct_from_string +generated/pandas.DatetimeTZDtype,../reference/api/pandas.DatetimeTZDtype +generated/pandas.DatetimeTZDtype.isbuiltin,../reference/api/pandas.DatetimeTZDtype.isbuiltin +generated/pandas.DatetimeTZDtype.is_dtype,../reference/api/pandas.DatetimeTZDtype.is_dtype +generated/pandas.DatetimeTZDtype.isnative,../reference/api/pandas.DatetimeTZDtype.isnative +generated/pandas.DatetimeTZDtype.itemsize,../reference/api/pandas.DatetimeTZDtype.itemsize +generated/pandas.DatetimeTZDtype.kind,../reference/api/pandas.DatetimeTZDtype.kind +generated/pandas.DatetimeTZDtype.name,../reference/api/pandas.DatetimeTZDtype.name +generated/pandas.DatetimeTZDtype.names,../reference/api/pandas.DatetimeTZDtype.names +generated/pandas.DatetimeTZDtype.na_value,../reference/api/pandas.DatetimeTZDtype.na_value +generated/pandas.DatetimeTZDtype.num,../reference/api/pandas.DatetimeTZDtype.num +generated/pandas.DatetimeTZDtype.reset_cache,../reference/api/pandas.DatetimeTZDtype.reset_cache +generated/pandas.DatetimeTZDtype.shape,../reference/api/pandas.DatetimeTZDtype.shape +generated/pandas.DatetimeTZDtype.str,../reference/api/pandas.DatetimeTZDtype.str +generated/pandas.DatetimeTZDtype.subdtype,../reference/api/pandas.DatetimeTZDtype.subdtype +generated/pandas.DatetimeTZDtype.tz,../reference/api/pandas.DatetimeTZDtype.tz +generated/pandas.DatetimeTZDtype.unit,../reference/api/pandas.DatetimeTZDtype.unit +generated/pandas.describe_option,../reference/api/pandas.describe_option +generated/pandas.errors.DtypeWarning,../reference/api/pandas.errors.DtypeWarning +generated/pandas.errors.EmptyDataError,../reference/api/pandas.errors.EmptyDataError +generated/pandas.errors.OutOfBoundsDatetime,../reference/api/pandas.errors.OutOfBoundsDatetime +generated/pandas.errors.ParserError,../reference/api/pandas.errors.ParserError +generated/pandas.errors.ParserWarning,../reference/api/pandas.errors.ParserWarning +generated/pandas.errors.PerformanceWarning,../reference/api/pandas.errors.PerformanceWarning +generated/pandas.errors.UnsortedIndexError,../reference/api/pandas.errors.UnsortedIndexError +generated/pandas.errors.UnsupportedFunctionCall,../reference/api/pandas.errors.UnsupportedFunctionCall +generated/pandas.eval,../reference/api/pandas.eval +generated/pandas.ExcelFile.parse,../reference/api/pandas.ExcelFile.parse +generated/pandas.ExcelWriter,../reference/api/pandas.ExcelWriter +generated/pandas.factorize,../reference/api/pandas.factorize +generated/pandas.Float64Index,../reference/api/pandas.Float64Index +generated/pandas.get_dummies,../reference/api/pandas.get_dummies +generated/pandas.get_option,../reference/api/pandas.get_option +generated/pandas.Grouper,../reference/api/pandas.Grouper +generated/pandas.HDFStore.append,../reference/api/pandas.HDFStore.append +generated/pandas.HDFStore.get,../reference/api/pandas.HDFStore.get +generated/pandas.HDFStore.groups,../reference/api/pandas.HDFStore.groups +generated/pandas.HDFStore.info,../reference/api/pandas.HDFStore.info +generated/pandas.HDFStore.keys,../reference/api/pandas.HDFStore.keys +generated/pandas.HDFStore.put,../reference/api/pandas.HDFStore.put +generated/pandas.HDFStore.select,../reference/api/pandas.HDFStore.select +generated/pandas.HDFStore.walk,../reference/api/pandas.HDFStore.walk +generated/pandas.Index.all,../reference/api/pandas.Index.all +generated/pandas.Index.any,../reference/api/pandas.Index.any +generated/pandas.Index.append,../reference/api/pandas.Index.append +generated/pandas.Index.argmax,../reference/api/pandas.Index.argmax +generated/pandas.Index.argmin,../reference/api/pandas.Index.argmin +generated/pandas.Index.argsort,../reference/api/pandas.Index.argsort +generated/pandas.Index.array,../reference/api/pandas.Index.array +generated/pandas.Index.asi8,../reference/api/pandas.Index.asi8 +generated/pandas.Index.asof,../reference/api/pandas.Index.asof +generated/pandas.Index.asof_locs,../reference/api/pandas.Index.asof_locs +generated/pandas.Index.astype,../reference/api/pandas.Index.astype +generated/pandas.Index.base,../reference/api/pandas.Index.base +generated/pandas.Index.contains,../reference/api/pandas.Index.contains +generated/pandas.Index.copy,../reference/api/pandas.Index.copy +generated/pandas.Index.data,../reference/api/pandas.Index.data +generated/pandas.Index.delete,../reference/api/pandas.Index.delete +generated/pandas.Index.difference,../reference/api/pandas.Index.difference +generated/pandas.Index.drop_duplicates,../reference/api/pandas.Index.drop_duplicates +generated/pandas.Index.drop,../reference/api/pandas.Index.drop +generated/pandas.Index.droplevel,../reference/api/pandas.Index.droplevel +generated/pandas.Index.dropna,../reference/api/pandas.Index.dropna +generated/pandas.Index.dtype,../reference/api/pandas.Index.dtype +generated/pandas.Index.dtype_str,../reference/api/pandas.Index.dtype_str +generated/pandas.Index.duplicated,../reference/api/pandas.Index.duplicated +generated/pandas.Index.empty,../reference/api/pandas.Index.empty +generated/pandas.Index.equals,../reference/api/pandas.Index.equals +generated/pandas.Index.factorize,../reference/api/pandas.Index.factorize +generated/pandas.Index.fillna,../reference/api/pandas.Index.fillna +generated/pandas.Index.flags,../reference/api/pandas.Index.flags +generated/pandas.Index.format,../reference/api/pandas.Index.format +generated/pandas.Index.get_duplicates,../reference/api/pandas.Index.get_duplicates +generated/pandas.Index.get_indexer_for,../reference/api/pandas.Index.get_indexer_for +generated/pandas.Index.get_indexer,../reference/api/pandas.Index.get_indexer +generated/pandas.Index.get_indexer_non_unique,../reference/api/pandas.Index.get_indexer_non_unique +generated/pandas.Index.get_level_values,../reference/api/pandas.Index.get_level_values +generated/pandas.Index.get_loc,../reference/api/pandas.Index.get_loc +generated/pandas.Index.get_slice_bound,../reference/api/pandas.Index.get_slice_bound +generated/pandas.Index.get_value,../reference/api/pandas.Index.get_value +generated/pandas.Index.get_values,../reference/api/pandas.Index.get_values +generated/pandas.Index.groupby,../reference/api/pandas.Index.groupby +generated/pandas.Index.has_duplicates,../reference/api/pandas.Index.has_duplicates +generated/pandas.Index.hasnans,../reference/api/pandas.Index.hasnans +generated/pandas.Index.holds_integer,../reference/api/pandas.Index.holds_integer +generated/pandas.Index,../reference/api/pandas.Index +generated/pandas.Index.identical,../reference/api/pandas.Index.identical +generated/pandas.Index.inferred_type,../reference/api/pandas.Index.inferred_type +generated/pandas.Index.insert,../reference/api/pandas.Index.insert +generated/pandas.Index.intersection,../reference/api/pandas.Index.intersection +generated/pandas.Index.is_all_dates,../reference/api/pandas.Index.is_all_dates +generated/pandas.Index.is_boolean,../reference/api/pandas.Index.is_boolean +generated/pandas.Index.is_categorical,../reference/api/pandas.Index.is_categorical +generated/pandas.Index.is_floating,../reference/api/pandas.Index.is_floating +generated/pandas.Index.is_,../reference/api/pandas.Index.is_ +generated/pandas.Index.isin,../reference/api/pandas.Index.isin +generated/pandas.Index.is_integer,../reference/api/pandas.Index.is_integer +generated/pandas.Index.is_interval,../reference/api/pandas.Index.is_interval +generated/pandas.Index.is_lexsorted_for_tuple,../reference/api/pandas.Index.is_lexsorted_for_tuple +generated/pandas.Index.is_mixed,../reference/api/pandas.Index.is_mixed +generated/pandas.Index.is_monotonic_decreasing,../reference/api/pandas.Index.is_monotonic_decreasing +generated/pandas.Index.is_monotonic,../reference/api/pandas.Index.is_monotonic +generated/pandas.Index.is_monotonic_increasing,../reference/api/pandas.Index.is_monotonic_increasing +generated/pandas.Index.isna,../reference/api/pandas.Index.isna +generated/pandas.Index.isnull,../reference/api/pandas.Index.isnull +generated/pandas.Index.is_numeric,../reference/api/pandas.Index.is_numeric +generated/pandas.Index.is_object,../reference/api/pandas.Index.is_object +generated/pandas.Index.is_type_compatible,../reference/api/pandas.Index.is_type_compatible +generated/pandas.Index.is_unique,../reference/api/pandas.Index.is_unique +generated/pandas.Index.item,../reference/api/pandas.Index.item +generated/pandas.Index.itemsize,../reference/api/pandas.Index.itemsize +generated/pandas.Index.join,../reference/api/pandas.Index.join +generated/pandas.Index.map,../reference/api/pandas.Index.map +generated/pandas.Index.max,../reference/api/pandas.Index.max +generated/pandas.Index.memory_usage,../reference/api/pandas.Index.memory_usage +generated/pandas.Index.min,../reference/api/pandas.Index.min +generated/pandas.Index.name,../reference/api/pandas.Index.name +generated/pandas.Index.names,../reference/api/pandas.Index.names +generated/pandas.Index.nbytes,../reference/api/pandas.Index.nbytes +generated/pandas.Index.ndim,../reference/api/pandas.Index.ndim +generated/pandas.Index.nlevels,../reference/api/pandas.Index.nlevels +generated/pandas.Index.notna,../reference/api/pandas.Index.notna +generated/pandas.Index.notnull,../reference/api/pandas.Index.notnull +generated/pandas.Index.nunique,../reference/api/pandas.Index.nunique +generated/pandas.Index.putmask,../reference/api/pandas.Index.putmask +generated/pandas.Index.ravel,../reference/api/pandas.Index.ravel +generated/pandas.Index.reindex,../reference/api/pandas.Index.reindex +generated/pandas.Index.rename,../reference/api/pandas.Index.rename +generated/pandas.Index.repeat,../reference/api/pandas.Index.repeat +generated/pandas.Index.searchsorted,../reference/api/pandas.Index.searchsorted +generated/pandas.Index.set_names,../reference/api/pandas.Index.set_names +generated/pandas.Index.set_value,../reference/api/pandas.Index.set_value +generated/pandas.Index.shape,../reference/api/pandas.Index.shape +generated/pandas.Index.shift,../reference/api/pandas.Index.shift +generated/pandas.Index.size,../reference/api/pandas.Index.size +generated/pandas.IndexSlice,../reference/api/pandas.IndexSlice +generated/pandas.Index.slice_indexer,../reference/api/pandas.Index.slice_indexer +generated/pandas.Index.slice_locs,../reference/api/pandas.Index.slice_locs +generated/pandas.Index.sort,../reference/api/pandas.Index.sort +generated/pandas.Index.sortlevel,../reference/api/pandas.Index.sortlevel +generated/pandas.Index.sort_values,../reference/api/pandas.Index.sort_values +generated/pandas.Index.str,../reference/api/pandas.Index.str +generated/pandas.Index.strides,../reference/api/pandas.Index.strides +generated/pandas.Index.summary,../reference/api/pandas.Index.summary +generated/pandas.Index.symmetric_difference,../reference/api/pandas.Index.symmetric_difference +generated/pandas.Index.take,../reference/api/pandas.Index.take +generated/pandas.Index.T,../reference/api/pandas.Index.T +generated/pandas.Index.to_flat_index,../reference/api/pandas.Index.to_flat_index +generated/pandas.Index.to_frame,../reference/api/pandas.Index.to_frame +generated/pandas.Index.to_list,../reference/api/pandas.Index.to_list +generated/pandas.Index.tolist,../reference/api/pandas.Index.tolist +generated/pandas.Index.to_native_types,../reference/api/pandas.Index.to_native_types +generated/pandas.Index.to_numpy,../reference/api/pandas.Index.to_numpy +generated/pandas.Index.to_series,../reference/api/pandas.Index.to_series +generated/pandas.Index.transpose,../reference/api/pandas.Index.transpose +generated/pandas.Index.union,../reference/api/pandas.Index.union +generated/pandas.Index.unique,../reference/api/pandas.Index.unique +generated/pandas.Index.value_counts,../reference/api/pandas.Index.value_counts +generated/pandas.Index.values,../reference/api/pandas.Index.values +generated/pandas.Index.view,../reference/api/pandas.Index.view +generated/pandas.Index.where,../reference/api/pandas.Index.where +generated/pandas.infer_freq,../reference/api/pandas.infer_freq +generated/pandas.Interval.closed,../reference/api/pandas.Interval.closed +generated/pandas.Interval.closed_left,../reference/api/pandas.Interval.closed_left +generated/pandas.Interval.closed_right,../reference/api/pandas.Interval.closed_right +generated/pandas.Interval,../reference/api/pandas.Interval +generated/pandas.IntervalIndex.closed,../reference/api/pandas.IntervalIndex.closed +generated/pandas.IntervalIndex.contains,../reference/api/pandas.IntervalIndex.contains +generated/pandas.IntervalIndex.from_arrays,../reference/api/pandas.IntervalIndex.from_arrays +generated/pandas.IntervalIndex.from_breaks,../reference/api/pandas.IntervalIndex.from_breaks +generated/pandas.IntervalIndex.from_tuples,../reference/api/pandas.IntervalIndex.from_tuples +generated/pandas.IntervalIndex.get_indexer,../reference/api/pandas.IntervalIndex.get_indexer +generated/pandas.IntervalIndex.get_loc,../reference/api/pandas.IntervalIndex.get_loc +generated/pandas.IntervalIndex,../reference/api/pandas.IntervalIndex +generated/pandas.IntervalIndex.is_non_overlapping_monotonic,../reference/api/pandas.IntervalIndex.is_non_overlapping_monotonic +generated/pandas.IntervalIndex.is_overlapping,../reference/api/pandas.IntervalIndex.is_overlapping +generated/pandas.IntervalIndex.left,../reference/api/pandas.IntervalIndex.left +generated/pandas.IntervalIndex.length,../reference/api/pandas.IntervalIndex.length +generated/pandas.IntervalIndex.mid,../reference/api/pandas.IntervalIndex.mid +generated/pandas.IntervalIndex.overlaps,../reference/api/pandas.IntervalIndex.overlaps +generated/pandas.IntervalIndex.right,../reference/api/pandas.IntervalIndex.right +generated/pandas.IntervalIndex.set_closed,../reference/api/pandas.IntervalIndex.set_closed +generated/pandas.IntervalIndex.to_tuples,../reference/api/pandas.IntervalIndex.to_tuples +generated/pandas.IntervalIndex.values,../reference/api/pandas.IntervalIndex.values +generated/pandas.Interval.left,../reference/api/pandas.Interval.left +generated/pandas.Interval.length,../reference/api/pandas.Interval.length +generated/pandas.Interval.mid,../reference/api/pandas.Interval.mid +generated/pandas.Interval.open_left,../reference/api/pandas.Interval.open_left +generated/pandas.Interval.open_right,../reference/api/pandas.Interval.open_right +generated/pandas.Interval.overlaps,../reference/api/pandas.Interval.overlaps +generated/pandas.interval_range,../reference/api/pandas.interval_range +generated/pandas.Interval.right,../reference/api/pandas.Interval.right +generated/pandas.io.formats.style.Styler.apply,../reference/api/pandas.io.formats.style.Styler.apply +generated/pandas.io.formats.style.Styler.applymap,../reference/api/pandas.io.formats.style.Styler.applymap +generated/pandas.io.formats.style.Styler.background_gradient,../reference/api/pandas.io.formats.style.Styler.background_gradient +generated/pandas.io.formats.style.Styler.bar,../reference/api/pandas.io.formats.style.Styler.bar +generated/pandas.io.formats.style.Styler.clear,../reference/api/pandas.io.formats.style.Styler.clear +generated/pandas.io.formats.style.Styler.env,../reference/api/pandas.io.formats.style.Styler.env +generated/pandas.io.formats.style.Styler.export,../reference/api/pandas.io.formats.style.Styler.export +generated/pandas.io.formats.style.Styler.format,../reference/api/pandas.io.formats.style.Styler.format +generated/pandas.io.formats.style.Styler.from_custom_template,../reference/api/pandas.io.formats.style.Styler.from_custom_template +generated/pandas.io.formats.style.Styler.hide_columns,../reference/api/pandas.io.formats.style.Styler.hide_columns +generated/pandas.io.formats.style.Styler.hide_index,../reference/api/pandas.io.formats.style.Styler.hide_index +generated/pandas.io.formats.style.Styler.highlight_max,../reference/api/pandas.io.formats.style.Styler.highlight_max +generated/pandas.io.formats.style.Styler.highlight_min,../reference/api/pandas.io.formats.style.Styler.highlight_min +generated/pandas.io.formats.style.Styler.highlight_null,../reference/api/pandas.io.formats.style.Styler.highlight_null +generated/pandas.io.formats.style.Styler,../reference/api/pandas.io.formats.style.Styler +generated/pandas.io.formats.style.Styler.loader,../reference/api/pandas.io.formats.style.Styler.loader +generated/pandas.io.formats.style.Styler.pipe,../reference/api/pandas.io.formats.style.Styler.pipe +generated/pandas.io.formats.style.Styler.render,../reference/api/pandas.io.formats.style.Styler.render +generated/pandas.io.formats.style.Styler.set_caption,../reference/api/pandas.io.formats.style.Styler.set_caption +generated/pandas.io.formats.style.Styler.set_precision,../reference/api/pandas.io.formats.style.Styler.set_precision +generated/pandas.io.formats.style.Styler.set_properties,../reference/api/pandas.io.formats.style.Styler.set_properties +generated/pandas.io.formats.style.Styler.set_table_attributes,../reference/api/pandas.io.formats.style.Styler.set_table_attributes +generated/pandas.io.formats.style.Styler.set_table_styles,../reference/api/pandas.io.formats.style.Styler.set_table_styles +generated/pandas.io.formats.style.Styler.set_uuid,../reference/api/pandas.io.formats.style.Styler.set_uuid +generated/pandas.io.formats.style.Styler.template,../reference/api/pandas.io.formats.style.Styler.template +generated/pandas.io.formats.style.Styler.to_excel,../reference/api/pandas.io.formats.style.Styler.to_excel +generated/pandas.io.formats.style.Styler.use,../reference/api/pandas.io.formats.style.Styler.use +generated/pandas.io.formats.style.Styler.where,../reference/api/pandas.io.formats.style.Styler.where +generated/pandas.io.json.build_table_schema,../reference/api/pandas.io.json.build_table_schema +generated/pandas.io.json.json_normalize,../reference/api/pandas.io.json.json_normalize +generated/pandas.io.stata.StataReader.data,../reference/api/pandas.io.stata.StataReader.data +generated/pandas.io.stata.StataReader.data_label,../reference/api/pandas.io.stata.StataReader.data_label +generated/pandas.io.stata.StataReader.value_labels,../reference/api/pandas.io.stata.StataReader.value_labels +generated/pandas.io.stata.StataReader.variable_labels,../reference/api/pandas.io.stata.StataReader.variable_labels +generated/pandas.io.stata.StataWriter.write_file,../reference/api/pandas.io.stata.StataWriter.write_file +generated/pandas.isna,../reference/api/pandas.isna +generated/pandas.isnull,../reference/api/pandas.isnull +generated/pandas.melt,../reference/api/pandas.melt +generated/pandas.merge_asof,../reference/api/pandas.merge_asof +generated/pandas.merge,../reference/api/pandas.merge +generated/pandas.merge_ordered,../reference/api/pandas.merge_ordered +generated/pandas.MultiIndex.codes,../reference/api/pandas.MultiIndex.codes +generated/pandas.MultiIndex.droplevel,../reference/api/pandas.MultiIndex.droplevel +generated/pandas.MultiIndex.from_arrays,../reference/api/pandas.MultiIndex.from_arrays +generated/pandas.MultiIndex.from_frame,../reference/api/pandas.MultiIndex.from_frame +generated/pandas.MultiIndex.from_product,../reference/api/pandas.MultiIndex.from_product +generated/pandas.MultiIndex.from_tuples,../reference/api/pandas.MultiIndex.from_tuples +generated/pandas.MultiIndex.get_indexer,../reference/api/pandas.MultiIndex.get_indexer +generated/pandas.MultiIndex.get_level_values,../reference/api/pandas.MultiIndex.get_level_values +generated/pandas.MultiIndex.get_loc,../reference/api/pandas.MultiIndex.get_loc +generated/pandas.MultiIndex.get_loc_level,../reference/api/pandas.MultiIndex.get_loc_level +generated/pandas.MultiIndex,../reference/api/pandas.MultiIndex +generated/pandas.MultiIndex.is_lexsorted,../reference/api/pandas.MultiIndex.is_lexsorted +generated/pandas.MultiIndex.levels,../reference/api/pandas.MultiIndex.levels +generated/pandas.MultiIndex.levshape,../reference/api/pandas.MultiIndex.levshape +generated/pandas.MultiIndex.names,../reference/api/pandas.MultiIndex.names +generated/pandas.MultiIndex.nlevels,../reference/api/pandas.MultiIndex.nlevels +generated/pandas.MultiIndex.remove_unused_levels,../reference/api/pandas.MultiIndex.remove_unused_levels +generated/pandas.MultiIndex.reorder_levels,../reference/api/pandas.MultiIndex.reorder_levels +generated/pandas.MultiIndex.set_codes,../reference/api/pandas.MultiIndex.set_codes +generated/pandas.MultiIndex.set_levels,../reference/api/pandas.MultiIndex.set_levels +generated/pandas.MultiIndex.sortlevel,../reference/api/pandas.MultiIndex.sortlevel +generated/pandas.MultiIndex.swaplevel,../reference/api/pandas.MultiIndex.swaplevel +generated/pandas.MultiIndex.to_flat_index,../reference/api/pandas.MultiIndex.to_flat_index +generated/pandas.MultiIndex.to_frame,../reference/api/pandas.MultiIndex.to_frame +generated/pandas.MultiIndex.to_hierarchical,../reference/api/pandas.MultiIndex.to_hierarchical +generated/pandas.notna,../reference/api/pandas.notna +generated/pandas.notnull,../reference/api/pandas.notnull +generated/pandas.option_context,../reference/api/pandas.option_context +generated/pandas.Panel.abs,../reference/api/pandas.Panel.abs +generated/pandas.Panel.add,../reference/api/pandas.Panel.add +generated/pandas.Panel.add_prefix,../reference/api/pandas.Panel.add_prefix +generated/pandas.Panel.add_suffix,../reference/api/pandas.Panel.add_suffix +generated/pandas.Panel.agg,../reference/api/pandas.Panel.agg +generated/pandas.Panel.aggregate,../reference/api/pandas.Panel.aggregate +generated/pandas.Panel.align,../reference/api/pandas.Panel.align +generated/pandas.Panel.all,../reference/api/pandas.Panel.all +generated/pandas.Panel.any,../reference/api/pandas.Panel.any +generated/pandas.Panel.apply,../reference/api/pandas.Panel.apply +generated/pandas.Panel.as_blocks,../reference/api/pandas.Panel.as_blocks +generated/pandas.Panel.asfreq,../reference/api/pandas.Panel.asfreq +generated/pandas.Panel.as_matrix,../reference/api/pandas.Panel.as_matrix +generated/pandas.Panel.asof,../reference/api/pandas.Panel.asof +generated/pandas.Panel.astype,../reference/api/pandas.Panel.astype +generated/pandas.Panel.at,../reference/api/pandas.Panel.at +generated/pandas.Panel.at_time,../reference/api/pandas.Panel.at_time +generated/pandas.Panel.axes,../reference/api/pandas.Panel.axes +generated/pandas.Panel.between_time,../reference/api/pandas.Panel.between_time +generated/pandas.Panel.bfill,../reference/api/pandas.Panel.bfill +generated/pandas.Panel.blocks,../reference/api/pandas.Panel.blocks +generated/pandas.Panel.bool,../reference/api/pandas.Panel.bool +generated/pandas.Panel.clip,../reference/api/pandas.Panel.clip +generated/pandas.Panel.clip_lower,../reference/api/pandas.Panel.clip_lower +generated/pandas.Panel.clip_upper,../reference/api/pandas.Panel.clip_upper +generated/pandas.Panel.compound,../reference/api/pandas.Panel.compound +generated/pandas.Panel.conform,../reference/api/pandas.Panel.conform +generated/pandas.Panel.convert_objects,../reference/api/pandas.Panel.convert_objects +generated/pandas.Panel.copy,../reference/api/pandas.Panel.copy +generated/pandas.Panel.count,../reference/api/pandas.Panel.count +generated/pandas.Panel.cummax,../reference/api/pandas.Panel.cummax +generated/pandas.Panel.cummin,../reference/api/pandas.Panel.cummin +generated/pandas.Panel.cumprod,../reference/api/pandas.Panel.cumprod +generated/pandas.Panel.cumsum,../reference/api/pandas.Panel.cumsum +generated/pandas.Panel.describe,../reference/api/pandas.Panel.describe +generated/pandas.Panel.div,../reference/api/pandas.Panel.div +generated/pandas.Panel.divide,../reference/api/pandas.Panel.divide +generated/pandas.Panel.drop,../reference/api/pandas.Panel.drop +generated/pandas.Panel.droplevel,../reference/api/pandas.Panel.droplevel +generated/pandas.Panel.dropna,../reference/api/pandas.Panel.dropna +generated/pandas.Panel.dtypes,../reference/api/pandas.Panel.dtypes +generated/pandas.Panel.empty,../reference/api/pandas.Panel.empty +generated/pandas.Panel.eq,../reference/api/pandas.Panel.eq +generated/pandas.Panel.equals,../reference/api/pandas.Panel.equals +generated/pandas.Panel.ffill,../reference/api/pandas.Panel.ffill +generated/pandas.Panel.fillna,../reference/api/pandas.Panel.fillna +generated/pandas.Panel.filter,../reference/api/pandas.Panel.filter +generated/pandas.Panel.first,../reference/api/pandas.Panel.first +generated/pandas.Panel.first_valid_index,../reference/api/pandas.Panel.first_valid_index +generated/pandas.Panel.floordiv,../reference/api/pandas.Panel.floordiv +generated/pandas.Panel.from_dict,../reference/api/pandas.Panel.from_dict +generated/pandas.Panel.fromDict,../reference/api/pandas.Panel.fromDict +generated/pandas.Panel.ftypes,../reference/api/pandas.Panel.ftypes +generated/pandas.Panel.ge,../reference/api/pandas.Panel.ge +generated/pandas.Panel.get_dtype_counts,../reference/api/pandas.Panel.get_dtype_counts +generated/pandas.Panel.get_ftype_counts,../reference/api/pandas.Panel.get_ftype_counts +generated/pandas.Panel.get,../reference/api/pandas.Panel.get +generated/pandas.Panel.get_value,../reference/api/pandas.Panel.get_value +generated/pandas.Panel.get_values,../reference/api/pandas.Panel.get_values +generated/pandas.Panel.groupby,../reference/api/pandas.Panel.groupby +generated/pandas.Panel.gt,../reference/api/pandas.Panel.gt +generated/pandas.Panel.head,../reference/api/pandas.Panel.head +generated/pandas.Panel,../reference/api/pandas.Panel +generated/pandas.Panel.iat,../reference/api/pandas.Panel.iat +generated/pandas.Panel.iloc,../reference/api/pandas.Panel.iloc +generated/pandas.Panel.infer_objects,../reference/api/pandas.Panel.infer_objects +generated/pandas.Panel.interpolate,../reference/api/pandas.Panel.interpolate +generated/pandas.Panel.is_copy,../reference/api/pandas.Panel.is_copy +generated/pandas.Panel.isna,../reference/api/pandas.Panel.isna +generated/pandas.Panel.isnull,../reference/api/pandas.Panel.isnull +generated/pandas.Panel.items,../reference/api/pandas.Panel.items +generated/pandas.Panel.__iter__,../reference/api/pandas.Panel.__iter__ +generated/pandas.Panel.iteritems,../reference/api/pandas.Panel.iteritems +generated/pandas.Panel.ix,../reference/api/pandas.Panel.ix +generated/pandas.Panel.join,../reference/api/pandas.Panel.join +generated/pandas.Panel.keys,../reference/api/pandas.Panel.keys +generated/pandas.Panel.kurt,../reference/api/pandas.Panel.kurt +generated/pandas.Panel.kurtosis,../reference/api/pandas.Panel.kurtosis +generated/pandas.Panel.last,../reference/api/pandas.Panel.last +generated/pandas.Panel.last_valid_index,../reference/api/pandas.Panel.last_valid_index +generated/pandas.Panel.le,../reference/api/pandas.Panel.le +generated/pandas.Panel.loc,../reference/api/pandas.Panel.loc +generated/pandas.Panel.lt,../reference/api/pandas.Panel.lt +generated/pandas.Panel.mad,../reference/api/pandas.Panel.mad +generated/pandas.Panel.major_axis,../reference/api/pandas.Panel.major_axis +generated/pandas.Panel.major_xs,../reference/api/pandas.Panel.major_xs +generated/pandas.Panel.mask,../reference/api/pandas.Panel.mask +generated/pandas.Panel.max,../reference/api/pandas.Panel.max +generated/pandas.Panel.mean,../reference/api/pandas.Panel.mean +generated/pandas.Panel.median,../reference/api/pandas.Panel.median +generated/pandas.Panel.min,../reference/api/pandas.Panel.min +generated/pandas.Panel.minor_axis,../reference/api/pandas.Panel.minor_axis +generated/pandas.Panel.minor_xs,../reference/api/pandas.Panel.minor_xs +generated/pandas.Panel.mod,../reference/api/pandas.Panel.mod +generated/pandas.Panel.mul,../reference/api/pandas.Panel.mul +generated/pandas.Panel.multiply,../reference/api/pandas.Panel.multiply +generated/pandas.Panel.ndim,../reference/api/pandas.Panel.ndim +generated/pandas.Panel.ne,../reference/api/pandas.Panel.ne +generated/pandas.Panel.notna,../reference/api/pandas.Panel.notna +generated/pandas.Panel.notnull,../reference/api/pandas.Panel.notnull +generated/pandas.Panel.pct_change,../reference/api/pandas.Panel.pct_change +generated/pandas.Panel.pipe,../reference/api/pandas.Panel.pipe +generated/pandas.Panel.pop,../reference/api/pandas.Panel.pop +generated/pandas.Panel.pow,../reference/api/pandas.Panel.pow +generated/pandas.Panel.prod,../reference/api/pandas.Panel.prod +generated/pandas.Panel.product,../reference/api/pandas.Panel.product +generated/pandas.Panel.radd,../reference/api/pandas.Panel.radd +generated/pandas.Panel.rank,../reference/api/pandas.Panel.rank +generated/pandas.Panel.rdiv,../reference/api/pandas.Panel.rdiv +generated/pandas.Panel.reindex_axis,../reference/api/pandas.Panel.reindex_axis +generated/pandas.Panel.reindex,../reference/api/pandas.Panel.reindex +generated/pandas.Panel.reindex_like,../reference/api/pandas.Panel.reindex_like +generated/pandas.Panel.rename_axis,../reference/api/pandas.Panel.rename_axis +generated/pandas.Panel.rename,../reference/api/pandas.Panel.rename +generated/pandas.Panel.replace,../reference/api/pandas.Panel.replace +generated/pandas.Panel.resample,../reference/api/pandas.Panel.resample +generated/pandas.Panel.rfloordiv,../reference/api/pandas.Panel.rfloordiv +generated/pandas.Panel.rmod,../reference/api/pandas.Panel.rmod +generated/pandas.Panel.rmul,../reference/api/pandas.Panel.rmul +generated/pandas.Panel.round,../reference/api/pandas.Panel.round +generated/pandas.Panel.rpow,../reference/api/pandas.Panel.rpow +generated/pandas.Panel.rsub,../reference/api/pandas.Panel.rsub +generated/pandas.Panel.rtruediv,../reference/api/pandas.Panel.rtruediv +generated/pandas.Panel.sample,../reference/api/pandas.Panel.sample +generated/pandas.Panel.select,../reference/api/pandas.Panel.select +generated/pandas.Panel.sem,../reference/api/pandas.Panel.sem +generated/pandas.Panel.set_axis,../reference/api/pandas.Panel.set_axis +generated/pandas.Panel.set_value,../reference/api/pandas.Panel.set_value +generated/pandas.Panel.shape,../reference/api/pandas.Panel.shape +generated/pandas.Panel.shift,../reference/api/pandas.Panel.shift +generated/pandas.Panel.size,../reference/api/pandas.Panel.size +generated/pandas.Panel.skew,../reference/api/pandas.Panel.skew +generated/pandas.Panel.slice_shift,../reference/api/pandas.Panel.slice_shift +generated/pandas.Panel.sort_index,../reference/api/pandas.Panel.sort_index +generated/pandas.Panel.sort_values,../reference/api/pandas.Panel.sort_values +generated/pandas.Panel.squeeze,../reference/api/pandas.Panel.squeeze +generated/pandas.Panel.std,../reference/api/pandas.Panel.std +generated/pandas.Panel.sub,../reference/api/pandas.Panel.sub +generated/pandas.Panel.subtract,../reference/api/pandas.Panel.subtract +generated/pandas.Panel.sum,../reference/api/pandas.Panel.sum +generated/pandas.Panel.swapaxes,../reference/api/pandas.Panel.swapaxes +generated/pandas.Panel.swaplevel,../reference/api/pandas.Panel.swaplevel +generated/pandas.Panel.tail,../reference/api/pandas.Panel.tail +generated/pandas.Panel.take,../reference/api/pandas.Panel.take +generated/pandas.Panel.timetuple,../reference/api/pandas.Panel.timetuple +generated/pandas.Panel.to_clipboard,../reference/api/pandas.Panel.to_clipboard +generated/pandas.Panel.to_csv,../reference/api/pandas.Panel.to_csv +generated/pandas.Panel.to_dense,../reference/api/pandas.Panel.to_dense +generated/pandas.Panel.to_excel,../reference/api/pandas.Panel.to_excel +generated/pandas.Panel.to_frame,../reference/api/pandas.Panel.to_frame +generated/pandas.Panel.to_hdf,../reference/api/pandas.Panel.to_hdf +generated/pandas.Panel.to_json,../reference/api/pandas.Panel.to_json +generated/pandas.Panel.to_latex,../reference/api/pandas.Panel.to_latex +generated/pandas.Panel.to_msgpack,../reference/api/pandas.Panel.to_msgpack +generated/pandas.Panel.to_pickle,../reference/api/pandas.Panel.to_pickle +generated/pandas.Panel.to_sparse,../reference/api/pandas.Panel.to_sparse +generated/pandas.Panel.to_sql,../reference/api/pandas.Panel.to_sql +generated/pandas.Panel.to_xarray,../reference/api/pandas.Panel.to_xarray +generated/pandas.Panel.transform,../reference/api/pandas.Panel.transform +generated/pandas.Panel.transpose,../reference/api/pandas.Panel.transpose +generated/pandas.Panel.truediv,../reference/api/pandas.Panel.truediv +generated/pandas.Panel.truncate,../reference/api/pandas.Panel.truncate +generated/pandas.Panel.tshift,../reference/api/pandas.Panel.tshift +generated/pandas.Panel.tz_convert,../reference/api/pandas.Panel.tz_convert +generated/pandas.Panel.tz_localize,../reference/api/pandas.Panel.tz_localize +generated/pandas.Panel.update,../reference/api/pandas.Panel.update +generated/pandas.Panel.values,../reference/api/pandas.Panel.values +generated/pandas.Panel.var,../reference/api/pandas.Panel.var +generated/pandas.Panel.where,../reference/api/pandas.Panel.where +generated/pandas.Panel.xs,../reference/api/pandas.Panel.xs +generated/pandas.Period.asfreq,../reference/api/pandas.Period.asfreq +generated/pandas.Period.day,../reference/api/pandas.Period.day +generated/pandas.Period.dayofweek,../reference/api/pandas.Period.dayofweek +generated/pandas.Period.dayofyear,../reference/api/pandas.Period.dayofyear +generated/pandas.Period.days_in_month,../reference/api/pandas.Period.days_in_month +generated/pandas.Period.daysinmonth,../reference/api/pandas.Period.daysinmonth +generated/pandas.Period.end_time,../reference/api/pandas.Period.end_time +generated/pandas.Period.freq,../reference/api/pandas.Period.freq +generated/pandas.Period.freqstr,../reference/api/pandas.Period.freqstr +generated/pandas.Period.hour,../reference/api/pandas.Period.hour +generated/pandas.Period,../reference/api/pandas.Period +generated/pandas.PeriodIndex.asfreq,../reference/api/pandas.PeriodIndex.asfreq +generated/pandas.PeriodIndex.day,../reference/api/pandas.PeriodIndex.day +generated/pandas.PeriodIndex.dayofweek,../reference/api/pandas.PeriodIndex.dayofweek +generated/pandas.PeriodIndex.dayofyear,../reference/api/pandas.PeriodIndex.dayofyear +generated/pandas.PeriodIndex.days_in_month,../reference/api/pandas.PeriodIndex.days_in_month +generated/pandas.PeriodIndex.daysinmonth,../reference/api/pandas.PeriodIndex.daysinmonth +generated/pandas.PeriodIndex.end_time,../reference/api/pandas.PeriodIndex.end_time +generated/pandas.PeriodIndex.freq,../reference/api/pandas.PeriodIndex.freq +generated/pandas.PeriodIndex.freqstr,../reference/api/pandas.PeriodIndex.freqstr +generated/pandas.PeriodIndex.hour,../reference/api/pandas.PeriodIndex.hour +generated/pandas.PeriodIndex,../reference/api/pandas.PeriodIndex +generated/pandas.PeriodIndex.is_leap_year,../reference/api/pandas.PeriodIndex.is_leap_year +generated/pandas.PeriodIndex.minute,../reference/api/pandas.PeriodIndex.minute +generated/pandas.PeriodIndex.month,../reference/api/pandas.PeriodIndex.month +generated/pandas.PeriodIndex.quarter,../reference/api/pandas.PeriodIndex.quarter +generated/pandas.PeriodIndex.qyear,../reference/api/pandas.PeriodIndex.qyear +generated/pandas.PeriodIndex.second,../reference/api/pandas.PeriodIndex.second +generated/pandas.PeriodIndex.start_time,../reference/api/pandas.PeriodIndex.start_time +generated/pandas.PeriodIndex.strftime,../reference/api/pandas.PeriodIndex.strftime +generated/pandas.PeriodIndex.to_timestamp,../reference/api/pandas.PeriodIndex.to_timestamp +generated/pandas.PeriodIndex.weekday,../reference/api/pandas.PeriodIndex.weekday +generated/pandas.PeriodIndex.week,../reference/api/pandas.PeriodIndex.week +generated/pandas.PeriodIndex.weekofyear,../reference/api/pandas.PeriodIndex.weekofyear +generated/pandas.PeriodIndex.year,../reference/api/pandas.PeriodIndex.year +generated/pandas.Period.is_leap_year,../reference/api/pandas.Period.is_leap_year +generated/pandas.Period.minute,../reference/api/pandas.Period.minute +generated/pandas.Period.month,../reference/api/pandas.Period.month +generated/pandas.Period.now,../reference/api/pandas.Period.now +generated/pandas.Period.ordinal,../reference/api/pandas.Period.ordinal +generated/pandas.Period.quarter,../reference/api/pandas.Period.quarter +generated/pandas.Period.qyear,../reference/api/pandas.Period.qyear +generated/pandas.period_range,../reference/api/pandas.period_range +generated/pandas.Period.second,../reference/api/pandas.Period.second +generated/pandas.Period.start_time,../reference/api/pandas.Period.start_time +generated/pandas.Period.strftime,../reference/api/pandas.Period.strftime +generated/pandas.Period.to_timestamp,../reference/api/pandas.Period.to_timestamp +generated/pandas.Period.weekday,../reference/api/pandas.Period.weekday +generated/pandas.Period.week,../reference/api/pandas.Period.week +generated/pandas.Period.weekofyear,../reference/api/pandas.Period.weekofyear +generated/pandas.Period.year,../reference/api/pandas.Period.year +generated/pandas.pivot,../reference/api/pandas.pivot +generated/pandas.pivot_table,../reference/api/pandas.pivot_table +generated/pandas.plotting.andrews_curves,../reference/api/pandas.plotting.andrews_curves +generated/pandas.plotting.bootstrap_plot,../reference/api/pandas.plotting.bootstrap_plot +generated/pandas.plotting.deregister_matplotlib_converters,../reference/api/pandas.plotting.deregister_matplotlib_converters +generated/pandas.plotting.lag_plot,../reference/api/pandas.plotting.lag_plot +generated/pandas.plotting.parallel_coordinates,../reference/api/pandas.plotting.parallel_coordinates +generated/pandas.plotting.radviz,../reference/api/pandas.plotting.radviz +generated/pandas.plotting.register_matplotlib_converters,../reference/api/pandas.plotting.register_matplotlib_converters +generated/pandas.plotting.scatter_matrix,../reference/api/pandas.plotting.scatter_matrix +generated/pandas.qcut,../reference/api/pandas.qcut +generated/pandas.RangeIndex.from_range,../reference/api/pandas.RangeIndex.from_range +generated/pandas.RangeIndex,../reference/api/pandas.RangeIndex +generated/pandas.read_clipboard,../reference/api/pandas.read_clipboard +generated/pandas.read_csv,../reference/api/pandas.read_csv +generated/pandas.read_excel,../reference/api/pandas.read_excel +generated/pandas.read_feather,../reference/api/pandas.read_feather +generated/pandas.read_fwf,../reference/api/pandas.read_fwf +generated/pandas.read_gbq,../reference/api/pandas.read_gbq +generated/pandas.read_hdf,../reference/api/pandas.read_hdf +generated/pandas.read,../reference/api/pandas.read +generated/pandas.read_json,../reference/api/pandas.read_json +generated/pandas.read_msgpack,../reference/api/pandas.read_msgpack +generated/pandas.read_parquet,../reference/api/pandas.read_parquet +generated/pandas.read_pickle,../reference/api/pandas.read_pickle +generated/pandas.read_sas,../reference/api/pandas.read_sas +generated/pandas.read_sql,../reference/api/pandas.read_sql +generated/pandas.read_sql_query,../reference/api/pandas.read_sql_query +generated/pandas.read_sql_table,../reference/api/pandas.read_sql_table +generated/pandas.read_stata,../reference/api/pandas.read_stata +generated/pandas.read_table,../reference/api/pandas.read_table +generated/pandas.reset_option,../reference/api/pandas.reset_option +generated/pandas.Series.abs,../reference/api/pandas.Series.abs +generated/pandas.Series.add,../reference/api/pandas.Series.add +generated/pandas.Series.add_prefix,../reference/api/pandas.Series.add_prefix +generated/pandas.Series.add_suffix,../reference/api/pandas.Series.add_suffix +generated/pandas.Series.agg,../reference/api/pandas.Series.agg +generated/pandas.Series.aggregate,../reference/api/pandas.Series.aggregate +generated/pandas.Series.align,../reference/api/pandas.Series.align +generated/pandas.Series.all,../reference/api/pandas.Series.all +generated/pandas.Series.any,../reference/api/pandas.Series.any +generated/pandas.Series.append,../reference/api/pandas.Series.append +generated/pandas.Series.apply,../reference/api/pandas.Series.apply +generated/pandas.Series.argmax,../reference/api/pandas.Series.argmax +generated/pandas.Series.argmin,../reference/api/pandas.Series.argmin +generated/pandas.Series.argsort,../reference/api/pandas.Series.argsort +generated/pandas.Series.__array__,../reference/api/pandas.Series.__array__ +generated/pandas.Series.array,../reference/api/pandas.Series.array +generated/pandas.Series.as_blocks,../reference/api/pandas.Series.as_blocks +generated/pandas.Series.asfreq,../reference/api/pandas.Series.asfreq +generated/pandas.Series.as_matrix,../reference/api/pandas.Series.as_matrix +generated/pandas.Series.asobject,../reference/api/pandas.Series.asobject +generated/pandas.Series.asof,../reference/api/pandas.Series.asof +generated/pandas.Series.astype,../reference/api/pandas.Series.astype +generated/pandas.Series.at,../reference/api/pandas.Series.at +generated/pandas.Series.at_time,../reference/api/pandas.Series.at_time +generated/pandas.Series.autocorr,../reference/api/pandas.Series.autocorr +generated/pandas.Series.axes,../reference/api/pandas.Series.axes +generated/pandas.Series.base,../reference/api/pandas.Series.base +generated/pandas.Series.between,../reference/api/pandas.Series.between +generated/pandas.Series.between_time,../reference/api/pandas.Series.between_time +generated/pandas.Series.bfill,../reference/api/pandas.Series.bfill +generated/pandas.Series.blocks,../reference/api/pandas.Series.blocks +generated/pandas.Series.bool,../reference/api/pandas.Series.bool +generated/pandas.Series.cat.add_categories,../reference/api/pandas.Series.cat.add_categories +generated/pandas.Series.cat.as_ordered,../reference/api/pandas.Series.cat.as_ordered +generated/pandas.Series.cat.as_unordered,../reference/api/pandas.Series.cat.as_unordered +generated/pandas.Series.cat.categories,../reference/api/pandas.Series.cat.categories +generated/pandas.Series.cat.codes,../reference/api/pandas.Series.cat.codes +generated/pandas.Series.cat,../reference/api/pandas.Series.cat +generated/pandas.Series.cat.ordered,../reference/api/pandas.Series.cat.ordered +generated/pandas.Series.cat.remove_categories,../reference/api/pandas.Series.cat.remove_categories +generated/pandas.Series.cat.remove_unused_categories,../reference/api/pandas.Series.cat.remove_unused_categories +generated/pandas.Series.cat.rename_categories,../reference/api/pandas.Series.cat.rename_categories +generated/pandas.Series.cat.reorder_categories,../reference/api/pandas.Series.cat.reorder_categories +generated/pandas.Series.cat.set_categories,../reference/api/pandas.Series.cat.set_categories +generated/pandas.Series.clip,../reference/api/pandas.Series.clip +generated/pandas.Series.clip_lower,../reference/api/pandas.Series.clip_lower +generated/pandas.Series.clip_upper,../reference/api/pandas.Series.clip_upper +generated/pandas.Series.combine_first,../reference/api/pandas.Series.combine_first +generated/pandas.Series.combine,../reference/api/pandas.Series.combine +generated/pandas.Series.compound,../reference/api/pandas.Series.compound +generated/pandas.Series.compress,../reference/api/pandas.Series.compress +generated/pandas.Series.convert_objects,../reference/api/pandas.Series.convert_objects +generated/pandas.Series.copy,../reference/api/pandas.Series.copy +generated/pandas.Series.corr,../reference/api/pandas.Series.corr +generated/pandas.Series.count,../reference/api/pandas.Series.count +generated/pandas.Series.cov,../reference/api/pandas.Series.cov +generated/pandas.Series.cummax,../reference/api/pandas.Series.cummax +generated/pandas.Series.cummin,../reference/api/pandas.Series.cummin +generated/pandas.Series.cumprod,../reference/api/pandas.Series.cumprod +generated/pandas.Series.cumsum,../reference/api/pandas.Series.cumsum +generated/pandas.Series.data,../reference/api/pandas.Series.data +generated/pandas.Series.describe,../reference/api/pandas.Series.describe +generated/pandas.Series.diff,../reference/api/pandas.Series.diff +generated/pandas.Series.div,../reference/api/pandas.Series.div +generated/pandas.Series.divide,../reference/api/pandas.Series.divide +generated/pandas.Series.divmod,../reference/api/pandas.Series.divmod +generated/pandas.Series.dot,../reference/api/pandas.Series.dot +generated/pandas.Series.drop_duplicates,../reference/api/pandas.Series.drop_duplicates +generated/pandas.Series.drop,../reference/api/pandas.Series.drop +generated/pandas.Series.droplevel,../reference/api/pandas.Series.droplevel +generated/pandas.Series.dropna,../reference/api/pandas.Series.dropna +generated/pandas.Series.dt.ceil,../reference/api/pandas.Series.dt.ceil +generated/pandas.Series.dt.components,../reference/api/pandas.Series.dt.components +generated/pandas.Series.dt.date,../reference/api/pandas.Series.dt.date +generated/pandas.Series.dt.day,../reference/api/pandas.Series.dt.day +generated/pandas.Series.dt.day_name,../reference/api/pandas.Series.dt.day_name +generated/pandas.Series.dt.dayofweek,../reference/api/pandas.Series.dt.dayofweek +generated/pandas.Series.dt.dayofyear,../reference/api/pandas.Series.dt.dayofyear +generated/pandas.Series.dt.days,../reference/api/pandas.Series.dt.days +generated/pandas.Series.dt.days_in_month,../reference/api/pandas.Series.dt.days_in_month +generated/pandas.Series.dt.daysinmonth,../reference/api/pandas.Series.dt.daysinmonth +generated/pandas.Series.dt.end_time,../reference/api/pandas.Series.dt.end_time +generated/pandas.Series.dt.floor,../reference/api/pandas.Series.dt.floor +generated/pandas.Series.dt.freq,../reference/api/pandas.Series.dt.freq +generated/pandas.Series.dt.hour,../reference/api/pandas.Series.dt.hour +generated/pandas.Series.dt,../reference/api/pandas.Series.dt +generated/pandas.Series.dt.is_leap_year,../reference/api/pandas.Series.dt.is_leap_year +generated/pandas.Series.dt.is_month_end,../reference/api/pandas.Series.dt.is_month_end +generated/pandas.Series.dt.is_month_start,../reference/api/pandas.Series.dt.is_month_start +generated/pandas.Series.dt.is_quarter_end,../reference/api/pandas.Series.dt.is_quarter_end +generated/pandas.Series.dt.is_quarter_start,../reference/api/pandas.Series.dt.is_quarter_start +generated/pandas.Series.dt.is_year_end,../reference/api/pandas.Series.dt.is_year_end +generated/pandas.Series.dt.is_year_start,../reference/api/pandas.Series.dt.is_year_start +generated/pandas.Series.dt.microsecond,../reference/api/pandas.Series.dt.microsecond +generated/pandas.Series.dt.microseconds,../reference/api/pandas.Series.dt.microseconds +generated/pandas.Series.dt.minute,../reference/api/pandas.Series.dt.minute +generated/pandas.Series.dt.month,../reference/api/pandas.Series.dt.month +generated/pandas.Series.dt.month_name,../reference/api/pandas.Series.dt.month_name +generated/pandas.Series.dt.nanosecond,../reference/api/pandas.Series.dt.nanosecond +generated/pandas.Series.dt.nanoseconds,../reference/api/pandas.Series.dt.nanoseconds +generated/pandas.Series.dt.normalize,../reference/api/pandas.Series.dt.normalize +generated/pandas.Series.dt.quarter,../reference/api/pandas.Series.dt.quarter +generated/pandas.Series.dt.qyear,../reference/api/pandas.Series.dt.qyear +generated/pandas.Series.dt.round,../reference/api/pandas.Series.dt.round +generated/pandas.Series.dt.second,../reference/api/pandas.Series.dt.second +generated/pandas.Series.dt.seconds,../reference/api/pandas.Series.dt.seconds +generated/pandas.Series.dt.start_time,../reference/api/pandas.Series.dt.start_time +generated/pandas.Series.dt.strftime,../reference/api/pandas.Series.dt.strftime +generated/pandas.Series.dt.time,../reference/api/pandas.Series.dt.time +generated/pandas.Series.dt.timetz,../reference/api/pandas.Series.dt.timetz +generated/pandas.Series.dt.to_period,../reference/api/pandas.Series.dt.to_period +generated/pandas.Series.dt.to_pydatetime,../reference/api/pandas.Series.dt.to_pydatetime +generated/pandas.Series.dt.to_pytimedelta,../reference/api/pandas.Series.dt.to_pytimedelta +generated/pandas.Series.dt.total_seconds,../reference/api/pandas.Series.dt.total_seconds +generated/pandas.Series.dt.tz_convert,../reference/api/pandas.Series.dt.tz_convert +generated/pandas.Series.dt.tz,../reference/api/pandas.Series.dt.tz +generated/pandas.Series.dt.tz_localize,../reference/api/pandas.Series.dt.tz_localize +generated/pandas.Series.dt.weekday,../reference/api/pandas.Series.dt.weekday +generated/pandas.Series.dt.week,../reference/api/pandas.Series.dt.week +generated/pandas.Series.dt.weekofyear,../reference/api/pandas.Series.dt.weekofyear +generated/pandas.Series.dt.year,../reference/api/pandas.Series.dt.year +generated/pandas.Series.dtype,../reference/api/pandas.Series.dtype +generated/pandas.Series.dtypes,../reference/api/pandas.Series.dtypes +generated/pandas.Series.duplicated,../reference/api/pandas.Series.duplicated +generated/pandas.Series.empty,../reference/api/pandas.Series.empty +generated/pandas.Series.eq,../reference/api/pandas.Series.eq +generated/pandas.Series.equals,../reference/api/pandas.Series.equals +generated/pandas.Series.ewm,../reference/api/pandas.Series.ewm +generated/pandas.Series.expanding,../reference/api/pandas.Series.expanding +generated/pandas.Series.factorize,../reference/api/pandas.Series.factorize +generated/pandas.Series.ffill,../reference/api/pandas.Series.ffill +generated/pandas.Series.fillna,../reference/api/pandas.Series.fillna +generated/pandas.Series.filter,../reference/api/pandas.Series.filter +generated/pandas.Series.first,../reference/api/pandas.Series.first +generated/pandas.Series.first_valid_index,../reference/api/pandas.Series.first_valid_index +generated/pandas.Series.flags,../reference/api/pandas.Series.flags +generated/pandas.Series.floordiv,../reference/api/pandas.Series.floordiv +generated/pandas.Series.from_array,../reference/api/pandas.Series.from_array +generated/pandas.Series.from_csv,../reference/api/pandas.Series.from_csv +generated/pandas.Series.ftype,../reference/api/pandas.Series.ftype +generated/pandas.Series.ftypes,../reference/api/pandas.Series.ftypes +generated/pandas.Series.ge,../reference/api/pandas.Series.ge +generated/pandas.Series.get_dtype_counts,../reference/api/pandas.Series.get_dtype_counts +generated/pandas.Series.get_ftype_counts,../reference/api/pandas.Series.get_ftype_counts +generated/pandas.Series.get,../reference/api/pandas.Series.get +generated/pandas.Series.get_value,../reference/api/pandas.Series.get_value +generated/pandas.Series.get_values,../reference/api/pandas.Series.get_values +generated/pandas.Series.groupby,../reference/api/pandas.Series.groupby +generated/pandas.Series.gt,../reference/api/pandas.Series.gt +generated/pandas.Series.hasnans,../reference/api/pandas.Series.hasnans +generated/pandas.Series.head,../reference/api/pandas.Series.head +generated/pandas.Series.hist,../reference/api/pandas.Series.hist +generated/pandas.Series,../reference/api/pandas.Series +generated/pandas.Series.iat,../reference/api/pandas.Series.iat +generated/pandas.Series.idxmax,../reference/api/pandas.Series.idxmax +generated/pandas.Series.idxmin,../reference/api/pandas.Series.idxmin +generated/pandas.Series.iloc,../reference/api/pandas.Series.iloc +generated/pandas.Series.imag,../reference/api/pandas.Series.imag +generated/pandas.Series.index,../reference/api/pandas.Series.index +generated/pandas.Series.infer_objects,../reference/api/pandas.Series.infer_objects +generated/pandas.Series.interpolate,../reference/api/pandas.Series.interpolate +generated/pandas.Series.is_copy,../reference/api/pandas.Series.is_copy +generated/pandas.Series.isin,../reference/api/pandas.Series.isin +generated/pandas.Series.is_monotonic_decreasing,../reference/api/pandas.Series.is_monotonic_decreasing +generated/pandas.Series.is_monotonic,../reference/api/pandas.Series.is_monotonic +generated/pandas.Series.is_monotonic_increasing,../reference/api/pandas.Series.is_monotonic_increasing +generated/pandas.Series.isna,../reference/api/pandas.Series.isna +generated/pandas.Series.isnull,../reference/api/pandas.Series.isnull +generated/pandas.Series.is_unique,../reference/api/pandas.Series.is_unique +generated/pandas.Series.item,../reference/api/pandas.Series.item +generated/pandas.Series.items,../reference/api/pandas.Series.items +generated/pandas.Series.itemsize,../reference/api/pandas.Series.itemsize +generated/pandas.Series.__iter__,../reference/api/pandas.Series.__iter__ +generated/pandas.Series.iteritems,../reference/api/pandas.Series.iteritems +generated/pandas.Series.ix,../reference/api/pandas.Series.ix +generated/pandas.Series.keys,../reference/api/pandas.Series.keys +generated/pandas.Series.kurt,../reference/api/pandas.Series.kurt +generated/pandas.Series.kurtosis,../reference/api/pandas.Series.kurtosis +generated/pandas.Series.last,../reference/api/pandas.Series.last +generated/pandas.Series.last_valid_index,../reference/api/pandas.Series.last_valid_index +generated/pandas.Series.le,../reference/api/pandas.Series.le +generated/pandas.Series.loc,../reference/api/pandas.Series.loc +generated/pandas.Series.lt,../reference/api/pandas.Series.lt +generated/pandas.Series.mad,../reference/api/pandas.Series.mad +generated/pandas.Series.map,../reference/api/pandas.Series.map +generated/pandas.Series.mask,../reference/api/pandas.Series.mask +generated/pandas.Series.max,../reference/api/pandas.Series.max +generated/pandas.Series.mean,../reference/api/pandas.Series.mean +generated/pandas.Series.median,../reference/api/pandas.Series.median +generated/pandas.Series.memory_usage,../reference/api/pandas.Series.memory_usage +generated/pandas.Series.min,../reference/api/pandas.Series.min +generated/pandas.Series.mode,../reference/api/pandas.Series.mode +generated/pandas.Series.mod,../reference/api/pandas.Series.mod +generated/pandas.Series.mul,../reference/api/pandas.Series.mul +generated/pandas.Series.multiply,../reference/api/pandas.Series.multiply +generated/pandas.Series.name,../reference/api/pandas.Series.name +generated/pandas.Series.nbytes,../reference/api/pandas.Series.nbytes +generated/pandas.Series.ndim,../reference/api/pandas.Series.ndim +generated/pandas.Series.ne,../reference/api/pandas.Series.ne +generated/pandas.Series.nlargest,../reference/api/pandas.Series.nlargest +generated/pandas.Series.nonzero,../reference/api/pandas.Series.nonzero +generated/pandas.Series.notna,../reference/api/pandas.Series.notna +generated/pandas.Series.notnull,../reference/api/pandas.Series.notnull +generated/pandas.Series.nsmallest,../reference/api/pandas.Series.nsmallest +generated/pandas.Series.nunique,../reference/api/pandas.Series.nunique +generated/pandas.Series.pct_change,../reference/api/pandas.Series.pct_change +generated/pandas.Series.pipe,../reference/api/pandas.Series.pipe +generated/pandas.Series.plot.area,../reference/api/pandas.Series.plot.area +generated/pandas.Series.plot.barh,../reference/api/pandas.Series.plot.barh +generated/pandas.Series.plot.bar,../reference/api/pandas.Series.plot.bar +generated/pandas.Series.plot.box,../reference/api/pandas.Series.plot.box +generated/pandas.Series.plot.density,../reference/api/pandas.Series.plot.density +generated/pandas.Series.plot.hist,../reference/api/pandas.Series.plot.hist +generated/pandas.Series.plot,../reference/api/pandas.Series.plot +generated/pandas.Series.plot.kde,../reference/api/pandas.Series.plot.kde +generated/pandas.Series.plot.line,../reference/api/pandas.Series.plot.line +generated/pandas.Series.plot.pie,../reference/api/pandas.Series.plot.pie +generated/pandas.Series.pop,../reference/api/pandas.Series.pop +generated/pandas.Series.pow,../reference/api/pandas.Series.pow +generated/pandas.Series.prod,../reference/api/pandas.Series.prod +generated/pandas.Series.product,../reference/api/pandas.Series.product +generated/pandas.Series.ptp,../reference/api/pandas.Series.ptp +generated/pandas.Series.put,../reference/api/pandas.Series.put +generated/pandas.Series.quantile,../reference/api/pandas.Series.quantile +generated/pandas.Series.radd,../reference/api/pandas.Series.radd +generated/pandas.Series.rank,../reference/api/pandas.Series.rank +generated/pandas.Series.ravel,../reference/api/pandas.Series.ravel +generated/pandas.Series.rdiv,../reference/api/pandas.Series.rdiv +generated/pandas.Series.rdivmod,../reference/api/pandas.Series.rdivmod +generated/pandas.Series.real,../reference/api/pandas.Series.real +generated/pandas.Series.reindex_axis,../reference/api/pandas.Series.reindex_axis +generated/pandas.Series.reindex,../reference/api/pandas.Series.reindex +generated/pandas.Series.reindex_like,../reference/api/pandas.Series.reindex_like +generated/pandas.Series.rename_axis,../reference/api/pandas.Series.rename_axis +generated/pandas.Series.rename,../reference/api/pandas.Series.rename +generated/pandas.Series.reorder_levels,../reference/api/pandas.Series.reorder_levels +generated/pandas.Series.repeat,../reference/api/pandas.Series.repeat +generated/pandas.Series.replace,../reference/api/pandas.Series.replace +generated/pandas.Series.resample,../reference/api/pandas.Series.resample +generated/pandas.Series.reset_index,../reference/api/pandas.Series.reset_index +generated/pandas.Series.rfloordiv,../reference/api/pandas.Series.rfloordiv +generated/pandas.Series.rmod,../reference/api/pandas.Series.rmod +generated/pandas.Series.rmul,../reference/api/pandas.Series.rmul +generated/pandas.Series.rolling,../reference/api/pandas.Series.rolling +generated/pandas.Series.round,../reference/api/pandas.Series.round +generated/pandas.Series.rpow,../reference/api/pandas.Series.rpow +generated/pandas.Series.rsub,../reference/api/pandas.Series.rsub +generated/pandas.Series.rtruediv,../reference/api/pandas.Series.rtruediv +generated/pandas.Series.sample,../reference/api/pandas.Series.sample +generated/pandas.Series.searchsorted,../reference/api/pandas.Series.searchsorted +generated/pandas.Series.select,../reference/api/pandas.Series.select +generated/pandas.Series.sem,../reference/api/pandas.Series.sem +generated/pandas.Series.set_axis,../reference/api/pandas.Series.set_axis +generated/pandas.Series.set_value,../reference/api/pandas.Series.set_value +generated/pandas.Series.shape,../reference/api/pandas.Series.shape +generated/pandas.Series.shift,../reference/api/pandas.Series.shift +generated/pandas.Series.size,../reference/api/pandas.Series.size +generated/pandas.Series.skew,../reference/api/pandas.Series.skew +generated/pandas.Series.slice_shift,../reference/api/pandas.Series.slice_shift +generated/pandas.Series.sort_index,../reference/api/pandas.Series.sort_index +generated/pandas.Series.sort_values,../reference/api/pandas.Series.sort_values +generated/pandas.Series.sparse.density,../reference/api/pandas.Series.sparse.density +generated/pandas.Series.sparse.fill_value,../reference/api/pandas.Series.sparse.fill_value +generated/pandas.Series.sparse.from_coo,../reference/api/pandas.Series.sparse.from_coo +generated/pandas.Series.sparse.npoints,../reference/api/pandas.Series.sparse.npoints +generated/pandas.Series.sparse.sp_values,../reference/api/pandas.Series.sparse.sp_values +generated/pandas.Series.sparse.to_coo,../reference/api/pandas.Series.sparse.to_coo +generated/pandas.Series.squeeze,../reference/api/pandas.Series.squeeze +generated/pandas.Series.std,../reference/api/pandas.Series.std +generated/pandas.Series.str.capitalize,../reference/api/pandas.Series.str.capitalize +generated/pandas.Series.str.cat,../reference/api/pandas.Series.str.cat +generated/pandas.Series.str.center,../reference/api/pandas.Series.str.center +generated/pandas.Series.str.contains,../reference/api/pandas.Series.str.contains +generated/pandas.Series.str.count,../reference/api/pandas.Series.str.count +generated/pandas.Series.str.decode,../reference/api/pandas.Series.str.decode +generated/pandas.Series.str.encode,../reference/api/pandas.Series.str.encode +generated/pandas.Series.str.endswith,../reference/api/pandas.Series.str.endswith +generated/pandas.Series.str.extractall,../reference/api/pandas.Series.str.extractall +generated/pandas.Series.str.extract,../reference/api/pandas.Series.str.extract +generated/pandas.Series.str.findall,../reference/api/pandas.Series.str.findall +generated/pandas.Series.str.find,../reference/api/pandas.Series.str.find +generated/pandas.Series.str.get_dummies,../reference/api/pandas.Series.str.get_dummies +generated/pandas.Series.str.get,../reference/api/pandas.Series.str.get +generated/pandas.Series.str,../reference/api/pandas.Series.str +generated/pandas.Series.strides,../reference/api/pandas.Series.strides +generated/pandas.Series.str.index,../reference/api/pandas.Series.str.index +generated/pandas.Series.str.isalnum,../reference/api/pandas.Series.str.isalnum +generated/pandas.Series.str.isalpha,../reference/api/pandas.Series.str.isalpha +generated/pandas.Series.str.isdecimal,../reference/api/pandas.Series.str.isdecimal +generated/pandas.Series.str.isdigit,../reference/api/pandas.Series.str.isdigit +generated/pandas.Series.str.islower,../reference/api/pandas.Series.str.islower +generated/pandas.Series.str.isnumeric,../reference/api/pandas.Series.str.isnumeric +generated/pandas.Series.str.isspace,../reference/api/pandas.Series.str.isspace +generated/pandas.Series.str.istitle,../reference/api/pandas.Series.str.istitle +generated/pandas.Series.str.isupper,../reference/api/pandas.Series.str.isupper +generated/pandas.Series.str.join,../reference/api/pandas.Series.str.join +generated/pandas.Series.str.len,../reference/api/pandas.Series.str.len +generated/pandas.Series.str.ljust,../reference/api/pandas.Series.str.ljust +generated/pandas.Series.str.lower,../reference/api/pandas.Series.str.lower +generated/pandas.Series.str.lstrip,../reference/api/pandas.Series.str.lstrip +generated/pandas.Series.str.match,../reference/api/pandas.Series.str.match +generated/pandas.Series.str.normalize,../reference/api/pandas.Series.str.normalize +generated/pandas.Series.str.pad,../reference/api/pandas.Series.str.pad +generated/pandas.Series.str.partition,../reference/api/pandas.Series.str.partition +generated/pandas.Series.str.repeat,../reference/api/pandas.Series.str.repeat +generated/pandas.Series.str.replace,../reference/api/pandas.Series.str.replace +generated/pandas.Series.str.rfind,../reference/api/pandas.Series.str.rfind +generated/pandas.Series.str.rindex,../reference/api/pandas.Series.str.rindex +generated/pandas.Series.str.rjust,../reference/api/pandas.Series.str.rjust +generated/pandas.Series.str.rpartition,../reference/api/pandas.Series.str.rpartition +generated/pandas.Series.str.rsplit,../reference/api/pandas.Series.str.rsplit +generated/pandas.Series.str.rstrip,../reference/api/pandas.Series.str.rstrip +generated/pandas.Series.str.slice,../reference/api/pandas.Series.str.slice +generated/pandas.Series.str.slice_replace,../reference/api/pandas.Series.str.slice_replace +generated/pandas.Series.str.split,../reference/api/pandas.Series.str.split +generated/pandas.Series.str.startswith,../reference/api/pandas.Series.str.startswith +generated/pandas.Series.str.strip,../reference/api/pandas.Series.str.strip +generated/pandas.Series.str.swapcase,../reference/api/pandas.Series.str.swapcase +generated/pandas.Series.str.title,../reference/api/pandas.Series.str.title +generated/pandas.Series.str.translate,../reference/api/pandas.Series.str.translate +generated/pandas.Series.str.upper,../reference/api/pandas.Series.str.upper +generated/pandas.Series.str.wrap,../reference/api/pandas.Series.str.wrap +generated/pandas.Series.str.zfill,../reference/api/pandas.Series.str.zfill +generated/pandas.Series.sub,../reference/api/pandas.Series.sub +generated/pandas.Series.subtract,../reference/api/pandas.Series.subtract +generated/pandas.Series.sum,../reference/api/pandas.Series.sum +generated/pandas.Series.swapaxes,../reference/api/pandas.Series.swapaxes +generated/pandas.Series.swaplevel,../reference/api/pandas.Series.swaplevel +generated/pandas.Series.tail,../reference/api/pandas.Series.tail +generated/pandas.Series.take,../reference/api/pandas.Series.take +generated/pandas.Series.T,../reference/api/pandas.Series.T +generated/pandas.Series.timetuple,../reference/api/pandas.Series.timetuple +generated/pandas.Series.to_clipboard,../reference/api/pandas.Series.to_clipboard +generated/pandas.Series.to_csv,../reference/api/pandas.Series.to_csv +generated/pandas.Series.to_dense,../reference/api/pandas.Series.to_dense +generated/pandas.Series.to_dict,../reference/api/pandas.Series.to_dict +generated/pandas.Series.to_excel,../reference/api/pandas.Series.to_excel +generated/pandas.Series.to_frame,../reference/api/pandas.Series.to_frame +generated/pandas.Series.to_hdf,../reference/api/pandas.Series.to_hdf +generated/pandas.Series.to_json,../reference/api/pandas.Series.to_json +generated/pandas.Series.to_latex,../reference/api/pandas.Series.to_latex +generated/pandas.Series.to_list,../reference/api/pandas.Series.to_list +generated/pandas.Series.tolist,../reference/api/pandas.Series.tolist +generated/pandas.Series.to_msgpack,../reference/api/pandas.Series.to_msgpack +generated/pandas.Series.to_numpy,../reference/api/pandas.Series.to_numpy +generated/pandas.Series.to_period,../reference/api/pandas.Series.to_period +generated/pandas.Series.to_pickle,../reference/api/pandas.Series.to_pickle +generated/pandas.Series.to_sparse,../reference/api/pandas.Series.to_sparse +generated/pandas.Series.to_sql,../reference/api/pandas.Series.to_sql +generated/pandas.Series.to_string,../reference/api/pandas.Series.to_string +generated/pandas.Series.to_timestamp,../reference/api/pandas.Series.to_timestamp +generated/pandas.Series.to_xarray,../reference/api/pandas.Series.to_xarray +generated/pandas.Series.transform,../reference/api/pandas.Series.transform +generated/pandas.Series.transpose,../reference/api/pandas.Series.transpose +generated/pandas.Series.truediv,../reference/api/pandas.Series.truediv +generated/pandas.Series.truncate,../reference/api/pandas.Series.truncate +generated/pandas.Series.tshift,../reference/api/pandas.Series.tshift +generated/pandas.Series.tz_convert,../reference/api/pandas.Series.tz_convert +generated/pandas.Series.tz_localize,../reference/api/pandas.Series.tz_localize +generated/pandas.Series.unique,../reference/api/pandas.Series.unique +generated/pandas.Series.unstack,../reference/api/pandas.Series.unstack +generated/pandas.Series.update,../reference/api/pandas.Series.update +generated/pandas.Series.valid,../reference/api/pandas.Series.valid +generated/pandas.Series.value_counts,../reference/api/pandas.Series.value_counts +generated/pandas.Series.values,../reference/api/pandas.Series.values +generated/pandas.Series.var,../reference/api/pandas.Series.var +generated/pandas.Series.view,../reference/api/pandas.Series.view +generated/pandas.Series.where,../reference/api/pandas.Series.where +generated/pandas.Series.xs,../reference/api/pandas.Series.xs +generated/pandas.set_option,../reference/api/pandas.set_option +generated/pandas.SparseDataFrame.to_coo,../reference/api/pandas.SparseDataFrame.to_coo +generated/pandas.SparseSeries.from_coo,../reference/api/pandas.SparseSeries.from_coo +generated/pandas.SparseSeries.to_coo,../reference/api/pandas.SparseSeries.to_coo +generated/pandas.test,../reference/api/pandas.test +generated/pandas.testing.assert_frame_equal,../reference/api/pandas.testing.assert_frame_equal +generated/pandas.testing.assert_index_equal,../reference/api/pandas.testing.assert_index_equal +generated/pandas.testing.assert_series_equal,../reference/api/pandas.testing.assert_series_equal +generated/pandas.Timedelta.asm8,../reference/api/pandas.Timedelta.asm8 +generated/pandas.Timedelta.ceil,../reference/api/pandas.Timedelta.ceil +generated/pandas.Timedelta.components,../reference/api/pandas.Timedelta.components +generated/pandas.Timedelta.days,../reference/api/pandas.Timedelta.days +generated/pandas.Timedelta.delta,../reference/api/pandas.Timedelta.delta +generated/pandas.Timedelta.floor,../reference/api/pandas.Timedelta.floor +generated/pandas.Timedelta.freq,../reference/api/pandas.Timedelta.freq +generated/pandas.Timedelta,../reference/api/pandas.Timedelta +generated/pandas.TimedeltaIndex.ceil,../reference/api/pandas.TimedeltaIndex.ceil +generated/pandas.TimedeltaIndex.components,../reference/api/pandas.TimedeltaIndex.components +generated/pandas.TimedeltaIndex.days,../reference/api/pandas.TimedeltaIndex.days +generated/pandas.TimedeltaIndex.floor,../reference/api/pandas.TimedeltaIndex.floor +generated/pandas.TimedeltaIndex,../reference/api/pandas.TimedeltaIndex +generated/pandas.TimedeltaIndex.inferred_freq,../reference/api/pandas.TimedeltaIndex.inferred_freq +generated/pandas.TimedeltaIndex.microseconds,../reference/api/pandas.TimedeltaIndex.microseconds +generated/pandas.TimedeltaIndex.nanoseconds,../reference/api/pandas.TimedeltaIndex.nanoseconds +generated/pandas.TimedeltaIndex.round,../reference/api/pandas.TimedeltaIndex.round +generated/pandas.TimedeltaIndex.seconds,../reference/api/pandas.TimedeltaIndex.seconds +generated/pandas.TimedeltaIndex.to_frame,../reference/api/pandas.TimedeltaIndex.to_frame +generated/pandas.TimedeltaIndex.to_pytimedelta,../reference/api/pandas.TimedeltaIndex.to_pytimedelta +generated/pandas.TimedeltaIndex.to_series,../reference/api/pandas.TimedeltaIndex.to_series +generated/pandas.Timedelta.isoformat,../reference/api/pandas.Timedelta.isoformat +generated/pandas.Timedelta.is_populated,../reference/api/pandas.Timedelta.is_populated +generated/pandas.Timedelta.max,../reference/api/pandas.Timedelta.max +generated/pandas.Timedelta.microseconds,../reference/api/pandas.Timedelta.microseconds +generated/pandas.Timedelta.min,../reference/api/pandas.Timedelta.min +generated/pandas.Timedelta.nanoseconds,../reference/api/pandas.Timedelta.nanoseconds +generated/pandas.timedelta_range,../reference/api/pandas.timedelta_range +generated/pandas.Timedelta.resolution,../reference/api/pandas.Timedelta.resolution +generated/pandas.Timedelta.round,../reference/api/pandas.Timedelta.round +generated/pandas.Timedelta.seconds,../reference/api/pandas.Timedelta.seconds +generated/pandas.Timedelta.to_pytimedelta,../reference/api/pandas.Timedelta.to_pytimedelta +generated/pandas.Timedelta.total_seconds,../reference/api/pandas.Timedelta.total_seconds +generated/pandas.Timedelta.to_timedelta64,../reference/api/pandas.Timedelta.to_timedelta64 +generated/pandas.Timedelta.value,../reference/api/pandas.Timedelta.value +generated/pandas.Timedelta.view,../reference/api/pandas.Timedelta.view +generated/pandas.Timestamp.asm8,../reference/api/pandas.Timestamp.asm8 +generated/pandas.Timestamp.astimezone,../reference/api/pandas.Timestamp.astimezone +generated/pandas.Timestamp.ceil,../reference/api/pandas.Timestamp.ceil +generated/pandas.Timestamp.combine,../reference/api/pandas.Timestamp.combine +generated/pandas.Timestamp.ctime,../reference/api/pandas.Timestamp.ctime +generated/pandas.Timestamp.date,../reference/api/pandas.Timestamp.date +generated/pandas.Timestamp.day,../reference/api/pandas.Timestamp.day +generated/pandas.Timestamp.day_name,../reference/api/pandas.Timestamp.day_name +generated/pandas.Timestamp.dayofweek,../reference/api/pandas.Timestamp.dayofweek +generated/pandas.Timestamp.dayofyear,../reference/api/pandas.Timestamp.dayofyear +generated/pandas.Timestamp.days_in_month,../reference/api/pandas.Timestamp.days_in_month +generated/pandas.Timestamp.daysinmonth,../reference/api/pandas.Timestamp.daysinmonth +generated/pandas.Timestamp.dst,../reference/api/pandas.Timestamp.dst +generated/pandas.Timestamp.floor,../reference/api/pandas.Timestamp.floor +generated/pandas.Timestamp.fold,../reference/api/pandas.Timestamp.fold +generated/pandas.Timestamp.freq,../reference/api/pandas.Timestamp.freq +generated/pandas.Timestamp.freqstr,../reference/api/pandas.Timestamp.freqstr +generated/pandas.Timestamp.fromisoformat,../reference/api/pandas.Timestamp.fromisoformat +generated/pandas.Timestamp.fromordinal,../reference/api/pandas.Timestamp.fromordinal +generated/pandas.Timestamp.fromtimestamp,../reference/api/pandas.Timestamp.fromtimestamp +generated/pandas.Timestamp.hour,../reference/api/pandas.Timestamp.hour +generated/pandas.Timestamp,../reference/api/pandas.Timestamp +generated/pandas.Timestamp.is_leap_year,../reference/api/pandas.Timestamp.is_leap_year +generated/pandas.Timestamp.is_month_end,../reference/api/pandas.Timestamp.is_month_end +generated/pandas.Timestamp.is_month_start,../reference/api/pandas.Timestamp.is_month_start +generated/pandas.Timestamp.isocalendar,../reference/api/pandas.Timestamp.isocalendar +generated/pandas.Timestamp.isoformat,../reference/api/pandas.Timestamp.isoformat +generated/pandas.Timestamp.isoweekday,../reference/api/pandas.Timestamp.isoweekday +generated/pandas.Timestamp.is_quarter_end,../reference/api/pandas.Timestamp.is_quarter_end +generated/pandas.Timestamp.is_quarter_start,../reference/api/pandas.Timestamp.is_quarter_start +generated/pandas.Timestamp.is_year_end,../reference/api/pandas.Timestamp.is_year_end +generated/pandas.Timestamp.is_year_start,../reference/api/pandas.Timestamp.is_year_start +generated/pandas.Timestamp.max,../reference/api/pandas.Timestamp.max +generated/pandas.Timestamp.microsecond,../reference/api/pandas.Timestamp.microsecond +generated/pandas.Timestamp.min,../reference/api/pandas.Timestamp.min +generated/pandas.Timestamp.minute,../reference/api/pandas.Timestamp.minute +generated/pandas.Timestamp.month,../reference/api/pandas.Timestamp.month +generated/pandas.Timestamp.month_name,../reference/api/pandas.Timestamp.month_name +generated/pandas.Timestamp.nanosecond,../reference/api/pandas.Timestamp.nanosecond +generated/pandas.Timestamp.normalize,../reference/api/pandas.Timestamp.normalize +generated/pandas.Timestamp.now,../reference/api/pandas.Timestamp.now +generated/pandas.Timestamp.quarter,../reference/api/pandas.Timestamp.quarter +generated/pandas.Timestamp.replace,../reference/api/pandas.Timestamp.replace +generated/pandas.Timestamp.resolution,../reference/api/pandas.Timestamp.resolution +generated/pandas.Timestamp.round,../reference/api/pandas.Timestamp.round +generated/pandas.Timestamp.second,../reference/api/pandas.Timestamp.second +generated/pandas.Timestamp.strftime,../reference/api/pandas.Timestamp.strftime +generated/pandas.Timestamp.strptime,../reference/api/pandas.Timestamp.strptime +generated/pandas.Timestamp.time,../reference/api/pandas.Timestamp.time +generated/pandas.Timestamp.timestamp,../reference/api/pandas.Timestamp.timestamp +generated/pandas.Timestamp.timetuple,../reference/api/pandas.Timestamp.timetuple +generated/pandas.Timestamp.timetz,../reference/api/pandas.Timestamp.timetz +generated/pandas.Timestamp.to_datetime64,../reference/api/pandas.Timestamp.to_datetime64 +generated/pandas.Timestamp.today,../reference/api/pandas.Timestamp.today +generated/pandas.Timestamp.to_julian_date,../reference/api/pandas.Timestamp.to_julian_date +generated/pandas.Timestamp.toordinal,../reference/api/pandas.Timestamp.toordinal +generated/pandas.Timestamp.to_period,../reference/api/pandas.Timestamp.to_period +generated/pandas.Timestamp.to_pydatetime,../reference/api/pandas.Timestamp.to_pydatetime +generated/pandas.Timestamp.tz_convert,../reference/api/pandas.Timestamp.tz_convert +generated/pandas.Timestamp.tz,../reference/api/pandas.Timestamp.tz +generated/pandas.Timestamp.tzinfo,../reference/api/pandas.Timestamp.tzinfo +generated/pandas.Timestamp.tz_localize,../reference/api/pandas.Timestamp.tz_localize +generated/pandas.Timestamp.tzname,../reference/api/pandas.Timestamp.tzname +generated/pandas.Timestamp.utcfromtimestamp,../reference/api/pandas.Timestamp.utcfromtimestamp +generated/pandas.Timestamp.utcnow,../reference/api/pandas.Timestamp.utcnow +generated/pandas.Timestamp.utcoffset,../reference/api/pandas.Timestamp.utcoffset +generated/pandas.Timestamp.utctimetuple,../reference/api/pandas.Timestamp.utctimetuple +generated/pandas.Timestamp.value,../reference/api/pandas.Timestamp.value +generated/pandas.Timestamp.weekday,../reference/api/pandas.Timestamp.weekday +generated/pandas.Timestamp.weekday_name,../reference/api/pandas.Timestamp.weekday_name +generated/pandas.Timestamp.week,../reference/api/pandas.Timestamp.week +generated/pandas.Timestamp.weekofyear,../reference/api/pandas.Timestamp.weekofyear +generated/pandas.Timestamp.year,../reference/api/pandas.Timestamp.year +generated/pandas.to_datetime,../reference/api/pandas.to_datetime +generated/pandas.to_numeric,../reference/api/pandas.to_numeric +generated/pandas.to_timedelta,../reference/api/pandas.to_timedelta +generated/pandas.tseries.frequencies.to_offset,../reference/api/pandas.tseries.frequencies.to_offset +generated/pandas.unique,../reference/api/pandas.unique +generated/pandas.util.hash_array,../reference/api/pandas.util.hash_array +generated/pandas.util.hash_pandas_object,../reference/api/pandas.util.hash_pandas_object +generated/pandas.wide_to_long,../reference/api/pandas.wide_to_long diff --git a/doc/source/contributing.rst b/doc/source/development/contributing.rst similarity index 99% rename from doc/source/contributing.rst rename to doc/source/development/contributing.rst index a68e5c70087e9..c9d6845107dfc 100644 --- a/doc/source/contributing.rst +++ b/doc/source/development/contributing.rst @@ -698,7 +698,7 @@ A pull-request will be considered for merging when you have an all 'green' build then you will get a red 'X', where you can click through to see the individual failed tests. This is an example of a green build. -.. image:: _static/ci.png +.. image:: ../_static/ci.png .. note:: diff --git a/doc/source/contributing_docstring.rst b/doc/source/development/contributing_docstring.rst similarity index 100% rename from doc/source/contributing_docstring.rst rename to doc/source/development/contributing_docstring.rst diff --git a/doc/source/developer.rst b/doc/source/development/developer.rst similarity index 100% rename from doc/source/developer.rst rename to doc/source/development/developer.rst diff --git a/doc/source/extending.rst b/doc/source/development/extending.rst similarity index 100% rename from doc/source/extending.rst rename to doc/source/development/extending.rst diff --git a/doc/source/development/index.rst b/doc/source/development/index.rst new file mode 100644 index 0000000000000..d67a6c3a2ca04 --- /dev/null +++ b/doc/source/development/index.rst @@ -0,0 +1,15 @@ +{{ header }} + +.. _development: + +=========== +Development +=========== + +.. toctree:: + :maxdepth: 2 + + contributing + internals + extending + developer diff --git a/doc/source/internals.rst b/doc/source/development/internals.rst similarity index 100% rename from doc/source/internals.rst rename to doc/source/development/internals.rst diff --git a/doc/source/10min.rst b/doc/source/getting_started/10min.rst similarity index 100% rename from doc/source/10min.rst rename to doc/source/getting_started/10min.rst diff --git a/doc/source/basics.rst b/doc/source/getting_started/basics.rst similarity index 100% rename from doc/source/basics.rst rename to doc/source/getting_started/basics.rst diff --git a/doc/source/comparison_with_r.rst b/doc/source/getting_started/comparison/comparison_with_r.rst similarity index 100% rename from doc/source/comparison_with_r.rst rename to doc/source/getting_started/comparison/comparison_with_r.rst diff --git a/doc/source/comparison_with_sas.rst b/doc/source/getting_started/comparison/comparison_with_sas.rst similarity index 100% rename from doc/source/comparison_with_sas.rst rename to doc/source/getting_started/comparison/comparison_with_sas.rst diff --git a/doc/source/comparison_with_sql.rst b/doc/source/getting_started/comparison/comparison_with_sql.rst similarity index 100% rename from doc/source/comparison_with_sql.rst rename to doc/source/getting_started/comparison/comparison_with_sql.rst diff --git a/doc/source/comparison_with_stata.rst b/doc/source/getting_started/comparison/comparison_with_stata.rst similarity index 100% rename from doc/source/comparison_with_stata.rst rename to doc/source/getting_started/comparison/comparison_with_stata.rst diff --git a/doc/source/getting_started/comparison/index.rst b/doc/source/getting_started/comparison/index.rst new file mode 100644 index 0000000000000..998706ce0c639 --- /dev/null +++ b/doc/source/getting_started/comparison/index.rst @@ -0,0 +1,15 @@ +{{ header }} + +.. _comparison: + +=========================== +Comparison with other tools +=========================== + +.. toctree:: + :maxdepth: 2 + + comparison_with_r + comparison_with_sql + comparison_with_sas + comparison_with_stata diff --git a/doc/source/dsintro.rst b/doc/source/getting_started/dsintro.rst similarity index 100% rename from doc/source/dsintro.rst rename to doc/source/getting_started/dsintro.rst diff --git a/doc/source/getting_started/index.rst b/doc/source/getting_started/index.rst new file mode 100644 index 0000000000000..4c5d26461a667 --- /dev/null +++ b/doc/source/getting_started/index.rst @@ -0,0 +1,17 @@ +{{ header }} + +.. _getting_started: + +=============== +Getting started +=============== + +.. toctree:: + :maxdepth: 2 + + overview + 10min + basics + dsintro + comparison/index + tutorials diff --git a/doc/source/overview.rst b/doc/source/getting_started/overview.rst similarity index 50% rename from doc/source/overview.rst rename to doc/source/getting_started/overview.rst index b98e2d4b9963c..b531f686951fc 100644 --- a/doc/source/overview.rst +++ b/doc/source/getting_started/overview.rst @@ -6,25 +6,80 @@ Package overview **************** -:mod:`pandas` is an open source, BSD-licensed library providing high-performance, -easy-to-use data structures and data analysis tools for the `Python `__ -programming language. - -:mod:`pandas` consists of the following elements: - -* A set of labeled array data structures, the primary of which are - Series and DataFrame. -* Index objects enabling both simple axis indexing and multi-level / - hierarchical axis indexing. -* An integrated group by engine for aggregating and transforming data sets. -* Date range generation (date_range) and custom date offsets enabling the - implementation of customized frequencies. -* Input/Output tools: loading tabular data from flat files (CSV, delimited, - Excel 2003), and saving and loading pandas objects from the fast and - efficient PyTables/HDF5 format. -* Memory-efficient "sparse" versions of the standard data structures for storing - data that is mostly missing or mostly constant (some fixed value). -* Moving window statistics (rolling mean, rolling standard deviation, etc.). +**pandas** is a `Python `__ package providing fast, +flexible, and expressive data structures designed to make working with +"relational" or "labeled" data both easy and intuitive. It aims to be the +fundamental high-level building block for doing practical, **real world** data +analysis in Python. Additionally, it has the broader goal of becoming **the +most powerful and flexible open source data analysis / manipulation tool +available in any language**. It is already well on its way toward this goal. + +pandas is well suited for many different kinds of data: + + - Tabular data with heterogeneously-typed columns, as in an SQL table or + Excel spreadsheet + - Ordered and unordered (not necessarily fixed-frequency) time series data. + - Arbitrary matrix data (homogeneously typed or heterogeneous) with row and + column labels + - Any other form of observational / statistical data sets. The data actually + need not be labeled at all to be placed into a pandas data structure + +The two primary data structures of pandas, :class:`Series` (1-dimensional) +and :class:`DataFrame` (2-dimensional), handle the vast majority of typical use +cases in finance, statistics, social science, and many areas of +engineering. For R users, :class:`DataFrame` provides everything that R's +``data.frame`` provides and much more. pandas is built on top of `NumPy +`__ and is intended to integrate well within a scientific +computing environment with many other 3rd party libraries. + +Here are just a few of the things that pandas does well: + + - Easy handling of **missing data** (represented as NaN) in floating point as + well as non-floating point data + - Size mutability: columns can be **inserted and deleted** from DataFrame and + higher dimensional objects + - Automatic and explicit **data alignment**: objects can be explicitly + aligned to a set of labels, or the user can simply ignore the labels and + let `Series`, `DataFrame`, etc. automatically align the data for you in + computations + - Powerful, flexible **group by** functionality to perform + split-apply-combine operations on data sets, for both aggregating and + transforming data + - Make it **easy to convert** ragged, differently-indexed data in other + Python and NumPy data structures into DataFrame objects + - Intelligent label-based **slicing**, **fancy indexing**, and **subsetting** + of large data sets + - Intuitive **merging** and **joining** data sets + - Flexible **reshaping** and pivoting of data sets + - **Hierarchical** labeling of axes (possible to have multiple labels per + tick) + - Robust IO tools for loading data from **flat files** (CSV and delimited), + Excel files, databases, and saving / loading data from the ultrafast **HDF5 + format** + - **Time series**-specific functionality: date range generation and frequency + conversion, moving window statistics, moving window linear regressions, + date shifting and lagging, etc. + +Many of these principles are here to address the shortcomings frequently +experienced using other languages / scientific research environments. For data +scientists, working with data is typically divided into multiple stages: +munging and cleaning data, analyzing / modeling it, then organizing the results +of the analysis into a form suitable for plotting or tabular display. pandas +is the ideal tool for all of these tasks. + +Some other notes + + - pandas is **fast**. Many of the low-level algorithmic bits have been + extensively tweaked in `Cython `__ code. However, as with + anything else generalization usually sacrifices performance. So if you focus + on one feature for your application you may be able to create a faster + specialized tool. + + - pandas is a dependency of `statsmodels + `__, making it an important part of the + statistical computing ecosystem in Python. + + - pandas has been used extensively in production in financial applications. Data Structures --------------- @@ -119,5 +174,5 @@ The information about current institutional partners can be found on `pandas web License ------- -.. literalinclude:: ../../LICENSE +.. literalinclude:: ../../../LICENSE diff --git a/doc/source/tutorials.rst b/doc/source/getting_started/tutorials.rst similarity index 100% rename from doc/source/tutorials.rst rename to doc/source/getting_started/tutorials.rst diff --git a/doc/source/index.rst.template b/doc/source/index.rst.template index b85150c3444b7..d04e9194e71dc 100644 --- a/doc/source/index.rst.template +++ b/doc/source/index.rst.template @@ -1,168 +1,54 @@ .. pandas documentation master file, created by +.. module:: pandas + ********************************************* pandas: powerful Python data analysis toolkit ********************************************* -`PDF Version `__ - -`Zipped HTML `__ - -.. module:: pandas - **Date**: |today| **Version**: |version| -**Binary Installers:** https://pypi.org/project/pandas - -**Source Repository:** https://github.com/pandas-dev/pandas - -**Issues & Ideas:** https://github.com/pandas-dev/pandas/issues - -**Q&A Support:** https://stackoverflow.com/questions/tagged/pandas - -**Developer Mailing List:** https://groups.google.com/forum/#!forum/pydata - -**pandas** is a `Python `__ package providing fast, -flexible, and expressive data structures designed to make working with -"relational" or "labeled" data both easy and intuitive. It aims to be the -fundamental high-level building block for doing practical, **real world** data -analysis in Python. Additionally, it has the broader goal of becoming **the -most powerful and flexible open source data analysis / manipulation tool -available in any language**. It is already well on its way toward this goal. - -pandas is well suited for many different kinds of data: - - - Tabular data with heterogeneously-typed columns, as in an SQL table or - Excel spreadsheet - - Ordered and unordered (not necessarily fixed-frequency) time series data. - - Arbitrary matrix data (homogeneously typed or heterogeneous) with row and - column labels - - Any other form of observational / statistical data sets. The data actually - need not be labeled at all to be placed into a pandas data structure - -The two primary data structures of pandas, :class:`Series` (1-dimensional) -and :class:`DataFrame` (2-dimensional), handle the vast majority of typical use -cases in finance, statistics, social science, and many areas of -engineering. For R users, :class:`DataFrame` provides everything that R's -``data.frame`` provides and much more. pandas is built on top of `NumPy -`__ and is intended to integrate well within a scientific -computing environment with many other 3rd party libraries. - -Here are just a few of the things that pandas does well: - - - Easy handling of **missing data** (represented as NaN) in floating point as - well as non-floating point data - - Size mutability: columns can be **inserted and deleted** from DataFrame and - higher dimensional objects - - Automatic and explicit **data alignment**: objects can be explicitly - aligned to a set of labels, or the user can simply ignore the labels and - let `Series`, `DataFrame`, etc. automatically align the data for you in - computations - - Powerful, flexible **group by** functionality to perform - split-apply-combine operations on data sets, for both aggregating and - transforming data - - Make it **easy to convert** ragged, differently-indexed data in other - Python and NumPy data structures into DataFrame objects - - Intelligent label-based **slicing**, **fancy indexing**, and **subsetting** - of large data sets - - Intuitive **merging** and **joining** data sets - - Flexible **reshaping** and pivoting of data sets - - **Hierarchical** labeling of axes (possible to have multiple labels per - tick) - - Robust IO tools for loading data from **flat files** (CSV and delimited), - Excel files, databases, and saving / loading data from the ultrafast **HDF5 - format** - - **Time series**-specific functionality: date range generation and frequency - conversion, moving window statistics, moving window linear regressions, - date shifting and lagging, etc. - -Many of these principles are here to address the shortcomings frequently -experienced using other languages / scientific research environments. For data -scientists, working with data is typically divided into multiple stages: -munging and cleaning data, analyzing / modeling it, then organizing the results -of the analysis into a form suitable for plotting or tabular display. pandas -is the ideal tool for all of these tasks. - -Some other notes - - - pandas is **fast**. Many of the low-level algorithmic bits have been - extensively tweaked in `Cython `__ code. However, as with - anything else generalization usually sacrifices performance. So if you focus - on one feature for your application you may be able to create a faster - specialized tool. - - - pandas is a dependency of `statsmodels - `__, making it an important part of the - statistical computing ecosystem in Python. - - - pandas has been used extensively in production in financial applications. - -.. note:: +**Download documentation**: `PDF Version `__ | `Zipped HTML `__ - This documentation assumes general familiarity with NumPy. If you haven't - used NumPy much or at all, do invest some time in `learning about NumPy - `__ first. +**Useful links**: +`Binary Installers `__ | +`Source Repository `__ | +`Issues & Ideas `__ | +`Q&A Support `__ | +`Mailing List `__ -See the package overview for more detail about what's in the library. +:mod:`pandas` is an open source, BSD-licensed library providing high-performance, +easy-to-use data structures and data analysis tools for the `Python `__ +programming language. +See the :ref:`overview` for more detail about what's in the library. {% if single_doc and single_doc.endswith('.rst') -%} .. toctree:: - :maxdepth: 4 + :maxdepth: 2 {{ single_doc[:-4] }} {% elif single_doc %} .. autosummary:: - :toctree: api/generated/ + :toctree: reference/api/ {{ single_doc }} {% else -%} .. toctree:: - :maxdepth: 4 + :maxdepth: 2 {% endif %} {% if not single_doc -%} - What's New + What's New in 0.25.0 install - contributing - overview - 10min - tutorials - cookbook - dsintro - basics - text - options - indexing - advanced - computation - missing_data - groupby - merging - reshaping - timeseries - timedeltas - categorical - integer_na - visualization - style - io - enhancingperf - sparse - gotchas - r_interface + getting_started/index + user_guide/index ecosystem - comparison_with_r - comparison_with_sql - comparison_with_sas - comparison_with_stata {% endif -%} {% if include_api -%} - api/index + reference/index {% endif -%} {% if not single_doc -%} - developer - internals - extending + development/index whatsnew/index {% endif -%} diff --git a/doc/source/r_interface.rst b/doc/source/r_interface.rst deleted file mode 100644 index 9839bba4884d4..0000000000000 --- a/doc/source/r_interface.rst +++ /dev/null @@ -1,94 +0,0 @@ -.. _rpy: - -{{ header }} - -****************** -rpy2 / R interface -****************** - -.. warning:: - - Up to pandas 0.19, a ``pandas.rpy`` module existed with functionality to - convert between pandas and ``rpy2`` objects. This functionality now lives in - the `rpy2 `__ project itself. - See the `updating section `__ - of the previous documentation for a guide to port your code from the - removed ``pandas.rpy`` to ``rpy2`` functions. - - -`rpy2 `__ is an interface to R running embedded in a Python process, and also includes functionality to deal with pandas DataFrames. -Converting data frames back and forth between rpy2 and pandas should be largely -automated (no need to convert explicitly, it will be done on the fly in most -rpy2 functions). -To convert explicitly, the functions are ``pandas2ri.py2ri()`` and -``pandas2ri.ri2py()``. - - -See also the documentation of the `rpy2 `__ project: https://rpy2.readthedocs.io. - -In the remainder of this page, a few examples of explicit conversion is given. The pandas conversion of rpy2 needs first to be activated: - -.. ipython:: - :verbatim: - - In [1]: from rpy2.robjects import pandas2ri - ...: pandas2ri.activate() - -Transferring R data sets into Python ------------------------------------- - -Once the pandas conversion is activated (``pandas2ri.activate()``), many conversions -of R to pandas objects will be done automatically. For example, to obtain the 'iris' dataset as a pandas DataFrame: - -.. ipython:: - :verbatim: - - In [2]: from rpy2.robjects import r - - In [3]: r.data('iris') - - In [4]: r['iris'].head() - Out[4]: - Sepal.Length Sepal.Width Petal.Length Petal.Width Species - 0 5.1 3.5 1.4 0.2 setosa - 1 4.9 3.0 1.4 0.2 setosa - 2 4.7 3.2 1.3 0.2 setosa - 3 4.6 3.1 1.5 0.2 setosa - 4 5.0 3.6 1.4 0.2 setosa - -If the pandas conversion was not activated, the above could also be accomplished -by explicitly converting it with the ``pandas2ri.ri2py`` function -(``pandas2ri.ri2py(r['iris'])``). - -Converting DataFrames into R objects ------------------------------------- - -The ``pandas2ri.py2ri`` function support the reverse operation to convert -DataFrames into the equivalent R object (that is, **data.frame**): - -.. ipython:: - :verbatim: - - In [5]: df = pd.DataFrame({'A': [1, 2, 3], 'B': [4, 5, 6], 'C': [7, 8, 9]}, - ...: index=["one", "two", "three"]) - - In [6]: r_dataframe = pandas2ri.py2ri(df) - - In [7]: print(type(r_dataframe)) - Out[7]: - - In [8]: print(r_dataframe) - Out[8]: - A B C - one 1 4 7 - two 2 5 8 - three 3 6 9 - - -The DataFrame's index is stored as the ``rownames`` attribute of the -data.frame instance. - - -.. - Calling R functions with pandas objects - High-level interface to R estimators diff --git a/doc/source/api/arrays.rst b/doc/source/reference/arrays.rst similarity index 92% rename from doc/source/api/arrays.rst rename to doc/source/reference/arrays.rst index 5ecc5181af22c..1dc74ad83b7e6 100644 --- a/doc/source/api/arrays.rst +++ b/doc/source/reference/arrays.rst @@ -31,7 +31,7 @@ The top-level :meth:`array` method can be used to create a new array, which may stored in a :class:`Series`, :class:`Index`, or as a column in a :class:`DataFrame`. .. autosummary:: - :toctree: generated/ + :toctree: api/ array @@ -48,14 +48,14 @@ or timezone-aware values. scalar type for timezone-naive or timezone-aware datetime data. .. autosummary:: - :toctree: generated/ + :toctree: api/ Timestamp Properties ~~~~~~~~~~ .. autosummary:: - :toctree: generated/ + :toctree: api/ Timestamp.asm8 Timestamp.day @@ -91,7 +91,7 @@ Properties Methods ~~~~~~~ .. autosummary:: - :toctree: generated/ + :toctree: api/ Timestamp.astimezone Timestamp.ceil @@ -142,7 +142,7 @@ is used. If the data are tz-aware, then every value in the array must have the same timezone. .. autosummary:: - :toctree: generated/ + :toctree: api/ arrays.DatetimeArray DatetimeTZDtype @@ -156,14 +156,14 @@ NumPy can natively represent timedeltas. Pandas provides :class:`Timedelta` for symmetry with :class:`Timestamp`. .. autosummary:: - :toctree: generated/ + :toctree: api/ Timedelta Properties ~~~~~~~~~~ .. autosummary:: - :toctree: generated/ + :toctree: api/ Timedelta.asm8 Timedelta.components @@ -183,7 +183,7 @@ Properties Methods ~~~~~~~ .. autosummary:: - :toctree: generated/ + :toctree: api/ Timedelta.ceil Timedelta.floor @@ -196,7 +196,7 @@ Methods A collection of timedeltas may be stored in a :class:`TimedeltaArray`. .. autosummary:: - :toctree: generated/ + :toctree: api/ arrays.TimedeltaArray @@ -210,14 +210,14 @@ Pandas represents spans of times as :class:`Period` objects. Period ------ .. autosummary:: - :toctree: generated/ + :toctree: api/ Period Properties ~~~~~~~~~~ .. autosummary:: - :toctree: generated/ + :toctree: api/ Period.day Period.dayofweek @@ -244,7 +244,7 @@ Properties Methods ~~~~~~~ .. autosummary:: - :toctree: generated/ + :toctree: api/ Period.asfreq Period.now @@ -255,7 +255,7 @@ A collection of timedeltas may be stored in a :class:`arrays.PeriodArray`. Every period in a ``PeriodArray`` must have the same ``freq``. .. autosummary:: - :toctree: generated/ + :toctree: api/ arrays.DatetimeArray PeriodDtype @@ -268,14 +268,14 @@ Interval Data Arbitrary intervals can be represented as :class:`Interval` objects. .. autosummary:: - :toctree: generated/ + :toctree: api/ Interval Properties ~~~~~~~~~~ .. autosummary:: - :toctree: generated/ + :toctree: api/ Interval.closed Interval.closed_left @@ -288,12 +288,12 @@ Properties Interval.overlaps Interval.right -A collection of intervals may be stored in an :class:`IntervalArray`. +A collection of intervals may be stored in an :class:`arrays.IntervalArray`. .. autosummary:: - :toctree: generated/ + :toctree: api/ - IntervalArray + arrays.IntervalArray IntervalDtype .. _api.arrays.integer_na: @@ -305,7 +305,7 @@ Nullable Integer Pandas provides this through :class:`arrays.IntegerArray`. .. autosummary:: - :toctree: generated/ + :toctree: api/ arrays.IntegerArray Int8Dtype @@ -327,13 +327,13 @@ limited, fixed set of values. The dtype of a ``Categorical`` can be described by a :class:`pandas.api.types.CategoricalDtype`. .. autosummary:: - :toctree: generated/ + :toctree: api/ :template: autosummary/class_without_autosummary.rst CategoricalDtype .. autosummary:: - :toctree: generated/ + :toctree: api/ CategoricalDtype.categories CategoricalDtype.ordered @@ -341,7 +341,7 @@ a :class:`pandas.api.types.CategoricalDtype`. Categorical data can be stored in a :class:`pandas.Categorical` .. autosummary:: - :toctree: generated/ + :toctree: api/ :template: autosummary/class_without_autosummary.rst Categorical @@ -350,14 +350,14 @@ The alternative :meth:`Categorical.from_codes` constructor can be used when you have the categories and integer codes already: .. autosummary:: - :toctree: generated/ + :toctree: api/ Categorical.from_codes The dtype information is available on the ``Categorical`` .. autosummary:: - :toctree: generated/ + :toctree: api/ Categorical.dtype Categorical.categories @@ -368,7 +368,7 @@ The dtype information is available on the ``Categorical`` the Categorical back to a NumPy array, so categories and order information is not preserved! .. autosummary:: - :toctree: generated/ + :toctree: api/ Categorical.__array__ @@ -391,7 +391,7 @@ Data where a single value is repeated many times (e.g. ``0`` or ``NaN``) may be stored efficiently as a :class:`SparseArray`. .. autosummary:: - :toctree: generated/ + :toctree: api/ SparseArray SparseDtype diff --git a/doc/source/api/extensions.rst b/doc/source/reference/extensions.rst similarity index 95% rename from doc/source/api/extensions.rst rename to doc/source/reference/extensions.rst index 3972354ff9651..6146e34fab274 100644 --- a/doc/source/api/extensions.rst +++ b/doc/source/reference/extensions.rst @@ -11,7 +11,7 @@ These are primarily intended for library authors looking to extend pandas objects. .. autosummary:: - :toctree: generated/ + :toctree: api/ api.extensions.register_extension_dtype api.extensions.register_dataframe_accessor diff --git a/doc/source/api/frame.rst b/doc/source/reference/frame.rst similarity index 93% rename from doc/source/api/frame.rst rename to doc/source/reference/frame.rst index de16d59fe7c40..568acd5207bd1 100644 --- a/doc/source/api/frame.rst +++ b/doc/source/reference/frame.rst @@ -10,7 +10,7 @@ DataFrame Constructor ~~~~~~~~~~~ .. autosummary:: - :toctree: generated/ + :toctree: api/ DataFrame @@ -19,13 +19,13 @@ Attributes and underlying data **Axes** .. autosummary:: - :toctree: generated/ + :toctree: api/ DataFrame.index DataFrame.columns .. autosummary:: - :toctree: generated/ + :toctree: api/ DataFrame.dtypes DataFrame.ftypes @@ -45,7 +45,7 @@ Attributes and underlying data Conversion ~~~~~~~~~~ .. autosummary:: - :toctree: generated/ + :toctree: api/ DataFrame.astype DataFrame.convert_objects @@ -58,7 +58,7 @@ Conversion Indexing, iteration ~~~~~~~~~~~~~~~~~~~ .. autosummary:: - :toctree: generated/ + :toctree: api/ DataFrame.head DataFrame.at @@ -88,7 +88,7 @@ For more information on ``.at``, ``.iat``, ``.loc``, and Binary operator functions ~~~~~~~~~~~~~~~~~~~~~~~~~ .. autosummary:: - :toctree: generated/ + :toctree: api/ DataFrame.add DataFrame.sub @@ -119,7 +119,7 @@ Binary operator functions Function application, GroupBy & Window ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ .. autosummary:: - :toctree: generated/ + :toctree: api/ DataFrame.apply DataFrame.applymap @@ -137,7 +137,7 @@ Function application, GroupBy & Window Computations / Descriptive Stats ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ .. autosummary:: - :toctree: generated/ + :toctree: api/ DataFrame.abs DataFrame.all @@ -181,7 +181,7 @@ Computations / Descriptive Stats Reindexing / Selection / Label manipulation ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ .. autosummary:: - :toctree: generated/ + :toctree: api/ DataFrame.add_prefix DataFrame.add_suffix @@ -217,7 +217,7 @@ Reindexing / Selection / Label manipulation Missing data handling ~~~~~~~~~~~~~~~~~~~~~ .. autosummary:: - :toctree: generated/ + :toctree: api/ DataFrame.dropna DataFrame.fillna @@ -227,7 +227,7 @@ Missing data handling Reshaping, sorting, transposing ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ .. autosummary:: - :toctree: generated/ + :toctree: api/ DataFrame.droplevel DataFrame.pivot @@ -251,7 +251,7 @@ Reshaping, sorting, transposing Combining / joining / merging ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ .. autosummary:: - :toctree: generated/ + :toctree: api/ DataFrame.append DataFrame.assign @@ -262,7 +262,7 @@ Combining / joining / merging Time series-related ~~~~~~~~~~~~~~~~~~~ .. autosummary:: - :toctree: generated/ + :toctree: api/ DataFrame.asfreq DataFrame.asof @@ -285,13 +285,13 @@ Plotting specific plotting methods of the form ``DataFrame.plot.``. .. autosummary:: - :toctree: generated/ + :toctree: api/ :template: autosummary/accessor_callable.rst DataFrame.plot .. autosummary:: - :toctree: generated/ + :toctree: api/ :template: autosummary/accessor_method.rst DataFrame.plot.area @@ -307,7 +307,7 @@ specific plotting methods of the form ``DataFrame.plot.``. DataFrame.plot.scatter .. autosummary:: - :toctree: generated/ + :toctree: api/ DataFrame.boxplot DataFrame.hist @@ -315,7 +315,7 @@ specific plotting methods of the form ``DataFrame.plot.``. Serialization / IO / Conversion ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ .. autosummary:: - :toctree: generated/ + :toctree: api/ DataFrame.from_csv DataFrame.from_dict @@ -346,6 +346,6 @@ Serialization / IO / Conversion Sparse ~~~~~~ .. autosummary:: - :toctree: generated/ + :toctree: api/ SparseDataFrame.to_coo diff --git a/doc/source/api/general_functions.rst b/doc/source/reference/general_functions.rst similarity index 84% rename from doc/source/api/general_functions.rst rename to doc/source/reference/general_functions.rst index cef5d8cac6abc..b5832cb8aa591 100644 --- a/doc/source/api/general_functions.rst +++ b/doc/source/reference/general_functions.rst @@ -10,7 +10,7 @@ General functions Data manipulations ~~~~~~~~~~~~~~~~~~ .. autosummary:: - :toctree: generated/ + :toctree: api/ melt pivot @@ -30,7 +30,7 @@ Data manipulations Top-level missing data ~~~~~~~~~~~~~~~~~~~~~~ .. autosummary:: - :toctree: generated/ + :toctree: api/ isna isnull @@ -40,14 +40,14 @@ Top-level missing data Top-level conversions ~~~~~~~~~~~~~~~~~~~~~ .. autosummary:: - :toctree: generated/ + :toctree: api/ to_numeric Top-level dealing with datetimelike ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ .. autosummary:: - :toctree: generated/ + :toctree: api/ to_datetime to_timedelta @@ -60,21 +60,21 @@ Top-level dealing with datetimelike Top-level dealing with intervals ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ .. autosummary:: - :toctree: generated/ + :toctree: api/ interval_range Top-level evaluation ~~~~~~~~~~~~~~~~~~~~ .. autosummary:: - :toctree: generated/ + :toctree: api/ eval Hashing ~~~~~~~ .. autosummary:: - :toctree: generated/ + :toctree: api/ util.hash_array util.hash_pandas_object @@ -82,6 +82,6 @@ Hashing Testing ~~~~~~~ .. autosummary:: - :toctree: generated/ + :toctree: api/ test diff --git a/doc/source/api/general_utility_functions.rst b/doc/source/reference/general_utility_functions.rst similarity index 93% rename from doc/source/api/general_utility_functions.rst rename to doc/source/reference/general_utility_functions.rst index e151f8f57ed5e..9c69770c0f1b7 100644 --- a/doc/source/api/general_utility_functions.rst +++ b/doc/source/reference/general_utility_functions.rst @@ -10,7 +10,7 @@ General utility functions Working with options -------------------- .. autosummary:: - :toctree: generated/ + :toctree: api/ describe_option reset_option @@ -21,7 +21,7 @@ Working with options Testing functions ----------------- .. autosummary:: - :toctree: generated/ + :toctree: api/ testing.assert_frame_equal testing.assert_series_equal @@ -30,7 +30,7 @@ Testing functions Exceptions and warnings ----------------------- .. autosummary:: - :toctree: generated/ + :toctree: api/ errors.DtypeWarning errors.EmptyDataError @@ -44,7 +44,7 @@ Exceptions and warnings Data types related functionality -------------------------------- .. autosummary:: - :toctree: generated/ + :toctree: api/ api.types.union_categoricals api.types.infer_dtype @@ -53,7 +53,7 @@ Data types related functionality Dtype introspection ~~~~~~~~~~~~~~~~~~~ .. autosummary:: - :toctree: generated/ + :toctree: api/ api.types.is_bool_dtype api.types.is_categorical_dtype @@ -81,7 +81,7 @@ Dtype introspection Iterable introspection ~~~~~~~~~~~~~~~~~~~~~~ .. autosummary:: - :toctree: generated/ + :toctree: api/ api.types.is_dict_like api.types.is_file_like @@ -92,7 +92,7 @@ Iterable introspection Scalar introspection ~~~~~~~~~~~~~~~~~~~~ .. autosummary:: - :toctree: generated/ + :toctree: api/ api.types.is_bool api.types.is_categorical diff --git a/doc/source/api/groupby.rst b/doc/source/reference/groupby.rst similarity index 94% rename from doc/source/api/groupby.rst rename to doc/source/reference/groupby.rst index d67c7e0889522..6ed85ff2fac43 100644 --- a/doc/source/api/groupby.rst +++ b/doc/source/reference/groupby.rst @@ -12,7 +12,7 @@ GroupBy objects are returned by groupby calls: :func:`pandas.DataFrame.groupby`, Indexing, iteration ------------------- .. autosummary:: - :toctree: generated/ + :toctree: api/ GroupBy.__iter__ GroupBy.groups @@ -22,7 +22,7 @@ Indexing, iteration .. currentmodule:: pandas .. autosummary:: - :toctree: generated/ + :toctree: api/ :template: autosummary/class_without_autosummary.rst Grouper @@ -32,7 +32,7 @@ Indexing, iteration Function application -------------------- .. autosummary:: - :toctree: generated/ + :toctree: api/ GroupBy.apply GroupBy.agg @@ -43,7 +43,7 @@ Function application Computations / Descriptive Stats -------------------------------- .. autosummary:: - :toctree: generated/ + :toctree: api/ GroupBy.all GroupBy.any @@ -78,7 +78,7 @@ axis argument, and often an argument indicating whether to restrict application to columns of a specific data type. .. autosummary:: - :toctree: generated/ + :toctree: api/ DataFrameGroupBy.all DataFrameGroupBy.any @@ -113,7 +113,7 @@ application to columns of a specific data type. The following methods are available only for ``SeriesGroupBy`` objects. .. autosummary:: - :toctree: generated/ + :toctree: api/ SeriesGroupBy.nlargest SeriesGroupBy.nsmallest @@ -126,7 +126,7 @@ The following methods are available only for ``SeriesGroupBy`` objects. The following methods are available only for ``DataFrameGroupBy`` objects. .. autosummary:: - :toctree: generated/ + :toctree: api/ DataFrameGroupBy.corrwith DataFrameGroupBy.boxplot diff --git a/doc/source/api/index.rst b/doc/source/reference/index.rst similarity index 56% rename from doc/source/api/index.rst rename to doc/source/reference/index.rst index e4d118e278128..ef4676054473a 100644 --- a/doc/source/api/index.rst +++ b/doc/source/reference/index.rst @@ -44,31 +44,31 @@ public functions related to data types in pandas. .. toctree:: :hidden: - generated/pandas.DataFrame.blocks - generated/pandas.DataFrame.as_matrix - generated/pandas.DataFrame.ix - generated/pandas.Index.asi8 - generated/pandas.Index.data - generated/pandas.Index.flags - generated/pandas.Index.holds_integer - generated/pandas.Index.is_type_compatible - generated/pandas.Index.nlevels - generated/pandas.Index.sort - generated/pandas.Panel.agg - generated/pandas.Panel.aggregate - generated/pandas.Panel.blocks - generated/pandas.Panel.empty - generated/pandas.Panel.is_copy - generated/pandas.Panel.items - generated/pandas.Panel.ix - generated/pandas.Panel.major_axis - generated/pandas.Panel.minor_axis - generated/pandas.Series.asobject - generated/pandas.Series.blocks - generated/pandas.Series.from_array - generated/pandas.Series.ix - generated/pandas.Series.imag - generated/pandas.Series.real + api/pandas.DataFrame.blocks + api/pandas.DataFrame.as_matrix + api/pandas.DataFrame.ix + api/pandas.Index.asi8 + api/pandas.Index.data + api/pandas.Index.flags + api/pandas.Index.holds_integer + api/pandas.Index.is_type_compatible + api/pandas.Index.nlevels + api/pandas.Index.sort + api/pandas.Panel.agg + api/pandas.Panel.aggregate + api/pandas.Panel.blocks + api/pandas.Panel.empty + api/pandas.Panel.is_copy + api/pandas.Panel.items + api/pandas.Panel.ix + api/pandas.Panel.major_axis + api/pandas.Panel.minor_axis + api/pandas.Series.asobject + api/pandas.Series.blocks + api/pandas.Series.from_array + api/pandas.Series.ix + api/pandas.Series.imag + api/pandas.Series.real .. Can't convince sphinx to generate toctree for this class attribute. @@ -77,4 +77,4 @@ public functions related to data types in pandas. .. toctree:: :hidden: - generated/pandas.api.extensions.ExtensionDtype.na_value + api/pandas.api.extensions.ExtensionDtype.na_value diff --git a/doc/source/api/indexing.rst b/doc/source/reference/indexing.rst similarity index 91% rename from doc/source/api/indexing.rst rename to doc/source/reference/indexing.rst index d27b05322c1f2..680cb7e3dac91 100644 --- a/doc/source/api/indexing.rst +++ b/doc/source/reference/indexing.rst @@ -15,14 +15,14 @@ that contain an index (Series/DataFrame) and those should most likely be used before calling these methods directly.** .. autosummary:: - :toctree: generated/ + :toctree: api/ Index Properties ~~~~~~~~~~ .. autosummary:: - :toctree: generated/ + :toctree: api/ Index.values Index.is_monotonic @@ -51,7 +51,7 @@ Properties Modifying and Computations ~~~~~~~~~~~~~~~~~~~~~~~~~~ .. autosummary:: - :toctree: generated/ + :toctree: api/ Index.all Index.any @@ -90,7 +90,7 @@ Modifying and Computations Compatibility with MultiIndex ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ .. autosummary:: - :toctree: generated/ + :toctree: api/ Index.set_names Index.is_lexsorted_for_tuple @@ -99,7 +99,7 @@ Compatibility with MultiIndex Missing Values ~~~~~~~~~~~~~~ .. autosummary:: - :toctree: generated/ + :toctree: api/ Index.fillna Index.dropna @@ -109,7 +109,7 @@ Missing Values Conversion ~~~~~~~~~~ .. autosummary:: - :toctree: generated/ + :toctree: api/ Index.astype Index.item @@ -124,7 +124,7 @@ Conversion Sorting ~~~~~~~ .. autosummary:: - :toctree: generated/ + :toctree: api/ Index.argsort Index.searchsorted @@ -133,14 +133,14 @@ Sorting Time-specific operations ~~~~~~~~~~~~~~~~~~~~~~~~ .. autosummary:: - :toctree: generated/ + :toctree: api/ Index.shift Combining / joining / set operations ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ .. autosummary:: - :toctree: generated/ + :toctree: api/ Index.append Index.join @@ -152,7 +152,7 @@ Combining / joining / set operations Selecting ~~~~~~~~~ .. autosummary:: - :toctree: generated/ + :toctree: api/ Index.asof Index.asof_locs @@ -176,7 +176,7 @@ Selecting Numeric Index ------------- .. autosummary:: - :toctree: generated/ + :toctree: api/ :template: autosummary/class_without_autosummary.rst RangeIndex @@ -188,7 +188,7 @@ Numeric Index .. Separate block, since they aren't classes. .. autosummary:: - :toctree: generated/ + :toctree: api/ RangeIndex.from_range @@ -197,7 +197,7 @@ Numeric Index CategoricalIndex ---------------- .. autosummary:: - :toctree: generated/ + :toctree: api/ :template: autosummary/class_without_autosummary.rst CategoricalIndex @@ -205,7 +205,7 @@ CategoricalIndex Categorical Components ~~~~~~~~~~~~~~~~~~~~~~ .. autosummary:: - :toctree: generated/ + :toctree: api/ CategoricalIndex.codes CategoricalIndex.categories @@ -222,7 +222,7 @@ Categorical Components Modifying and Computations ~~~~~~~~~~~~~~~~~~~~~~~~~~ .. autosummary:: - :toctree: generated/ + :toctree: api/ CategoricalIndex.map CategoricalIndex.equals @@ -232,7 +232,7 @@ Modifying and Computations IntervalIndex ------------- .. autosummary:: - :toctree: generated/ + :toctree: api/ :template: autosummary/class_without_autosummary.rst IntervalIndex @@ -240,7 +240,7 @@ IntervalIndex IntervalIndex Components ~~~~~~~~~~~~~~~~~~~~~~~~ .. autosummary:: - :toctree: generated/ + :toctree: api/ IntervalIndex.from_arrays IntervalIndex.from_tuples @@ -265,20 +265,20 @@ IntervalIndex Components MultiIndex ---------- .. autosummary:: - :toctree: generated/ + :toctree: api/ :template: autosummary/class_without_autosummary.rst MultiIndex .. autosummary:: - :toctree: generated/ + :toctree: api/ IndexSlice MultiIndex Constructors ~~~~~~~~~~~~~~~~~~~~~~~ .. autosummary:: - :toctree: generated/ + :toctree: api/ MultiIndex.from_arrays MultiIndex.from_tuples @@ -288,7 +288,7 @@ MultiIndex Constructors MultiIndex Properties ~~~~~~~~~~~~~~~~~~~~~ .. autosummary:: - :toctree: generated/ + :toctree: api/ MultiIndex.names MultiIndex.levels @@ -299,7 +299,7 @@ MultiIndex Properties MultiIndex Components ~~~~~~~~~~~~~~~~~~~~~ .. autosummary:: - :toctree: generated/ + :toctree: api/ MultiIndex.set_levels MultiIndex.set_codes @@ -316,7 +316,7 @@ MultiIndex Components MultiIndex Selecting ~~~~~~~~~~~~~~~~~~~~ .. autosummary:: - :toctree: generated/ + :toctree: api/ MultiIndex.get_loc MultiIndex.get_loc_level @@ -328,7 +328,7 @@ MultiIndex Selecting DatetimeIndex ------------- .. autosummary:: - :toctree: generated/ + :toctree: api/ :template: autosummary/class_without_autosummary.rst DatetimeIndex @@ -336,7 +336,7 @@ DatetimeIndex Time/Date Components ~~~~~~~~~~~~~~~~~~~~ .. autosummary:: - :toctree: generated/ + :toctree: api/ DatetimeIndex.year DatetimeIndex.month @@ -370,7 +370,7 @@ Time/Date Components Selecting ~~~~~~~~~ .. autosummary:: - :toctree: generated/ + :toctree: api/ DatetimeIndex.indexer_at_time DatetimeIndex.indexer_between_time @@ -379,7 +379,7 @@ Selecting Time-specific operations ~~~~~~~~~~~~~~~~~~~~~~~~ .. autosummary:: - :toctree: generated/ + :toctree: api/ DatetimeIndex.normalize DatetimeIndex.strftime @@ -395,7 +395,7 @@ Time-specific operations Conversion ~~~~~~~~~~ .. autosummary:: - :toctree: generated/ + :toctree: api/ DatetimeIndex.to_period DatetimeIndex.to_perioddelta @@ -406,7 +406,7 @@ Conversion TimedeltaIndex -------------- .. autosummary:: - :toctree: generated/ + :toctree: api/ :template: autosummary/class_without_autosummary.rst TimedeltaIndex @@ -414,7 +414,7 @@ TimedeltaIndex Components ~~~~~~~~~~ .. autosummary:: - :toctree: generated/ + :toctree: api/ TimedeltaIndex.days TimedeltaIndex.seconds @@ -426,7 +426,7 @@ Components Conversion ~~~~~~~~~~ .. autosummary:: - :toctree: generated/ + :toctree: api/ TimedeltaIndex.to_pytimedelta TimedeltaIndex.to_series @@ -440,7 +440,7 @@ Conversion PeriodIndex ----------- .. autosummary:: - :toctree: generated/ + :toctree: api/ :template: autosummary/class_without_autosummary.rst PeriodIndex @@ -448,7 +448,7 @@ PeriodIndex Properties ~~~~~~~~~~ .. autosummary:: - :toctree: generated/ + :toctree: api/ PeriodIndex.day PeriodIndex.dayofweek @@ -474,7 +474,7 @@ Properties Methods ~~~~~~~ .. autosummary:: - :toctree: generated/ + :toctree: api/ PeriodIndex.asfreq PeriodIndex.strftime diff --git a/doc/source/api/io.rst b/doc/source/reference/io.rst similarity index 78% rename from doc/source/api/io.rst rename to doc/source/reference/io.rst index f2060b7c05413..9c776e3ff8a82 100644 --- a/doc/source/api/io.rst +++ b/doc/source/reference/io.rst @@ -10,14 +10,14 @@ Input/Output Pickling ~~~~~~~~ .. autosummary:: - :toctree: generated/ + :toctree: api/ read_pickle Flat File ~~~~~~~~~ .. autosummary:: - :toctree: generated/ + :toctree: api/ read_table read_csv @@ -27,20 +27,20 @@ Flat File Clipboard ~~~~~~~~~ .. autosummary:: - :toctree: generated/ + :toctree: api/ read_clipboard Excel ~~~~~ .. autosummary:: - :toctree: generated/ + :toctree: api/ read_excel ExcelFile.parse .. autosummary:: - :toctree: generated/ + :toctree: api/ :template: autosummary/class_without_autosummary.rst ExcelWriter @@ -48,14 +48,14 @@ Excel JSON ~~~~ .. autosummary:: - :toctree: generated/ + :toctree: api/ read_json .. currentmodule:: pandas.io.json .. autosummary:: - :toctree: generated/ + :toctree: api/ json_normalize build_table_schema @@ -65,14 +65,14 @@ JSON HTML ~~~~ .. autosummary:: - :toctree: generated/ + :toctree: api/ read_html HDFStore: PyTables (HDF5) ~~~~~~~~~~~~~~~~~~~~~~~~~ .. autosummary:: - :toctree: generated/ + :toctree: api/ read_hdf HDFStore.put @@ -87,28 +87,28 @@ HDFStore: PyTables (HDF5) Feather ~~~~~~~ .. autosummary:: - :toctree: generated/ + :toctree: api/ read_feather Parquet ~~~~~~~ .. autosummary:: - :toctree: generated/ + :toctree: api/ read_parquet SAS ~~~ .. autosummary:: - :toctree: generated/ + :toctree: api/ read_sas SQL ~~~ .. autosummary:: - :toctree: generated/ + :toctree: api/ read_sql_table read_sql_query @@ -117,21 +117,21 @@ SQL Google BigQuery ~~~~~~~~~~~~~~~ .. autosummary:: - :toctree: generated/ + :toctree: api/ read_gbq STATA ~~~~~ .. autosummary:: - :toctree: generated/ + :toctree: api/ read_stata .. currentmodule:: pandas.io.stata .. autosummary:: - :toctree: generated/ + :toctree: api/ StataReader.data StataReader.data_label diff --git a/doc/source/api/offset_frequency.rst b/doc/source/reference/offset_frequency.rst similarity index 84% rename from doc/source/api/offset_frequency.rst rename to doc/source/reference/offset_frequency.rst index 42894fe8d7f2f..ccc1c7e171d22 100644 --- a/doc/source/api/offset_frequency.rst +++ b/doc/source/reference/offset_frequency.rst @@ -10,14 +10,14 @@ Date Offsets DateOffset ---------- .. autosummary:: - :toctree: generated/ + :toctree: api/ DateOffset Properties ~~~~~~~~~~ .. autosummary:: - :toctree: generated/ + :toctree: api/ DateOffset.freqstr DateOffset.kwds @@ -29,7 +29,7 @@ Properties Methods ~~~~~~~ .. autosummary:: - :toctree: generated/ + :toctree: api/ DateOffset.apply DateOffset.copy @@ -39,14 +39,14 @@ Methods BusinessDay ----------- .. autosummary:: - :toctree: generated/ + :toctree: api/ BusinessDay Properties ~~~~~~~~~~ .. autosummary:: - :toctree: generated/ + :toctree: api/ BusinessDay.freqstr BusinessDay.kwds @@ -58,7 +58,7 @@ Properties Methods ~~~~~~~ .. autosummary:: - :toctree: generated/ + :toctree: api/ BusinessDay.apply BusinessDay.apply_index @@ -69,14 +69,14 @@ Methods BusinessHour ------------ .. autosummary:: - :toctree: generated/ + :toctree: api/ BusinessHour Properties ~~~~~~~~~~ .. autosummary:: - :toctree: generated/ + :toctree: api/ BusinessHour.freqstr BusinessHour.kwds @@ -88,7 +88,7 @@ Properties Methods ~~~~~~~ .. autosummary:: - :toctree: generated/ + :toctree: api/ BusinessHour.apply BusinessHour.copy @@ -98,14 +98,14 @@ Methods CustomBusinessDay ----------------- .. autosummary:: - :toctree: generated/ + :toctree: api/ CustomBusinessDay Properties ~~~~~~~~~~ .. autosummary:: - :toctree: generated/ + :toctree: api/ CustomBusinessDay.freqstr CustomBusinessDay.kwds @@ -117,7 +117,7 @@ Properties Methods ~~~~~~~ .. autosummary:: - :toctree: generated/ + :toctree: api/ CustomBusinessDay.apply CustomBusinessDay.copy @@ -127,14 +127,14 @@ Methods CustomBusinessHour ------------------ .. autosummary:: - :toctree: generated/ + :toctree: api/ CustomBusinessHour Properties ~~~~~~~~~~ .. autosummary:: - :toctree: generated/ + :toctree: api/ CustomBusinessHour.freqstr CustomBusinessHour.kwds @@ -146,7 +146,7 @@ Properties Methods ~~~~~~~ .. autosummary:: - :toctree: generated/ + :toctree: api/ CustomBusinessHour.apply CustomBusinessHour.copy @@ -156,14 +156,14 @@ Methods MonthOffset ----------- .. autosummary:: - :toctree: generated/ + :toctree: api/ MonthOffset Properties ~~~~~~~~~~ .. autosummary:: - :toctree: generated/ + :toctree: api/ MonthOffset.freqstr MonthOffset.kwds @@ -175,7 +175,7 @@ Properties Methods ~~~~~~~ .. autosummary:: - :toctree: generated/ + :toctree: api/ MonthOffset.apply MonthOffset.apply_index @@ -186,14 +186,14 @@ Methods MonthEnd -------- .. autosummary:: - :toctree: generated/ + :toctree: api/ MonthEnd Properties ~~~~~~~~~~ .. autosummary:: - :toctree: generated/ + :toctree: api/ MonthEnd.freqstr MonthEnd.kwds @@ -205,7 +205,7 @@ Properties Methods ~~~~~~~ .. autosummary:: - :toctree: generated/ + :toctree: api/ MonthEnd.apply MonthEnd.apply_index @@ -216,14 +216,14 @@ Methods MonthBegin ---------- .. autosummary:: - :toctree: generated/ + :toctree: api/ MonthBegin Properties ~~~~~~~~~~ .. autosummary:: - :toctree: generated/ + :toctree: api/ MonthBegin.freqstr MonthBegin.kwds @@ -235,7 +235,7 @@ Properties Methods ~~~~~~~ .. autosummary:: - :toctree: generated/ + :toctree: api/ MonthBegin.apply MonthBegin.apply_index @@ -246,14 +246,14 @@ Methods BusinessMonthEnd ---------------- .. autosummary:: - :toctree: generated/ + :toctree: api/ BusinessMonthEnd Properties ~~~~~~~~~~ .. autosummary:: - :toctree: generated/ + :toctree: api/ BusinessMonthEnd.freqstr BusinessMonthEnd.kwds @@ -265,7 +265,7 @@ Properties Methods ~~~~~~~ .. autosummary:: - :toctree: generated/ + :toctree: api/ BusinessMonthEnd.apply BusinessMonthEnd.apply_index @@ -276,14 +276,14 @@ Methods BusinessMonthBegin ------------------ .. autosummary:: - :toctree: generated/ + :toctree: api/ BusinessMonthBegin Properties ~~~~~~~~~~ .. autosummary:: - :toctree: generated/ + :toctree: api/ BusinessMonthBegin.freqstr BusinessMonthBegin.kwds @@ -295,7 +295,7 @@ Properties Methods ~~~~~~~ .. autosummary:: - :toctree: generated/ + :toctree: api/ BusinessMonthBegin.apply BusinessMonthBegin.apply_index @@ -306,14 +306,14 @@ Methods CustomBusinessMonthEnd ---------------------- .. autosummary:: - :toctree: generated/ + :toctree: api/ CustomBusinessMonthEnd Properties ~~~~~~~~~~ .. autosummary:: - :toctree: generated/ + :toctree: api/ CustomBusinessMonthEnd.freqstr CustomBusinessMonthEnd.kwds @@ -326,7 +326,7 @@ Properties Methods ~~~~~~~ .. autosummary:: - :toctree: generated/ + :toctree: api/ CustomBusinessMonthEnd.apply CustomBusinessMonthEnd.copy @@ -336,14 +336,14 @@ Methods CustomBusinessMonthBegin ------------------------ .. autosummary:: - :toctree: generated/ + :toctree: api/ CustomBusinessMonthBegin Properties ~~~~~~~~~~ .. autosummary:: - :toctree: generated/ + :toctree: api/ CustomBusinessMonthBegin.freqstr CustomBusinessMonthBegin.kwds @@ -356,7 +356,7 @@ Properties Methods ~~~~~~~ .. autosummary:: - :toctree: generated/ + :toctree: api/ CustomBusinessMonthBegin.apply CustomBusinessMonthBegin.copy @@ -366,14 +366,14 @@ Methods SemiMonthOffset --------------- .. autosummary:: - :toctree: generated/ + :toctree: api/ SemiMonthOffset Properties ~~~~~~~~~~ .. autosummary:: - :toctree: generated/ + :toctree: api/ SemiMonthOffset.freqstr SemiMonthOffset.kwds @@ -385,7 +385,7 @@ Properties Methods ~~~~~~~ .. autosummary:: - :toctree: generated/ + :toctree: api/ SemiMonthOffset.apply SemiMonthOffset.apply_index @@ -396,14 +396,14 @@ Methods SemiMonthEnd ------------ .. autosummary:: - :toctree: generated/ + :toctree: api/ SemiMonthEnd Properties ~~~~~~~~~~ .. autosummary:: - :toctree: generated/ + :toctree: api/ SemiMonthEnd.freqstr SemiMonthEnd.kwds @@ -415,7 +415,7 @@ Properties Methods ~~~~~~~ .. autosummary:: - :toctree: generated/ + :toctree: api/ SemiMonthEnd.apply SemiMonthEnd.apply_index @@ -426,14 +426,14 @@ Methods SemiMonthBegin -------------- .. autosummary:: - :toctree: generated/ + :toctree: api/ SemiMonthBegin Properties ~~~~~~~~~~ .. autosummary:: - :toctree: generated/ + :toctree: api/ SemiMonthBegin.freqstr SemiMonthBegin.kwds @@ -445,7 +445,7 @@ Properties Methods ~~~~~~~ .. autosummary:: - :toctree: generated/ + :toctree: api/ SemiMonthBegin.apply SemiMonthBegin.apply_index @@ -456,14 +456,14 @@ Methods Week ---- .. autosummary:: - :toctree: generated/ + :toctree: api/ Week Properties ~~~~~~~~~~ .. autosummary:: - :toctree: generated/ + :toctree: api/ Week.freqstr Week.kwds @@ -475,7 +475,7 @@ Properties Methods ~~~~~~~ .. autosummary:: - :toctree: generated/ + :toctree: api/ Week.apply Week.apply_index @@ -486,14 +486,14 @@ Methods WeekOfMonth ----------- .. autosummary:: - :toctree: generated/ + :toctree: api/ WeekOfMonth Properties ~~~~~~~~~~ .. autosummary:: - :toctree: generated/ + :toctree: api/ WeekOfMonth.freqstr WeekOfMonth.kwds @@ -505,7 +505,7 @@ Properties Methods ~~~~~~~ .. autosummary:: - :toctree: generated/ + :toctree: api/ WeekOfMonth.apply WeekOfMonth.copy @@ -515,14 +515,14 @@ Methods LastWeekOfMonth --------------- .. autosummary:: - :toctree: generated/ + :toctree: api/ LastWeekOfMonth Properties ~~~~~~~~~~ .. autosummary:: - :toctree: generated/ + :toctree: api/ LastWeekOfMonth.freqstr LastWeekOfMonth.kwds @@ -534,7 +534,7 @@ Properties Methods ~~~~~~~ .. autosummary:: - :toctree: generated/ + :toctree: api/ LastWeekOfMonth.apply LastWeekOfMonth.copy @@ -544,14 +544,14 @@ Methods QuarterOffset ------------- .. autosummary:: - :toctree: generated/ + :toctree: api/ QuarterOffset Properties ~~~~~~~~~~ .. autosummary:: - :toctree: generated/ + :toctree: api/ QuarterOffset.freqstr QuarterOffset.kwds @@ -563,7 +563,7 @@ Properties Methods ~~~~~~~ .. autosummary:: - :toctree: generated/ + :toctree: api/ QuarterOffset.apply QuarterOffset.apply_index @@ -574,14 +574,14 @@ Methods BQuarterEnd ----------- .. autosummary:: - :toctree: generated/ + :toctree: api/ BQuarterEnd Properties ~~~~~~~~~~ .. autosummary:: - :toctree: generated/ + :toctree: api/ BQuarterEnd.freqstr BQuarterEnd.kwds @@ -593,7 +593,7 @@ Properties Methods ~~~~~~~ .. autosummary:: - :toctree: generated/ + :toctree: api/ BQuarterEnd.apply BQuarterEnd.apply_index @@ -604,14 +604,14 @@ Methods BQuarterBegin ------------- .. autosummary:: - :toctree: generated/ + :toctree: api/ BQuarterBegin Properties ~~~~~~~~~~ .. autosummary:: - :toctree: generated/ + :toctree: api/ BQuarterBegin.freqstr BQuarterBegin.kwds @@ -623,7 +623,7 @@ Properties Methods ~~~~~~~ .. autosummary:: - :toctree: generated/ + :toctree: api/ BQuarterBegin.apply BQuarterBegin.apply_index @@ -634,14 +634,14 @@ Methods QuarterEnd ---------- .. autosummary:: - :toctree: generated/ + :toctree: api/ QuarterEnd Properties ~~~~~~~~~~ .. autosummary:: - :toctree: generated/ + :toctree: api/ QuarterEnd.freqstr QuarterEnd.kwds @@ -653,7 +653,7 @@ Properties Methods ~~~~~~~ .. autosummary:: - :toctree: generated/ + :toctree: api/ QuarterEnd.apply QuarterEnd.apply_index @@ -664,14 +664,14 @@ Methods QuarterBegin ------------ .. autosummary:: - :toctree: generated/ + :toctree: api/ QuarterBegin Properties ~~~~~~~~~~ .. autosummary:: - :toctree: generated/ + :toctree: api/ QuarterBegin.freqstr QuarterBegin.kwds @@ -683,7 +683,7 @@ Properties Methods ~~~~~~~ .. autosummary:: - :toctree: generated/ + :toctree: api/ QuarterBegin.apply QuarterBegin.apply_index @@ -694,14 +694,14 @@ Methods YearOffset ---------- .. autosummary:: - :toctree: generated/ + :toctree: api/ YearOffset Properties ~~~~~~~~~~ .. autosummary:: - :toctree: generated/ + :toctree: api/ YearOffset.freqstr YearOffset.kwds @@ -713,7 +713,7 @@ Properties Methods ~~~~~~~ .. autosummary:: - :toctree: generated/ + :toctree: api/ YearOffset.apply YearOffset.apply_index @@ -724,14 +724,14 @@ Methods BYearEnd -------- .. autosummary:: - :toctree: generated/ + :toctree: api/ BYearEnd Properties ~~~~~~~~~~ .. autosummary:: - :toctree: generated/ + :toctree: api/ BYearEnd.freqstr BYearEnd.kwds @@ -743,7 +743,7 @@ Properties Methods ~~~~~~~ .. autosummary:: - :toctree: generated/ + :toctree: api/ BYearEnd.apply BYearEnd.apply_index @@ -754,14 +754,14 @@ Methods BYearBegin ---------- .. autosummary:: - :toctree: generated/ + :toctree: api/ BYearBegin Properties ~~~~~~~~~~ .. autosummary:: - :toctree: generated/ + :toctree: api/ BYearBegin.freqstr BYearBegin.kwds @@ -773,7 +773,7 @@ Properties Methods ~~~~~~~ .. autosummary:: - :toctree: generated/ + :toctree: api/ BYearBegin.apply BYearBegin.apply_index @@ -784,14 +784,14 @@ Methods YearEnd ------- .. autosummary:: - :toctree: generated/ + :toctree: api/ YearEnd Properties ~~~~~~~~~~ .. autosummary:: - :toctree: generated/ + :toctree: api/ YearEnd.freqstr YearEnd.kwds @@ -803,7 +803,7 @@ Properties Methods ~~~~~~~ .. autosummary:: - :toctree: generated/ + :toctree: api/ YearEnd.apply YearEnd.apply_index @@ -814,14 +814,14 @@ Methods YearBegin --------- .. autosummary:: - :toctree: generated/ + :toctree: api/ YearBegin Properties ~~~~~~~~~~ .. autosummary:: - :toctree: generated/ + :toctree: api/ YearBegin.freqstr YearBegin.kwds @@ -833,7 +833,7 @@ Properties Methods ~~~~~~~ .. autosummary:: - :toctree: generated/ + :toctree: api/ YearBegin.apply YearBegin.apply_index @@ -844,14 +844,14 @@ Methods FY5253 ------ .. autosummary:: - :toctree: generated/ + :toctree: api/ FY5253 Properties ~~~~~~~~~~ .. autosummary:: - :toctree: generated/ + :toctree: api/ FY5253.freqstr FY5253.kwds @@ -863,7 +863,7 @@ Properties Methods ~~~~~~~ .. autosummary:: - :toctree: generated/ + :toctree: api/ FY5253.apply FY5253.copy @@ -875,14 +875,14 @@ Methods FY5253Quarter ------------- .. autosummary:: - :toctree: generated/ + :toctree: api/ FY5253Quarter Properties ~~~~~~~~~~ .. autosummary:: - :toctree: generated/ + :toctree: api/ FY5253Quarter.freqstr FY5253Quarter.kwds @@ -894,7 +894,7 @@ Properties Methods ~~~~~~~ .. autosummary:: - :toctree: generated/ + :toctree: api/ FY5253Quarter.apply FY5253Quarter.copy @@ -906,14 +906,14 @@ Methods Easter ------ .. autosummary:: - :toctree: generated/ + :toctree: api/ Easter Properties ~~~~~~~~~~ .. autosummary:: - :toctree: generated/ + :toctree: api/ Easter.freqstr Easter.kwds @@ -925,7 +925,7 @@ Properties Methods ~~~~~~~ .. autosummary:: - :toctree: generated/ + :toctree: api/ Easter.apply Easter.copy @@ -935,14 +935,14 @@ Methods Tick ---- .. autosummary:: - :toctree: generated/ + :toctree: api/ Tick Properties ~~~~~~~~~~ .. autosummary:: - :toctree: generated/ + :toctree: api/ Tick.delta Tick.freqstr @@ -955,7 +955,7 @@ Properties Methods ~~~~~~~ .. autosummary:: - :toctree: generated/ + :toctree: api/ Tick.copy Tick.isAnchored @@ -964,14 +964,14 @@ Methods Day --- .. autosummary:: - :toctree: generated/ + :toctree: api/ Day Properties ~~~~~~~~~~ .. autosummary:: - :toctree: generated/ + :toctree: api/ Day.delta Day.freqstr @@ -984,7 +984,7 @@ Properties Methods ~~~~~~~ .. autosummary:: - :toctree: generated/ + :toctree: api/ Day.copy Day.isAnchored @@ -993,14 +993,14 @@ Methods Hour ---- .. autosummary:: - :toctree: generated/ + :toctree: api/ Hour Properties ~~~~~~~~~~ .. autosummary:: - :toctree: generated/ + :toctree: api/ Hour.delta Hour.freqstr @@ -1013,7 +1013,7 @@ Properties Methods ~~~~~~~ .. autosummary:: - :toctree: generated/ + :toctree: api/ Hour.copy Hour.isAnchored @@ -1022,14 +1022,14 @@ Methods Minute ------ .. autosummary:: - :toctree: generated/ + :toctree: api/ Minute Properties ~~~~~~~~~~ .. autosummary:: - :toctree: generated/ + :toctree: api/ Minute.delta Minute.freqstr @@ -1042,7 +1042,7 @@ Properties Methods ~~~~~~~ .. autosummary:: - :toctree: generated/ + :toctree: api/ Minute.copy Minute.isAnchored @@ -1051,14 +1051,14 @@ Methods Second ------ .. autosummary:: - :toctree: generated/ + :toctree: api/ Second Properties ~~~~~~~~~~ .. autosummary:: - :toctree: generated/ + :toctree: api/ Second.delta Second.freqstr @@ -1071,7 +1071,7 @@ Properties Methods ~~~~~~~ .. autosummary:: - :toctree: generated/ + :toctree: api/ Second.copy Second.isAnchored @@ -1080,14 +1080,14 @@ Methods Milli ----- .. autosummary:: - :toctree: generated/ + :toctree: api/ Milli Properties ~~~~~~~~~~ .. autosummary:: - :toctree: generated/ + :toctree: api/ Milli.delta Milli.freqstr @@ -1100,7 +1100,7 @@ Properties Methods ~~~~~~~ .. autosummary:: - :toctree: generated/ + :toctree: api/ Milli.copy Milli.isAnchored @@ -1109,14 +1109,14 @@ Methods Micro ----- .. autosummary:: - :toctree: generated/ + :toctree: api/ Micro Properties ~~~~~~~~~~ .. autosummary:: - :toctree: generated/ + :toctree: api/ Micro.delta Micro.freqstr @@ -1129,7 +1129,7 @@ Properties Methods ~~~~~~~ .. autosummary:: - :toctree: generated/ + :toctree: api/ Micro.copy Micro.isAnchored @@ -1138,14 +1138,14 @@ Methods Nano ---- .. autosummary:: - :toctree: generated/ + :toctree: api/ Nano Properties ~~~~~~~~~~ .. autosummary:: - :toctree: generated/ + :toctree: api/ Nano.delta Nano.freqstr @@ -1158,7 +1158,7 @@ Properties Methods ~~~~~~~ .. autosummary:: - :toctree: generated/ + :toctree: api/ Nano.copy Nano.isAnchored @@ -1167,14 +1167,14 @@ Methods BDay ---- .. autosummary:: - :toctree: generated/ + :toctree: api/ BDay Properties ~~~~~~~~~~ .. autosummary:: - :toctree: generated/ + :toctree: api/ BDay.base BDay.freqstr @@ -1188,7 +1188,7 @@ Properties Methods ~~~~~~~ .. autosummary:: - :toctree: generated/ + :toctree: api/ BDay.apply BDay.apply_index @@ -1201,14 +1201,14 @@ Methods BMonthEnd --------- .. autosummary:: - :toctree: generated/ + :toctree: api/ BMonthEnd Properties ~~~~~~~~~~ .. autosummary:: - :toctree: generated/ + :toctree: api/ BMonthEnd.base BMonthEnd.freqstr @@ -1221,7 +1221,7 @@ Properties Methods ~~~~~~~ .. autosummary:: - :toctree: generated/ + :toctree: api/ BMonthEnd.apply BMonthEnd.apply_index @@ -1234,14 +1234,14 @@ Methods BMonthBegin ----------- .. autosummary:: - :toctree: generated/ + :toctree: api/ BMonthBegin Properties ~~~~~~~~~~ .. autosummary:: - :toctree: generated/ + :toctree: api/ BMonthBegin.base BMonthBegin.freqstr @@ -1254,7 +1254,7 @@ Properties Methods ~~~~~~~ .. autosummary:: - :toctree: generated/ + :toctree: api/ BMonthBegin.apply BMonthBegin.apply_index @@ -1267,14 +1267,14 @@ Methods CBMonthEnd ---------- .. autosummary:: - :toctree: generated/ + :toctree: api/ CBMonthEnd Properties ~~~~~~~~~~ .. autosummary:: - :toctree: generated/ + :toctree: api/ CBMonthEnd.base CBMonthEnd.cbday_roll @@ -1291,7 +1291,7 @@ Properties Methods ~~~~~~~ .. autosummary:: - :toctree: generated/ + :toctree: api/ CBMonthEnd.apply CBMonthEnd.apply_index @@ -1304,14 +1304,14 @@ Methods CBMonthBegin ------------ .. autosummary:: - :toctree: generated/ + :toctree: api/ CBMonthBegin Properties ~~~~~~~~~~ .. autosummary:: - :toctree: generated/ + :toctree: api/ CBMonthBegin.base CBMonthBegin.cbday_roll @@ -1328,7 +1328,7 @@ Properties Methods ~~~~~~~ .. autosummary:: - :toctree: generated/ + :toctree: api/ CBMonthBegin.apply CBMonthBegin.apply_index @@ -1341,14 +1341,14 @@ Methods CDay ---- .. autosummary:: - :toctree: generated/ + :toctree: api/ CDay Properties ~~~~~~~~~~ .. autosummary:: - :toctree: generated/ + :toctree: api/ CDay.base CDay.freqstr @@ -1362,7 +1362,7 @@ Properties Methods ~~~~~~~ .. autosummary:: - :toctree: generated/ + :toctree: api/ CDay.apply CDay.apply_index @@ -1382,6 +1382,6 @@ Frequencies .. _api.offsets: .. autosummary:: - :toctree: generated/ + :toctree: api/ to_offset diff --git a/doc/source/api/panel.rst b/doc/source/reference/panel.rst similarity index 90% rename from doc/source/api/panel.rst rename to doc/source/reference/panel.rst index 4edcd22d2685d..39c8ba0828859 100644 --- a/doc/source/api/panel.rst +++ b/doc/source/reference/panel.rst @@ -10,7 +10,7 @@ Panel Constructor ~~~~~~~~~~~ .. autosummary:: - :toctree: generated/ + :toctree: api/ Panel @@ -23,7 +23,7 @@ Properties and underlying data * **minor_axis**: axis 2; the columns of each of the DataFrames .. autosummary:: - :toctree: generated/ + :toctree: api/ Panel.values Panel.axes @@ -38,7 +38,7 @@ Properties and underlying data Conversion ~~~~~~~~~~ .. autosummary:: - :toctree: generated/ + :toctree: api/ Panel.astype Panel.copy @@ -48,7 +48,7 @@ Conversion Getting and setting ~~~~~~~~~~~~~~~~~~~ .. autosummary:: - :toctree: generated/ + :toctree: api/ Panel.get_value Panel.set_value @@ -56,7 +56,7 @@ Getting and setting Indexing, iteration, slicing ~~~~~~~~~~~~~~~~~~~~~~~~~~~~ .. autosummary:: - :toctree: generated/ + :toctree: api/ Panel.at Panel.iat @@ -75,7 +75,7 @@ For more information on ``.at``, ``.iat``, ``.loc``, and Binary operator functions ~~~~~~~~~~~~~~~~~~~~~~~~~ .. autosummary:: - :toctree: generated/ + :toctree: api/ Panel.add Panel.sub @@ -103,7 +103,7 @@ Binary operator functions Function application, GroupBy ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ .. autosummary:: - :toctree: generated/ + :toctree: api/ Panel.apply Panel.groupby @@ -113,7 +113,7 @@ Function application, GroupBy Computations / Descriptive Stats ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ .. autosummary:: - :toctree: generated/ + :toctree: api/ Panel.abs Panel.clip @@ -139,7 +139,7 @@ Computations / Descriptive Stats Reindexing / Selection / Label manipulation ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ .. autosummary:: - :toctree: generated/ + :toctree: api/ Panel.add_prefix Panel.add_suffix @@ -160,14 +160,14 @@ Reindexing / Selection / Label manipulation Missing data handling ~~~~~~~~~~~~~~~~~~~~~ .. autosummary:: - :toctree: generated/ + :toctree: api/ Panel.dropna Reshaping, sorting, transposing ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ .. autosummary:: - :toctree: generated/ + :toctree: api/ Panel.sort_index Panel.swaplevel @@ -178,7 +178,7 @@ Reshaping, sorting, transposing Combining / joining / merging ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ .. autosummary:: - :toctree: generated/ + :toctree: api/ Panel.join Panel.update @@ -186,7 +186,7 @@ Combining / joining / merging Time series-related ~~~~~~~~~~~~~~~~~~~ .. autosummary:: - :toctree: generated/ + :toctree: api/ Panel.asfreq Panel.shift @@ -197,7 +197,7 @@ Time series-related Serialization / IO / Conversion ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ .. autosummary:: - :toctree: generated/ + :toctree: api/ Panel.from_dict Panel.to_pickle diff --git a/doc/source/api/plotting.rst b/doc/source/reference/plotting.rst similarity index 93% rename from doc/source/api/plotting.rst rename to doc/source/reference/plotting.rst index c4e6333ebda37..7615e1d20f5e2 100644 --- a/doc/source/api/plotting.rst +++ b/doc/source/reference/plotting.rst @@ -10,7 +10,7 @@ Plotting The following functions are contained in the `pandas.plotting` module. .. autosummary:: - :toctree: generated/ + :toctree: api/ andrews_curves bootstrap_plot diff --git a/doc/source/api/resampling.rst b/doc/source/reference/resampling.rst similarity index 91% rename from doc/source/api/resampling.rst rename to doc/source/reference/resampling.rst index f5c6ccce3cdd7..2a52defa3c68f 100644 --- a/doc/source/api/resampling.rst +++ b/doc/source/reference/resampling.rst @@ -12,7 +12,7 @@ Resampler objects are returned by resample calls: :func:`pandas.DataFrame.resamp Indexing, iteration ~~~~~~~~~~~~~~~~~~~ .. autosummary:: - :toctree: generated/ + :toctree: api/ Resampler.__iter__ Resampler.groups @@ -22,7 +22,7 @@ Indexing, iteration Function application ~~~~~~~~~~~~~~~~~~~~ .. autosummary:: - :toctree: generated/ + :toctree: api/ Resampler.apply Resampler.aggregate @@ -32,7 +32,7 @@ Function application Upsampling ~~~~~~~~~~ .. autosummary:: - :toctree: generated/ + :toctree: api/ Resampler.ffill Resampler.backfill @@ -46,7 +46,7 @@ Upsampling Computations / Descriptive Stats ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ .. autosummary:: - :toctree: generated/ + :toctree: api/ Resampler.count Resampler.nunique diff --git a/doc/source/api/series.rst b/doc/source/reference/series.rst similarity index 93% rename from doc/source/api/series.rst rename to doc/source/reference/series.rst index aa43c8b643d44..a6ac40b5203bf 100644 --- a/doc/source/api/series.rst +++ b/doc/source/reference/series.rst @@ -10,7 +10,7 @@ Series Constructor ----------- .. autosummary:: - :toctree: generated/ + :toctree: api/ Series @@ -19,12 +19,12 @@ Attributes **Axes** .. autosummary:: - :toctree: generated/ + :toctree: api/ Series.index .. autosummary:: - :toctree: generated/ + :toctree: api/ Series.array Series.values @@ -52,7 +52,7 @@ Attributes Conversion ---------- .. autosummary:: - :toctree: generated/ + :toctree: api/ Series.astype Series.infer_objects @@ -69,7 +69,7 @@ Conversion Indexing, iteration ------------------- .. autosummary:: - :toctree: generated/ + :toctree: api/ Series.get Series.at @@ -90,7 +90,7 @@ For more information on ``.at``, ``.iat``, ``.loc``, and Binary operator functions ------------------------- .. autosummary:: - :toctree: generated/ + :toctree: api/ Series.add Series.sub @@ -123,7 +123,7 @@ Binary operator functions Function application, GroupBy & Window -------------------------------------- .. autosummary:: - :toctree: generated/ + :toctree: api/ Series.apply Series.agg @@ -141,7 +141,7 @@ Function application, GroupBy & Window Computations / Descriptive Stats -------------------------------- .. autosummary:: - :toctree: generated/ + :toctree: api/ Series.abs Series.all @@ -192,7 +192,7 @@ Computations / Descriptive Stats Reindexing / Selection / Label manipulation ------------------------------------------- .. autosummary:: - :toctree: generated/ + :toctree: api/ Series.align Series.drop @@ -226,7 +226,7 @@ Reindexing / Selection / Label manipulation Missing data handling --------------------- .. autosummary:: - :toctree: generated/ + :toctree: api/ Series.isna Series.notna @@ -237,7 +237,7 @@ Missing data handling Reshaping, sorting ------------------ .. autosummary:: - :toctree: generated/ + :toctree: api/ Series.argsort Series.argmin @@ -256,7 +256,7 @@ Reshaping, sorting Combining / joining / merging ----------------------------- .. autosummary:: - :toctree: generated/ + :toctree: api/ Series.append Series.replace @@ -265,7 +265,7 @@ Combining / joining / merging Time series-related ------------------- .. autosummary:: - :toctree: generated/ + :toctree: api/ Series.asfreq Series.asof @@ -309,7 +309,7 @@ Datetime Properties ^^^^^^^^^^^^^^^^^^^ .. autosummary:: - :toctree: generated/ + :toctree: api/ :template: autosummary/accessor_attribute.rst Series.dt.date @@ -345,7 +345,7 @@ Datetime Methods ^^^^^^^^^^^^^^^^ .. autosummary:: - :toctree: generated/ + :toctree: api/ :template: autosummary/accessor_method.rst Series.dt.to_period @@ -364,7 +364,7 @@ Period Properties ^^^^^^^^^^^^^^^^^ .. autosummary:: - :toctree: generated/ + :toctree: api/ :template: autosummary/accessor_attribute.rst Series.dt.qyear @@ -375,7 +375,7 @@ Timedelta Properties ^^^^^^^^^^^^^^^^^^^^ .. autosummary:: - :toctree: generated/ + :toctree: api/ :template: autosummary/accessor_attribute.rst Series.dt.days @@ -388,7 +388,7 @@ Timedelta Methods ^^^^^^^^^^^^^^^^^ .. autosummary:: - :toctree: generated/ + :toctree: api/ :template: autosummary/accessor_method.rst Series.dt.to_pytimedelta @@ -405,7 +405,7 @@ strings and apply several methods to it. These can be accessed like ``Series.str.``. .. autosummary:: - :toctree: generated/ + :toctree: api/ :template: autosummary/accessor_method.rst Series.str.capitalize @@ -467,7 +467,7 @@ strings and apply several methods to it. These can be accessed like .. .. autosummary:: - :toctree: generated/ + :toctree: api/ :template: autosummary/accessor.rst Series.str @@ -484,7 +484,7 @@ Categorical-dtype specific methods and attributes are available under the ``Series.cat`` accessor. .. autosummary:: - :toctree: generated/ + :toctree: api/ :template: autosummary/accessor_attribute.rst Series.cat.categories @@ -492,7 +492,7 @@ the ``Series.cat`` accessor. Series.cat.codes .. autosummary:: - :toctree: generated/ + :toctree: api/ :template: autosummary/accessor_method.rst Series.cat.rename_categories @@ -514,7 +514,7 @@ Sparse-dtype specific methods and attributes are provided under the ``Series.sparse`` accessor. .. autosummary:: - :toctree: generated/ + :toctree: api/ :template: autosummary/accessor_attribute.rst Series.sparse.npoints @@ -523,7 +523,7 @@ Sparse-dtype specific methods and attributes are provided under the Series.sparse.sp_values .. autosummary:: - :toctree: generated/ + :toctree: api/ Series.sparse.from_coo Series.sparse.to_coo @@ -535,13 +535,13 @@ Plotting specific plotting methods of the form ``Series.plot.``. .. autosummary:: - :toctree: generated/ + :toctree: api/ :template: autosummary/accessor_callable.rst Series.plot .. autosummary:: - :toctree: generated/ + :toctree: api/ :template: autosummary/accessor_method.rst Series.plot.area @@ -555,14 +555,14 @@ specific plotting methods of the form ``Series.plot.``. Series.plot.pie .. autosummary:: - :toctree: generated/ + :toctree: api/ Series.hist Serialization / IO / Conversion ------------------------------- .. autosummary:: - :toctree: generated/ + :toctree: api/ Series.to_pickle Series.to_csv @@ -585,7 +585,7 @@ Sparse ------ .. autosummary:: - :toctree: generated/ + :toctree: api/ SparseSeries.to_coo SparseSeries.from_coo diff --git a/doc/source/api/style.rst b/doc/source/reference/style.rst similarity index 88% rename from doc/source/api/style.rst rename to doc/source/reference/style.rst index 70913bbec410d..bd9635b41e343 100644 --- a/doc/source/api/style.rst +++ b/doc/source/reference/style.rst @@ -12,7 +12,7 @@ Style Styler Constructor ------------------ .. autosummary:: - :toctree: generated/ + :toctree: api/ Styler Styler.from_custom_template @@ -20,7 +20,7 @@ Styler Constructor Styler Properties ----------------- .. autosummary:: - :toctree: generated/ + :toctree: api/ Styler.env Styler.template @@ -29,7 +29,7 @@ Styler Properties Style Application ----------------- .. autosummary:: - :toctree: generated/ + :toctree: api/ Styler.apply Styler.applymap @@ -47,7 +47,7 @@ Style Application Builtin Styles -------------- .. autosummary:: - :toctree: generated/ + :toctree: api/ Styler.highlight_max Styler.highlight_min @@ -58,7 +58,7 @@ Builtin Styles Style Export and Import ----------------------- .. autosummary:: - :toctree: generated/ + :toctree: api/ Styler.render Styler.export diff --git a/doc/source/api/window.rst b/doc/source/reference/window.rst similarity index 95% rename from doc/source/api/window.rst rename to doc/source/reference/window.rst index 3245f5f831688..9e1374a3bd8e4 100644 --- a/doc/source/api/window.rst +++ b/doc/source/reference/window.rst @@ -14,7 +14,7 @@ EWM objects are returned by ``.ewm`` calls: :func:`pandas.DataFrame.ewm`, :func: Standard moving window functions -------------------------------- .. autosummary:: - :toctree: generated/ + :toctree: api/ Rolling.count Rolling.sum @@ -39,7 +39,7 @@ Standard moving window functions Standard expanding window functions ----------------------------------- .. autosummary:: - :toctree: generated/ + :toctree: api/ Expanding.count Expanding.sum @@ -60,7 +60,7 @@ Standard expanding window functions Exponentially-weighted moving window functions ---------------------------------------------- .. autosummary:: - :toctree: generated/ + :toctree: api/ EWM.mean EWM.std diff --git a/doc/source/advanced.rst b/doc/source/user_guide/advanced.rst similarity index 100% rename from doc/source/advanced.rst rename to doc/source/user_guide/advanced.rst diff --git a/doc/source/categorical.rst b/doc/source/user_guide/categorical.rst similarity index 100% rename from doc/source/categorical.rst rename to doc/source/user_guide/categorical.rst diff --git a/doc/source/computation.rst b/doc/source/user_guide/computation.rst similarity index 100% rename from doc/source/computation.rst rename to doc/source/user_guide/computation.rst diff --git a/doc/source/cookbook.rst b/doc/source/user_guide/cookbook.rst similarity index 100% rename from doc/source/cookbook.rst rename to doc/source/user_guide/cookbook.rst diff --git a/doc/source/enhancingperf.rst b/doc/source/user_guide/enhancingperf.rst similarity index 99% rename from doc/source/enhancingperf.rst rename to doc/source/user_guide/enhancingperf.rst index 0e3d389aa4f6e..9941ffcc9de4d 100644 --- a/doc/source/enhancingperf.rst +++ b/doc/source/user_guide/enhancingperf.rst @@ -783,7 +783,7 @@ significant performance benefit. Here is a plot showing the running time of computation. The two lines are two different engines. -.. image:: _static/eval-perf.png +.. image:: ../_static/eval-perf.png .. note:: @@ -791,7 +791,7 @@ computation. The two lines are two different engines. Operations with smallish objects (around 15k-20k rows) are faster using plain Python: - .. image:: _static/eval-perf-small.png + .. image:: ../_static/eval-perf-small.png This plot was created using a ``DataFrame`` with 3 columns each containing diff --git a/doc/source/gotchas.rst b/doc/source/user_guide/gotchas.rst similarity index 100% rename from doc/source/gotchas.rst rename to doc/source/user_guide/gotchas.rst diff --git a/doc/source/groupby.rst b/doc/source/user_guide/groupby.rst similarity index 99% rename from doc/source/groupby.rst rename to doc/source/user_guide/groupby.rst index 953f40d1afebe..2c2e5c5425216 100644 --- a/doc/source/groupby.rst +++ b/doc/source/user_guide/groupby.rst @@ -15,7 +15,7 @@ steps: Out of these, the split step is the most straightforward. In fact, in many situations we may wish to split the data set into groups and do something with -those groups. In the apply step, we might wish to one of the +those groups. In the apply step, we might wish to do one of the following: * **Aggregation**: compute a summary statistic (or statistics) for each diff --git a/doc/source/user_guide/index.rst b/doc/source/user_guide/index.rst new file mode 100644 index 0000000000000..d39cf7103ab63 --- /dev/null +++ b/doc/source/user_guide/index.rst @@ -0,0 +1,40 @@ +{{ header }} + +.. _user_guide: + +========== +User Guide +========== + +The User Guide covers all of pandas by topic area. Each of the subsections +introduces a topic (such as "working with missing data"), and discusses how +pandas approaches the problem, with many examples throughout. + +Users brand-new to pandas should start with :ref:`10min`. + +Further information on any specific method can be obtained in the +:ref:`api`. + +.. toctree:: + :maxdepth: 2 + + io + indexing + advanced + merging + reshaping + text + missing_data + categorical + integer_na + visualization + computation + groupby + timeseries + timedeltas + style + options + enhancingperf + sparse + gotchas + cookbook diff --git a/doc/source/indexing.rst b/doc/source/user_guide/indexing.rst similarity index 99% rename from doc/source/indexing.rst rename to doc/source/user_guide/indexing.rst index 3fe416c48f670..be1745e2664a1 100644 --- a/doc/source/indexing.rst +++ b/doc/source/user_guide/indexing.rst @@ -1392,7 +1392,7 @@ Performance of :meth:`~pandas.DataFrame.query` ``DataFrame.query()`` using ``numexpr`` is slightly faster than Python for large frames. -.. image:: _static/query-perf.png +.. image:: ../_static/query-perf.png .. note:: @@ -1400,7 +1400,7 @@ large frames. with ``DataFrame.query()`` if your frame has more than approximately 200,000 rows. - .. image:: _static/query-perf-small.png + .. image:: ../_static/query-perf-small.png This plot was created using a ``DataFrame`` with 3 columns each containing floating point values generated using ``numpy.random.randn()``. diff --git a/doc/source/integer_na.rst b/doc/source/user_guide/integer_na.rst similarity index 95% rename from doc/source/integer_na.rst rename to doc/source/user_guide/integer_na.rst index eb0c5e3d05863..c5667e9319ca6 100644 --- a/doc/source/integer_na.rst +++ b/doc/source/user_guide/integer_na.rst @@ -10,6 +10,12 @@ Nullable Integer Data Type .. versionadded:: 0.24.0 +.. note:: + + IntegerArray is currently experimental. Its API or implementation may + change without warning. + + In :ref:`missing_data`, we saw that pandas primarily uses ``NaN`` to represent missing data. Because ``NaN`` is a float, this forces an array of integers with any missing values to become floating point. In some cases, this may not matter diff --git a/doc/source/io.rst b/doc/source/user_guide/io.rst similarity index 99% rename from doc/source/io.rst rename to doc/source/user_guide/io.rst index dd1cde0bdff73..b23a0f10e9e2b 100644 --- a/doc/source/io.rst +++ b/doc/source/user_guide/io.rst @@ -989,6 +989,36 @@ a single date rather than the entire array. os.remove('tmp.csv') + +.. _io.csv.mixed_timezones: + +Parsing a CSV with mixed Timezones +++++++++++++++++++++++++++++++++++ + +Pandas cannot natively represent a column or index with mixed timezones. If your CSV +file contains columns with a mixture of timezones, the default result will be +an object-dtype column with strings, even with ``parse_dates``. + + +.. ipython:: python + + content = """\ + a + 2000-01-01T00:00:00+05:00 + 2000-01-01T00:00:00+06:00""" + df = pd.read_csv(StringIO(content), parse_dates=['a']) + df['a'] + +To parse the mixed-timezone values as a datetime column, pass a partially-applied +:func:`to_datetime` with ``utc=True`` as the ``date_parser``. + +.. ipython:: python + + df = pd.read_csv(StringIO(content), parse_dates=['a'], + date_parser=lambda col: pd.to_datetime(col, utc=True)) + df['a'] + + .. _io.dayfirst: @@ -2549,7 +2579,7 @@ in the method ``to_string`` described above. HTML: .. raw:: html - :file: _static/basic.html + :file: ../_static/basic.html The ``columns`` argument will limit the columns shown: @@ -2565,7 +2595,7 @@ The ``columns`` argument will limit the columns shown: HTML: .. raw:: html - :file: _static/columns.html + :file: ../_static/columns.html ``float_format`` takes a Python callable to control the precision of floating point values: @@ -2582,7 +2612,7 @@ point values: HTML: .. raw:: html - :file: _static/float_format.html + :file: ../_static/float_format.html ``bold_rows`` will make the row labels bold by default, but you can turn that off: @@ -2597,7 +2627,7 @@ off: write_html(df, 'nobold', bold_rows=False) .. raw:: html - :file: _static/nobold.html + :file: ../_static/nobold.html The ``classes`` argument provides the ability to give the resulting HTML table CSS classes. Note that these classes are *appended* to the existing @@ -2627,7 +2657,7 @@ that contain URLs. HTML: .. raw:: html - :file: _static/render_links.html + :file: ../_static/render_links.html Finally, the ``escape`` argument allows you to control whether the "<", ">" and "&" characters escaped in the resulting HTML (by default it is @@ -2651,7 +2681,7 @@ Escaped: print(df.to_html()) .. raw:: html - :file: _static/escape.html + :file: ../_static/escape.html Not escaped: @@ -2660,7 +2690,7 @@ Not escaped: print(df.to_html(escape=False)) .. raw:: html - :file: _static/noescape.html + :file: ../_static/noescape.html .. note:: @@ -4850,7 +4880,7 @@ See also some :ref:`cookbook examples ` for some advanced strategi The key functions are: .. autosummary:: - :toctree: generated/ + :toctree: ../reference/api/ read_sql_table read_sql_query diff --git a/doc/source/merging.rst b/doc/source/user_guide/merging.rst similarity index 100% rename from doc/source/merging.rst rename to doc/source/user_guide/merging.rst diff --git a/doc/source/missing_data.rst b/doc/source/user_guide/missing_data.rst similarity index 100% rename from doc/source/missing_data.rst rename to doc/source/user_guide/missing_data.rst diff --git a/doc/source/options.rst b/doc/source/user_guide/options.rst similarity index 99% rename from doc/source/options.rst rename to doc/source/user_guide/options.rst index e91be3e6ae730..d640d8b1153c5 100644 --- a/doc/source/options.rst +++ b/doc/source/user_guide/options.rst @@ -487,7 +487,7 @@ If a DataFrame or Series contains these characters, the default output mode may df = pd.DataFrame({u'国籍': ['UK', u'日本'], u'名前': ['Alice', u'しのぶ']}) df -.. image:: _static/option_unicode01.png +.. image:: ../_static/option_unicode01.png Enabling ``display.unicode.east_asian_width`` allows pandas to check each character's "East Asian Width" property. These characters can be aligned properly by setting this option to ``True``. However, this will result in longer render @@ -498,7 +498,7 @@ times than the standard ``len`` function. pd.set_option('display.unicode.east_asian_width', True) df -.. image:: _static/option_unicode02.png +.. image:: ../_static/option_unicode02.png In addition, Unicode characters whose width is "Ambiguous" can either be 1 or 2 characters wide depending on the terminal setting or encoding. The option ``display.unicode.ambiguous_as_wide`` can be used to handle the ambiguity. @@ -510,7 +510,7 @@ By default, an "Ambiguous" character's width, such as "¡" (inverted exclamation df = pd.DataFrame({'a': ['xxx', u'¡¡'], 'b': ['yyy', u'¡¡']}) df -.. image:: _static/option_unicode03.png +.. image:: ../_static/option_unicode03.png Enabling ``display.unicode.ambiguous_as_wide`` makes pandas interpret these characters' widths to be 2. (Note that this option will only be effective when ``display.unicode.east_asian_width`` is enabled.) @@ -522,7 +522,7 @@ However, setting this option incorrectly for your terminal will cause these char pd.set_option('display.unicode.ambiguous_as_wide', True) df -.. image:: _static/option_unicode04.png +.. image:: ../_static/option_unicode04.png .. ipython:: python :suppress: diff --git a/doc/source/reshaping.rst b/doc/source/user_guide/reshaping.rst similarity index 98% rename from doc/source/reshaping.rst rename to doc/source/user_guide/reshaping.rst index 9891e22e9d552..5c11be34e6ed4 100644 --- a/doc/source/reshaping.rst +++ b/doc/source/user_guide/reshaping.rst @@ -9,7 +9,7 @@ Reshaping and Pivot Tables Reshaping by pivoting DataFrame objects --------------------------------------- -.. image:: _static/reshaping_pivot.png +.. image:: ../_static/reshaping_pivot.png .. ipython:: python :suppress: @@ -101,7 +101,7 @@ are homogeneously-typed. Reshaping by stacking and unstacking ------------------------------------ -.. image:: _static/reshaping_stack.png +.. image:: ../_static/reshaping_stack.png Closely related to the :meth:`~DataFrame.pivot` method are the related :meth:`~DataFrame.stack` and :meth:`~DataFrame.unstack` methods available on @@ -116,7 +116,7 @@ Closely related to the :meth:`~DataFrame.pivot` method are the related (possibly hierarchical) row index to the column axis, producing a reshaped ``DataFrame`` with a new inner-most level of column labels. -.. image:: _static/reshaping_unstack.png +.. image:: ../_static/reshaping_unstack.png The clearest way to explain is by example. Let's take a prior example data set from the hierarchical indexing section: @@ -158,7 +158,7 @@ unstacks the **last level**: .. _reshaping.unstack_by_name: -.. image:: _static/reshaping_unstack_1.png +.. image:: ../_static/reshaping_unstack_1.png If the indexes have names, you can use the level names instead of specifying the level numbers: @@ -168,7 +168,7 @@ the level numbers: stacked.unstack('second') -.. image:: _static/reshaping_unstack_0.png +.. image:: ../_static/reshaping_unstack_0.png Notice that the ``stack`` and ``unstack`` methods implicitly sort the index levels involved. Hence a call to ``stack`` and then ``unstack``, or vice versa, @@ -279,7 +279,7 @@ the right thing: Reshaping by Melt ----------------- -.. image:: _static/reshaping_melt.png +.. image:: ../_static/reshaping_melt.png The top-level :func:`~pandas.melt` function and the corresponding :meth:`DataFrame.melt` are useful to massage a ``DataFrame`` into a format where one or more columns diff --git a/doc/source/sparse.rst b/doc/source/user_guide/sparse.rst similarity index 100% rename from doc/source/sparse.rst rename to doc/source/user_guide/sparse.rst diff --git a/doc/source/style.ipynb b/doc/source/user_guide/style.ipynb similarity index 99% rename from doc/source/style.ipynb rename to doc/source/user_guide/style.ipynb index 792fe5120f6e8..79a9848704eec 100644 --- a/doc/source/style.ipynb +++ b/doc/source/user_guide/style.ipynb @@ -992,7 +992,7 @@ "source": [ "A screenshot of the output:\n", "\n", - "![Excel spreadsheet with styled DataFrame](_static/style-excel.png)\n" + "![Excel spreadsheet with styled DataFrame](../_static/style-excel.png)\n" ] }, { @@ -1133,7 +1133,7 @@ "metadata": {}, "outputs": [], "source": [ - "with open(\"template_structure.html\") as f:\n", + "with open(\"templates/template_structure.html\") as f:\n", " structure = f.read()\n", " \n", "HTML(structure)" diff --git a/doc/source/templates/myhtml.tpl b/doc/source/user_guide/templates/myhtml.tpl similarity index 100% rename from doc/source/templates/myhtml.tpl rename to doc/source/user_guide/templates/myhtml.tpl diff --git a/doc/source/template_structure.html b/doc/source/user_guide/templates/template_structure.html similarity index 100% rename from doc/source/template_structure.html rename to doc/source/user_guide/templates/template_structure.html diff --git a/doc/source/text.rst b/doc/source/user_guide/text.rst similarity index 100% rename from doc/source/text.rst rename to doc/source/user_guide/text.rst diff --git a/doc/source/timedeltas.rst b/doc/source/user_guide/timedeltas.rst similarity index 100% rename from doc/source/timedeltas.rst rename to doc/source/user_guide/timedeltas.rst diff --git a/doc/source/timeseries.rst b/doc/source/user_guide/timeseries.rst similarity index 92% rename from doc/source/timeseries.rst rename to doc/source/user_guide/timeseries.rst index f56ad710973dd..5841125817d03 100644 --- a/doc/source/timeseries.rst +++ b/doc/source/user_guide/timeseries.rst @@ -2129,11 +2129,13 @@ These can easily be converted to a ``PeriodIndex``: Time Zone Handling ------------------ -Pandas provides rich support for working with timestamps in different time -zones using ``pytz`` and ``dateutil`` libraries. ``dateutil`` currently is only -supported for fixed offset and tzfile zones. The default library is ``pytz``. -Support for ``dateutil`` is provided for compatibility with other -applications e.g. if you use ``dateutil`` in other Python packages. +pandas provides rich support for working with timestamps in different time +zones using the ``pytz`` and ``dateutil`` libraries. + +.. note:: + + pandas does not yet support ``datetime.timezone`` objects from the standard + library. Working with Time Zones ~~~~~~~~~~~~~~~~~~~~~~~ @@ -2145,13 +2147,16 @@ By default, pandas objects are time zone unaware: rng = pd.date_range('3/6/2012 00:00', periods=15, freq='D') rng.tz is None -To supply the time zone, you can use the ``tz`` keyword to ``date_range`` and -other functions. Dateutil time zone strings are distinguished from ``pytz`` -time zones by starting with ``dateutil/``. +To localize these dates to a time zone (assign a particular time zone to a naive date), +you can use the ``tz_localize`` method or the ``tz`` keyword argument in +:func:`date_range`, :class:`Timestamp`, or :class:`DatetimeIndex`. +You can either pass ``pytz`` or ``dateutil`` time zone objects or Olson time zone database strings. +Olson time zone strings will return ``pytz`` time zone objects by default. +To return ``dateutil`` time zone objects, append ``dateutil/`` before the string. * In ``pytz`` you can find a list of common (and less common) time zones using ``from pytz import common_timezones, all_timezones``. -* ``dateutil`` uses the OS timezones so there isn't a fixed list available. For +* ``dateutil`` uses the OS time zones so there isn't a fixed list available. For common zones, the names are the same as ``pytz``. .. ipython:: python @@ -2159,23 +2164,23 @@ time zones by starting with ``dateutil/``. import dateutil # pytz - rng_pytz = pd.date_range('3/6/2012 00:00', periods=10, freq='D', + rng_pytz = pd.date_range('3/6/2012 00:00', periods=3, freq='D', tz='Europe/London') rng_pytz.tz # dateutil - rng_dateutil = pd.date_range('3/6/2012 00:00', periods=10, freq='D', - tz='dateutil/Europe/London') + rng_dateutil = pd.date_range('3/6/2012 00:00', periods=3, freq='D') + rng_dateutil = rng_dateutil.tz_localize('dateutil/Europe/London') rng_dateutil.tz # dateutil - utc special case - rng_utc = pd.date_range('3/6/2012 00:00', periods=10, freq='D', + rng_utc = pd.date_range('3/6/2012 00:00', periods=3, freq='D', tz=dateutil.tz.tzutc()) rng_utc.tz -Note that the ``UTC`` timezone is a special case in ``dateutil`` and should be constructed explicitly -as an instance of ``dateutil.tz.tzutc``. You can also construct other timezones explicitly first, -which gives you more control over which time zone is used: +Note that the ``UTC`` time zone is a special case in ``dateutil`` and should be constructed explicitly +as an instance of ``dateutil.tz.tzutc``. You can also construct other time +zones objects explicitly first. .. ipython:: python @@ -2183,56 +2188,46 @@ which gives you more control over which time zone is used: # pytz tz_pytz = pytz.timezone('Europe/London') - rng_pytz = pd.date_range('3/6/2012 00:00', periods=10, freq='D', - tz=tz_pytz) + rng_pytz = pd.date_range('3/6/2012 00:00', periods=3, freq='D') + rng_pytz = rng_pytz.tz_localize(tz_pytz) rng_pytz.tz == tz_pytz # dateutil tz_dateutil = dateutil.tz.gettz('Europe/London') - rng_dateutil = pd.date_range('3/6/2012 00:00', periods=10, freq='D', + rng_dateutil = pd.date_range('3/6/2012 00:00', periods=3, freq='D', tz=tz_dateutil) rng_dateutil.tz == tz_dateutil -Timestamps, like Python's ``datetime.datetime`` object can be either time zone -naive or time zone aware. Naive time series and ``DatetimeIndex`` objects can be -*localized* using ``tz_localize``: - -.. ipython:: python - - ts = pd.Series(np.random.randn(len(rng)), rng) - - ts_utc = ts.tz_localize('UTC') - ts_utc - -Again, you can explicitly construct the timezone object first. -You can use the ``tz_convert`` method to convert pandas objects to convert -tz-aware data to another time zone: +To convert a time zone aware pandas object from one time zone to another, +you can use the ``tz_convert`` method. .. ipython:: python - ts_utc.tz_convert('US/Eastern') + rng_pytz.tz_convert('US/Eastern') .. warning:: - Be wary of conversions between libraries. For some zones ``pytz`` and ``dateutil`` have different - definitions of the zone. This is more of a problem for unusual timezones than for + Be wary of conversions between libraries. For some time zones, ``pytz`` and ``dateutil`` have different + definitions of the zone. This is more of a problem for unusual time zones than for 'standard' zones like ``US/Eastern``. .. warning:: - Be aware that a timezone definition across versions of timezone libraries may not - be considered equal. This may cause problems when working with stored data that - is localized using one version and operated on with a different version. - See :ref:`here` for how to handle such a situation. + Be aware that a time zone definition across versions of time zone libraries may not + be considered equal. This may cause problems when working with stored data that + is localized using one version and operated on with a different version. + See :ref:`here` for how to handle such a situation. .. warning:: - It is incorrect to pass a timezone directly into the ``datetime.datetime`` constructor (e.g., - ``datetime.datetime(2011, 1, 1, tz=timezone('US/Eastern'))``. Instead, the datetime - needs to be localized using the localize method on the timezone. + For ``pytz`` time zones, it is incorrect to pass a time zone object directly into + the ``datetime.datetime`` constructor + (e.g., ``datetime.datetime(2011, 1, 1, tz=pytz.timezone('US/Eastern'))``. + Instead, the datetime needs to be localized using the ``localize`` method + on the ``pytz`` time zone object. -Under the hood, all timestamps are stored in UTC. Scalar values from a -``DatetimeIndex`` with a time zone will have their fields (day, hour, minute) +Under the hood, all timestamps are stored in UTC. Values from a time zone aware +:class:`DatetimeIndex` or :class:`Timestamp` will have their fields (day, hour, minute, etc.) localized to the time zone. However, timestamps with the same UTC value are still considered to be equal even if they are in different time zones: @@ -2241,51 +2236,35 @@ still considered to be equal even if they are in different time zones: rng_eastern = rng_utc.tz_convert('US/Eastern') rng_berlin = rng_utc.tz_convert('Europe/Berlin') - rng_eastern[5] - rng_berlin[5] - rng_eastern[5] == rng_berlin[5] - -Like ``Series``, ``DataFrame``, and ``DatetimeIndex``; ``Timestamp`` objects -can be converted to other time zones using ``tz_convert``: - -.. ipython:: python - - rng_eastern[5] - rng_berlin[5] - rng_eastern[5].tz_convert('Europe/Berlin') - -Localization of ``Timestamp`` functions just like ``DatetimeIndex`` and ``Series``: - -.. ipython:: python - - rng[5] - rng[5].tz_localize('Asia/Shanghai') - + rng_eastern[2] + rng_berlin[2] + rng_eastern[2] == rng_berlin[2] -Operations between ``Series`` in different time zones will yield UTC -``Series``, aligning the data on the UTC timestamps: +Operations between :class:`Series` in different time zones will yield UTC +:class:`Series`, aligning the data on the UTC timestamps: .. ipython:: python + ts_utc = pd.Series(range(3), pd.date_range('20130101', periods=3, tz='UTC')) eastern = ts_utc.tz_convert('US/Eastern') berlin = ts_utc.tz_convert('Europe/Berlin') result = eastern + berlin result result.index -To remove timezone from tz-aware ``DatetimeIndex``, use ``tz_localize(None)`` or ``tz_convert(None)``. -``tz_localize(None)`` will remove timezone holding local time representations. -``tz_convert(None)`` will remove timezone after converting to UTC time. +To remove time zone information, use ``tz_localize(None)`` or ``tz_convert(None)``. +``tz_localize(None)`` will remove the time zone yielding the local time representation. +``tz_convert(None)`` will remove the time zone after converting to UTC time. .. ipython:: python didx = pd.date_range(start='2014-08-01 09:00', freq='H', - periods=10, tz='US/Eastern') + periods=3, tz='US/Eastern') didx didx.tz_localize(None) didx.tz_convert(None) - # tz_convert(None) is identical with tz_convert('UTC').tz_localize(None) + # tz_convert(None) is identical to tz_convert('UTC').tz_localize(None) didx.tz_convert('UTC').tz_localize(None) .. _timeseries.timezone_ambiguous: @@ -2293,54 +2272,34 @@ To remove timezone from tz-aware ``DatetimeIndex``, use ``tz_localize(None)`` or Ambiguous Times when Localizing ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ -In some cases, localize cannot determine the DST and non-DST hours when there are -duplicates. This often happens when reading files or database records that simply -duplicate the hours. Passing ``ambiguous='infer'`` into ``tz_localize`` will -attempt to determine the right offset. Below the top example will fail as it -contains ambiguous times and the bottom will infer the right offset. +``tz_localize`` may not be able to determine the UTC offset of a timestamp +because daylight savings time (DST) in a local time zone causes some times to occur +twice within one day ("clocks fall back"). The following options are available: + +* ``'raise'``: Raises a ``pytz.AmbiguousTimeError`` (the default behavior) +* ``'infer'``: Attempt to determine the correct offset base on the monotonicity of the timestamps +* ``'NaT'``: Replaces ambiguous times with ``NaT`` +* ``bool``: ``True`` represents a DST time, ``False`` represents non-DST time. An array-like of ``bool`` values is supported for a sequence of times. .. ipython:: python rng_hourly = pd.DatetimeIndex(['11/06/2011 00:00', '11/06/2011 01:00', - '11/06/2011 01:00', '11/06/2011 02:00', - '11/06/2011 03:00']) + '11/06/2011 01:00', '11/06/2011 02:00']) -This will fail as there are ambiguous times +This will fail as there are ambiguous times (``'11/06/2011 01:00'``) .. code-block:: ipython In [2]: rng_hourly.tz_localize('US/Eastern') AmbiguousTimeError: Cannot infer dst time from Timestamp('2011-11-06 01:00:00'), try using the 'ambiguous' argument -Infer the ambiguous times - -.. ipython:: python - - rng_hourly_eastern = rng_hourly.tz_localize('US/Eastern', ambiguous='infer') - rng_hourly_eastern.to_list() - -In addition to 'infer', there are several other arguments supported. Passing -an array-like of bools or 0s/1s where True represents a DST hour and False a -non-DST hour, allows for distinguishing more than one DST -transition (e.g., if you have multiple records in a database each with their -own DST transition). Or passing 'NaT' will fill in transition times -with not-a-time values. These methods are available in the ``DatetimeIndex`` -constructor as well as ``tz_localize``. +Handle these ambiguous times by specifying the following. .. ipython:: python - rng_hourly_dst = np.array([1, 1, 0, 0, 0]) - rng_hourly.tz_localize('US/Eastern', ambiguous=rng_hourly_dst).to_list() - rng_hourly.tz_localize('US/Eastern', ambiguous='NaT').to_list() - - didx = pd.date_range(start='2014-08-01 09:00', freq='H', - periods=10, tz='US/Eastern') - didx - didx.tz_localize(None) - didx.tz_convert(None) - - # tz_convert(None) is identical with tz_convert('UTC').tz_localize(None) - didx.tz_convert('UCT').tz_localize(None) + rng_hourly.tz_localize('US/Eastern', ambiguous='infer') + rng_hourly.tz_localize('US/Eastern', ambiguous='NaT') + rng_hourly.tz_localize('US/Eastern', ambiguous=[True, True, False, False]) .. _timeseries.timezone_nonexistent: @@ -2348,7 +2307,7 @@ Nonexistent Times when Localizing ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ A DST transition may also shift the local time ahead by 1 hour creating nonexistent -local times. The behavior of localizing a timeseries with nonexistent times +local times ("clocks spring forward"). The behavior of localizing a timeseries with nonexistent times can be controlled by the ``nonexistent`` argument. The following options are available: * ``'raise'``: Raises a ``pytz.NonExistentTimeError`` (the default behavior) @@ -2382,58 +2341,61 @@ Transform nonexistent times to ``NaT`` or shift the times. .. _timeseries.timezone_series: -TZ Aware Dtypes -~~~~~~~~~~~~~~~ +Time Zone Series Operations +~~~~~~~~~~~~~~~~~~~~~~~~~~~ -``Series/DatetimeIndex`` with a timezone **naive** value are represented with a dtype of ``datetime64[ns]``. +A :class:`Series` with time zone **naive** values is +represented with a dtype of ``datetime64[ns]``. .. ipython:: python s_naive = pd.Series(pd.date_range('20130101', periods=3)) s_naive -``Series/DatetimeIndex`` with a timezone **aware** value are represented with a dtype of ``datetime64[ns, tz]``. +A :class:`Series` with a time zone **aware** values is +represented with a dtype of ``datetime64[ns, tz]`` where ``tz`` is the time zone .. ipython:: python s_aware = pd.Series(pd.date_range('20130101', periods=3, tz='US/Eastern')) s_aware -Both of these ``Series`` can be manipulated via the ``.dt`` accessor, see :ref:`here `. +Both of these :class:`Series` time zone information +can be manipulated via the ``.dt`` accessor, see :ref:`the dt accessor section `. -For example, to localize and convert a naive stamp to timezone aware. +For example, to localize and convert a naive stamp to time zone aware. .. ipython:: python s_naive.dt.tz_localize('UTC').dt.tz_convert('US/Eastern') - -Further more you can ``.astype(...)`` timezone aware (and naive). This operation is effectively a localize AND convert on a naive stamp, and -a convert on an aware stamp. +Time zone information can also be manipulated using the ``astype`` method. +This method can localize and convert time zone naive timestamps or +convert time zone aware timestamps. .. ipython:: python - # localize and convert a naive timezone + # localize and convert a naive time zone s_naive.astype('datetime64[ns, US/Eastern]') # make an aware tz naive s_aware.astype('datetime64[ns]') - # convert to a new timezone + # convert to a new time zone s_aware.astype('datetime64[ns, CET]') .. note:: Using :meth:`Series.to_numpy` on a ``Series``, returns a NumPy array of the data. - NumPy does not currently support timezones (even though it is *printing* in the local timezone!), - therefore an object array of Timestamps is returned for timezone aware data: + NumPy does not currently support time zones (even though it is *printing* in the local time zone!), + therefore an object array of Timestamps is returned for time zone aware data: .. ipython:: python s_naive.to_numpy() s_aware.to_numpy() - By converting to an object array of Timestamps, it preserves the timezone + By converting to an object array of Timestamps, it preserves the time zone information. For example, when converting back to a Series: .. ipython:: python diff --git a/doc/source/visualization.rst b/doc/source/user_guide/visualization.rst similarity index 100% rename from doc/source/visualization.rst rename to doc/source/user_guide/visualization.rst diff --git a/doc/source/whatsnew/v0.24.0.rst b/doc/source/whatsnew/v0.24.0.rst index 69b59793f7c0d..a49ea2cf493a6 100644 --- a/doc/source/whatsnew/v0.24.0.rst +++ b/doc/source/whatsnew/v0.24.0.rst @@ -1,33 +1,49 @@ .. _whatsnew_0240: -What's New in 0.24.0 (January XX, 2019) +What's New in 0.24.0 (January 25, 2019) --------------------------------------- .. warning:: The 0.24.x series of releases will be the last to support Python 2. Future feature - releases will support Python 3 only. See :ref:`install.dropping-27` for more. + releases will support Python 3 only. See :ref:`install.dropping-27` for more + details. {{ header }} -These are the changes in pandas 0.24.0. See :ref:`release` for a full changelog -including other versions of pandas. +This is a major release from 0.23.4 and includes a number of API changes, new +features, enhancements, and performance improvements along with a large number +of bug fixes. -Highlights include +Highlights include: -* :ref:`Optional Nullable Integer Support ` +* :ref:`Optional Integer NA Support ` * :ref:`New APIs for accessing the array backing a Series or Index ` * :ref:`A new top-level method for creating arrays ` * :ref:`Store Interval and Period data in a Series or DataFrame ` * :ref:`Support for joining on two MultiIndexes ` + +Check the :ref:`API Changes ` and :ref:`deprecations ` before updating. + +These are the changes in pandas 0.24.0. See :ref:`release` for a full changelog +including other versions of pandas. + + +Enhancements +~~~~~~~~~~~~ + .. _whatsnew_0240.enhancements.intna: Optional Integer NA Support ^^^^^^^^^^^^^^^^^^^^^^^^^^^ Pandas has gained the ability to hold integer dtypes with missing values. This long requested feature is enabled through the use of :ref:`extension types `. -Here is an example of the usage. + +.. note:: + + IntegerArray is currently experimental. Its API or implementation may + change without warning. We can construct a ``Series`` with the specified dtype. The dtype string ``Int64`` is a pandas ``ExtensionDtype``. Specifying a list or array using the traditional missing value marker of ``np.nan`` will infer to integer dtype. The display of the ``Series`` will also use the ``NaN`` to indicate missing values in string outputs. (:issue:`20700`, :issue:`20747`, :issue:`22441`, :issue:`21789`, :issue:`22346`) @@ -57,7 +73,7 @@ Operations on these dtypes will propagate ``NaN`` as other pandas operations. # coerce when needed s + 0.01 -These dtypes can operate as part of of ``DataFrame``. +These dtypes can operate as part of a ``DataFrame``. .. ipython:: python @@ -66,7 +82,7 @@ These dtypes can operate as part of of ``DataFrame``. df.dtypes -These dtypes can be merged & reshaped & casted. +These dtypes can be merged, reshaped, and casted. .. ipython:: python @@ -109,6 +125,7 @@ a new ndarray of period objects each time. .. ipython:: python + idx.values id(idx.values) id(idx.values) @@ -121,7 +138,7 @@ If you need an actual NumPy array, use :meth:`Series.to_numpy` or :meth:`Index.t For Series and Indexes backed by normal NumPy arrays, :attr:`Series.array` will return a new :class:`arrays.PandasArray`, which is a thin (no-copy) wrapper around a -:class:`numpy.ndarray`. :class:`arrays.PandasArray` isn't especially useful on its own, +:class:`numpy.ndarray`. :class:`~arrays.PandasArray` isn't especially useful on its own, but it does provide the same interface as any extension array defined in pandas or by a third-party library. @@ -139,14 +156,13 @@ See :ref:`Dtypes ` and :ref:`Attributes and Underlying Data `, including -extension arrays registered by :ref:`3rd party libraries `. See - -See :ref:`Dtypes ` for more on extension arrays. +extension arrays registered by :ref:`3rd party libraries `. +See the :ref:`dtypes docs ` for more on extension arrays. .. ipython:: python @@ -155,15 +171,15 @@ See :ref:`Dtypes ` for more on extension arrays. Passing data for which there isn't dedicated extension type (e.g. float, integer, etc.) will return a new :class:`arrays.PandasArray`, which is just a thin (no-copy) -wrapper around a :class:`numpy.ndarray` that satisfies the extension array interface. +wrapper around a :class:`numpy.ndarray` that satisfies the pandas extension array interface. .. ipython:: python pd.array([1, 2, 3]) -On their own, a :class:`arrays.PandasArray` isn't a very useful object. +On their own, a :class:`~arrays.PandasArray` isn't a very useful object. But if you need write low-level code that works generically for any -:class:`~pandas.api.extensions.ExtensionArray`, :class:`arrays.PandasArray` +:class:`~pandas.api.extensions.ExtensionArray`, :class:`~arrays.PandasArray` satisfies that need. Notice that by default, if no ``dtype`` is specified, the dtype of the returned @@ -194,7 +210,7 @@ For periods: .. ipython:: python - pser = pd.Series(pd.date_range("2000", freq="D", periods=5)) + pser = pd.Series(pd.period_range("2000", freq="D", periods=5)) pser pser.dtype @@ -210,6 +226,9 @@ from the ``Series``: ser.array pser.array +These return an instance of :class:`arrays.IntervalArray` or :class:`arrays.PeriodArray`, +the new extension arrays that back interval and period data. + .. warning:: For backwards compatibility, :attr:`Series.values` continues to return @@ -226,7 +245,7 @@ from the ``Series``: Joining with two multi-indexes ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ -:func:`DataFrame.merge` and :func:`DataFrame.join` can now be used to join multi-indexed ``Dataframe`` instances on the overlaping index levels (:issue:`6360`) +:func:`DataFrame.merge` and :func:`DataFrame.join` can now be used to join multi-indexed ``Dataframe`` instances on the overlapping index levels (:issue:`6360`) See the :ref:`Merge, join, and concatenate ` documentation section. @@ -256,23 +275,6 @@ For earlier versions this can be done using the following. pd.merge(left.reset_index(), right.reset_index(), on=['key'], how='inner').set_index(['key', 'X', 'Y']) - -.. _whatsnew_0240.enhancements.extension_array_operators: - -``ExtensionArray`` operator support -^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ - -A ``Series`` based on an ``ExtensionArray`` now supports arithmetic and comparison -operators (:issue:`19577`). There are two approaches for providing operator support for an ``ExtensionArray``: - -1. Define each of the operators on your ``ExtensionArray`` subclass. -2. Use an operator implementation from pandas that depends on operators that are already defined - on the underlying elements (scalars) of the ``ExtensionArray``. - -See the :ref:`ExtensionArray Operator Support -` documentation section for details on both -ways of adding operator support. - .. _whatsnew_0240.enhancements.read_html: ``read_html`` Enhancements @@ -332,7 +334,7 @@ convenient way to apply users' predefined styling functions, and can help reduce df.style.pipe(format_and_align).set_caption('Summary of results.') Similar methods already exist for other classes in pandas, including :meth:`DataFrame.pipe`, -:meth:`pandas.core.groupby.GroupBy.pipe`, and :meth:`pandas.core.resample.Resampler.pipe`. +:meth:`GroupBy.pipe() `, and :meth:`Resampler.pipe() `. .. _whatsnew_0240.enhancements.rename_axis: @@ -340,7 +342,7 @@ Renaming names in a MultiIndex ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ :func:`DataFrame.rename_axis` now supports ``index`` and ``columns`` arguments -and :func:`Series.rename_axis` supports ``index`` argument (:issue:`19978`) +and :func:`Series.rename_axis` supports ``index`` argument (:issue:`19978`). This change allows a dictionary to be passed so that some of the names of a ``MultiIndex`` can be changed. @@ -368,13 +370,13 @@ Other Enhancements - :func:`DataFrame.to_parquet` now accepts ``index`` as an argument, allowing the user to override the engine's default behavior to include or omit the dataframe's indexes from the resulting Parquet file. (:issue:`20768`) +- :func:`read_feather` now accepts ``columns`` as an argument, allowing the user to specify which columns should be read. (:issue:`24025`) - :meth:`DataFrame.corr` and :meth:`Series.corr` now accept a callable for generic calculation methods of correlation, e.g. histogram intersection (:issue:`22684`) - :func:`DataFrame.to_string` now accepts ``decimal`` as an argument, allowing the user to specify which decimal separator should be used in the output. (:issue:`23614`) -- :func:`read_feather` now accepts ``columns`` as an argument, allowing the user to specify which columns should be read. (:issue:`24025`) - :func:`DataFrame.to_html` now accepts ``render_links`` as an argument, allowing the user to generate HTML with links to any URLs that appear in the DataFrame. See the :ref:`section on writing HTML ` in the IO docs for example usage. (:issue:`2679`) - :func:`pandas.read_csv` now supports pandas extension types as an argument to ``dtype``, allowing the user to use pandas extension types when reading CSVs. (:issue:`23228`) -- :meth:`DataFrame.shift` :meth:`Series.shift`, :meth:`ExtensionArray.shift`, :meth:`SparseArray.shift`, :meth:`Period.shift`, :meth:`GroupBy.shift`, :meth:`Categorical.shift`, :meth:`NDFrame.shift` and :meth:`Block.shift` now accept `fill_value` as an argument, allowing the user to specify a value which will be used instead of NA/NaT in the empty periods. (:issue:`15486`) +- The :meth:`~DataFrame.shift` method now accepts `fill_value` as an argument, allowing the user to specify a value which will be used instead of NA/NaT in the empty periods. (:issue:`15486`) - :func:`to_datetime` now supports the ``%Z`` and ``%z`` directive when passed into ``format`` (:issue:`13486`) - :func:`Series.mode` and :func:`DataFrame.mode` now support the ``dropna`` parameter which can be used to specify whether ``NaN``/``NaT`` values should be considered (:issue:`17534`) - :func:`DataFrame.to_csv` and :func:`Series.to_csv` now support the ``compression`` keyword when a file handle is passed. (:issue:`21227`) @@ -396,20 +398,21 @@ Other Enhancements The default compression for ``to_csv``, ``to_json``, and ``to_pickle`` methods has been updated to ``'infer'`` (:issue:`22004`). - :meth:`DataFrame.to_sql` now supports writing ``TIMESTAMP WITH TIME ZONE`` types for supported databases. For databases that don't support timezones, datetime data will be stored as timezone unaware local timestamps. See the :ref:`io.sql_datetime_data` for implications (:issue:`9086`). - :func:`to_timedelta` now supports iso-formated timedelta strings (:issue:`21877`) -- :class:`Series` and :class:`DataFrame` now support :class:`Iterable` in constructor (:issue:`2193`) +- :class:`Series` and :class:`DataFrame` now support :class:`Iterable` objects in the constructor (:issue:`2193`) - :class:`DatetimeIndex` has gained the :attr:`DatetimeIndex.timetz` attribute. This returns the local time with timezone information. (:issue:`21358`) -- :meth:`Timestamp.round`, :meth:`Timestamp.ceil`, and :meth:`Timestamp.floor` for :class:`DatetimeIndex` and :class:`Timestamp` now support an ``ambiguous`` argument for handling datetimes that are rounded to ambiguous times (:issue:`18946`) -- :meth:`Timestamp.round`, :meth:`Timestamp.ceil`, and :meth:`Timestamp.floor` for :class:`DatetimeIndex` and :class:`Timestamp` now support a ``nonexistent`` argument for handling datetimes that are rounded to nonexistent times. See :ref:`timeseries.timezone_nonexistent` (:issue:`22647`) -- :class:`pandas.core.resample.Resampler` now is iterable like :class:`pandas.core.groupby.GroupBy` (:issue:`15314`). +- :meth:`~Timestamp.round`, :meth:`~Timestamp.ceil`, and :meth:`~Timestamp.floor` for :class:`DatetimeIndex` and :class:`Timestamp` + now support an ``ambiguous`` argument for handling datetimes that are rounded to ambiguous times (:issue:`18946`) + and a ``nonexistent`` argument for handling datetimes that are rounded to nonexistent times. See :ref:`timeseries.timezone_nonexistent` (:issue:`22647`) +- The result of :meth:`~DataFrame.resample` is now iterable similar to ``groupby()`` (:issue:`15314`). - :meth:`Series.resample` and :meth:`DataFrame.resample` have gained the :meth:`pandas.core.resample.Resampler.quantile` (:issue:`15023`). - :meth:`DataFrame.resample` and :meth:`Series.resample` with a :class:`PeriodIndex` will now respect the ``base`` argument in the same fashion as with a :class:`DatetimeIndex`. (:issue:`23882`) - :meth:`pandas.api.types.is_list_like` has gained a keyword ``allow_sets`` which is ``True`` by default; if ``False``, all instances of ``set`` will not be considered "list-like" anymore (:issue:`23061`) - :meth:`Index.to_frame` now supports overriding column name(s) (:issue:`22580`). - :meth:`Categorical.from_codes` now can take a ``dtype`` parameter as an alternative to passing ``categories`` and ``ordered`` (:issue:`24398`). -- New attribute :attr:`__git_version__` will return git commit sha of current build (:issue:`21295`). +- New attribute ``__git_version__`` will return git commit sha of current build (:issue:`21295`). - Compatibility with Matplotlib 3.0 (:issue:`22790`). -- Added :meth:`Interval.overlaps`, :meth:`IntervalArray.overlaps`, and :meth:`IntervalIndex.overlaps` for determining overlaps between interval-like objects (:issue:`21998`) +- Added :meth:`Interval.overlaps`, :meth:`arrays.IntervalArray.overlaps`, and :meth:`IntervalIndex.overlaps` for determining overlaps between interval-like objects (:issue:`21998`) - :func:`read_fwf` now accepts keyword ``infer_nrows`` (:issue:`15138`). - :func:`~DataFrame.to_parquet` now supports writing a ``DataFrame`` as a directory of parquet files partitioned by a subset of the columns when ``engine = 'pyarrow'`` (:issue:`23283`) - :meth:`Timestamp.tz_localize`, :meth:`DatetimeIndex.tz_localize`, and :meth:`Series.tz_localize` have gained the ``nonexistent`` argument for alternative handling of nonexistent times. See :ref:`timeseries.timezone_nonexistent` (:issue:`8917`, :issue:`24466`) @@ -418,12 +421,11 @@ Other Enhancements - :meth:`MultiIndex.to_flat_index` has been added to flatten multiple levels into a single-level :class:`Index` object. - :meth:`DataFrame.to_stata` and :class:`pandas.io.stata.StataWriter117` can write mixed sting columns to Stata strl format (:issue:`23633`) - :meth:`DataFrame.between_time` and :meth:`DataFrame.at_time` have gained the ``axis`` parameter (:issue:`8839`) -- The ``scatter_matrix``, ``andrews_curves``, ``parallel_coordinates``, ``lag_plot``, ``autocorrelation_plot``, ``bootstrap_plot``, and ``radviz`` plots from the ``pandas.plotting`` module are now accessible from calling :meth:`DataFrame.plot` (:issue:`11978`) - :meth:`DataFrame.to_records` now accepts ``index_dtypes`` and ``column_dtypes`` parameters to allow different data types in stored column and index records (:issue:`18146`) - :class:`IntervalIndex` has gained the :attr:`~IntervalIndex.is_overlapping` attribute to indicate if the ``IntervalIndex`` contains any overlapping intervals (:issue:`23309`) - :func:`pandas.DataFrame.to_sql` has gained the ``method`` argument to control SQL insertion clause. See the :ref:`insertion method ` section in the documentation. (:issue:`8953`) - :meth:`DataFrame.corrwith` now supports Spearman's rank correlation, Kendall's tau as well as callable correlation methods. (:issue:`21925`) -- :meth:`DataFrame.to_json`, :meth:`DataFrame.to_csv`, :meth:`DataFrame.to_pickle`, and :meth:`DataFrame.to_XXX` etc. now support tilde(~) in path argument. (:issue:`23473`) +- :meth:`DataFrame.to_json`, :meth:`DataFrame.to_csv`, :meth:`DataFrame.to_pickle`, and other export methods now support tilde(~) in path argument. (:issue:`23473`) .. _whatsnew_0240.api_breaking: @@ -435,8 +437,8 @@ Pandas 0.24.0 includes a number of API breaking changes. .. _whatsnew_0240.api_breaking.deps: -Dependencies have increased minimum versions -^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ +Increased minimum versions for dependencies +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ We have updated our minimum supported versions of dependencies (:issue:`21242`, :issue:`18742`, :issue:`23774`, :issue:`24767`). If installed, we now require: @@ -646,6 +648,52 @@ that the dates have been converted to UTC pd.to_datetime(["2015-11-18 15:30:00+05:30", "2015-11-18 16:30:00+06:30"], utc=True) + +.. _whatsnew_0240.api_breaking.read_csv_mixed_tz: + +Parsing mixed-timezones with :func:`read_csv` +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ + +:func:`read_csv` no longer silently converts mixed-timezone columns to UTC (:issue:`24987`). + +*Previous Behavior* + +.. code-block:: python + + >>> import io + >>> content = """\ + ... a + ... 2000-01-01T00:00:00+05:00 + ... 2000-01-01T00:00:00+06:00""" + >>> df = pd.read_csv(io.StringIO(content), parse_dates=['a']) + >>> df.a + 0 1999-12-31 19:00:00 + 1 1999-12-31 18:00:00 + Name: a, dtype: datetime64[ns] + +*New Behavior* + +.. ipython:: python + + import io + content = """\ + a + 2000-01-01T00:00:00+05:00 + 2000-01-01T00:00:00+06:00""" + df = pd.read_csv(io.StringIO(content), parse_dates=['a']) + df.a + +As can be seen, the ``dtype`` is object; each value in the column is a string. +To convert the strings to an array of datetimes, the ``date_parser`` argument + +.. ipython:: python + + df = pd.read_csv(io.StringIO(content), parse_dates=['a'], + date_parser=lambda col: pd.to_datetime(col, utc=True)) + df.a + +See :ref:`whatsnew_0240.api.timezone_offset_parsing` for more. + .. _whatsnew_0240.api_breaking.period_end_time: Time values in ``dt.end_time`` and ``to_timestamp(how='end')`` @@ -1164,17 +1212,19 @@ Other API Changes .. _whatsnew_0240.api.extension: -ExtensionType Changes -^^^^^^^^^^^^^^^^^^^^^ +Extension Type Changes +~~~~~~~~~~~~~~~~~~~~~~ **Equality and Hashability** -Pandas now requires that extension dtypes be hashable. The base class implements +Pandas now requires that extension dtypes be hashable (i.e. the respective +``ExtensionDtype`` objects; hashability is not a requirement for the values +of the corresponding ``ExtensionArray``). The base class implements a default ``__eq__`` and ``__hash__``. If you have a parametrized dtype, you should update the ``ExtensionDtype._metadata`` tuple to match the signature of your ``__init__`` method. See :class:`pandas.api.extensions.ExtensionDtype` for more (:issue:`22476`). -**Reshaping changes** +**New and changed methods** - :meth:`~pandas.api.types.ExtensionArray.dropna` has been added (:issue:`21185`) - :meth:`~pandas.api.types.ExtensionArray.repeat` has been added (:issue:`24349`) @@ -1192,9 +1242,25 @@ update the ``ExtensionDtype._metadata`` tuple to match the signature of your - Added :meth:`pandas.api.types.register_extension_dtype` to register an extension type with pandas (:issue:`22664`) - Updated the ``.type`` attribute for ``PeriodDtype``, ``DatetimeTZDtype``, and ``IntervalDtype`` to be instances of the dtype (``Period``, ``Timestamp``, and ``Interval`` respectively) (:issue:`22938`) +.. _whatsnew_0240.enhancements.extension_array_operators: + +**Operator support** + +A ``Series`` based on an ``ExtensionArray`` now supports arithmetic and comparison +operators (:issue:`19577`). There are two approaches for providing operator support for an ``ExtensionArray``: + +1. Define each of the operators on your ``ExtensionArray`` subclass. +2. Use an operator implementation from pandas that depends on operators that are already defined + on the underlying elements (scalars) of the ``ExtensionArray``. + +See the :ref:`ExtensionArray Operator Support +` documentation section for details on both +ways of adding operator support. + **Other changes** - A default repr for :class:`pandas.api.extensions.ExtensionArray` is now provided (:issue:`23601`). +- :meth:`ExtensionArray._formatting_values` is deprecated. Use :attr:`ExtensionArray._formatter` instead. (:issue:`23601`) - An ``ExtensionArray`` with a boolean dtype now works correctly as a boolean indexer. :meth:`pandas.api.types.is_bool_dtype` now properly considers them boolean (:issue:`22326`) **Bug Fixes** @@ -1243,7 +1309,6 @@ Deprecations - The methods :meth:`DataFrame.update` and :meth:`Panel.update` have deprecated the ``raise_conflict=False|True`` keyword in favor of ``errors='ignore'|'raise'`` (:issue:`23585`) - The methods :meth:`Series.str.partition` and :meth:`Series.str.rpartition` have deprecated the ``pat`` keyword in favor of ``sep`` (:issue:`22676`) - Deprecated the ``nthreads`` keyword of :func:`pandas.read_feather` in favor of ``use_threads`` to reflect the changes in ``pyarrow>=0.11.0``. (:issue:`23053`) -- :meth:`ExtensionArray._formatting_values` is deprecated. Use :attr:`ExtensionArray._formatter` instead. (:issue:`23601`) - :func:`pandas.read_excel` has deprecated accepting ``usecols`` as an integer. Please pass in a list of ints from 0 to ``usecols`` inclusive instead (:issue:`23527`) - Constructing a :class:`TimedeltaIndex` from data with ``datetime64``-dtyped data is deprecated, will raise ``TypeError`` in a future version (:issue:`23539`) - Constructing a :class:`DatetimeIndex` from data with ``timedelta64``-dtyped data is deprecated, will raise ``TypeError`` in a future version (:issue:`23675`) @@ -1692,8 +1757,8 @@ Missing - Bug in :func:`Series.hasnans` that could be incorrectly cached and return incorrect answers if null elements are introduced after an initial call (:issue:`19700`) - :func:`Series.isin` now treats all NaN-floats as equal also for ``np.object``-dtype. This behavior is consistent with the behavior for float64 (:issue:`22119`) - :func:`unique` no longer mangles NaN-floats and the ``NaT``-object for ``np.object``-dtype, i.e. ``NaT`` is no longer coerced to a NaN-value and is treated as a different entity. (:issue:`22295`) -- :func:`DataFrame` and :func:`Series` now properly handle numpy masked arrays with hardened masks. Previously, constructing a DataFrame or Series from a masked array with a hard mask would create a pandas object containing the underlying value, rather than the expected NaN. (:issue:`24574`) - +- :class:`DataFrame` and :class:`Series` now properly handle numpy masked arrays with hardened masks. Previously, constructing a DataFrame or Series from a masked array with a hard mask would create a pandas object containing the underlying value, rather than the expected NaN. (:issue:`24574`) +- Bug in :class:`DataFrame` constructor where ``dtype`` argument was not honored when handling numpy masked record arrays. (:issue:`24874`) MultiIndex ^^^^^^^^^^ @@ -1751,6 +1816,8 @@ I/O - Bug in :meth:`DataFrame.to_stata`, :class:`pandas.io.stata.StataWriter` and :class:`pandas.io.stata.StataWriter117` where a exception would leave a partially written and invalid dta file (:issue:`23573`) - Bug in :meth:`DataFrame.to_stata` and :class:`pandas.io.stata.StataWriter117` that produced invalid files when using strLs with non-ASCII characters (:issue:`23573`) - Bug in :class:`HDFStore` that caused it to raise ``ValueError`` when reading a Dataframe in Python 3 from fixed format written in Python 2 (:issue:`24510`) +- Bug in :func:`DataFrame.to_string()` and more generally in the floating ``repr`` formatter. Zeros were not trimmed if ``inf`` was present in a columns while it was the case with NA values. Zeros are now trimmed as in the presence of NA (:issue:`24861`). +- Bug in the ``repr`` when truncating the number of columns and having a wide last column (:issue:`24849`). Plotting ^^^^^^^^ @@ -1786,6 +1853,7 @@ Groupby/Resample/Rolling - Bug in :meth:`DataFrame.groupby` did not respect the ``observed`` argument when selecting a column and instead always used ``observed=False`` (:issue:`23970`) - Bug in :func:`pandas.core.groupby.SeriesGroupBy.pct_change` or :func:`pandas.core.groupby.DataFrameGroupBy.pct_change` would previously work across groups when calculating the percent change, where it now correctly works per group (:issue:`21200`, :issue:`21235`). - Bug preventing hash table creation with very large number (2^32) of rows (:issue:`22805`) +- Bug in groupby when grouping on categorical causes ``ValueError`` and incorrect grouping if ``observed=True`` and ``nan`` is present in categorical column (:issue:`24740`, :issue:`21151`). Reshaping ^^^^^^^^^ @@ -1821,7 +1889,6 @@ Reshaping - Bug in :func:`DataFrame.unstack` where a ``ValueError`` was raised when unstacking timezone aware values (:issue:`18338`) - Bug in :func:`DataFrame.stack` where timezone aware values were converted to timezone naive values (:issue:`19420`) - Bug in :func:`merge_asof` where a ``TypeError`` was raised when ``by_col`` were timezone aware values (:issue:`21184`) -- Bug in :func:`merge` when merging by index name would sometimes result in an incorrectly numbered index (:issue:`24212`) - Bug showing an incorrect shape when throwing error during ``DataFrame`` construction. (:issue:`20742`) .. _whatsnew_0240.bug_fixes.sparse: diff --git a/doc/source/whatsnew/v0.24.1.rst b/doc/source/whatsnew/v0.24.1.rst index ee4b7ab62b31a..be0a2eb682e87 100644 --- a/doc/source/whatsnew/v0.24.1.rst +++ b/doc/source/whatsnew/v0.24.1.rst @@ -2,8 +2,8 @@ .. _whatsnew_0241: -Whats New in 0.24.1 (February XX, 2019) ---------------------------------------- +Whats New in 0.24.1 (February 3, 2019) +-------------------------------------- .. warning:: @@ -13,61 +13,69 @@ Whats New in 0.24.1 (February XX, 2019) {{ header }} These are the changes in pandas 0.24.1. See :ref:`release` for a full changelog -including other versions of pandas. +including other versions of pandas. See :ref:`whatsnew_0240` for the 0.24.0 changelog. +.. _whatsnew_0241.api: -.. _whatsnew_0241.enhancements: +API Changes +~~~~~~~~~~~ -Enhancements -^^^^^^^^^^^^ +Changing the ``sort`` parameter for :class:`Index` set operations +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ +The default ``sort`` value for :meth:`Index.union` has changed from ``True`` to ``None`` (:issue:`24959`). +The default *behavior*, however, remains the same: the result is sorted, unless -.. _whatsnew_0241.bug_fixes: - -Bug Fixes -~~~~~~~~~ +1. ``self`` and ``other`` are identical +2. ``self`` or ``other`` is empty +3. ``self`` or ``other`` contain values that can not be compared (a ``RuntimeWarning`` is raised). -**Conversion** +This change will allow ``sort=True`` to mean "always sort" in a future release. -- -- -- +The same change applies to :meth:`Index.difference` and :meth:`Index.symmetric_difference`, which +would not sort the result when the values could not be compared. -**Indexing** +The `sort` option for :meth:`Index.intersection` has changed in three ways. -- -- -- +1. The default has changed from ``True`` to ``False``, to restore the + pandas 0.23.4 and earlier behavior of not sorting by default. +2. The behavior of ``sort=True`` can now be obtained with ``sort=None``. + This will sort the result only if the values in ``self`` and ``other`` + are not identical. +3. The value ``sort=True`` is no longer allowed. A future version of pandas + will properly support ``sort=True`` meaning "always sort". -**I/O** +.. _whatsnew_0241.regressions: -- -- -- +Fixed Regressions +~~~~~~~~~~~~~~~~~ -**Categorical** +- Fixed regression in :meth:`DataFrame.to_dict` with ``records`` orient raising an + ``AttributeError`` when the ``DataFrame`` contained more than 255 columns, or + wrongly converting column names that were not valid python identifiers (:issue:`24939`, :issue:`24940`). +- Fixed regression in :func:`read_sql` when passing certain queries with MySQL/pymysql (:issue:`24988`). +- Fixed regression in :class:`Index.intersection` incorrectly sorting the values by default (:issue:`24959`). +- Fixed regression in :func:`merge` when merging an empty ``DataFrame`` with multiple timezone-aware columns on one of the timezone-aware columns (:issue:`25014`). +- Fixed regression in :meth:`Series.rename_axis` and :meth:`DataFrame.rename_axis` where passing ``None`` failed to remove the axis name (:issue:`25034`) +- Fixed regression in :func:`to_timedelta` with `box=False` incorrectly returning a ``datetime64`` object instead of a ``timedelta64`` object (:issue:`24961`) +- Fixed regression where custom hashable types could not be used as column keys in :meth:`DataFrame.set_index` (:issue:`24969`) -- -- -- +.. _whatsnew_0241.bug_fixes: -**Timezones** +Bug Fixes +~~~~~~~~~ -- -- -- +**Reshaping** -**Timedelta** +- Bug in :meth:`DataFrame.groupby` with :class:`Grouper` when there is a time change (DST) and grouping frequency is ``'1d'`` (:issue:`24972`) -- -- -- +**Visualization** +- Fixed the warning for implicitly registered matplotlib converters not showing. See :ref:`whatsnew_0211.converters` for more (:issue:`24963`). **Other** -- -- +- Fixed AttributeError when printing a DataFrame's HTML repr after accessing the IPython config object (:issue:`25036`) .. _whatsnew_0.241.contributors: diff --git a/doc/source/whatsnew/v0.24.2.rst b/doc/source/whatsnew/v0.24.2.rst new file mode 100644 index 0000000000000..b0f287cf0b9f6 --- /dev/null +++ b/doc/source/whatsnew/v0.24.2.rst @@ -0,0 +1,100 @@ +:orphan: + +.. _whatsnew_0242: + +Whats New in 0.24.2 (February XX, 2019) +--------------------------------------- + +.. warning:: + + The 0.24.x series of releases will be the last to support Python 2. Future feature + releases will support Python 3 only. See :ref:`install.dropping-27` for more. + +{{ header }} + +These are the changes in pandas 0.24.2. See :ref:`release` for a full changelog +including other versions of pandas. + +.. _whatsnew_0242.regressions: + +Fixed Regressions +^^^^^^^^^^^^^^^^^ + +- Fixed regression in :meth:`DataFrame.all` and :meth:`DataFrame.any` where ``bool_only=True`` was ignored (:issue:`25101`) +- Fixed issue in ``DataFrame`` construction with passing a mixed list of mixed types could segfault. (:issue:`25075`) +- Fixed regression in :meth:`DataFrame.apply` causing ``RecursionError`` when ``dict``-like classes were passed as argument. (:issue:`25196`) + +.. _whatsnew_0242.enhancements: + +Enhancements +^^^^^^^^^^^^ + +- +- + +.. _whatsnew_0242.bug_fixes: + +Bug Fixes +~~~~~~~~~ + +**Conversion** + +- +- +- + +**Indexing** + +- +- +- + +**I/O** + +- Bug in reading a HDF5 table-format ``DataFrame`` created in Python 2, in Python 3 (:issue:`24925`) +- Bug in reading a JSON with ``orient='table'`` generated by :meth:`DataFrame.to_json` with ``index=False`` (:issue:`25170`) +- Bug where float indexes could have misaligned values when printing (:issue:`25061`) +- + +**Categorical** + +- +- +- + +**Timezones** + +- +- +- + +**Timedelta** + +- +- +- + +**Reshaping** + +- +- +- + +**Visualization** + +- +- +- + +**Other** + +- Bug in :meth:`Series.is_unique` where single occurrences of ``NaN`` were not considered unique (:issue:`25180`) +- +- + +.. _whatsnew_0.242.contributors: + +Contributors +~~~~~~~~~~~~ + +.. contributors:: v0.24.1..v0.24.2 diff --git a/doc/source/whatsnew/v0.25.0.rst b/doc/source/whatsnew/v0.25.0.rst index 24298b3025169..601c230296f7d 100644 --- a/doc/source/whatsnew/v0.25.0.rst +++ b/doc/source/whatsnew/v0.25.0.rst @@ -1,10 +1,13 @@ -:orphan: - .. _whatsnew_0250: What's New in 0.25.0 (April XX, 2019) ------------------------------------- +.. warning:: + + Starting with the 0.25.x series of releases, pandas only supports Python 3.5 and higher. + See :ref:`install.dropping-27` for more details. + {{ header }} These are the changes in pandas 0.25.0. See :ref:`release` for a full changelog @@ -16,10 +19,9 @@ including other versions of pandas. Other Enhancements ^^^^^^^^^^^^^^^^^^ +- :meth:`Timestamp.replace` now supports the ``fold`` argument to disambiguate DST transition times (:issue:`25017`) - - -- - .. _whatsnew_0250.api_breaking: @@ -40,16 +42,13 @@ Other API Changes Deprecations ~~~~~~~~~~~~ -- -- -- - +- Deprecated the `M (months)` and `Y (year)` `units` parameter of :func: `pandas.to_timedelta`, :func: `pandas.Timedelta` and :func: `pandas.TimedeltaIndex` (:issue:`16344`) .. _whatsnew_0250.prior_deprecations: Removal of prior version deprecations/changes ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ - +- Removed (parts of) :class:`Panel` (:issue:`25047`,:issue:`25191`,:issue:`25231`) - - - @@ -59,8 +58,8 @@ Removal of prior version deprecations/changes Performance Improvements ~~~~~~~~~~~~~~~~~~~~~~~~ -- -- +- Significant speedup in `SparseArray` initialization that benefits most operations, fixing performance regression introduced in v0.20.0 (:issue:`24985`) +- `DataFrame.to_stata()` is now faster when outputting data with any string or non-native endian columns (:issue:`25045`) - @@ -93,13 +92,15 @@ Timedelta Timezones ^^^^^^^^^ -- +- Bug in :func:`to_datetime` with ``utc=True`` and datetime strings that would apply previously parsed UTC offsets to subsequent arguments (:issue:`24992`) - - Numeric ^^^^^^^ +- Bug in :meth:`to_numeric` in which large negative numbers were being improperly handled (:issue:`24910`) +- Bug in :meth:`to_numeric` in which numbers were being coerced to float, even though ``errors`` was not ``coerce`` (:issue:`24910`) - - - @@ -153,9 +154,11 @@ MultiIndex I/O ^^^ +- Fixed bug in missing text when using :meth:`to_clipboard` if copying utf-16 characters in Python 3 on Windows (:issue:`25040`) - Bug in :func:`DataFrame.to_html()` where values were truncated using display options instead of outputting the full content (:issue:`17004`) - - +- Plotting @@ -176,15 +179,16 @@ Groupby/Resample/Rolling Reshaping ^^^^^^^^^ -- -- +- Bug in :func:`pandas.merge` adds a string of ``None`` if ``None`` is assigned in suffixes instead of remain the column name as-is (:issue:`24782`). +- Bug in :func:`merge` when merging by index name would sometimes result in an incorrectly numbered index (:issue:`24212`) +- :func:`to_records` now accepts dtypes to its `column_dtypes` parameter (:issue:`24895`) - Sparse ^^^^^^ -- +- Significant speedup in `SparseArray` initialization that benefits most operations, fixing performance regression introduced in v0.20.0 (:issue:`24985`) - - @@ -203,4 +207,3 @@ Contributors ~~~~~~~~~~~~ .. contributors:: v0.24.x..HEAD - diff --git a/pandas/_libs/algos.pyx b/pandas/_libs/algos.pyx index b3c519ab99b6e..663411ad984c2 100644 --- a/pandas/_libs/algos.pyx +++ b/pandas/_libs/algos.pyx @@ -76,7 +76,7 @@ class NegInfinity(object): @cython.wraparound(False) @cython.boundscheck(False) -cpdef ndarray[int64_t, ndim=1] unique_deltas(ndarray[int64_t] arr): +cpdef ndarray[int64_t, ndim=1] unique_deltas(const int64_t[:] arr): """ Efficiently find the unique first-differences of the given array. @@ -150,7 +150,7 @@ def is_lexsorted(list_of_arrays: list) -> bint: @cython.boundscheck(False) @cython.wraparound(False) -def groupsort_indexer(ndarray[int64_t] index, Py_ssize_t ngroups): +def groupsort_indexer(const int64_t[:] index, Py_ssize_t ngroups): """ compute a 1-d indexer that is an ordering of the passed index, ordered by the groups. This is a reverse of the label @@ -230,7 +230,7 @@ def kth_smallest(numeric[:] a, Py_ssize_t k) -> numeric: @cython.boundscheck(False) @cython.wraparound(False) -def nancorr(ndarray[float64_t, ndim=2] mat, bint cov=0, minp=None): +def nancorr(const float64_t[:, :] mat, bint cov=0, minp=None): cdef: Py_ssize_t i, j, xi, yi, N, K bint minpv @@ -294,7 +294,7 @@ def nancorr(ndarray[float64_t, ndim=2] mat, bint cov=0, minp=None): @cython.boundscheck(False) @cython.wraparound(False) -def nancorr_spearman(ndarray[float64_t, ndim=2] mat, Py_ssize_t minp=1): +def nancorr_spearman(const float64_t[:, :] mat, Py_ssize_t minp=1): cdef: Py_ssize_t i, j, xi, yi, N, K ndarray[float64_t, ndim=2] result @@ -435,8 +435,8 @@ def pad(ndarray[algos_t] old, ndarray[algos_t] new, limit=None): @cython.boundscheck(False) @cython.wraparound(False) -def pad_inplace(ndarray[algos_t] values, - ndarray[uint8_t, cast=True] mask, +def pad_inplace(algos_t[:] values, + const uint8_t[:] mask, limit=None): cdef: Py_ssize_t i, N @@ -472,8 +472,8 @@ def pad_inplace(ndarray[algos_t] values, @cython.boundscheck(False) @cython.wraparound(False) -def pad_2d_inplace(ndarray[algos_t, ndim=2] values, - ndarray[uint8_t, ndim=2] mask, +def pad_2d_inplace(algos_t[:, :] values, + const uint8_t[:, :] mask, limit=None): cdef: Py_ssize_t i, j, N, K @@ -602,8 +602,8 @@ def backfill(ndarray[algos_t] old, ndarray[algos_t] new, limit=None): @cython.boundscheck(False) @cython.wraparound(False) -def backfill_inplace(ndarray[algos_t] values, - ndarray[uint8_t, cast=True] mask, +def backfill_inplace(algos_t[:] values, + const uint8_t[:] mask, limit=None): cdef: Py_ssize_t i, N @@ -639,8 +639,8 @@ def backfill_inplace(ndarray[algos_t] values, @cython.boundscheck(False) @cython.wraparound(False) -def backfill_2d_inplace(ndarray[algos_t, ndim=2] values, - ndarray[uint8_t, ndim=2] mask, +def backfill_2d_inplace(algos_t[:, :] values, + const uint8_t[:, :] mask, limit=None): cdef: Py_ssize_t i, j, N, K @@ -678,7 +678,7 @@ def backfill_2d_inplace(ndarray[algos_t, ndim=2] values, @cython.wraparound(False) @cython.boundscheck(False) -def arrmap(ndarray[algos_t] index, object func): +def arrmap(algos_t[:] index, object func): cdef: Py_ssize_t length = index.shape[0] Py_ssize_t i = 0 diff --git a/pandas/_libs/groupby.pyx b/pandas/_libs/groupby.pyx index e6036654c71c3..950ba3f89ffb7 100644 --- a/pandas/_libs/groupby.pyx +++ b/pandas/_libs/groupby.pyx @@ -2,6 +2,7 @@ import cython from cython import Py_ssize_t +from cython cimport floating from libc.stdlib cimport malloc, free @@ -382,5 +383,55 @@ def group_any_all(uint8_t[:] out, out[lab] = flag_val +@cython.wraparound(False) +@cython.boundscheck(False) +def _group_add(floating[:, :] out, + int64_t[:] counts, + floating[:, :] values, + const int64_t[:] labels, + Py_ssize_t min_count=0): + """ + Only aggregates on axis=0 + """ + cdef: + Py_ssize_t i, j, N, K, lab, ncounts = len(counts) + floating val, count + ndarray[floating, ndim=2] sumx, nobs + + if not len(values) == len(labels): + raise AssertionError("len(index) != len(labels)") + + nobs = np.zeros_like(out) + sumx = np.zeros_like(out) + + N, K = (values).shape + + with nogil: + + for i in range(N): + lab = labels[i] + if lab < 0: + continue + + counts[lab] += 1 + for j in range(K): + val = values[i, j] + + # not nan + if val == val: + nobs[lab, j] += 1 + sumx[lab, j] += val + + for i in range(ncounts): + for j in range(K): + if nobs[i, j] < min_count: + out[i, j] = NAN + else: + out[i, j] = sumx[i, j] + + +group_add_float32 = _group_add['float'] +group_add_float64 = _group_add['double'] + # generated from template include "groupby_helper.pxi" diff --git a/pandas/_libs/groupby_helper.pxi.in b/pandas/_libs/groupby_helper.pxi.in index abac9f147848e..db7018e1a7254 100644 --- a/pandas/_libs/groupby_helper.pxi.in +++ b/pandas/_libs/groupby_helper.pxi.in @@ -9,7 +9,7 @@ cdef extern from "numpy/npy_math.h": _int64_max = np.iinfo(np.int64).max # ---------------------------------------------------------------------- -# group_add, group_prod, group_var, group_mean, group_ohlc +# group_prod, group_var, group_mean, group_ohlc # ---------------------------------------------------------------------- {{py: @@ -29,57 +29,10 @@ def get_dispatch(dtypes): @cython.wraparound(False) @cython.boundscheck(False) -def group_add_{{name}}(ndarray[{{c_type}}, ndim=2] out, - ndarray[int64_t] counts, - ndarray[{{c_type}}, ndim=2] values, - ndarray[int64_t] labels, - Py_ssize_t min_count=0): - """ - Only aggregates on axis=0 - """ - cdef: - Py_ssize_t i, j, N, K, lab, ncounts = len(counts) - {{c_type}} val, count - ndarray[{{c_type}}, ndim=2] sumx, nobs - - if not len(values) == len(labels): - raise AssertionError("len(index) != len(labels)") - - nobs = np.zeros_like(out) - sumx = np.zeros_like(out) - - N, K = (values).shape - - with nogil: - - for i in range(N): - lab = labels[i] - if lab < 0: - continue - - counts[lab] += 1 - for j in range(K): - val = values[i, j] - - # not nan - if val == val: - nobs[lab, j] += 1 - sumx[lab, j] += val - - for i in range(ncounts): - for j in range(K): - if nobs[i, j] < min_count: - out[i, j] = NAN - else: - out[i, j] = sumx[i, j] - - -@cython.wraparound(False) -@cython.boundscheck(False) -def group_prod_{{name}}(ndarray[{{c_type}}, ndim=2] out, - ndarray[int64_t] counts, - ndarray[{{c_type}}, ndim=2] values, - ndarray[int64_t] labels, +def group_prod_{{name}}({{c_type}}[:, :] out, + int64_t[:] counts, + {{c_type}}[:, :] values, + const int64_t[:] labels, Py_ssize_t min_count=0): """ Only aggregates on axis=0 @@ -123,10 +76,10 @@ def group_prod_{{name}}(ndarray[{{c_type}}, ndim=2] out, @cython.wraparound(False) @cython.boundscheck(False) @cython.cdivision(True) -def group_var_{{name}}(ndarray[{{c_type}}, ndim=2] out, - ndarray[int64_t] counts, - ndarray[{{c_type}}, ndim=2] values, - ndarray[int64_t] labels, +def group_var_{{name}}({{c_type}}[:, :] out, + int64_t[:] counts, + {{c_type}}[:, :] values, + const int64_t[:] labels, Py_ssize_t min_count=-1): cdef: Py_ssize_t i, j, N, K, lab, ncounts = len(counts) @@ -175,10 +128,10 @@ def group_var_{{name}}(ndarray[{{c_type}}, ndim=2] out, @cython.wraparound(False) @cython.boundscheck(False) -def group_mean_{{name}}(ndarray[{{c_type}}, ndim=2] out, - ndarray[int64_t] counts, - ndarray[{{c_type}}, ndim=2] values, - ndarray[int64_t] labels, +def group_mean_{{name}}({{c_type}}[:, :] out, + int64_t[:] counts, + {{c_type}}[:, :] values, + const int64_t[:] labels, Py_ssize_t min_count=-1): cdef: Py_ssize_t i, j, N, K, lab, ncounts = len(counts) @@ -220,11 +173,11 @@ def group_mean_{{name}}(ndarray[{{c_type}}, ndim=2] out, @cython.wraparound(False) @cython.boundscheck(False) -def group_ohlc_{{name}}(ndarray[{{c_type}}, ndim=2] out, - ndarray[int64_t] counts, - ndarray[{{c_type}}, ndim=2] values, - ndarray[int64_t] labels, - Py_ssize_t min_count=-1): +def group_ohlc_{{name}}({{c_type}}[:, :] out, + int64_t[:] counts, + {{c_type}}[:, :] values, + const int64_t[:] labels, + Py_ssize_t min_count=-1): """ Only aggregates on axis=0 """ @@ -293,10 +246,10 @@ def get_dispatch(dtypes): @cython.wraparound(False) @cython.boundscheck(False) -def group_last_{{name}}(ndarray[{{c_type}}, ndim=2] out, - ndarray[int64_t] counts, - ndarray[{{c_type}}, ndim=2] values, - ndarray[int64_t] labels, +def group_last_{{name}}({{c_type}}[:, :] out, + int64_t[:] counts, + {{c_type}}[:, :] values, + const int64_t[:] labels, Py_ssize_t min_count=-1): """ Only aggregates on axis=0 @@ -350,10 +303,10 @@ def group_last_{{name}}(ndarray[{{c_type}}, ndim=2] out, @cython.wraparound(False) @cython.boundscheck(False) -def group_nth_{{name}}(ndarray[{{c_type}}, ndim=2] out, - ndarray[int64_t] counts, - ndarray[{{c_type}}, ndim=2] values, - ndarray[int64_t] labels, int64_t rank, +def group_nth_{{name}}({{c_type}}[:, :] out, + int64_t[:] counts, + {{c_type}}[:, :] values, + const int64_t[:] labels, int64_t rank, Py_ssize_t min_count=-1): """ Only aggregates on axis=0 @@ -411,9 +364,9 @@ def group_nth_{{name}}(ndarray[{{c_type}}, ndim=2] out, @cython.boundscheck(False) @cython.wraparound(False) -def group_rank_{{name}}(ndarray[float64_t, ndim=2] out, - ndarray[{{c_type}}, ndim=2] values, - ndarray[int64_t] labels, +def group_rank_{{name}}(float64_t[:, :] out, + {{c_type}}[:, :] values, + const int64_t[:] labels, bint is_datetimelike, object ties_method, bint ascending, bint pct, object na_option): """ @@ -606,10 +559,10 @@ ctypedef fused groupby_t: @cython.wraparound(False) @cython.boundscheck(False) -def group_max(ndarray[groupby_t, ndim=2] out, - ndarray[int64_t] counts, - ndarray[groupby_t, ndim=2] values, - ndarray[int64_t] labels, +def group_max(groupby_t[:, :] out, + int64_t[:] counts, + groupby_t[:, :] values, + const int64_t[:] labels, Py_ssize_t min_count=-1): """ Only aggregates on axis=0 @@ -669,10 +622,10 @@ def group_max(ndarray[groupby_t, ndim=2] out, @cython.wraparound(False) @cython.boundscheck(False) -def group_min(ndarray[groupby_t, ndim=2] out, - ndarray[int64_t] counts, - ndarray[groupby_t, ndim=2] values, - ndarray[int64_t] labels, +def group_min(groupby_t[:, :] out, + int64_t[:] counts, + groupby_t[:, :] values, + const int64_t[:] labels, Py_ssize_t min_count=-1): """ Only aggregates on axis=0 @@ -731,9 +684,9 @@ def group_min(ndarray[groupby_t, ndim=2] out, @cython.boundscheck(False) @cython.wraparound(False) -def group_cummin(ndarray[groupby_t, ndim=2] out, - ndarray[groupby_t, ndim=2] values, - ndarray[int64_t] labels, +def group_cummin(groupby_t[:, :] out, + groupby_t[:, :] values, + const int64_t[:] labels, bint is_datetimelike): """ Only transforms on axis=0 @@ -779,9 +732,9 @@ def group_cummin(ndarray[groupby_t, ndim=2] out, @cython.boundscheck(False) @cython.wraparound(False) -def group_cummax(ndarray[groupby_t, ndim=2] out, - ndarray[groupby_t, ndim=2] values, - ndarray[int64_t] labels, +def group_cummax(groupby_t[:, :] out, + groupby_t[:, :] values, + const int64_t[:] labels, bint is_datetimelike): """ Only transforms on axis=0 diff --git a/pandas/_libs/hashtable.pyx b/pandas/_libs/hashtable.pyx index 47fa5932290af..8d0c451ad0ab8 100644 --- a/pandas/_libs/hashtable.pyx +++ b/pandas/_libs/hashtable.pyx @@ -52,9 +52,10 @@ include "hashtable_class_helper.pxi" include "hashtable_func_helper.pxi" cdef class Factorizer: - cdef public PyObjectHashTable table - cdef public ObjectVector uniques - cdef public Py_ssize_t count + cdef public: + PyObjectHashTable table + ObjectVector uniques + Py_ssize_t count def __init__(self, size_hint): self.table = PyObjectHashTable(size_hint) @@ -96,9 +97,10 @@ cdef class Factorizer: cdef class Int64Factorizer: - cdef public Int64HashTable table - cdef public Int64Vector uniques - cdef public Py_ssize_t count + cdef public: + Int64HashTable table + Int64Vector uniques + Py_ssize_t count def __init__(self, size_hint): self.table = Int64HashTable(size_hint) @@ -140,7 +142,7 @@ cdef class Int64Factorizer: @cython.wraparound(False) @cython.boundscheck(False) -def unique_label_indices(ndarray[int64_t, ndim=1] labels): +def unique_label_indices(const int64_t[:] labels): """ indices of the first occurrences of the unique labels *excluding* -1. equivalent to: @@ -168,6 +170,6 @@ def unique_label_indices(ndarray[int64_t, ndim=1] labels): kh_destroy_int64(table) arr = idx.to_array() - arr = arr[labels[arr].argsort()] + arr = arr[np.asarray(labels)[arr].argsort()] return arr[1:] if arr.size != 0 and labels[arr[0]] == -1 else arr diff --git a/pandas/_libs/hashtable_class_helper.pxi.in b/pandas/_libs/hashtable_class_helper.pxi.in index eac35588b6fc3..3644928d8dedc 100644 --- a/pandas/_libs/hashtable_class_helper.pxi.in +++ b/pandas/_libs/hashtable_class_helper.pxi.in @@ -322,7 +322,7 @@ cdef class {{name}}HashTable(HashTable): self.table.vals[k] = values[i] @cython.boundscheck(False) - def map_locations(self, ndarray[{{dtype}}_t, ndim=1] values): + def map_locations(self, const {{dtype}}_t[:] values): cdef: Py_ssize_t i, n = len(values) int ret = 0 diff --git a/pandas/_libs/internals.pyx b/pandas/_libs/internals.pyx index 72a1cf16f96b6..f23d2666b4bf4 100644 --- a/pandas/_libs/internals.pyx +++ b/pandas/_libs/internals.pyx @@ -23,10 +23,11 @@ from pandas._libs.algos import ensure_int64 cdef class BlockPlacement: # __slots__ = '_as_slice', '_as_array', '_len' - cdef slice _as_slice - cdef object _as_array + cdef: + slice _as_slice + object _as_array - cdef bint _has_slice, _has_array, _is_known_slice_like + bint _has_slice, _has_array, _is_known_slice_like def __init__(self, val): cdef: diff --git a/pandas/_libs/interval.pyx b/pandas/_libs/interval.pyx index 3147f36dcc835..eb511b1adb28a 100644 --- a/pandas/_libs/interval.pyx +++ b/pandas/_libs/interval.pyx @@ -18,7 +18,6 @@ cnp.import_array() cimport pandas._libs.util as util -util.import_array() from pandas._libs.hashtable cimport Int64Vector, Int64VectorData diff --git a/pandas/_libs/join.pyx b/pandas/_libs/join.pyx index e4440ac3d9fd8..503867058b3c8 100644 --- a/pandas/_libs/join.pyx +++ b/pandas/_libs/join.pyx @@ -14,7 +14,7 @@ from pandas._libs.algos import groupsort_indexer, ensure_platform_int from pandas.core.algorithms import take_nd -def inner_join(ndarray[int64_t] left, ndarray[int64_t] right, +def inner_join(const int64_t[:] left, const int64_t[:] right, Py_ssize_t max_groups): cdef: Py_ssize_t i, j, k, count = 0 @@ -65,7 +65,7 @@ def inner_join(ndarray[int64_t] left, ndarray[int64_t] right, _get_result_indexer(right_sorter, right_indexer)) -def left_outer_join(ndarray[int64_t] left, ndarray[int64_t] right, +def left_outer_join(const int64_t[:] left, const int64_t[:] right, Py_ssize_t max_groups, sort=True): cdef: Py_ssize_t i, j, k, count = 0 @@ -139,7 +139,7 @@ def left_outer_join(ndarray[int64_t] left, ndarray[int64_t] right, return left_indexer, right_indexer -def full_outer_join(ndarray[int64_t] left, ndarray[int64_t] right, +def full_outer_join(const int64_t[:] left, const int64_t[:] right, Py_ssize_t max_groups): cdef: Py_ssize_t i, j, k, count = 0 @@ -213,7 +213,7 @@ def _get_result_indexer(sorter, indexer): return res -def ffill_indexer(ndarray[int64_t] indexer): +def ffill_indexer(const int64_t[:] indexer): cdef: Py_ssize_t i, n = len(indexer) ndarray[int64_t] result @@ -252,7 +252,7 @@ ctypedef fused join_t: @cython.wraparound(False) @cython.boundscheck(False) -def left_join_indexer_unique(ndarray[join_t] left, ndarray[join_t] right): +def left_join_indexer_unique(join_t[:] left, join_t[:] right): cdef: Py_ssize_t i, j, nleft, nright ndarray[int64_t] indexer @@ -677,10 +677,10 @@ ctypedef fused by_t: uint64_t -def asof_join_backward_on_X_by_Y(ndarray[asof_t] left_values, - ndarray[asof_t] right_values, - ndarray[by_t] left_by_values, - ndarray[by_t] right_by_values, +def asof_join_backward_on_X_by_Y(asof_t[:] left_values, + asof_t[:] right_values, + by_t[:] left_by_values, + by_t[:] right_by_values, bint allow_exact_matches=1, tolerance=None): @@ -746,10 +746,10 @@ def asof_join_backward_on_X_by_Y(ndarray[asof_t] left_values, return left_indexer, right_indexer -def asof_join_forward_on_X_by_Y(ndarray[asof_t] left_values, - ndarray[asof_t] right_values, - ndarray[by_t] left_by_values, - ndarray[by_t] right_by_values, +def asof_join_forward_on_X_by_Y(asof_t[:] left_values, + asof_t[:] right_values, + by_t[:] left_by_values, + by_t[:] right_by_values, bint allow_exact_matches=1, tolerance=None): @@ -815,10 +815,10 @@ def asof_join_forward_on_X_by_Y(ndarray[asof_t] left_values, return left_indexer, right_indexer -def asof_join_nearest_on_X_by_Y(ndarray[asof_t] left_values, - ndarray[asof_t] right_values, - ndarray[by_t] left_by_values, - ndarray[by_t] right_by_values, +def asof_join_nearest_on_X_by_Y(asof_t[:] left_values, + asof_t[:] right_values, + by_t[:] left_by_values, + by_t[:] right_by_values, bint allow_exact_matches=1, tolerance=None): @@ -864,8 +864,8 @@ def asof_join_nearest_on_X_by_Y(ndarray[asof_t] left_values, # asof_join # ---------------------------------------------------------------------- -def asof_join_backward(ndarray[asof_t] left_values, - ndarray[asof_t] right_values, +def asof_join_backward(asof_t[:] left_values, + asof_t[:] right_values, bint allow_exact_matches=1, tolerance=None): @@ -917,8 +917,8 @@ def asof_join_backward(ndarray[asof_t] left_values, return left_indexer, right_indexer -def asof_join_forward(ndarray[asof_t] left_values, - ndarray[asof_t] right_values, +def asof_join_forward(asof_t[:] left_values, + asof_t[:] right_values, bint allow_exact_matches=1, tolerance=None): @@ -971,8 +971,8 @@ def asof_join_forward(ndarray[asof_t] left_values, return left_indexer, right_indexer -def asof_join_nearest(ndarray[asof_t] left_values, - ndarray[asof_t] right_values, +def asof_join_nearest(asof_t[:] left_values, + asof_t[:] right_values, bint allow_exact_matches=1, tolerance=None): diff --git a/pandas/_libs/lib.pyx b/pandas/_libs/lib.pyx index f845a5437ded4..1f0f0a408aee8 100644 --- a/pandas/_libs/lib.pyx +++ b/pandas/_libs/lib.pyx @@ -40,11 +40,12 @@ cdef extern from "numpy/arrayobject.h": # Use PyDataType_* macros when possible, however there are no macros # for accessing some of the fields, so some are defined. Please # ask on cython-dev if you need more. - cdef int type_num - cdef int itemsize "elsize" - cdef char byteorder - cdef object fields - cdef tuple names + cdef: + int type_num + int itemsize "elsize" + char byteorder + object fields + tuple names cdef extern from "src/parse_helper.h": @@ -67,12 +68,13 @@ from pandas._libs.missing cimport ( # constants that will be compared to potentially arbitrarily large # python int -cdef object oINT64_MAX = INT64_MAX -cdef object oINT64_MIN = INT64_MIN -cdef object oUINT64_MAX = UINT64_MAX +cdef: + object oINT64_MAX = INT64_MAX + object oINT64_MIN = INT64_MIN + object oUINT64_MAX = UINT64_MAX -cdef bint PY2 = sys.version_info[0] == 2 -cdef float64_t NaN = np.NaN + bint PY2 = sys.version_info[0] == 2 + float64_t NaN = np.NaN def values_from_object(obj: object): @@ -231,10 +233,11 @@ def fast_unique_multiple(list arrays, sort: bool=True): if val not in table: table[val] = stub uniques.append(val) - if sort: + if sort is None: try: uniques.sort() except Exception: + # TODO: RuntimeWarning? pass return uniques @@ -376,7 +379,7 @@ def fast_zip(list ndarrays): return result -def get_reverse_indexer(ndarray[int64_t] indexer, Py_ssize_t length): +def get_reverse_indexer(const int64_t[:] indexer, Py_ssize_t length): """ Reverse indexing operation. @@ -405,7 +408,7 @@ def get_reverse_indexer(ndarray[int64_t] indexer, Py_ssize_t length): @cython.wraparound(False) @cython.boundscheck(False) -def has_infs_f4(ndarray[float32_t] arr) -> bool: +def has_infs_f4(const float32_t[:] arr) -> bool: cdef: Py_ssize_t i, n = len(arr) float32_t inf, neginf, val @@ -422,7 +425,7 @@ def has_infs_f4(ndarray[float32_t] arr) -> bool: @cython.wraparound(False) @cython.boundscheck(False) -def has_infs_f8(ndarray[float64_t] arr) -> bool: +def has_infs_f8(const float64_t[:] arr) -> bool: cdef: Py_ssize_t i, n = len(arr) float64_t inf, neginf, val @@ -660,7 +663,7 @@ def clean_index_list(obj: list): # is a general, O(max(len(values), len(binner))) method. @cython.boundscheck(False) @cython.wraparound(False) -def generate_bins_dt64(ndarray[int64_t] values, ndarray[int64_t] binner, +def generate_bins_dt64(ndarray[int64_t] values, const int64_t[:] binner, object closed='left', bint hasnans=0): """ Int64 (datetime64) version of generic python version in groupby.py @@ -723,7 +726,7 @@ def generate_bins_dt64(ndarray[int64_t] values, ndarray[int64_t] binner, @cython.boundscheck(False) @cython.wraparound(False) -def row_bool_subset(ndarray[float64_t, ndim=2] values, +def row_bool_subset(const float64_t[:, :] values, ndarray[uint8_t, cast=True] mask): cdef: Py_ssize_t i, j, n, k, pos = 0 @@ -767,8 +770,8 @@ def row_bool_subset_object(ndarray[object, ndim=2] values, @cython.boundscheck(False) @cython.wraparound(False) -def get_level_sorter(ndarray[int64_t, ndim=1] label, - ndarray[int64_t, ndim=1] starts): +def get_level_sorter(const int64_t[:] label, + const int64_t[:] starts): """ argsort for a single level of a multi-index, keeping the order of higher levels unchanged. `starts` points to starts of same-key indices w.r.t @@ -780,10 +783,11 @@ def get_level_sorter(ndarray[int64_t, ndim=1] label, int64_t l, r Py_ssize_t i ndarray[int64_t, ndim=1] out = np.empty(len(label), dtype=np.int64) + ndarray[int64_t, ndim=1] label_arr = np.asarray(label) for i in range(len(starts) - 1): l, r = starts[i], starts[i + 1] - out[l:r] = l + label[l:r].argsort(kind='mergesort') + out[l:r] = l + label_arr[l:r].argsort(kind='mergesort') return out @@ -791,7 +795,7 @@ def get_level_sorter(ndarray[int64_t, ndim=1] label, @cython.boundscheck(False) @cython.wraparound(False) def count_level_2d(ndarray[uint8_t, ndim=2, cast=True] mask, - ndarray[int64_t, ndim=1] labels, + const int64_t[:] labels, Py_ssize_t max_bin, int axis): cdef: @@ -818,7 +822,7 @@ def count_level_2d(ndarray[uint8_t, ndim=2, cast=True] mask, return counts -def generate_slices(ndarray[int64_t] labels, Py_ssize_t ngroups): +def generate_slices(const int64_t[:] labels, Py_ssize_t ngroups): cdef: Py_ssize_t i, group_size, n, start int64_t lab @@ -847,7 +851,7 @@ def generate_slices(ndarray[int64_t] labels, Py_ssize_t ngroups): return starts, ends -def indices_fast(object index, ndarray[int64_t] labels, list keys, +def indices_fast(object index, const int64_t[:] labels, list keys, list sorted_labels): cdef: Py_ssize_t i, j, k, lab, cur, start, n = len(labels) @@ -1825,7 +1829,7 @@ def maybe_convert_numeric(ndarray[object] values, set na_values, except (ValueError, OverflowError, TypeError): pass - # otherwise, iterate and do full infererence + # Otherwise, iterate and do full inference. cdef: int status, maybe_int Py_ssize_t i, n = values.size @@ -1862,10 +1866,10 @@ def maybe_convert_numeric(ndarray[object] values, set na_values, else: seen.float_ = True - if val <= oINT64_MAX: + if oINT64_MIN <= val <= oINT64_MAX: ints[i] = val - if seen.sint_ and seen.uint_: + if val < oINT64_MIN or (seen.sint_ and seen.uint_): seen.float_ = True elif util.is_bool_object(val): @@ -1907,23 +1911,28 @@ def maybe_convert_numeric(ndarray[object] values, set na_values, else: seen.saw_int(as_int) - if not (seen.float_ or as_int in na_values): + if as_int not in na_values: if as_int < oINT64_MIN or as_int > oUINT64_MAX: - raise ValueError('Integer out of range.') + if seen.coerce_numeric: + seen.float_ = True + else: + raise ValueError("Integer out of range.") + else: + if as_int >= 0: + uints[i] = as_int - if as_int >= 0: - uints[i] = as_int - if as_int <= oINT64_MAX: - ints[i] = as_int + if as_int <= oINT64_MAX: + ints[i] = as_int seen.float_ = seen.float_ or (seen.uint_ and seen.sint_) else: seen.float_ = True except (TypeError, ValueError) as e: if not seen.coerce_numeric: - raise type(e)(str(e) + ' at position {pos}'.format(pos=i)) + raise type(e)(str(e) + " at position {pos}".format(pos=i)) elif "uint64" in str(e): # Exception from check functions. raise + seen.saw_null() floats[i] = NaN @@ -2146,7 +2155,7 @@ def maybe_convert_objects(ndarray[object] objects, bint try_float=0, @cython.boundscheck(False) @cython.wraparound(False) -def map_infer_mask(ndarray arr, object f, ndarray[uint8_t] mask, +def map_infer_mask(ndarray arr, object f, const uint8_t[:] mask, bint convert=1): """ Substitute for np.vectorize with pandas-friendly dtype inference @@ -2268,7 +2277,7 @@ def to_object_array(rows: object, int min_width=0): result = np.empty((n, k), dtype=object) for i in range(n): - row = input_rows[i] + row = list(input_rows[i]) for j in range(len(row)): result[i, j] = row[j] diff --git a/pandas/_libs/missing.pyx b/pandas/_libs/missing.pyx index 229edbac4992d..ab0e4cd6cc765 100644 --- a/pandas/_libs/missing.pyx +++ b/pandas/_libs/missing.pyx @@ -16,10 +16,11 @@ from pandas._libs.tslibs.nattype cimport ( checknull_with_nat, c_NaT as NaT, is_null_datetimelike) -cdef float64_t INF = np.inf -cdef float64_t NEGINF = -INF +cdef: + float64_t INF = np.inf + float64_t NEGINF = -INF -cdef int64_t NPY_NAT = util.get_nat() + int64_t NPY_NAT = util.get_nat() cpdef bint checknull(object val): diff --git a/pandas/_libs/parsers.pyx b/pandas/_libs/parsers.pyx index 6cb6ed749f87b..f679746643643 100644 --- a/pandas/_libs/parsers.pyx +++ b/pandas/_libs/parsers.pyx @@ -64,10 +64,11 @@ from pandas.errors import (ParserError, DtypeWarning, CParserError = ParserError -cdef bint PY3 = (sys.version_info[0] >= 3) +cdef: + bint PY3 = (sys.version_info[0] >= 3) -cdef float64_t INF = np.inf -cdef float64_t NEGINF = -INF + float64_t INF = np.inf + float64_t NEGINF = -INF cdef extern from "errno.h": @@ -735,7 +736,7 @@ cdef class TextReader: int status int64_t hr, data_line char *errors = "strict" - cdef StringPath path = _string_path(self.c_encoding) + StringPath path = _string_path(self.c_encoding) header = [] unnamed_cols = set() @@ -1389,8 +1390,9 @@ cdef class TextReader: return None -cdef object _true_values = [b'True', b'TRUE', b'true'] -cdef object _false_values = [b'False', b'FALSE', b'false'] +cdef: + object _true_values = [b'True', b'TRUE', b'true'] + object _false_values = [b'False', b'FALSE', b'false'] def _ensure_encoded(list lst): @@ -1637,7 +1639,7 @@ cdef _categorical_convert(parser_t *parser, int64_t col, int64_t current_category = 0 char *errors = "strict" - cdef StringPath path = _string_path(encoding) + StringPath path = _string_path(encoding) int ret = 0 kh_str_t *table @@ -1727,9 +1729,10 @@ cdef inline void _to_fw_string_nogil(parser_t *parser, int64_t col, data += width -cdef char* cinf = b'inf' -cdef char* cposinf = b'+inf' -cdef char* cneginf = b'-inf' +cdef: + char* cinf = b'inf' + char* cposinf = b'+inf' + char* cneginf = b'-inf' cdef _try_double(parser_t *parser, int64_t col, diff --git a/pandas/_libs/reduction.pyx b/pandas/_libs/reduction.pyx index ca39c4de4d309..507567cf480d7 100644 --- a/pandas/_libs/reduction.pyx +++ b/pandas/_libs/reduction.pyx @@ -494,7 +494,7 @@ class InvalidApply(Exception): def apply_frame_axis0(object frame, object f, object names, - ndarray[int64_t] starts, ndarray[int64_t] ends): + const int64_t[:] starts, const int64_t[:] ends): cdef: BlockSlider slider Py_ssize_t i, n = len(starts) diff --git a/pandas/_libs/skiplist.pyx b/pandas/_libs/skiplist.pyx index 6698fcb767d7c..2fdee72f9d588 100644 --- a/pandas/_libs/skiplist.pyx +++ b/pandas/_libs/skiplist.pyx @@ -57,8 +57,9 @@ cdef class IndexableSkiplist: return self.get(i) cpdef get(self, Py_ssize_t i): - cdef Py_ssize_t level - cdef Node node + cdef: + Py_ssize_t level + Node node node = self.head i += 1 @@ -71,9 +72,10 @@ cdef class IndexableSkiplist: return node.value cpdef insert(self, double value): - cdef Py_ssize_t level, steps, d - cdef Node node, prevnode, newnode, next_at_level, tmp - cdef list chain, steps_at_level + cdef: + Py_ssize_t level, steps, d + Node node, prevnode, newnode, next_at_level, tmp + list chain, steps_at_level # find first node on each level where node.next[levels].value > value chain = [None] * self.maxlevels @@ -110,9 +112,10 @@ cdef class IndexableSkiplist: self.size += 1 cpdef remove(self, double value): - cdef Py_ssize_t level, d - cdef Node node, prevnode, tmpnode, next_at_level - cdef list chain + cdef: + Py_ssize_t level, d + Node node, prevnode, tmpnode, next_at_level + list chain # find first node on each level where node.next[levels].value >= value chain = [None] * self.maxlevels diff --git a/pandas/_libs/sparse.pyx b/pandas/_libs/sparse.pyx index f5980998f6db4..5471c8184e458 100644 --- a/pandas/_libs/sparse.pyx +++ b/pandas/_libs/sparse.pyx @@ -72,9 +72,6 @@ cdef class IntIndex(SparseIndex): A ValueError is raised if any of these conditions is violated. """ - cdef: - int32_t index, prev = -1 - if self.npoints > self.length: msg = ("Too many indices. Expected " "{exp} but found {act}").format( @@ -86,17 +83,15 @@ cdef class IntIndex(SparseIndex): if self.npoints == 0: return - if min(self.indices) < 0: + if self.indices.min() < 0: raise ValueError("No index can be less than zero") - if max(self.indices) >= self.length: + if self.indices.max() >= self.length: raise ValueError("All indices must be less than the length") - for index in self.indices: - if prev != -1 and index <= prev: - raise ValueError("Indices must be strictly increasing") - - prev = index + monotonic = np.all(self.indices[:-1] < self.indices[1:]) + if not monotonic: + raise ValueError("Indices must be strictly increasing") def equals(self, other): if not isinstance(other, IntIndex): diff --git a/pandas/_libs/sparse_op_helper.pxi.in b/pandas/_libs/sparse_op_helper.pxi.in index c6621ab5977ca..5949a3fd0ed81 100644 --- a/pandas/_libs/sparse_op_helper.pxi.in +++ b/pandas/_libs/sparse_op_helper.pxi.in @@ -125,10 +125,10 @@ def get_dispatch(dtypes): @cython.wraparound(False) @cython.boundscheck(False) -cdef inline tuple block_op_{{opname}}_{{dtype}}(ndarray x_, +cdef inline tuple block_op_{{opname}}_{{dtype}}({{dtype}}_t[:] x_, BlockIndex xindex, {{dtype}}_t xfill, - ndarray y_, + {{dtype}}_t[:] y_, BlockIndex yindex, {{dtype}}_t yfill): ''' @@ -142,7 +142,7 @@ cdef inline tuple block_op_{{opname}}_{{dtype}}(ndarray x_, int32_t xloc, yloc Py_ssize_t xblock = 0, yblock = 0 # block numbers - ndarray[{{dtype}}_t, ndim=1] x, y + {{dtype}}_t[:] x, y ndarray[{{rdtype}}_t, ndim=1] out # to suppress Cython warning @@ -226,16 +226,18 @@ cdef inline tuple block_op_{{opname}}_{{dtype}}(ndarray x_, @cython.wraparound(False) @cython.boundscheck(False) -cdef inline tuple int_op_{{opname}}_{{dtype}}(ndarray x_, IntIndex xindex, +cdef inline tuple int_op_{{opname}}_{{dtype}}({{dtype}}_t[:] x_, + IntIndex xindex, {{dtype}}_t xfill, - ndarray y_, IntIndex yindex, + {{dtype}}_t[:] y_, + IntIndex yindex, {{dtype}}_t yfill): cdef: IntIndex out_index Py_ssize_t xi = 0, yi = 0, out_i = 0 # fp buf indices int32_t xloc, yloc - ndarray[int32_t, ndim=1] xindices, yindices, out_indices - ndarray[{{dtype}}_t, ndim=1] x, y + int32_t[:] xindices, yindices, out_indices + {{dtype}}_t[:] x, y ndarray[{{rdtype}}_t, ndim=1] out # suppress Cython compiler warnings due to inlining @@ -284,9 +286,9 @@ cdef inline tuple int_op_{{opname}}_{{dtype}}(ndarray x_, IntIndex xindex, return out, out_index, {{(opname, 'xfill', 'yfill', dtype) | get_op}} -cpdef sparse_{{opname}}_{{dtype}}(ndarray[{{dtype}}_t, ndim=1] x, +cpdef sparse_{{opname}}_{{dtype}}({{dtype}}_t[:] x, SparseIndex xindex, {{dtype}}_t xfill, - ndarray[{{dtype}}_t, ndim=1] y, + {{dtype}}_t[:] y, SparseIndex yindex, {{dtype}}_t yfill): if isinstance(xindex, BlockIndex): diff --git a/pandas/_libs/tslib.pyx b/pandas/_libs/tslib.pyx index 798e338d5581b..f932e236b5218 100644 --- a/pandas/_libs/tslib.pyx +++ b/pandas/_libs/tslib.pyx @@ -645,6 +645,8 @@ cpdef array_to_datetime(ndarray[object] values, str errors='raise', out_tzoffset_vals.add(out_tzoffset * 60.) tz = pytz.FixedOffset(out_tzoffset) value = tz_convert_single(value, tz, UTC) + out_local = 0 + out_tzoffset = 0 else: # Add a marker for naive string, to track if we are # parsing mixed naive and aware strings diff --git a/pandas/_libs/tslibs/conversion.pyx b/pandas/_libs/tslibs/conversion.pyx index 6c8b732928bc3..1c0adaaa288a9 100644 --- a/pandas/_libs/tslibs/conversion.pyx +++ b/pandas/_libs/tslibs/conversion.pyx @@ -147,7 +147,7 @@ def ensure_timedelta64ns(arr: ndarray, copy: bool=True): @cython.boundscheck(False) @cython.wraparound(False) -def datetime_to_datetime64(values: object[:]): +def datetime_to_datetime64(object[:] values): """ Convert ndarray of datetime-like objects to int64 array representing nanosecond timestamps. diff --git a/pandas/_libs/tslibs/fields.pyx b/pandas/_libs/tslibs/fields.pyx index 5cda7992369fc..240f008394099 100644 --- a/pandas/_libs/tslibs/fields.pyx +++ b/pandas/_libs/tslibs/fields.pyx @@ -381,7 +381,7 @@ def get_start_end_field(int64_t[:] dtindex, object field, @cython.wraparound(False) @cython.boundscheck(False) -def get_date_field(ndarray[int64_t] dtindex, object field): +def get_date_field(int64_t[:] dtindex, object field): """ Given a int64-based datetime index, extract the year, month, etc., field and return an array of these values. diff --git a/pandas/_libs/tslibs/nattype.pyx b/pandas/_libs/tslibs/nattype.pyx index a55d15a7c4e85..b64c3479f23fe 100644 --- a/pandas/_libs/tslibs/nattype.pyx +++ b/pandas/_libs/tslibs/nattype.pyx @@ -183,7 +183,9 @@ cdef class _NaT(datetime): return np.datetime64(NPY_NAT, 'ns') def to_datetime64(self): - """ Returns a numpy.datetime64 object with 'ns' precision """ + """ + Return a numpy.datetime64 object with 'ns' precision. + """ return np.datetime64('NaT', 'ns') def __repr__(self): @@ -382,7 +384,7 @@ class NaTType(_NaT): ) combine = _make_error_func('combine', # noqa:E128 """ - Timsetamp.combine(date, time) + Timestamp.combine(date, time) date, time -> datetime with same date and time fields """ @@ -448,7 +450,7 @@ class NaTType(_NaT): """ Timestamp.now(tz=None) - Returns new Timestamp object representing current time local to + Return new Timestamp object representing current time local to tz. Parameters @@ -669,7 +671,6 @@ class NaTType(_NaT): nanosecond : int, optional tzinfo : tz-convertible, optional fold : int, optional, default is 0 - added in 3.6, NotImplemented Returns ------- diff --git a/pandas/_libs/tslibs/offsets.pyx b/pandas/_libs/tslibs/offsets.pyx index 856aa52f82cf5..e28462f7103b9 100644 --- a/pandas/_libs/tslibs/offsets.pyx +++ b/pandas/_libs/tslibs/offsets.pyx @@ -18,6 +18,7 @@ from numpy cimport int64_t cnp.import_array() +from pandas._libs.tslibs cimport util from pandas._libs.tslibs.util cimport is_string_object, is_integer_object from pandas._libs.tslibs.ccalendar import MONTHS, DAYS @@ -408,6 +409,10 @@ class _BaseOffset(object): return self.apply(other) def __mul__(self, other): + if hasattr(other, "_typ"): + return NotImplemented + if util.is_array(other): + return np.array([self * x for x in other]) return type(self)(n=other * self.n, normalize=self.normalize, **self.kwds) @@ -458,6 +463,9 @@ class _BaseOffset(object): TypeError if `int(n)` raises ValueError if n != int(n) """ + if util.is_timedelta64_object(n): + raise TypeError('`n` argument must be an integer, ' + 'got {ntype}'.format(ntype=type(n))) try: nint = int(n) except (ValueError, TypeError): @@ -533,12 +541,20 @@ class _Tick(object): can do isinstance checks on _Tick and avoid importing tseries.offsets """ + # ensure that reversed-ops with numpy scalars return NotImplemented + __array_priority__ = 1000 + def __truediv__(self, other): result = self.delta.__truediv__(other) return _wrap_timedelta_result(result) + def __rtruediv__(self, other): + result = self.delta.__rtruediv__(other) + return _wrap_timedelta_result(result) + if PY2: __div__ = __truediv__ + __rdiv__ = __rtruediv__ # ---------------------------------------------------------------------- diff --git a/pandas/_libs/tslibs/parsing.pyx b/pandas/_libs/tslibs/parsing.pyx index 82719de2dbdbd..7759e165b7193 100644 --- a/pandas/_libs/tslibs/parsing.pyx +++ b/pandas/_libs/tslibs/parsing.pyx @@ -44,9 +44,10 @@ class DateParseError(ValueError): _DEFAULT_DATETIME = datetime(1, 1, 1).replace(hour=0, minute=0, second=0, microsecond=0) -cdef object _TIMEPAT = re.compile(r'^([01]?[0-9]|2[0-3]):([0-5][0-9])') +cdef: + object _TIMEPAT = re.compile(r'^([01]?[0-9]|2[0-3]):([0-5][0-9])') -cdef set _not_datelike_strings = {'a', 'A', 'm', 'M', 'p', 'P', 't', 'T'} + set _not_datelike_strings = {'a', 'A', 'm', 'M', 'p', 'P', 't', 'T'} # ---------------------------------------------------------------------- diff --git a/pandas/_libs/tslibs/period.pyx b/pandas/_libs/tslibs/period.pyx index 2f4edb7de8f95..e38e9a1ca5df6 100644 --- a/pandas/_libs/tslibs/period.pyx +++ b/pandas/_libs/tslibs/period.pyx @@ -52,9 +52,10 @@ from pandas._libs.tslibs.nattype cimport ( from pandas._libs.tslibs.offsets cimport to_offset from pandas._libs.tslibs.offsets import _Tick -cdef bint PY2 = str == bytes -cdef enum: - INT32_MIN = -2147483648 +cdef: + bint PY2 = str == bytes + enum: + INT32_MIN = -2147483648 ctypedef struct asfreq_info: diff --git a/pandas/_libs/tslibs/resolution.pyx b/pandas/_libs/tslibs/resolution.pyx index f80c1e9841abe..13a4f5ba48557 100644 --- a/pandas/_libs/tslibs/resolution.pyx +++ b/pandas/_libs/tslibs/resolution.pyx @@ -16,15 +16,16 @@ from pandas._libs.tslibs.ccalendar cimport get_days_in_month # ---------------------------------------------------------------------- # Constants -cdef int64_t NPY_NAT = get_nat() - -cdef int RESO_NS = 0 -cdef int RESO_US = 1 -cdef int RESO_MS = 2 -cdef int RESO_SEC = 3 -cdef int RESO_MIN = 4 -cdef int RESO_HR = 5 -cdef int RESO_DAY = 6 +cdef: + int64_t NPY_NAT = get_nat() + + int RESO_NS = 0 + int RESO_US = 1 + int RESO_MS = 2 + int RESO_SEC = 3 + int RESO_MIN = 4 + int RESO_HR = 5 + int RESO_DAY = 6 # ---------------------------------------------------------------------- diff --git a/pandas/_libs/tslibs/timedeltas.pyx b/pandas/_libs/tslibs/timedeltas.pyx index 0a19d8749fc7c..f08a57375a301 100644 --- a/pandas/_libs/tslibs/timedeltas.pyx +++ b/pandas/_libs/tslibs/timedeltas.pyx @@ -1158,6 +1158,11 @@ class Timedelta(_Timedelta): "[weeks, days, hours, minutes, seconds, " "milliseconds, microseconds, nanoseconds]") + if unit in {'Y', 'y', 'M'}: + warnings.warn("M and Y units are deprecated and " + "will be removed in a future version.", + FutureWarning, stacklevel=1) + if isinstance(value, Timedelta): value = value.value elif is_string_object(value): diff --git a/pandas/_libs/tslibs/timestamps.pyx b/pandas/_libs/tslibs/timestamps.pyx index fe0564cb62c30..25b0b4069cf7c 100644 --- a/pandas/_libs/tslibs/timestamps.pyx +++ b/pandas/_libs/tslibs/timestamps.pyx @@ -1,4 +1,5 @@ # -*- coding: utf-8 -*- +import sys import warnings from cpython cimport (PyObject_RichCompareBool, PyObject_RichCompare, @@ -43,10 +44,11 @@ from pandas._libs.tslibs.timezones import UTC # Constants _zero_time = datetime_time(0, 0) _no_input = object() - +PY36 = sys.version_info >= (3, 6) # ---------------------------------------------------------------------- + def maybe_integer_op_deprecated(obj): # GH#22535 add/sub of integers and int-arrays is deprecated if obj.freq is not None: @@ -197,7 +199,7 @@ def round_nsint64(values, mode, freq): # This is PITA. Because we inherit from datetime, which has very specific # construction requirements, we need to do object instantiation in python -# (see Timestamp class above). This will serve as a C extension type that +# (see Timestamp class below). This will serve as a C extension type that # shadows the python class, where we do any heavy lifting. cdef class _Timestamp(datetime): @@ -338,7 +340,9 @@ cdef class _Timestamp(datetime): self.microsecond, self.tzinfo) cpdef to_datetime64(self): - """ Returns a numpy.datetime64 object with 'ns' precision """ + """ + Return a numpy.datetime64 object with 'ns' precision. + """ return np.datetime64(self.value, 'ns') def __add__(self, other): @@ -500,6 +504,9 @@ cdef class _Timestamp(datetime): @property def asm8(self): + """ + Return numpy datetime64 format in nanoseconds. + """ return np.datetime64(self.value, 'ns') @property @@ -566,15 +573,18 @@ class Timestamp(_Timestamp): Using the primary calling convention: This converts a datetime-like string + >>> pd.Timestamp('2017-01-01T12') Timestamp('2017-01-01 12:00:00') This converts a float representing a Unix epoch in units of seconds + >>> pd.Timestamp(1513393355.5, unit='s') Timestamp('2017-12-16 03:02:35.500000') This converts an int representing a Unix-epoch in units of seconds and for a particular timezone + >>> pd.Timestamp(1513393355, unit='s', tz='US/Pacific') Timestamp('2017-12-15 19:02:35-0800', tz='US/Pacific') @@ -612,7 +622,7 @@ class Timestamp(_Timestamp): """ Timestamp.now(tz=None) - Returns new Timestamp object representing current time local to + Return new Timestamp object representing current time local to tz. Parameters @@ -670,7 +680,7 @@ class Timestamp(_Timestamp): @classmethod def combine(cls, date, time): """ - Timsetamp.combine(date, time) + Timestamp.combine(date, time) date, time -> datetime with same date and time fields """ @@ -930,6 +940,9 @@ class Timestamp(_Timestamp): @property def dayofweek(self): + """ + Return day of whe week. + """ return self.weekday() def day_name(self, locale=None): @@ -979,30 +992,48 @@ class Timestamp(_Timestamp): @property def dayofyear(self): + """ + Return the day of the year. + """ return ccalendar.get_day_of_year(self.year, self.month, self.day) @property def week(self): + """ + Return the week number of the year. + """ return ccalendar.get_week_of_year(self.year, self.month, self.day) weekofyear = week @property def quarter(self): + """ + Return the quarter of the year. + """ return ((self.month - 1) // 3) + 1 @property def days_in_month(self): + """ + Return the number of days in the month. + """ return ccalendar.get_days_in_month(self.year, self.month) daysinmonth = days_in_month @property def freqstr(self): + """ + Return the total number of days in the month. + """ return getattr(self.freq, 'freqstr', self.freq) @property def is_month_start(self): + """ + Return True if date is first day of month. + """ if self.freq is None: # fast-path for non-business frequencies return self.day == 1 @@ -1010,6 +1041,9 @@ class Timestamp(_Timestamp): @property def is_month_end(self): + """ + Return True if date is last day of month. + """ if self.freq is None: # fast-path for non-business frequencies return self.day == self.days_in_month @@ -1017,6 +1051,9 @@ class Timestamp(_Timestamp): @property def is_quarter_start(self): + """ + Return True if date is first day of the quarter. + """ if self.freq is None: # fast-path for non-business frequencies return self.day == 1 and self.month % 3 == 1 @@ -1024,6 +1061,9 @@ class Timestamp(_Timestamp): @property def is_quarter_end(self): + """ + Return True if date is last day of the quarter. + """ if self.freq is None: # fast-path for non-business frequencies return (self.month % 3) == 0 and self.day == self.days_in_month @@ -1031,6 +1071,9 @@ class Timestamp(_Timestamp): @property def is_year_start(self): + """ + Return True if date is first day of the year. + """ if self.freq is None: # fast-path for non-business frequencies return self.day == self.month == 1 @@ -1038,6 +1081,9 @@ class Timestamp(_Timestamp): @property def is_year_end(self): + """ + Return True if date is last day of the year. + """ if self.freq is None: # fast-path for non-business frequencies return self.month == 12 and self.day == 31 @@ -1045,6 +1091,9 @@ class Timestamp(_Timestamp): @property def is_leap_year(self): + """ + Return True if year is a leap year. + """ return bool(ccalendar.is_leapyear(self.year)) def tz_localize(self, tz, ambiguous='raise', nonexistent='raise', @@ -1195,7 +1244,6 @@ class Timestamp(_Timestamp): nanosecond : int, optional tzinfo : tz-convertible, optional fold : int, optional, default is 0 - added in 3.6, NotImplemented Returns ------- @@ -1252,12 +1300,16 @@ class Timestamp(_Timestamp): # see GH#18319 ts_input = _tzinfo.localize(datetime(dts.year, dts.month, dts.day, dts.hour, dts.min, dts.sec, - dts.us)) + dts.us), + is_dst=not bool(fold)) _tzinfo = ts_input.tzinfo else: - ts_input = datetime(dts.year, dts.month, dts.day, - dts.hour, dts.min, dts.sec, dts.us, - tzinfo=_tzinfo) + kwargs = {'year': dts.year, 'month': dts.month, 'day': dts.day, + 'hour': dts.hour, 'minute': dts.min, 'second': dts.sec, + 'microsecond': dts.us, 'tzinfo': _tzinfo} + if PY36: + kwargs['fold'] = fold + ts_input = datetime(**kwargs) ts = convert_datetime_to_tsobject(ts_input, _tzinfo) value = ts.value + (dts.ps // 1000) diff --git a/pandas/_libs/window.pyx b/pandas/_libs/window.pyx index e8f3de64c3823..cc5b3b63f5b04 100644 --- a/pandas/_libs/window.pyx +++ b/pandas/_libs/window.pyx @@ -26,13 +26,14 @@ from pandas._libs.skiplist cimport ( skiplist_t, skiplist_init, skiplist_destroy, skiplist_get, skiplist_insert, skiplist_remove) -cdef float32_t MINfloat32 = np.NINF -cdef float64_t MINfloat64 = np.NINF +cdef: + float32_t MINfloat32 = np.NINF + float64_t MINfloat64 = np.NINF -cdef float32_t MAXfloat32 = np.inf -cdef float64_t MAXfloat64 = np.inf + float32_t MAXfloat32 = np.inf + float64_t MAXfloat64 = np.inf -cdef float64_t NaN = np.NaN + float64_t NaN = np.NaN cdef inline int int_max(int a, int b): return a if a >= b else b cdef inline int int_min(int a, int b): return a if a <= b else b @@ -242,7 +243,7 @@ cdef class VariableWindowIndexer(WindowIndexer): # max window size self.win = (self.end - self.start).max() - def build(self, ndarray[int64_t] index, int64_t win, bint left_closed, + def build(self, const int64_t[:] index, int64_t win, bint left_closed, bint right_closed): cdef: diff --git a/pandas/compat/__init__.py b/pandas/compat/__init__.py index f9c659106a516..d7ca7f8963f70 100644 --- a/pandas/compat/__init__.py +++ b/pandas/compat/__init__.py @@ -9,7 +9,6 @@ * lists: lrange(), lmap(), lzip(), lfilter() * unicode: u() [no unicode builtin in Python 3] * longs: long (int in Python 3) -* callable * iterable method compatibility: iteritems, iterkeys, itervalues * Uses the original method if available, otherwise uses items, keys, values. * types: @@ -378,14 +377,6 @@ class ResourceWarning(Warning): string_and_binary_types = string_types + (binary_type,) -try: - # callable reintroduced in later versions of Python - callable = callable -except NameError: - def callable(obj): - return any("__call__" in klass.__dict__ for klass in type(obj).__mro__) - - if PY2: # In PY2 functools.wraps doesn't provide metadata pytest needs to generate # decorated tests using parametrization. See pytest GH issue #2782 @@ -411,8 +402,6 @@ def wrapper(cls): return metaclass(cls.__name__, cls.__bases__, orig_vars) return wrapper -from collections import OrderedDict, Counter - if PY3: def raise_with_traceback(exc, traceback=Ellipsis): if traceback == Ellipsis: diff --git a/pandas/compat/numpy/__init__.py b/pandas/compat/numpy/__init__.py index 5e67cf2ee2837..bc9af01a97467 100644 --- a/pandas/compat/numpy/__init__.py +++ b/pandas/compat/numpy/__init__.py @@ -12,6 +12,7 @@ _np_version_under1p13 = _nlv < LooseVersion('1.13') _np_version_under1p14 = _nlv < LooseVersion('1.14') _np_version_under1p15 = _nlv < LooseVersion('1.15') +_np_version_under1p16 = _nlv < LooseVersion('1.16') if _nlv < '1.12': @@ -64,5 +65,6 @@ def np_array_datetime64_compat(arr, *args, **kwargs): __all__ = ['np', '_np_version_under1p13', '_np_version_under1p14', - '_np_version_under1p15' + '_np_version_under1p15', + '_np_version_under1p16' ] diff --git a/pandas/compat/numpy/function.py b/pandas/compat/numpy/function.py index 417ddd0d8af17..f15783ad642b4 100644 --- a/pandas/compat/numpy/function.py +++ b/pandas/compat/numpy/function.py @@ -17,10 +17,10 @@ and methods that are spread throughout the codebase. This module will make it easier to adjust to future upstream changes in the analogous numpy signatures. """ +from collections import OrderedDict from numpy import ndarray -from pandas.compat import OrderedDict from pandas.errors import UnsupportedFunctionCall from pandas.util._validators import ( validate_args, validate_args_and_kwargs, validate_kwargs) diff --git a/pandas/compat/pickle_compat.py b/pandas/compat/pickle_compat.py index 61295b8249f58..8f16f8154b952 100644 --- a/pandas/compat/pickle_compat.py +++ b/pandas/compat/pickle_compat.py @@ -201,7 +201,7 @@ def load_newobj_ex(self): pass -def load(fh, encoding=None, compat=False, is_verbose=False): +def load(fh, encoding=None, is_verbose=False): """load a pickle, with a provided encoding if compat is True: @@ -212,7 +212,6 @@ def load(fh, encoding=None, compat=False, is_verbose=False): ---------- fh : a filelike object encoding : an optional encoding - compat : provide Series compatibility mode, boolean, default False is_verbose : show exception output """ diff --git a/pandas/core/accessor.py b/pandas/core/accessor.py index 961488ff12e58..050749741e7bd 100644 --- a/pandas/core/accessor.py +++ b/pandas/core/accessor.py @@ -16,11 +16,15 @@ class DirNamesMixin(object): ['asobject', 'base', 'data', 'flags', 'itemsize', 'strides']) def _dir_deletions(self): - """ delete unwanted __dir__ for this object """ + """ + Delete unwanted __dir__ for this object. + """ return self._accessors | self._deprecations def _dir_additions(self): - """ add additional __dir__ for this object """ + """ + Add additional __dir__ for this object. + """ rv = set() for accessor in self._accessors: try: @@ -33,7 +37,7 @@ def _dir_additions(self): def __dir__(self): """ Provide method name lookup and completion - Only provide 'public' methods + Only provide 'public' methods. """ rv = set(dir(type(self))) rv = (rv - self._dir_deletions()) | self._dir_additions() @@ -42,7 +46,7 @@ def __dir__(self): class PandasDelegate(object): """ - an abstract base class for delegating methods/properties + An abstract base class for delegating methods/properties. """ def _delegate_property_get(self, name, *args, **kwargs): @@ -65,10 +69,10 @@ def _add_delegate_accessors(cls, delegate, accessors, typ, ---------- cls : the class to add the methods/properties to delegate : the class to get methods/properties & doc-strings - acccessors : string list of accessors to add + accessors : string list of accessors to add typ : 'property' or 'method' overwrite : boolean, default False - overwrite the method/property in the target class if it exists + overwrite the method/property in the target class if it exists. """ def _create_delegator_property(name): @@ -117,7 +121,7 @@ def delegate_names(delegate, accessors, typ, overwrite=False): ---------- delegate : object the class to get methods/properties & doc-strings - acccessors : Sequence[str] + accessors : Sequence[str] List of accessor to add typ : {'property', 'method'} overwrite : boolean, default False diff --git a/pandas/core/algorithms.py b/pandas/core/algorithms.py index b473a7aef929e..a70a3ff06f202 100644 --- a/pandas/core/algorithms.py +++ b/pandas/core/algorithms.py @@ -289,14 +289,14 @@ def unique(values): Returns ------- unique values. - - If the input is an Index, the return is an Index - - If the input is a Categorical dtype, the return is a Categorical - - If the input is a Series/ndarray, the return will be an ndarray + If the input is an Index, the return is an Index + If the input is a Categorical dtype, the return is a Categorical + If the input is a Series/ndarray, the return will be an ndarray See Also -------- - pandas.Index.unique - pandas.Series.unique + Index.unique + Series.unique Examples -------- diff --git a/pandas/core/api.py b/pandas/core/api.py index afc929c39086c..8c92287e212a6 100644 --- a/pandas/core/api.py +++ b/pandas/core/api.py @@ -4,7 +4,6 @@ import numpy as np -from pandas.core.arrays import IntervalArray from pandas.core.arrays.integer import ( Int8Dtype, Int16Dtype, diff --git a/pandas/core/arrays/array_.py b/pandas/core/arrays/array_.py index c7be8e3f745c4..41d623c7efd9c 100644 --- a/pandas/core/arrays/array_.py +++ b/pandas/core/arrays/array_.py @@ -50,7 +50,7 @@ def array(data, # type: Sequence[object] ============================== ===================================== Scalar Type Array Type ============================== ===================================== - :class:`pandas.Interval` :class:`pandas.IntervalArray` + :class:`pandas.Interval` :class:`pandas.arrays.IntervalArray` :class:`pandas.Period` :class:`pandas.arrays.PeriodArray` :class:`datetime.datetime` :class:`pandas.arrays.DatetimeArray` :class:`datetime.timedelta` :class:`pandas.arrays.TimedeltaArray` diff --git a/pandas/core/arrays/categorical.py b/pandas/core/arrays/categorical.py index 35b662eaae9a5..ab58f86e0a6bc 100644 --- a/pandas/core/arrays/categorical.py +++ b/pandas/core/arrays/categorical.py @@ -214,7 +214,7 @@ def contains(cat, key, container): class Categorical(ExtensionArray, PandasObject): """ - Represents a categorical variable in classic R / S-plus fashion + Represent a categorical variable in classic R / S-plus fashion `Categoricals` can only take on only a limited, and usually fixed, number of possible values (`categories`). In contrast to statistical categorical @@ -276,7 +276,7 @@ class Categorical(ExtensionArray, PandasObject): See Also -------- - pandas.api.types.CategoricalDtype : Type for categorical data. + api.types.CategoricalDtype : Type for categorical data. CategoricalIndex : An Index with an underlying ``Categorical``. Notes @@ -747,7 +747,7 @@ def _set_dtype(self, dtype): def set_ordered(self, value, inplace=False): """ - Sets the ordered attribute to the boolean value + Set the ordered attribute to the boolean value Parameters ---------- @@ -793,7 +793,7 @@ def as_unordered(self, inplace=False): def set_categories(self, new_categories, ordered=None, rename=False, inplace=False): """ - Sets the categories to the specified new_categories. + Set the categories to the specified new_categories. `new_categories` can include new categories (which will result in unused categories) or remove old categories (which results in values @@ -864,7 +864,7 @@ def set_categories(self, new_categories, ordered=None, rename=False, def rename_categories(self, new_categories, inplace=False): """ - Renames categories. + Rename categories. Parameters ---------- @@ -958,7 +958,7 @@ def rename_categories(self, new_categories, inplace=False): def reorder_categories(self, new_categories, ordered=None, inplace=False): """ - Reorders categories as specified in new_categories. + Reorder categories as specified in new_categories. `new_categories` need to include all old categories and no new category items. @@ -1051,7 +1051,7 @@ def add_categories(self, new_categories, inplace=False): def remove_categories(self, removals, inplace=False): """ - Removes the specified categories. + Remove the specified categories. `removals` must be included in the old categories. Values which were in the removed categories will be set to NaN @@ -1104,7 +1104,7 @@ def remove_categories(self, removals, inplace=False): def remove_unused_categories(self, inplace=False): """ - Removes categories which are not used. + Remove categories which are not used. Parameters ---------- @@ -1454,7 +1454,7 @@ def dropna(self): def value_counts(self, dropna=True): """ - Returns a Series containing counts of each category. + Return a Series containing counts of each category. Every category will have an entry, even those with a count of 0. @@ -1570,7 +1570,7 @@ def argsort(self, *args, **kwargs): def sort_values(self, inplace=False, ascending=True, na_position='last'): """ - Sorts the Categorical by category value returning a new + Sort the Categorical by category value returning a new Categorical by default. While an ordering is applied to the category values, sorting in this @@ -2167,8 +2167,7 @@ def _reverse_indexer(self): r, counts = libalgos.groupsort_indexer(self.codes.astype('int64'), categories.size) counts = counts.cumsum() - result = [r[counts[indexer]:counts[indexer + 1]] - for indexer in range(len(counts) - 1)] + result = (r[start:end] for start, end in zip(counts, counts[1:])) result = dict(zip(categories, result)) return result @@ -2321,8 +2320,7 @@ def _values_for_factorize(self): @classmethod def _from_factorized(cls, uniques, original): return original._constructor(original.categories.take(uniques), - categories=original.categories, - ordered=original.ordered) + dtype=original.dtype) def equals(self, other): """ @@ -2674,9 +2672,7 @@ def _factorize_from_iterable(values): if is_categorical(values): if isinstance(values, (ABCCategoricalIndex, ABCSeries)): values = values._values - categories = CategoricalIndex(values.categories, - categories=values.categories, - ordered=values.ordered) + categories = CategoricalIndex(values.categories, dtype=values.dtype) codes = values.codes else: # The value of ordered is irrelevant since we don't use cat as such, diff --git a/pandas/core/arrays/datetimes.py b/pandas/core/arrays/datetimes.py index f2aeb1c1309de..1b2a4da389dc4 100644 --- a/pandas/core/arrays/datetimes.py +++ b/pandas/core/arrays/datetimes.py @@ -128,7 +128,7 @@ def _dt_array_cmp(cls, op): Wrap comparison operations to convert datetime-like to datetime64 """ opname = '__{name}__'.format(name=op.__name__) - nat_result = True if opname == '__ne__' else False + nat_result = opname == '__ne__' def wrapper(self, other): if isinstance(other, (ABCDataFrame, ABCSeries, ABCIndexClass)): @@ -218,6 +218,13 @@ class DatetimeArray(dtl.DatetimeLikeArrayMixin, .. versionadded:: 0.24.0 + .. warning:: + + DatetimeArray is currently experimental, and its API may change + without warning. In particular, :attr:`DatetimeArray.dtype` is + expected to change to always be an instance of an ``ExtensionDtype`` + subclass. + Parameters ---------- values : Series, Index, DatetimeArray, ndarray @@ -511,6 +518,12 @@ def dtype(self): """ The dtype for the DatetimeArray. + .. warning:: + + A future version of pandas will change dtype to never be a + ``numpy.dtype``. Instead, :attr:`DatetimeArray.dtype` will + always be an instance of an ``ExtensionDtype`` subclass. + Returns ------- numpy.dtype or DatetimeTZDtype @@ -2045,7 +2058,7 @@ def validate_tz_from_dtype(dtype, tz): # tz-naive dtype (i.e. datetime64[ns]) if tz is not None and not timezones.tz_compare(tz, dtz): raise ValueError("cannot supply both a tz and a " - "timezone-naive dtype (i.e. datetime64[ns]") + "timezone-naive dtype (i.e. datetime64[ns])") return tz diff --git a/pandas/core/arrays/integer.py b/pandas/core/arrays/integer.py index b3dde6bf2bd93..fd90aec3b5e8c 100644 --- a/pandas/core/arrays/integer.py +++ b/pandas/core/arrays/integer.py @@ -225,24 +225,57 @@ class IntegerArray(ExtensionArray, ExtensionOpsMixin): """ Array of integer (optional missing) values. + .. versionadded:: 0.24.0 + + .. warning:: + + IntegerArray is currently experimental, and its API or internal + implementation may change without warning. + We represent an IntegerArray with 2 numpy arrays: - data: contains a numpy integer array of the appropriate dtype - mask: a boolean array holding a mask on the data, True is missing To construct an IntegerArray from generic array-like input, use - ``integer_array`` function instead. + :func:`pandas.array` with one of the integer dtypes (see examples). + + See :ref:`integer_na` for more. Parameters ---------- - values : integer 1D numpy array - mask : boolean 1D numpy array + values : numpy.ndarray + A 1-d integer-dtype array. + mask : numpy.ndarray + A 1-d boolean-dtype array indicating missing values. copy : bool, default False + Whether to copy the `values` and `mask`. Returns ------- IntegerArray + Examples + -------- + Create an IntegerArray with :func:`pandas.array`. + + >>> int_array = pd.array([1, None, 3], dtype=pd.Int32Dtype()) + >>> int_array + + [1, NaN, 3] + Length: 3, dtype: Int32 + + String aliases for the dtypes are also available. They are capitalized. + + >>> pd.array([1, None, 3], dtype='Int32') + + [1, NaN, 3] + Length: 3, dtype: Int32 + + >>> pd.array([1, None, 3], dtype='UInt16') + + [1, NaN, 3] + Length: 3, dtype: UInt16 """ @cache_readonly @@ -528,7 +561,7 @@ def cmp_method(self, other): else: mask = self._mask | mask - result[mask] = True if op_name == 'ne' else False + result[mask] = op_name == 'ne' return result name = '__{name}__'.format(name=op.__name__) diff --git a/pandas/core/arrays/interval.py b/pandas/core/arrays/interval.py index 45470e03c041a..1e671c7bd956a 100644 --- a/pandas/core/arrays/interval.py +++ b/pandas/core/arrays/interval.py @@ -32,6 +32,7 @@ _shared_docs_kwargs = dict( klass='IntervalArray', + qualname='arrays.IntervalArray', name='' ) @@ -115,7 +116,7 @@ A new ``IntervalArray`` can be constructed directly from an array-like of ``Interval`` objects: - >>> pd.IntervalArray([pd.Interval(0, 1), pd.Interval(1, 5)]) + >>> pd.arrays.IntervalArray([pd.Interval(0, 1), pd.Interval(1, 5)]) IntervalArray([(0, 1], (1, 5]], closed='right', dtype='interval[int64]') @@ -248,8 +249,8 @@ def _from_factorized(cls, values, original): Examples -------- - >>> pd.%(klass)s.from_breaks([0, 1, 2, 3]) - %(klass)s([(0, 1], (1, 2], (2, 3]] + >>> pd.%(qualname)s.from_breaks([0, 1, 2, 3]) + %(klass)s([(0, 1], (1, 2], (2, 3]], closed='right', dtype='interval[int64]') """ @@ -311,7 +312,7 @@ def from_breaks(cls, breaks, closed='right', copy=False, dtype=None): Examples -------- >>> %(klass)s.from_arrays([0, 1, 2], [1, 2, 3]) - %(klass)s([(0, 1], (1, 2], (2, 3]] + %(klass)s([(0, 1], (1, 2], (2, 3]], closed='right', dtype='interval[int64]') """ @@ -354,16 +355,16 @@ def from_arrays(cls, left, right, closed='right', copy=False, dtype=None): Examples -------- - >>> pd.%(klass)s.from_intervals([pd.Interval(0, 1), + >>> pd.%(qualname)s.from_intervals([pd.Interval(0, 1), ... pd.Interval(1, 2)]) - %(klass)s([(0, 1], (1, 2]] + %(klass)s([(0, 1], (1, 2]], closed='right', dtype='interval[int64]') The generic Index constructor work identically when it infers an array of all intervals: >>> pd.Index([pd.Interval(0, 1), pd.Interval(1, 2)]) - %(klass)s([(0, 1], (1, 2]] + %(klass)s([(0, 1], (1, 2]], closed='right', dtype='interval[int64]') """ @@ -394,7 +395,7 @@ def from_arrays(cls, left, right, closed='right', copy=False, dtype=None): Examples -------- - >>> pd.%(klass)s.from_tuples([(0, 1), (1, 2)]) + >>> pd.%(qualname)s.from_tuples([(0, 1), (1, 2)]) %(klass)s([(0, 1], (1, 2]], closed='right', dtype='interval[int64]') """ @@ -891,13 +892,13 @@ def closed(self): Examples -------- - >>> index = pd.interval_range(0, 3) - >>> index - %(klass)s([(0, 1], (1, 2], (2, 3]] + >>> index = pd.interval_range(0, 3) + >>> index + IntervalIndex([(0, 1], (1, 2], (2, 3]], closed='right', dtype='interval[int64]') - >>> index.set_closed('both') - %(klass)s([[0, 1], [1, 2], [2, 3]] + >>> index.set_closed('both') + IntervalIndex([[0, 1], [1, 2], [2, 3]], closed='both', dtype='interval[int64]') """ @@ -1039,7 +1040,7 @@ def repeat(self, repeats, axis=None): Examples -------- - >>> intervals = pd.%(klass)s.from_tuples([(0, 1), (1, 3), (2, 4)]) + >>> intervals = pd.%(qualname)s.from_tuples([(0, 1), (1, 3), (2, 4)]) >>> intervals %(klass)s([(0, 1], (1, 3], (2, 4]], closed='right', diff --git a/pandas/core/arrays/numpy_.py b/pandas/core/arrays/numpy_.py index 47517782e2bbf..791ff44303e96 100644 --- a/pandas/core/arrays/numpy_.py +++ b/pandas/core/arrays/numpy_.py @@ -222,7 +222,7 @@ def __getitem__(self, item): item = item._ndarray result = self._ndarray[item] - if not lib.is_scalar(result): + if not lib.is_scalar(item): result = type(self)(result) return result diff --git a/pandas/core/arrays/period.py b/pandas/core/arrays/period.py index e0c71b5609096..3ddceb8c2839d 100644 --- a/pandas/core/arrays/period.py +++ b/pandas/core/arrays/period.py @@ -46,7 +46,7 @@ def _period_array_cmp(cls, op): Wrap comparison operations to convert Period-like to PeriodDtype """ opname = '__{name}__'.format(name=op.__name__) - nat_result = True if opname == '__ne__' else False + nat_result = opname == '__ne__' def wrapper(self, other): op = getattr(self.asi8, opname) diff --git a/pandas/core/arrays/timedeltas.py b/pandas/core/arrays/timedeltas.py index 910cb96a86216..06e2bf76fcf96 100644 --- a/pandas/core/arrays/timedeltas.py +++ b/pandas/core/arrays/timedeltas.py @@ -62,7 +62,7 @@ def _td_array_cmp(cls, op): Wrap comparison operations to convert timedelta-like to timedelta64 """ opname = '__{name}__'.format(name=op.__name__) - nat_result = True if opname == '__ne__' else False + nat_result = opname == '__ne__' def wrapper(self, other): if isinstance(other, (ABCDataFrame, ABCSeries, ABCIndexClass)): @@ -107,6 +107,29 @@ def wrapper(self, other): class TimedeltaArray(dtl.DatetimeLikeArrayMixin, dtl.TimelikeOps): + """ + Pandas ExtensionArray for timedelta data. + + .. versionadded:: 0.24.0 + + .. warning:: + + TimedeltaArray is currently experimental, and its API may change + without warning. In particular, :attr:`TimedeltaArray.dtype` is + expected to change to be an instance of an ``ExtensionDtype`` + subclass. + + Parameters + ---------- + values : array-like + The timedelta data. + + dtype : numpy.dtype + Currently, only ``numpy.dtype("timedelta64[ns]")`` is accepted. + freq : Offset, optional + copy : bool, default False + Whether to copy the underlying array of data. + """ _typ = "timedeltaarray" _scalar_type = Timedelta __array_priority__ = 1000 @@ -128,6 +151,19 @@ def _box_func(self): @property def dtype(self): + """ + The dtype for the TimedeltaArray. + + .. warning:: + + A future version of pandas will change dtype to be an instance + of a :class:`pandas.api.extensions.ExtensionDtype` subclass, + not a ``numpy.dtype``. + + Returns + ------- + numpy.dtype + """ return _TD_DTYPE # ---------------------------------------------------------------- diff --git a/pandas/core/base.py b/pandas/core/base.py index c02ba88ea7fda..5a98e83c65884 100644 --- a/pandas/core/base.py +++ b/pandas/core/base.py @@ -1,6 +1,7 @@ """ Base and utility classes for pandas objects. """ +from collections import OrderedDict import textwrap import warnings @@ -8,7 +9,7 @@ import pandas._libs.lib as lib import pandas.compat as compat -from pandas.compat import PYPY, OrderedDict, builtins, map, range +from pandas.compat import PYPY, builtins, map, range from pandas.compat.numpy import function as nv from pandas.errors import AbstractMethodError from pandas.util._decorators import Appender, Substitution, cache_readonly @@ -376,7 +377,7 @@ def nested_renaming_depr(level=4): # eg. {'A' : ['mean']}, normalize all to # be list-likes if any(is_aggregator(x) for x in compat.itervalues(arg)): - new_arg = compat.OrderedDict() + new_arg = OrderedDict() for k, v in compat.iteritems(arg): if not isinstance(v, (tuple, list, dict)): new_arg[k] = [v] @@ -444,14 +445,14 @@ def _agg(arg, func): run the aggregations over the arg with func return an OrderedDict """ - result = compat.OrderedDict() + result = OrderedDict() for fname, agg_how in compat.iteritems(arg): result[fname] = func(fname, agg_how) return result # set the final keys keys = list(compat.iterkeys(arg)) - result = compat.OrderedDict() + result = OrderedDict() # nested renamer if is_nested_renamer: @@ -459,7 +460,7 @@ def _agg(arg, func): if all(isinstance(r, dict) for r in result): - result, results = compat.OrderedDict(), result + result, results = OrderedDict(), result for r in results: result.update(r) keys = list(compat.iterkeys(result)) @@ -1234,7 +1235,7 @@ def value_counts(self, normalize=False, sort=True, ascending=False, If True then the object returned will contain the relative frequencies of the unique values. sort : boolean, default True - Sort by values. + Sort by frequencies. ascending : boolean, default False Sort in ascending order. bins : integer, optional @@ -1323,12 +1324,31 @@ def nunique(self, dropna=True): Parameters ---------- - dropna : boolean, default True + dropna : bool, default True Don't include NaN in the count. Returns ------- - nunique : int + int + + See Also + -------- + DataFrame.nunique: Method nunique for DataFrame. + Series.count: Count non-NA/null observations in the Series. + + Examples + -------- + >>> s = pd.Series([1, 3, 5, 7, 7]) + >>> s + 0 1 + 1 3 + 2 5 + 3 7 + 4 7 + dtype: int64 + + >>> s.nunique() + 4 """ uniqs = self.unique() n = len(uniqs) @@ -1345,7 +1365,7 @@ def is_unique(self): ------- is_unique : boolean """ - return self.nunique() == len(self) + return self.nunique(dropna=False) == len(self) @property def is_monotonic(self): diff --git a/pandas/core/common.py b/pandas/core/common.py index b4de0daa13b16..5b83cb344b1e7 100644 --- a/pandas/core/common.py +++ b/pandas/core/common.py @@ -5,6 +5,7 @@ """ import collections +from collections import OrderedDict from datetime import datetime, timedelta from functools import partial import inspect @@ -13,7 +14,7 @@ from pandas._libs import lib, tslibs import pandas.compat as compat -from pandas.compat import PY36, OrderedDict, iteritems +from pandas.compat import PY36, iteritems from pandas.core.dtypes.cast import construct_1d_object_array_from_listlike from pandas.core.dtypes.common import ( @@ -32,7 +33,8 @@ class SettingWithCopyWarning(Warning): def flatten(l): - """Flatten an arbitrarily nested sequence. + """ + Flatten an arbitrarily nested sequence. Parameters ---------- @@ -160,12 +162,16 @@ def cast_scalar_indexer(val): def _not_none(*args): - """Returns a generator consisting of the arguments that are not None""" + """ + Returns a generator consisting of the arguments that are not None. + """ return (arg for arg in args if arg is not None) def _any_none(*args): - """Returns a boolean indicating if any argument is None""" + """ + Returns a boolean indicating if any argument is None. + """ for arg in args: if arg is None: return True @@ -173,7 +179,9 @@ def _any_none(*args): def _all_none(*args): - """Returns a boolean indicating if all arguments are None""" + """ + Returns a boolean indicating if all arguments are None. + """ for arg in args: if arg is not None: return False @@ -181,7 +189,9 @@ def _all_none(*args): def _any_not_none(*args): - """Returns a boolean indicating if any argument is not None""" + """ + Returns a boolean indicating if any argument is not None. + """ for arg in args: if arg is not None: return True @@ -189,7 +199,9 @@ def _any_not_none(*args): def _all_not_none(*args): - """Returns a boolean indicating if all arguments are not None""" + """ + Returns a boolean indicating if all arguments are not None. + """ for arg in args: if arg is None: return False @@ -197,7 +209,9 @@ def _all_not_none(*args): def count_not_none(*args): - """Returns the count of arguments that are not None""" + """ + Returns the count of arguments that are not None. + """ return sum(x is not None for x in args) @@ -277,7 +291,9 @@ def maybe_make_list(obj): def is_null_slice(obj): - """ we have a null slice """ + """ + We have a null slice. + """ return (isinstance(obj, slice) and obj.start is None and obj.stop is None and obj.step is None) @@ -291,7 +307,9 @@ def is_true_slices(l): # TODO: used only once in indexing; belongs elsewhere? def is_full_slice(obj, l): - """ we have a full length slice """ + """ + We have a full length slice. + """ return (isinstance(obj, slice) and obj.start == 0 and obj.stop == l and obj.step is None) @@ -316,7 +334,7 @@ def get_callable_name(obj): def apply_if_callable(maybe_callable, obj, **kwargs): """ Evaluate possibly callable input using obj and kwargs if it is callable, - otherwise return as it is + otherwise return as it is. Parameters ---------- @@ -333,7 +351,8 @@ def apply_if_callable(maybe_callable, obj, **kwargs): def dict_compat(d): """ - Helper function to convert datetimelike-keyed dicts to Timestamp-keyed dict + Helper function to convert datetimelike-keyed dicts + to Timestamp-keyed dict. Parameters ---------- @@ -383,13 +402,6 @@ def standardize_mapping(into): return into -def sentinel_factory(): - class Sentinel(object): - pass - - return Sentinel() - - def random_state(state=None): """ Helper function for processing random_state arguments. diff --git a/pandas/core/computation/eval.py b/pandas/core/computation/eval.py index b768ed6df303e..23c3e0eaace81 100644 --- a/pandas/core/computation/eval.py +++ b/pandas/core/computation/eval.py @@ -205,7 +205,7 @@ def eval(expr, parser='pandas', engine=None, truediv=True, A list of objects implementing the ``__getitem__`` special method that you can use to inject an additional collection of namespaces to use for variable lookup. For example, this is used in the - :meth:`~pandas.DataFrame.query` method to inject the + :meth:`~DataFrame.query` method to inject the ``DataFrame.index`` and ``DataFrame.columns`` variables that refer to their respective :class:`~pandas.DataFrame` instance attributes. @@ -248,8 +248,8 @@ def eval(expr, parser='pandas', engine=None, truediv=True, See Also -------- - pandas.DataFrame.query - pandas.DataFrame.eval + DataFrame.query + DataFrame.eval Notes ----- diff --git a/pandas/core/computation/expr.py b/pandas/core/computation/expr.py index 9a44198ba3b86..d840bf6ae71a2 100644 --- a/pandas/core/computation/expr.py +++ b/pandas/core/computation/expr.py @@ -18,7 +18,6 @@ UndefinedVariableError, _arith_ops_syms, _bool_ops_syms, _cmp_ops_syms, _mathops, _reductions, _unary_ops_syms, is_term) from pandas.core.computation.scope import Scope -from pandas.core.reshape.util import compose import pandas.io.formats.printing as printing @@ -103,8 +102,19 @@ def _replace_locals(tok): return toknum, tokval -def _preparse(source, f=compose(_replace_locals, _replace_booleans, - _rewrite_assign)): +def _compose2(f, g): + """Compose 2 callables""" + return lambda *args, **kwargs: f(g(*args, **kwargs)) + + +def _compose(*funcs): + """Compose 2 or more callables""" + assert len(funcs) > 1, 'At least 2 callables must be passed to compose' + return reduce(_compose2, funcs) + + +def _preparse(source, f=_compose(_replace_locals, _replace_booleans, + _rewrite_assign)): """Compose a collection of tokenization functions Parameters @@ -701,8 +711,8 @@ def visitor(x, y): class PandasExprVisitor(BaseExprVisitor): def __init__(self, env, engine, parser, - preparser=partial(_preparse, f=compose(_replace_locals, - _replace_booleans))): + preparser=partial(_preparse, f=_compose(_replace_locals, + _replace_booleans))): super(PandasExprVisitor, self).__init__(env, engine, parser, preparser) diff --git a/pandas/core/computation/ops.py b/pandas/core/computation/ops.py index 8c3218a976b6b..5c70255982e54 100644 --- a/pandas/core/computation/ops.py +++ b/pandas/core/computation/ops.py @@ -8,11 +8,11 @@ import numpy as np +from pandas._libs.tslibs import Timestamp from pandas.compat import PY3, string_types, text_type from pandas.core.dtypes.common import is_list_like, is_scalar -import pandas as pd from pandas.core.base import StringMixin import pandas.core.common as com from pandas.core.computation.common import _ensure_decoded, _result_type_many @@ -399,8 +399,9 @@ def evaluate(self, env, engine, parser, term_type, eval_in_python): if self.op in eval_in_python: res = self.func(left.value, right.value) else: - res = pd.eval(self, local_dict=env, engine=engine, - parser=parser) + from pandas.core.computation.eval import eval + res = eval(self, local_dict=env, engine=engine, + parser=parser) name = env.add_tmp(res) return term_type(name, env=env) @@ -422,7 +423,7 @@ def stringify(value): v = rhs.value if isinstance(v, (int, float)): v = stringify(v) - v = pd.Timestamp(_ensure_decoded(v)) + v = Timestamp(_ensure_decoded(v)) if v.tz is not None: v = v.tz_convert('UTC') self.rhs.update(v) @@ -431,7 +432,7 @@ def stringify(value): v = lhs.value if isinstance(v, (int, float)): v = stringify(v) - v = pd.Timestamp(_ensure_decoded(v)) + v = Timestamp(_ensure_decoded(v)) if v.tz is not None: v = v.tz_convert('UTC') self.lhs.update(v) diff --git a/pandas/core/computation/pytables.py b/pandas/core/computation/pytables.py index 00de29b07c75d..18f13e17c046e 100644 --- a/pandas/core/computation/pytables.py +++ b/pandas/core/computation/pytables.py @@ -5,6 +5,7 @@ import numpy as np +from pandas._libs.tslibs import Timedelta, Timestamp from pandas.compat import DeepChainMap, string_types, u from pandas.core.dtypes.common import is_list_like @@ -185,12 +186,12 @@ def stringify(value): if isinstance(v, (int, float)): v = stringify(v) v = _ensure_decoded(v) - v = pd.Timestamp(v) + v = Timestamp(v) if v.tz is not None: v = v.tz_convert('UTC') return TermValue(v, v.value, kind) elif kind == u('timedelta64') or kind == u('timedelta'): - v = pd.Timedelta(v, unit='s').value + v = Timedelta(v, unit='s').value return TermValue(int(v), v, kind) elif meta == u('category'): metadata = com.values_from_object(self.metadata) @@ -251,7 +252,7 @@ def evaluate(self): .format(slf=self)) rhs = self.conform(self.rhs) - values = [TermValue(v, v, self.kind) for v in rhs] + values = [TermValue(v, v, self.kind).value for v in rhs] if self.is_in_table: @@ -262,7 +263,7 @@ def evaluate(self): self.filter = ( self.lhs, filter_op, - pd.Index([v.value for v in values])) + pd.Index(values)) return self return None @@ -274,7 +275,7 @@ def evaluate(self): self.filter = ( self.lhs, filter_op, - pd.Index([v.value for v in values])) + pd.Index(values)) else: raise TypeError("passing a filterable condition to a non-table " diff --git a/pandas/core/computation/scope.py b/pandas/core/computation/scope.py index 33c5a1c2e0f0a..e158bc8c568eb 100644 --- a/pandas/core/computation/scope.py +++ b/pandas/core/computation/scope.py @@ -11,9 +11,9 @@ import numpy as np +from pandas._libs.tslibs import Timestamp from pandas.compat import DeepChainMap, StringIO, map -import pandas as pd # noqa from pandas.core.base import StringMixin import pandas.core.computation as compu @@ -48,7 +48,7 @@ def _raw_hex_id(obj): _DEFAULT_GLOBALS = { - 'Timestamp': pd._libs.tslib.Timestamp, + 'Timestamp': Timestamp, 'datetime': datetime.datetime, 'True': True, 'False': False, diff --git a/pandas/core/config.py b/pandas/core/config.py index 0f43ca65d187a..01664fffb1e27 100644 --- a/pandas/core/config.py +++ b/pandas/core/config.py @@ -282,8 +282,8 @@ def __doc__(self): Note: partial matches are supported for convenience, but unless you use the full option name (e.g. x.y.z.option_name), your code may break in future versions if new options with similar names are introduced. -value : - new value of option. +value : object + New value of option. Returns ------- diff --git a/pandas/core/dtypes/base.py b/pandas/core/dtypes/base.py index ab1cb9cf2499a..88bbdcf342d66 100644 --- a/pandas/core/dtypes/base.py +++ b/pandas/core/dtypes/base.py @@ -153,8 +153,8 @@ class ExtensionDtype(_DtypeOpsMixin): See Also -------- - pandas.api.extensions.register_extension_dtype - pandas.api.extensions.ExtensionArray + extensions.register_extension_dtype + extensions.ExtensionArray Notes ----- @@ -173,7 +173,7 @@ class ExtensionDtype(_DtypeOpsMixin): Optionally one can override construct_array_type for construction with the name of this dtype via the Registry. See - :meth:`pandas.api.extensions.register_extension_dtype`. + :meth:`extensions.register_extension_dtype`. * construct_array_type diff --git a/pandas/core/dtypes/cast.py b/pandas/core/dtypes/cast.py index ad62146dda268..f6561948df99a 100644 --- a/pandas/core/dtypes/cast.py +++ b/pandas/core/dtypes/cast.py @@ -1111,11 +1111,9 @@ def find_common_type(types): # this is different from numpy, which casts bool with float/int as int has_bools = any(is_bool_dtype(t) for t in types) if has_bools: - has_ints = any(is_integer_dtype(t) for t in types) - has_floats = any(is_float_dtype(t) for t in types) - has_complex = any(is_complex_dtype(t) for t in types) - if has_ints or has_floats or has_complex: - return np.object + for t in types: + if is_integer_dtype(t) or is_float_dtype(t) or is_complex_dtype(t): + return np.object return np.find_common_type(types, []) diff --git a/pandas/core/dtypes/common.py b/pandas/core/dtypes/common.py index e9bf0f87088db..4be7eb8ddb890 100644 --- a/pandas/core/dtypes/common.py +++ b/pandas/core/dtypes/common.py @@ -139,7 +139,8 @@ def is_object_dtype(arr_or_dtype): Returns ------- - boolean : Whether or not the array-like or dtype is of the object dtype. + boolean + Whether or not the array-like or dtype is of the object dtype. Examples -------- @@ -230,8 +231,8 @@ def is_scipy_sparse(arr): Returns ------- - boolean : Whether or not the array-like is a - scipy.sparse.spmatrix instance. + boolean + Whether or not the array-like is a scipy.sparse.spmatrix instance. Notes ----- @@ -270,7 +271,8 @@ def is_categorical(arr): Returns ------- - boolean : Whether or not the array-like is of a Categorical instance. + boolean + Whether or not the array-like is of a Categorical instance. Examples -------- @@ -305,8 +307,9 @@ def is_datetimetz(arr): Returns ------- - boolean : Whether or not the array-like is a datetime array-like with - a timezone component in its dtype. + boolean + Whether or not the array-like is a datetime array-like with a + timezone component in its dtype. Examples -------- @@ -347,7 +350,8 @@ def is_offsetlike(arr_or_obj): Returns ------- - boolean : Whether the object is a DateOffset or listlike of DatetOffsets + boolean + Whether the object is a DateOffset or listlike of DatetOffsets Examples -------- @@ -381,7 +385,8 @@ def is_period(arr): Returns ------- - boolean : Whether or not the array-like is a periodical index. + boolean + Whether or not the array-like is a periodical index. Examples -------- @@ -411,8 +416,8 @@ def is_datetime64_dtype(arr_or_dtype): Returns ------- - boolean : Whether or not the array-like or dtype is of - the datetime64 dtype. + boolean + Whether or not the array-like or dtype is of the datetime64 dtype. Examples -------- @@ -442,8 +447,8 @@ def is_datetime64tz_dtype(arr_or_dtype): Returns ------- - boolean : Whether or not the array-like or dtype is of - a DatetimeTZDtype dtype. + boolean + Whether or not the array-like or dtype is of a DatetimeTZDtype dtype. Examples -------- @@ -480,8 +485,8 @@ def is_timedelta64_dtype(arr_or_dtype): Returns ------- - boolean : Whether or not the array-like or dtype is - of the timedelta64 dtype. + boolean + Whether or not the array-like or dtype is of the timedelta64 dtype. Examples -------- @@ -511,7 +516,8 @@ def is_period_dtype(arr_or_dtype): Returns ------- - boolean : Whether or not the array-like or dtype is of the Period dtype. + boolean + Whether or not the array-like or dtype is of the Period dtype. Examples -------- @@ -544,8 +550,8 @@ def is_interval_dtype(arr_or_dtype): Returns ------- - boolean : Whether or not the array-like or dtype is - of the Interval dtype. + boolean + Whether or not the array-like or dtype is of the Interval dtype. Examples -------- @@ -580,8 +586,8 @@ def is_categorical_dtype(arr_or_dtype): Returns ------- - boolean : Whether or not the array-like or dtype is - of the Categorical dtype. + boolean + Whether or not the array-like or dtype is of the Categorical dtype. Examples -------- @@ -613,7 +619,8 @@ def is_string_dtype(arr_or_dtype): Returns ------- - boolean : Whether or not the array or dtype is of the string dtype. + boolean + Whether or not the array or dtype is of the string dtype. Examples -------- @@ -647,8 +654,9 @@ def is_period_arraylike(arr): Returns ------- - boolean : Whether or not the array-like is a periodical - array-like or PeriodIndex instance. + boolean + Whether or not the array-like is a periodical array-like or + PeriodIndex instance. Examples -------- @@ -678,8 +686,9 @@ def is_datetime_arraylike(arr): Returns ------- - boolean : Whether or not the array-like is a datetime - array-like or DatetimeIndex. + boolean + Whether or not the array-like is a datetime array-like or + DatetimeIndex. Examples -------- @@ -713,7 +722,8 @@ def is_datetimelike(arr): Returns ------- - boolean : Whether or not the array-like is a datetime-like array-like. + boolean + Whether or not the array-like is a datetime-like array-like. Examples -------- @@ -754,7 +764,8 @@ def is_dtype_equal(source, target): Returns ---------- - boolean : Whether or not the two dtypes are equal. + boolean + Whether or not the two dtypes are equal. Examples -------- @@ -794,7 +805,8 @@ def is_dtype_union_equal(source, target): Returns ---------- - boolean : Whether or not the two dtypes are equal. + boolean + Whether or not the two dtypes are equal. >>> is_dtype_equal("int", int) True @@ -835,7 +847,8 @@ def is_any_int_dtype(arr_or_dtype): Returns ------- - boolean : Whether or not the array or dtype is of an integer dtype. + boolean + Whether or not the array or dtype is of an integer dtype. Examples -------- @@ -883,8 +896,9 @@ def is_integer_dtype(arr_or_dtype): Returns ------- - boolean : Whether or not the array or dtype is of an integer dtype - and not an instance of timedelta64. + boolean + Whether or not the array or dtype is of an integer dtype and + not an instance of timedelta64. Examples -------- @@ -938,8 +952,9 @@ def is_signed_integer_dtype(arr_or_dtype): Returns ------- - boolean : Whether or not the array or dtype is of a signed integer dtype - and not an instance of timedelta64. + boolean + Whether or not the array or dtype is of a signed integer dtype + and not an instance of timedelta64. Examples -------- @@ -993,8 +1008,8 @@ def is_unsigned_integer_dtype(arr_or_dtype): Returns ------- - boolean : Whether or not the array or dtype is of an - unsigned integer dtype. + boolean + Whether or not the array or dtype is of an unsigned integer dtype. Examples -------- @@ -1036,7 +1051,8 @@ def is_int64_dtype(arr_or_dtype): Returns ------- - boolean : Whether or not the array or dtype is of the int64 dtype. + boolean + Whether or not the array or dtype is of the int64 dtype. Notes ----- @@ -1086,7 +1102,8 @@ def is_datetime64_any_dtype(arr_or_dtype): Returns ------- - boolean : Whether or not the array or dtype is of the datetime64 dtype. + boolean + Whether or not the array or dtype is of the datetime64 dtype. Examples -------- @@ -1126,7 +1143,8 @@ def is_datetime64_ns_dtype(arr_or_dtype): Returns ------- - boolean : Whether or not the array or dtype is of the datetime64[ns] dtype. + boolean + Whether or not the array or dtype is of the datetime64[ns] dtype. Examples -------- @@ -1178,8 +1196,8 @@ def is_timedelta64_ns_dtype(arr_or_dtype): Returns ------- - boolean : Whether or not the array or dtype is of the - timedelta64[ns] dtype. + boolean + Whether or not the array or dtype is of the timedelta64[ns] dtype. Examples -------- @@ -1207,8 +1225,9 @@ def is_datetime_or_timedelta_dtype(arr_or_dtype): Returns ------- - boolean : Whether or not the array or dtype is of a - timedelta64, or datetime64 dtype. + boolean + Whether or not the array or dtype is of a timedelta64, + or datetime64 dtype. Examples -------- @@ -1248,7 +1267,8 @@ def _is_unorderable_exception(e): Returns ------- - boolean : Whether or not the exception raised is an unorderable exception. + boolean + Whether or not the exception raised is an unorderable exception. """ if PY36: @@ -1275,8 +1295,8 @@ def is_numeric_v_string_like(a, b): Returns ------- - boolean : Whether we return a comparing a string-like - object to a numeric array. + boolean + Whether we return a comparing a string-like object to a numeric array. Examples -------- @@ -1332,8 +1352,8 @@ def is_datetimelike_v_numeric(a, b): Returns ------- - boolean : Whether we return a comparing a datetime-like - to a numeric object. + boolean + Whether we return a comparing a datetime-like to a numeric object. Examples -------- @@ -1388,8 +1408,8 @@ def is_datetimelike_v_object(a, b): Returns ------- - boolean : Whether we return a comparing a datetime-like - to an object instance. + boolean + Whether we return a comparing a datetime-like to an object instance. Examples -------- @@ -1442,7 +1462,8 @@ def needs_i8_conversion(arr_or_dtype): Returns ------- - boolean : Whether or not the array or dtype should be converted to int64. + boolean + Whether or not the array or dtype should be converted to int64. Examples -------- @@ -1480,7 +1501,8 @@ def is_numeric_dtype(arr_or_dtype): Returns ------- - boolean : Whether or not the array or dtype is of a numeric dtype. + boolean + Whether or not the array or dtype is of a numeric dtype. Examples -------- @@ -1524,7 +1546,8 @@ def is_string_like_dtype(arr_or_dtype): Returns ------- - boolean : Whether or not the array or dtype is of the string dtype. + boolean + Whether or not the array or dtype is of the string dtype. Examples -------- @@ -1555,7 +1578,8 @@ def is_float_dtype(arr_or_dtype): Returns ------- - boolean : Whether or not the array or dtype is of a float dtype. + boolean + Whether or not the array or dtype is of a float dtype. Examples -------- @@ -1586,7 +1610,8 @@ def is_bool_dtype(arr_or_dtype): Returns ------- - boolean : Whether or not the array or dtype is of a boolean dtype. + boolean + Whether or not the array or dtype is of a boolean dtype. Notes ----- @@ -1655,8 +1680,8 @@ def is_extension_type(arr): Returns ------- - boolean : Whether or not the array-like is of a pandas - extension class instance. + boolean + Whether or not the array-like is of a pandas extension class instance. Examples -------- @@ -1760,7 +1785,8 @@ def is_complex_dtype(arr_or_dtype): Returns ------- - boolean : Whether or not the array or dtype is of a compex dtype. + boolean + Whether or not the array or dtype is of a compex dtype. Examples -------- @@ -1980,7 +2006,7 @@ def _validate_date_like_dtype(dtype): def pandas_dtype(dtype): """ - Converts input into a pandas only dtype object or a numpy dtype object. + Convert input into a pandas only dtype object or a numpy dtype object. Parameters ---------- diff --git a/pandas/core/dtypes/concat.py b/pandas/core/dtypes/concat.py index aada777decaa7..10e903acbe538 100644 --- a/pandas/core/dtypes/concat.py +++ b/pandas/core/dtypes/concat.py @@ -123,8 +123,6 @@ def is_nonempty(x): except Exception: return True - nonempty = [x for x in to_concat if is_nonempty(x)] - # If all arrays are empty, there's nothing to convert, just short-cut to # the concatenation, #3121. # @@ -148,11 +146,11 @@ def is_nonempty(x): elif 'sparse' in typs: return _concat_sparse(to_concat, axis=axis, typs=typs) - extensions = [is_extension_array_dtype(x) for x in to_concat] - if any(extensions) and axis == 1: + all_empty = all(not is_nonempty(x) for x in to_concat) + if any(is_extension_array_dtype(x) for x in to_concat) and axis == 1: to_concat = [np.atleast_2d(x.astype('object')) for x in to_concat] - if not nonempty: + if all_empty: # we have all empties, but may need to coerce the result dtype to # object if we have non-numeric type operands (numpy would otherwise # cast this to float) diff --git a/pandas/core/dtypes/dtypes.py b/pandas/core/dtypes/dtypes.py index f84471c3b04e8..8b9ac680493a1 100644 --- a/pandas/core/dtypes/dtypes.py +++ b/pandas/core/dtypes/dtypes.py @@ -17,7 +17,8 @@ def register_extension_dtype(cls): - """Class decorator to register an ExtensionType with pandas. + """ + Register an ExtensionType with pandas as class decorator. .. versionadded:: 0.24.0 @@ -194,7 +195,7 @@ class CategoricalDtype(PandasExtensionDtype, ExtensionDtype): See Also -------- - pandas.Categorical + Categorical Notes ----- @@ -413,8 +414,7 @@ def _hash_categories(categories, ordered=True): cat_array = hash_tuples(categories) else: if categories.dtype == 'O': - types = [type(x) for x in categories] - if not len(set(types)) == 1: + if len({type(x) for x in categories}) != 1: # TODO: hash_array doesn't handle mixed types. It casts # everything to a str first, which means we treat # {'1', '2'} the same as {'1', 2} diff --git a/pandas/core/dtypes/inference.py b/pandas/core/dtypes/inference.py index b11542622451c..1a02623fa6072 100644 --- a/pandas/core/dtypes/inference.py +++ b/pandas/core/dtypes/inference.py @@ -44,7 +44,7 @@ def is_number(obj): See Also -------- - pandas.api.types.is_integer: Checks a subgroup of numbers. + api.types.is_integer: Checks a subgroup of numbers. Examples -------- @@ -397,12 +397,15 @@ def is_dict_like(obj): True >>> is_dict_like([1, 2, 3]) False + >>> is_dict_like(dict) + False + >>> is_dict_like(dict()) + True """ - for attr in ("__getitem__", "keys", "__contains__"): - if not hasattr(obj, attr): - return False - - return True + dict_like_attrs = ("__getitem__", "keys", "__contains__") + return (all(hasattr(obj, attr) for attr in dict_like_attrs) + # [GH 25196] exclude classes + and not isinstance(obj, type)) def is_named_tuple(obj): diff --git a/pandas/core/frame.py b/pandas/core/frame.py index b4f79bda25517..e89aeb29f1625 100644 --- a/pandas/core/frame.py +++ b/pandas/core/frame.py @@ -13,6 +13,7 @@ from __future__ import division import collections +from collections import OrderedDict import functools import itertools import sys @@ -32,7 +33,7 @@ from pandas import compat from pandas.compat import (range, map, zip, lmap, lzip, StringIO, u, - OrderedDict, PY36, raise_with_traceback, + PY36, raise_with_traceback, string_and_binary_types) from pandas.compat.numpy import function as nv from pandas.core.dtypes.cast import ( @@ -318,7 +319,7 @@ class DataFrame(NDFrame): DataFrame.from_records : Constructor from tuples, also record arrays. DataFrame.from_dict : From dicts of Series, arrays, or dicts. DataFrame.from_items : From sequence of (key, value) pairs - pandas.read_csv, pandas.read_table, pandas.read_clipboard. + read_csv, pandas.read_table, pandas.read_clipboard. Examples -------- @@ -482,7 +483,7 @@ def axes(self): -------- >>> df = pd.DataFrame({'col1': [1, 2], 'col2': [3, 4]}) >>> df.axes - [RangeIndex(start=0, stop=2, step=1), Index(['coll', 'col2'], + [RangeIndex(start=0, stop=2, step=1), Index(['col1', 'col2'], dtype='object')] """ return [self.index, self.columns] @@ -640,16 +641,6 @@ def _repr_html_(self): Mainly for IPython notebook. """ - # qtconsole doesn't report its line width, and also - # behaves badly when outputting an HTML table - # that doesn't fit the window, so disable it. - # XXX: In IPython 3.x and above, the Qt console will not attempt to - # display HTML, so this check can be removed when support for - # IPython 2.x is no longer needed. - if console.in_qtconsole(): - # 'HTML output is disabled in QtConsole' - return None - if self._info_repr(): buf = StringIO(u("")) self.info(buf=buf) @@ -727,7 +718,7 @@ def style(self): See Also -------- - pandas.io.formats.style.Styler + io.formats.style.Styler """ from pandas.io.formats.style import Styler return Styler(self) @@ -847,7 +838,7 @@ def itertuples(self, index=True, name="Pandas"): ---------- index : bool, default True If True, return the index as the first element of the tuple. - name : str, default "Pandas" + name : str or None, default "Pandas" The name of the returned namedtuples or None to return regular tuples. @@ -1290,23 +1281,26 @@ def to_dict(self, orient='dict', into=dict): ('columns', self.columns.tolist()), ('data', [ list(map(com.maybe_box_datetimelike, t)) - for t in self.itertuples(index=False)] - ))) + for t in self.itertuples(index=False, name=None) + ]))) elif orient.lower().startswith('s'): return into_c((k, com.maybe_box_datetimelike(v)) for k, v in compat.iteritems(self)) elif orient.lower().startswith('r'): + columns = self.columns.tolist() + rows = (dict(zip(columns, row)) + for row in self.itertuples(index=False, name=None)) return [ into_c((k, com.maybe_box_datetimelike(v)) - for k, v in compat.iteritems(row._asdict())) - for row in self.itertuples(index=False)] + for k, v in compat.iteritems(row)) + for row in rows] elif orient.lower().startswith('i'): if not self.index.is_unique: raise ValueError( "DataFrame index must be unique for orient='index'." ) return into_c((t[0], dict(zip(self.columns, t[1:]))) - for t in self.itertuples()) + for t in self.itertuples(name=None)) else: raise ValueError("orient '{o}' not understood".format(o=orient)) @@ -1406,7 +1400,7 @@ def to_gbq(self, destination_table, project_id=None, chunksize=None, See Also -------- pandas_gbq.to_gbq : This function in the pandas-gbq library. - pandas.read_gbq : Read a DataFrame from Google BigQuery. + read_gbq : Read a DataFrame from Google BigQuery. """ from pandas.io import gbq return gbq.to_gbq( @@ -1524,8 +1518,8 @@ def from_records(cls, data, index=None, exclude=None, columns=None, result_index = Index([], name=index) else: try: - to_remove = [arr_columns.get_loc(field) for field in index] - index_data = [arrays[i] for i in to_remove] + index_data = [arrays[arr_columns.get_loc(field)] + for field in index] result_index = ensure_index_from_sequences(index_data, names=index) @@ -1716,7 +1710,8 @@ def to_records(self, index=True, convert_datetime64=None, # string naming a type. if dtype_mapping is None: formats.append(v.dtype) - elif isinstance(dtype_mapping, (type, compat.string_types)): + elif isinstance(dtype_mapping, (type, np.dtype, + compat.string_types)): formats.append(dtype_mapping) else: element = "row" if i < index_len else "column" @@ -1831,14 +1826,14 @@ def from_csv(cls, path, header=0, sep=',', index_col=0, parse_dates=True, Read CSV file. .. deprecated:: 0.21.0 - Use :func:`pandas.read_csv` instead. + Use :func:`read_csv` instead. - It is preferable to use the more powerful :func:`pandas.read_csv` + It is preferable to use the more powerful :func:`read_csv` for most general purposes, but ``from_csv`` makes for an easy roundtrip to and from a file (the exact counterpart of ``to_csv``), especially with a DataFrame of time series data. - This method only differs from the preferred :func:`pandas.read_csv` + This method only differs from the preferred :func:`read_csv` in some defaults: - `index_col` is ``0`` instead of ``None`` (take first column as index @@ -1875,7 +1870,7 @@ def from_csv(cls, path, header=0, sep=',', index_col=0, parse_dates=True, See Also -------- - pandas.read_csv + read_csv """ warnings.warn("from_csv is deprecated. Please use read_csv(...) " @@ -1963,45 +1958,7 @@ def to_panel(self): ------- panel : Panel """ - # only support this kind for now - if (not isinstance(self.index, MultiIndex) or # pragma: no cover - len(self.index.levels) != 2): - raise NotImplementedError('Only 2-level MultiIndex are supported.') - - if not self.index.is_unique: - raise ValueError("Can't convert non-uniquely indexed " - "DataFrame to Panel") - - self._consolidate_inplace() - - # minor axis must be sorted - if self.index.lexsort_depth < 2: - selfsorted = self.sort_index(level=0) - else: - selfsorted = self - - major_axis, minor_axis = selfsorted.index.levels - major_codes, minor_codes = selfsorted.index.codes - shape = len(major_axis), len(minor_axis) - - # preserve names, if any - major_axis = major_axis.copy() - major_axis.name = self.index.names[0] - - minor_axis = minor_axis.copy() - minor_axis.name = self.index.names[1] - - # create new axes - new_axes = [selfsorted.columns, major_axis, minor_axis] - - # create new manager - new_mgr = selfsorted._data.reshape_nd(axes=new_axes, - labels=[major_codes, - minor_codes], - shape=shape, - ref_items=selfsorted.columns) - - return self._constructor_expanddim(new_mgr) + raise NotImplementedError("Panel is being removed in pandas 0.25.0.") @deprecate_kwarg(old_arg_name='encoding', new_arg_name=None) def to_stata(self, fname, convert_dates=None, write_index=True, @@ -2530,7 +2487,7 @@ def memory_usage(self, index=True, deep=False): numpy.ndarray.nbytes : Total bytes consumed by the elements of an ndarray. Series.memory_usage : Bytes consumed by a Series. - pandas.Categorical : Memory-efficient array for string values with + Categorical : Memory-efficient array for string values with many repeated values. DataFrame.info : Concise summary of a DataFrame. @@ -3005,28 +2962,30 @@ def query(self, expr, inplace=False, **kwargs): Parameters ---------- - expr : string + expr : str The query string to evaluate. You can refer to variables in the environment by prefixing them with an '@' character like ``@a + b``. inplace : bool Whether the query should modify the data in place or return - a modified copy + a modified copy. + **kwargs + See the documentation for :func:`eval` for complete details + on the keyword arguments accepted by :meth:`DataFrame.query`. .. versionadded:: 0.18.0 - kwargs : dict - See the documentation for :func:`pandas.eval` for complete details - on the keyword arguments accepted by :meth:`DataFrame.query`. - Returns ------- - q : DataFrame + DataFrame + DataFrame resulting from the provided query expression. See Also -------- - pandas.eval - DataFrame.eval + eval : Evaluate a string describing operations on + DataFrame columns. + DataFrame.eval : Evaluate a string describing operations on + DataFrame columns. Notes ----- @@ -3035,7 +2994,7 @@ def query(self, expr, inplace=False, **kwargs): multidimensional key (e.g., a DataFrame) then the result will be passed to :meth:`DataFrame.__getitem__`. - This method uses the top-level :func:`pandas.eval` function to + This method uses the top-level :func:`eval` function to evaluate the passed query. The :meth:`~pandas.DataFrame.query` method uses a slightly @@ -3065,9 +3024,23 @@ def query(self, expr, inplace=False, **kwargs): Examples -------- - >>> df = pd.DataFrame(np.random.randn(10, 2), columns=list('ab')) - >>> df.query('a > b') - >>> df[df.a > df.b] # same result as the previous expression + >>> df = pd.DataFrame({'A': range(1, 6), 'B': range(10, 0, -2)}) + >>> df + A B + 0 1 10 + 1 2 8 + 2 3 6 + 3 4 4 + 4 5 2 + >>> df.query('A > B') + A B + 4 5 2 + + The previous expression is equivalent to + + >>> df[df.A > df.B] + A B + 4 5 2 """ inplace = validate_bool_kwarg(inplace, 'inplace') if not isinstance(expr, compat.string_types): @@ -3108,7 +3081,7 @@ def eval(self, expr, inplace=False, **kwargs): .. versionadded:: 0.18.0. kwargs : dict - See the documentation for :func:`~pandas.eval` for complete details + See the documentation for :func:`eval` for complete details on the keyword arguments accepted by :meth:`~pandas.DataFrame.query`. @@ -3123,12 +3096,12 @@ def eval(self, expr, inplace=False, **kwargs): of a frame. DataFrame.assign : Can evaluate an expression or function to create new values for a column. - pandas.eval : Evaluate a Python expression as a string using various + eval : Evaluate a Python expression as a string using various backends. Notes ----- - For more details see the API documentation for :func:`~pandas.eval`. + For more details see the API documentation for :func:`~eval`. For detailed examples see :ref:`enhancing performance with eval `. @@ -3967,7 +3940,7 @@ def rename(self, *args, **kwargs): See Also -------- - pandas.DataFrame.rename_axis + DataFrame.rename_axis Examples -------- @@ -4127,33 +4100,8 @@ def set_index(self, keys, drop=True, append=False, inplace=False, 4 16 10 2014 31 """ inplace = validate_bool_kwarg(inplace, 'inplace') - - err_msg = ('The parameter "keys" may be a column key, one-dimensional ' - 'array, or a list containing only valid column keys and ' - 'one-dimensional arrays.') - - if (is_scalar(keys) or isinstance(keys, tuple) - or isinstance(keys, (ABCIndexClass, ABCSeries, np.ndarray))): - # make sure we have a container of keys/arrays we can iterate over - # tuples can appear as valid column keys! + if not isinstance(keys, list): keys = [keys] - elif not isinstance(keys, list): - raise ValueError(err_msg) - - missing = [] - for col in keys: - if (is_scalar(col) or isinstance(col, tuple)): - # if col is a valid column key, everything is fine - # tuples are always considered keys, never as list-likes - if col not in self: - missing.append(col) - elif (not isinstance(col, (ABCIndexClass, ABCSeries, - np.ndarray, list)) - or getattr(col, 'ndim', 1) > 1): - raise ValueError(err_msg) - - if missing: - raise KeyError('{}'.format(missing)) if inplace: frame = self @@ -4614,7 +4562,8 @@ def dropna(self, axis=0, how='any', thresh=None, subset=None, def drop_duplicates(self, subset=None, keep='first', inplace=False): """ Return DataFrame with duplicate rows removed, optionally only - considering certain columns. + considering certain columns. Indexes, including time indexes + are ignored. Parameters ---------- @@ -5130,8 +5079,7 @@ def _combine_const(self, other, func): def combine(self, other, func, fill_value=None, overwrite=True): """ - Perform column-wise combine with another DataFrame based on a - passed function. + Perform column-wise combine with another DataFrame. Combines a DataFrame with `other` DataFrame using `func` to element-wise combine columns. The row and column indexes of the @@ -5147,13 +5095,14 @@ def combine(self, other, func, fill_value=None, overwrite=True): fill_value : scalar value, default None The value to fill NaNs with prior to passing any column to the merge func. - overwrite : boolean, default True + overwrite : bool, default True If True, columns in `self` that do not exist in `other` will be overwritten with NaNs. Returns ------- - result : DataFrame + DataFrame + Combination of the provided DataFrames. See Also -------- @@ -5197,15 +5146,15 @@ def combine(self, other, func, fill_value=None, overwrite=True): >>> df1 = pd.DataFrame({'A': [0, 0], 'B': [None, 4]}) >>> df2 = pd.DataFrame({'A': [1, 1], 'B': [None, 3]}) >>> df1.combine(df2, take_smaller, fill_value=-5) - A B - 0 0 NaN + A B + 0 0 -5.0 1 0 3.0 Example that demonstrates the use of `overwrite` and behavior when the axis differ between the dataframes. >>> df1 = pd.DataFrame({'A': [0, 0], 'B': [4, 4]}) - >>> df2 = pd.DataFrame({'B': [3, 3], 'C': [-10, 1],}, index=[1, 2]) + >>> df2 = pd.DataFrame({'B': [3, 3], 'C': [-10, 1], }, index=[1, 2]) >>> df1.combine(df2, take_smaller) A B C 0 NaN NaN NaN @@ -5220,7 +5169,7 @@ def combine(self, other, func, fill_value=None, overwrite=True): Demonstrating the preference of the passed in dataframe. - >>> df2 = pd.DataFrame({'B': [3, 3], 'C': [1, 1],}, index=[1, 2]) + >>> df2 = pd.DataFrame({'B': [3, 3], 'C': [1, 1], }, index=[1, 2]) >>> df2.combine(df1, take_smaller) A B C 0 0.0 NaN NaN @@ -5704,19 +5653,19 @@ def pivot(self, index=None, columns=None, values=None): This first example aggregates values by taking the sum. - >>> table = pivot_table(df, values='D', index=['A', 'B'], + >>> table = pd.pivot_table(df, values='D', index=['A', 'B'], ... columns=['C'], aggfunc=np.sum) >>> table C large small A B - bar one 4 5 - two 7 6 - foo one 4 1 - two NaN 6 + bar one 4.0 5.0 + two 7.0 6.0 + foo one 4.0 1.0 + two NaN 6.0 We can also fill missing values using the `fill_value` parameter. - >>> table = pivot_table(df, values='D', index=['A', 'B'], + >>> table = pd.pivot_table(df, values='D', index=['A', 'B'], ... columns=['C'], aggfunc=np.sum, fill_value=0) >>> table C large small @@ -5728,12 +5677,11 @@ def pivot(self, index=None, columns=None, values=None): The next example aggregates by taking the mean across multiple columns. - >>> table = pivot_table(df, values=['D', 'E'], index=['A', 'C'], + >>> table = pd.pivot_table(df, values=['D', 'E'], index=['A', 'C'], ... aggfunc={'D': np.mean, ... 'E': np.mean}) >>> table - D E - mean mean + D E A C bar large 5.500000 7.500000 small 5.500000 8.500000 @@ -5743,17 +5691,17 @@ def pivot(self, index=None, columns=None, values=None): We can also calculate multiple types of aggregations for any given value column. - >>> table = pivot_table(df, values=['D', 'E'], index=['A', 'C'], + >>> table = pd.pivot_table(df, values=['D', 'E'], index=['A', 'C'], ... aggfunc={'D': np.mean, ... 'E': [min, max, np.mean]}) >>> table - D E - mean max mean min + D E + mean max mean min A C - bar large 5.500000 9 7.500000 6 - small 5.500000 9 8.500000 8 - foo large 2.000000 5 4.500000 4 - small 2.333333 6 4.333333 2 + bar large 5.500000 9.0 7.500000 6.0 + small 5.500000 9.0 8.500000 8.0 + foo large 2.000000 5.0 4.500000 4.0 + small 2.333333 6.0 4.333333 2.0 """ @Substitution('') @@ -6001,7 +5949,7 @@ def unstack(self, level=-1, fill_value=None): return unstack(self, level, fill_value) _shared_docs['melt'] = (""" - Unpivots a DataFrame from wide format to long format, optionally + Unpivot a DataFrame from wide format to long format, optionally leaving identifier variables set. This function is useful to massage a DataFrame into a format where one @@ -6238,11 +6186,11 @@ def _gotitem(self, -------- DataFrame.apply : Perform any type of operations. DataFrame.transform : Perform transformation type operations. - pandas.core.groupby.GroupBy : Perform operations over groups. - pandas.core.resample.Resampler : Perform operations over resampled bins. - pandas.core.window.Rolling : Perform operations over rolling window. - pandas.core.window.Expanding : Perform operations over expanding window. - pandas.core.window.EWM : Perform operation over exponential weighted + core.groupby.GroupBy : Perform operations over groups. + core.resample.Resampler : Perform operations over resampled bins. + core.window.Rolling : Perform operations over rolling window. + core.window.Expanding : Perform operations over expanding window. + core.window.EWM : Perform operation over exponential weighted window. """) @@ -6594,7 +6542,7 @@ def append(self, other, ignore_index=False, See Also -------- - pandas.concat : General function to concatenate DataFrame, Series + concat : General function to concatenate DataFrame, Series or Panel objects. Notes @@ -6891,41 +6839,67 @@ def round(self, decimals=0, *args, **kwargs): columns not included in `decimals` will be left as is. Elements of `decimals` which are not columns of the input will be ignored. + *args + Additional keywords have no effect but might be accepted for + compatibility with numpy. + **kwargs + Additional keywords have no effect but might be accepted for + compatibility with numpy. Returns ------- - DataFrame + DataFrame : + A DataFrame with the affected columns rounded to the specified + number of decimal places. See Also -------- - numpy.around - Series.round + numpy.around : Round a numpy array to the given number of decimals. + Series.round : Round a Series to the given number of decimals. Examples -------- - >>> df = pd.DataFrame(np.random.random([3, 3]), - ... columns=['A', 'B', 'C'], index=['first', 'second', 'third']) + >>> df = pd.DataFrame([(.21, .32), (.01, .67), (.66, .03), (.21, .18)], + ... columns=['dogs', 'cats']) >>> df - A B C - first 0.028208 0.992815 0.173891 - second 0.038683 0.645646 0.577595 - third 0.877076 0.149370 0.491027 - >>> df.round(2) - A B C - first 0.03 0.99 0.17 - second 0.04 0.65 0.58 - third 0.88 0.15 0.49 - >>> df.round({'A': 1, 'C': 2}) - A B C - first 0.0 0.992815 0.17 - second 0.0 0.645646 0.58 - third 0.9 0.149370 0.49 - >>> decimals = pd.Series([1, 0, 2], index=['A', 'B', 'C']) + dogs cats + 0 0.21 0.32 + 1 0.01 0.67 + 2 0.66 0.03 + 3 0.21 0.18 + + By providing an integer each column is rounded to the same number + of decimal places + + >>> df.round(1) + dogs cats + 0 0.2 0.3 + 1 0.0 0.7 + 2 0.7 0.0 + 3 0.2 0.2 + + With a dict, the number of places for specific columns can be + specfified with the column names as key and the number of decimal + places as value + + >>> df.round({'dogs': 1, 'cats': 0}) + dogs cats + 0 0.2 0.0 + 1 0.0 1.0 + 2 0.7 0.0 + 3 0.2 0.0 + + Using a Series, the number of places for specific columns can be + specfified with the column names as index and the number of + decimal places as value + + >>> decimals = pd.Series([0, 1], index=['cats', 'dogs']) >>> df.round(decimals) - A B C - first 0.0 1 0.17 - second 0.0 1 0.58 - third 0.9 0 0.49 + dogs cats + 0 0.2 0.0 + 1 0.0 1.0 + 2 0.7 0.0 + 3 0.2 0.0 """ from pandas.core.reshape.concat import concat @@ -7078,10 +7052,10 @@ def cov(self, min_periods=None): See Also -------- - pandas.Series.cov : Compute covariance with another Series. - pandas.core.window.EWM.cov: Exponential weighted sample covariance. - pandas.core.window.Expanding.cov : Expanding sample covariance. - pandas.core.window.Rolling.cov : Rolling sample covariance. + Series.cov : Compute covariance with another Series. + core.window.EWM.cov: Exponential weighted sample covariance. + core.window.Expanding.cov : Expanding sample covariance. + core.window.Rolling.cov : Rolling sample covariance. Notes ----- @@ -7464,7 +7438,8 @@ def f(x): if filter_type is None or filter_type == 'numeric': data = self._get_numeric_data() elif filter_type == 'bool': - data = self + # GH 25101, # GH 24434 + data = self._get_bool_data() if axis == 0 else self else: # pragma: no cover msg = ("Generating numeric_only data with filter_type {f}" "not supported.".format(f=filter_type)) @@ -7733,10 +7708,10 @@ def quantile(self, q=0.5, axis=0, numeric_only=True, ------- quantiles : Series or DataFrame - - If ``q`` is an array, a DataFrame will be returned where the + If ``q`` is an array, a DataFrame will be returned where the index is ``q``, the columns are the columns of self, and the values are the quantiles. - - If ``q`` is a float, a Series will be returned where the + If ``q`` is a float, a Series will be returned where the index is the columns of self and the values are the quantiles. See Also diff --git a/pandas/core/generic.py b/pandas/core/generic.py index 2b97661fe9ec3..ef629361c291a 100644 --- a/pandas/core/generic.py +++ b/pandas/core/generic.py @@ -61,6 +61,10 @@ by : str or list of str Name or list of names to sort by""") +# sentinel value to use as kwarg in place of None when None has special meaning +# and needs to be distinguished from a user explicitly passing None. +sentinel = object() + def _single_replace(self, to_replace, method, inplace, limit): """ @@ -290,11 +294,16 @@ def _construct_axes_dict_for_slice(self, axes=None, **kwargs): d.update(kwargs) return d - def _construct_axes_from_arguments(self, args, kwargs, require_all=False): + def _construct_axes_from_arguments( + self, args, kwargs, require_all=False, sentinel=None): """Construct and returns axes if supplied in args/kwargs. If require_all, raise if all axis arguments are not supplied return a tuple of (axes, kwargs). + + sentinel specifies the default parameter when an axis is not + supplied; useful to distinguish when a user explicitly passes None + in scenarios where None has special meaning. """ # construct the args @@ -322,7 +331,7 @@ def _construct_axes_from_arguments(self, args, kwargs, require_all=False): raise TypeError("not enough/duplicate arguments " "specified!") - axes = {a: kwargs.pop(a, None) for a in self._AXIS_ORDERS} + axes = {a: kwargs.pop(a, sentinel) for a in self._AXIS_ORDERS} return axes, kwargs @classmethod @@ -530,7 +539,7 @@ def set_axis(self, labels, axis=0, inplace=None): The axis to update. The value 0 identifies the rows, and 1 identifies the columns. - inplace : boolean, default None + inplace : bool, default None Whether to return a new %(klass)s instance. .. warning:: @@ -977,7 +986,7 @@ def rename(self, *args, **kwargs): See Also -------- - pandas.NDFrame.rename_axis + NDFrame.rename_axis Examples -------- @@ -1089,7 +1098,7 @@ def rename(self, *args, **kwargs): @rewrite_axis_style_signature('mapper', [('copy', True), ('inplace', False)]) - def rename_axis(self, mapper=None, **kwargs): + def rename_axis(self, mapper=sentinel, **kwargs): """ Set the name of the axis for the index or columns. @@ -1218,7 +1227,8 @@ class name cat 4 0 monkey 2 2 """ - axes, kwargs = self._construct_axes_from_arguments((), kwargs) + axes, kwargs = self._construct_axes_from_arguments( + (), kwargs, sentinel=sentinel) copy = kwargs.pop('copy', True) inplace = kwargs.pop('inplace', False) axis = kwargs.pop('axis', 0) @@ -1231,7 +1241,7 @@ class name inplace = validate_bool_kwarg(inplace, 'inplace') - if (mapper is not None): + if (mapper is not sentinel): # Use v0.23 behavior if a scalar or list non_mapper = is_scalar(mapper) or (is_list_like(mapper) and not is_dict_like(mapper)) @@ -1254,7 +1264,7 @@ class name for axis in lrange(self._AXIS_LEN): v = axes.get(self._AXIS_NAMES[axis]) - if v is None: + if v is sentinel: continue non_mapper = is_scalar(v) or (is_list_like(v) and not is_dict_like(v)) @@ -1554,14 +1564,14 @@ def _is_label_reference(self, key, axis=0): ------- is_label: bool """ - axis = self._get_axis_number(axis) - other_axes = [ax for ax in range(self._AXIS_LEN) if ax != axis] - if self.ndim > 2: raise NotImplementedError( "_is_label_reference is not implemented for {type}" .format(type=type(self))) + axis = self._get_axis_number(axis) + other_axes = (ax for ax in range(self._AXIS_LEN) if ax != axis) + return (key is not None and is_hashable(key) and any(key in self.axes[ax] for ax in other_axes)) @@ -1613,15 +1623,14 @@ def _check_label_or_level_ambiguity(self, key, axis=0): ------ ValueError: `key` is ambiguous """ - - axis = self._get_axis_number(axis) - other_axes = [ax for ax in range(self._AXIS_LEN) if ax != axis] - if self.ndim > 2: raise NotImplementedError( "_check_label_or_level_ambiguity is not implemented for {type}" .format(type=type(self))) + axis = self._get_axis_number(axis) + other_axes = (ax for ax in range(self._AXIS_LEN) if ax != axis) + if (key is not None and is_hashable(key) and key in self.axes[axis].names and @@ -1679,15 +1688,14 @@ def _get_label_or_level_values(self, key, axis=0): if `key` is ambiguous. This will become an ambiguity error in a future version """ - - axis = self._get_axis_number(axis) - other_axes = [ax for ax in range(self._AXIS_LEN) if ax != axis] - if self.ndim > 2: raise NotImplementedError( "_get_label_or_level_values is not implemented for {type}" .format(type=type(self))) + axis = self._get_axis_number(axis) + other_axes = [ax for ax in range(self._AXIS_LEN) if ax != axis] + if self._is_label_reference(key, axis=axis): self._check_label_or_level_ambiguity(key, axis=axis) values = self.xs(key, axis=other_axes[0])._values @@ -1743,14 +1751,13 @@ def _drop_labels_or_levels(self, keys, axis=0): ValueError if any `keys` match neither a label nor a level """ - - axis = self._get_axis_number(axis) - if self.ndim > 2: raise NotImplementedError( "_drop_labels_or_levels is not implemented for {type}" .format(type=type(self))) + axis = self._get_axis_number(axis) + # Validate keys keys = com.maybe_make_list(keys) invalid_keys = [k for k in keys if not @@ -1851,8 +1858,8 @@ def empty(self): See Also -------- - pandas.Series.dropna - pandas.DataFrame.dropna + Series.dropna + DataFrame.dropna Notes ----- @@ -3966,35 +3973,37 @@ def add_suffix(self, suffix): def sort_values(self, by=None, axis=0, ascending=True, inplace=False, kind='quicksort', na_position='last'): """ - Sort by the values along either axis + Sort by the values along either axis. Parameters ----------%(optional_by)s axis : %(axes_single_arg)s, default 0 - Axis to be sorted + Axis to be sorted. ascending : bool or list of bool, default True Sort ascending vs. descending. Specify list for multiple sort orders. If this is a list of bools, must match the length of the by. inplace : bool, default False - if True, perform operation in-place + If True, perform operation in-place. kind : {'quicksort', 'mergesort', 'heapsort'}, default 'quicksort' Choice of sorting algorithm. See also ndarray.np.sort for more information. `mergesort` is the only stable algorithm. For DataFrames, this option is only applied when sorting on a single column or label. na_position : {'first', 'last'}, default 'last' - `first` puts NaNs at the beginning, `last` puts NaNs at the end + Puts NaNs at the beginning if `first`; `last` puts NaNs at the + end. Returns ------- - sorted_obj : %(klass)s + sorted_obj : DataFrame or None + DataFrame with sorted values if inplace=False, None otherwise. Examples -------- >>> df = pd.DataFrame({ - ... 'col1' : ['A', 'A', 'B', np.nan, 'D', 'C'], - ... 'col2' : [2, 1, 9, 8, 7, 4], + ... 'col1': ['A', 'A', 'B', np.nan, 'D', 'C'], + ... 'col2': [2, 1, 9, 8, 7, 4], ... 'col3': [0, 1, 9, 4, 2, 3], ... }) >>> df @@ -4056,32 +4065,35 @@ def sort_values(self, by=None, axis=0, ascending=True, inplace=False, def sort_index(self, axis=0, level=None, ascending=True, inplace=False, kind='quicksort', na_position='last', sort_remaining=True): """ - Sort object by labels (along an axis) + Sort object by labels (along an axis). Parameters ---------- - axis : %(axes)s to direct sorting + axis : {0 or 'index', 1 or 'columns'}, default 0 + The axis along which to sort. The value 0 identifies the rows, + and 1 identifies the columns. level : int or level name or list of ints or list of level names - if not None, sort on values in specified index level(s) - ascending : boolean, default True - Sort ascending vs. descending + If not None, sort on values in specified index level(s). + ascending : bool, default True + Sort ascending vs. descending. inplace : bool, default False - if True, perform operation in-place + If True, perform operation in-place. kind : {'quicksort', 'mergesort', 'heapsort'}, default 'quicksort' - Choice of sorting algorithm. See also ndarray.np.sort for more - information. `mergesort` is the only stable algorithm. For - DataFrames, this option is only applied when sorting on a single - column or label. + Choice of sorting algorithm. See also ndarray.np.sort for more + information. `mergesort` is the only stable algorithm. For + DataFrames, this option is only applied when sorting on a single + column or label. na_position : {'first', 'last'}, default 'last' - `first` puts NaNs at the beginning, `last` puts NaNs at the end. - Not implemented for MultiIndex. + Puts NaNs at the beginning if `first`; `last` puts NaNs at the end. + Not implemented for MultiIndex. sort_remaining : bool, default True - if true and sorting by level and index is multilevel, sort by other - levels too (in order) after sorting by specified level + If True and sorting by level and index is multilevel, sort by other + levels too (in order) after sorting by specified level. Returns ------- - sorted_obj : %(klass)s + sorted_obj : DataFrame or None + DataFrame with sorted index if inplace=False, None otherwise. """ inplace = validate_bool_kwarg(inplace, 'inplace') axis = self._get_axis_number(axis) @@ -4936,10 +4948,10 @@ def pipe(self, func, *args, **kwargs): Returns ------- DataFrame, Series or scalar - if DataFrame.agg is called with a single function, returns a Series - if DataFrame.agg is called with several functions, returns a DataFrame - if Series.agg is called with single function, returns a scalar - if Series.agg is called with several functions, returns a Series + If DataFrame.agg is called with a single function, returns a Series + If DataFrame.agg is called with several functions, returns a DataFrame + If Series.agg is called with single function, returns a scalar + If Series.agg is called with several functions, returns a Series %(see_also)s @@ -5257,8 +5269,8 @@ def values(self): See Also -------- DataFrame.to_numpy : Recommended alternative to this method. - pandas.DataFrame.index : Retrieve the index labels. - pandas.DataFrame.columns : Retrieving the column names. + DataFrame.index : Retrieve the index labels. + DataFrame.columns : Retrieving the column names. Notes ----- @@ -5329,7 +5341,7 @@ def get_values(self): Return an ndarray after converting sparse values to dense. This is the same as ``.values`` for non-sparse data. For sparse - data contained in a `pandas.SparseArray`, the data are first + data contained in a `SparseArray`, the data are first converted to a dense representation. Returns @@ -5340,7 +5352,7 @@ def get_values(self): See Also -------- values : Numpy representation of DataFrame. - pandas.SparseArray : Container for sparse data. + SparseArray : Container for sparse data. Examples -------- @@ -5461,7 +5473,7 @@ def dtypes(self): See Also -------- - pandas.DataFrame.ftypes : Dtype and sparsity information. + DataFrame.ftypes : Dtype and sparsity information. Examples -------- @@ -5497,8 +5509,8 @@ def ftypes(self): See Also -------- - pandas.DataFrame.dtypes: Series with just dtype information. - pandas.SparseDataFrame : Container for sparse tabular data. + DataFrame.dtypes: Series with just dtype information. + SparseDataFrame : Container for sparse tabular data. Notes ----- @@ -6596,7 +6608,7 @@ def replace(self, to_replace=None, value=None, inplace=False, limit=None, 'barycentric', 'polynomial': Passed to `scipy.interpolate.interp1d`. Both 'polynomial' and 'spline' require that you also specify an `order` (int), - e.g. ``df.interpolate(method='polynomial', order=4)``. + e.g. ``df.interpolate(method='polynomial', order=5)``. These use the numerical values of the index. * 'krogh', 'piecewise_polynomial', 'spline', 'pchip', 'akima': Wrappers around the SciPy interpolation methods of similar @@ -6863,10 +6875,10 @@ def asof(self, where, subset=None): ------- scalar, Series, or DataFrame - * scalar : when `self` is a Series and `where` is a scalar - * Series: when `self` is a Series and `where` is an array-like, + Scalar : when `self` is a Series and `where` is a scalar + Series: when `self` is a Series and `where` is an array-like, or when `self` is a DataFrame and `where` is a scalar - * DataFrame : when `self` is a DataFrame and `where` is an + DataFrame : when `self` is a DataFrame and `where` is an array-like See Also @@ -8564,7 +8576,7 @@ def _where(self, cond, other=np.nan, inplace=False, axis=None, level=None, cond = self._constructor(cond, **self._construct_axes_dict()) # make sure we are boolean - fill_value = True if inplace else False + fill_value = bool(inplace) cond = cond.fillna(fill_value) msg = "Boolean array expected for the condition, not {dtype}" @@ -9979,8 +9991,7 @@ def _add_numeric_operations(cls): cls, 'all', name, name2, axis_descr, _all_desc, nanops.nanall, _all_see_also, _all_examples, empty_value=True) - @Substitution(outname='mad', - desc="Return the mean absolute deviation of the values " + @Substitution(desc="Return the mean absolute deviation of the values " "for the requested axis.", name1=name, name2=name2, axis_descr=axis_descr, min_count='', see_also='', examples='') @@ -10021,8 +10032,7 @@ def mad(self, axis=None, skipna=None, level=None): "ddof argument", nanops.nanstd) - @Substitution(outname='compounded', - desc="Return the compound percentage of the values for " + @Substitution(desc="Return the compound percentage of the values for " "the requested axis.", name1=name, name2=name2, axis_descr=axis_descr, min_count='', see_also='', examples='') @@ -10112,7 +10122,7 @@ def nanptp(values, axis=0, skipna=True): cls.ptp = _make_stat_function( cls, 'ptp', name, name2, axis_descr, - """Returns the difference between the maximum value and the + """Return the difference between the maximum value and the minimum value in the object. This is the equivalent of the ``numpy.ndarray`` method ``ptp``.\n\n.. deprecated:: 0.24.0 Use numpy.ptp instead""", @@ -10230,8 +10240,8 @@ def last_valid_index(self): def _doc_parms(cls): """Return a tuple of the doc parms.""" - axis_descr = "{%s}" % ', '.join(["{0} ({1})".format(a, i) - for i, a in enumerate(cls._AXIS_ORDERS)]) + axis_descr = "{%s}" % ', '.join("{0} ({1})".format(a, i) + for i, a in enumerate(cls._AXIS_ORDERS)) name = (cls._constructor_sliced.__name__ if cls._AXIS_LEN > 1 else 'scalar') name2 = cls.__name__ @@ -10259,7 +10269,7 @@ def _doc_parms(cls): Returns ------- -%(outname)s : %(name1)s or %(name2)s (if level specified) +%(name1)s or %(name2)s (if level specified) %(see_also)s %(examples)s\ """ @@ -10285,7 +10295,7 @@ def _doc_parms(cls): Returns ------- -%(outname)s : %(name1)s or %(name2)s (if level specified)\n""" +%(name1)s or %(name2)s (if level specified)\n""" _bool_doc = """ %(desc)s @@ -10404,7 +10414,7 @@ def _doc_parms(cls): Returns ------- -%(outname)s : %(name1)s or %(name2)s\n +%(name1)s or %(name2)s\n See Also -------- core.window.Expanding.%(accum_func_name)s : Similar functionality @@ -10897,7 +10907,7 @@ def _doc_parms(cls): def _make_min_count_stat_function(cls, name, name1, name2, axis_descr, desc, f, see_also='', examples=''): - @Substitution(outname=name, desc=desc, name1=name1, name2=name2, + @Substitution(desc=desc, name1=name1, name2=name2, axis_descr=axis_descr, min_count=_min_count_stub, see_also=see_also, examples=examples) @Appender(_num_doc) @@ -10925,7 +10935,7 @@ def stat_func(self, axis=None, skipna=None, level=None, numeric_only=None, def _make_stat_function(cls, name, name1, name2, axis_descr, desc, f, see_also='', examples=''): - @Substitution(outname=name, desc=desc, name1=name1, name2=name2, + @Substitution(desc=desc, name1=name1, name2=name2, axis_descr=axis_descr, min_count='', see_also=see_also, examples=examples) @Appender(_num_doc) @@ -10949,7 +10959,7 @@ def stat_func(self, axis=None, skipna=None, level=None, numeric_only=None, def _make_stat_function_ddof(cls, name, name1, name2, axis_descr, desc, f): - @Substitution(outname=name, desc=desc, name1=name1, name2=name2, + @Substitution(desc=desc, name1=name1, name2=name2, axis_descr=axis_descr) @Appender(_num_ddof_doc) def stat_func(self, axis=None, skipna=None, level=None, ddof=1, @@ -10970,7 +10980,7 @@ def stat_func(self, axis=None, skipna=None, level=None, ddof=1, def _make_cum_function(cls, name, name1, name2, axis_descr, desc, accum_func, accum_func_name, mask_a, mask_b, examples): - @Substitution(outname=name, desc=desc, name1=name1, name2=name2, + @Substitution(desc=desc, name1=name1, name2=name2, axis_descr=axis_descr, accum_func_name=accum_func_name, examples=examples) @Appender(_cnum_doc) @@ -11005,7 +11015,7 @@ def cum_func(self, axis=None, skipna=True, *args, **kwargs): def _make_logical_function(cls, name, name1, name2, axis_descr, desc, f, see_also, examples, empty_value): - @Substitution(outname=name, desc=desc, name1=name1, name2=name2, + @Substitution(desc=desc, name1=name1, name2=name2, axis_descr=axis_descr, see_also=see_also, examples=examples, empty_value=empty_value) @Appender(_bool_doc) diff --git a/pandas/core/groupby/__init__.py b/pandas/core/groupby/__init__.py index 9c15a5ebfe0f2..ac35f3825e5e8 100644 --- a/pandas/core/groupby/__init__.py +++ b/pandas/core/groupby/__init__.py @@ -1,4 +1,4 @@ from pandas.core.groupby.groupby import GroupBy # noqa: F401 from pandas.core.groupby.generic import ( # noqa: F401 - SeriesGroupBy, DataFrameGroupBy, PanelGroupBy) + SeriesGroupBy, DataFrameGroupBy) from pandas.core.groupby.grouper import Grouper # noqa: F401 diff --git a/pandas/core/groupby/generic.py b/pandas/core/groupby/generic.py index c5142a4ee98cc..27e13e86a6e9e 100644 --- a/pandas/core/groupby/generic.py +++ b/pandas/core/groupby/generic.py @@ -1,5 +1,5 @@ """ -Define the SeriesGroupBy, DataFrameGroupBy, and PanelGroupBy +Define the SeriesGroupBy and DataFrameGroupBy classes that hold the groupby interfaces (and some implementations). These are user facing as the result of the ``df.groupby(...)`` operations, @@ -39,7 +39,6 @@ from pandas.core.index import CategoricalIndex, Index, MultiIndex import pandas.core.indexes.base as ibase from pandas.core.internals import BlockManager, make_block -from pandas.core.panel import Panel from pandas.core.series import Series from pandas.plotting._core import boxplot_frame_groupby @@ -1021,7 +1020,9 @@ def true_and_notna(x, *args, **kwargs): return filtered def nunique(self, dropna=True): - """ Returns number of unique elements in the group """ + """ + Return number of unique elements in the group. + """ ids, _, _ = self.grouper.group_info val = self.obj.get_values() @@ -1461,8 +1462,8 @@ def _reindex_output(self, result): # reindex `result`, and then reset the in-axis grouper columns. # Select in-axis groupers - in_axis_grps = [(i, ping.name) for (i, ping) - in enumerate(groupings) if ping.in_axis] + in_axis_grps = ((i, ping.name) for (i, ping) + in enumerate(groupings) if ping.in_axis) g_nums, g_names = zip(*in_axis_grps) result = result.drop(labels=list(g_names), axis=1) @@ -1584,90 +1585,3 @@ def groupby_series(obj, col=None): return results boxplot = boxplot_frame_groupby - - -class PanelGroupBy(NDFrameGroupBy): - - def aggregate(self, arg, *args, **kwargs): - return super(PanelGroupBy, self).aggregate(arg, *args, **kwargs) - - agg = aggregate - - def _iterate_slices(self): - if self.axis == 0: - # kludge - if self._selection is None: - slice_axis = self._selected_obj.items - else: - slice_axis = self._selection_list - slicer = lambda x: self._selected_obj[x] - else: - raise NotImplementedError("axis other than 0 is not supported") - - for val in slice_axis: - if val in self.exclusions: - continue - - yield val, slicer(val) - - def aggregate(self, arg, *args, **kwargs): - """ - Aggregate using input function or dict of {column -> function} - - Parameters - ---------- - arg : function or dict - Function to use for aggregating groups. If a function, must either - work when passed a Panel or when passed to Panel.apply. If - pass a dict, the keys must be DataFrame column names - - Returns - ------- - aggregated : Panel - """ - if isinstance(arg, compat.string_types): - return getattr(self, arg)(*args, **kwargs) - - return self._aggregate_generic(arg, *args, **kwargs) - - def _wrap_generic_output(self, result, obj): - if self.axis == 0: - new_axes = list(obj.axes) - new_axes[0] = self.grouper.result_index - elif self.axis == 1: - x, y, z = obj.axes - new_axes = [self.grouper.result_index, z, x] - else: - x, y, z = obj.axes - new_axes = [self.grouper.result_index, y, x] - - result = Panel._from_axes(result, new_axes) - - if self.axis == 1: - result = result.swapaxes(0, 1).swapaxes(0, 2) - elif self.axis == 2: - result = result.swapaxes(0, 2) - - return result - - def _aggregate_item_by_item(self, func, *args, **kwargs): - obj = self._obj_with_exclusions - result = {} - - if self.axis > 0: - for item in obj: - try: - itemg = DataFrameGroupBy(obj[item], - axis=self.axis - 1, - grouper=self.grouper) - result[item] = itemg.aggregate(func, *args, **kwargs) - except (ValueError, TypeError): - raise - new_axes = list(obj.axes) - new_axes[self.axis] = self.grouper.result_index - return Panel._from_axes(result, new_axes) - else: - raise ValueError("axis value must be greater than 0") - - def _wrap_aggregated_output(self, output, names=None): - raise AbstractMethodError(self) diff --git a/pandas/core/groupby/groupby.py b/pandas/core/groupby/groupby.py index 8766fdbc29755..c7f1aa697c2e8 100644 --- a/pandas/core/groupby/groupby.py +++ b/pandas/core/groupby/groupby.py @@ -18,7 +18,7 @@ class providing the base-class of operations. from pandas._libs import Timestamp, groupby as libgroupby import pandas.compat as compat -from pandas.compat import callable, range, set_function_name, zip +from pandas.compat import range, set_function_name, zip from pandas.compat.numpy import function as nv from pandas.errors import AbstractMethodError from pandas.util._decorators import Appender, Substitution, cache_readonly @@ -44,9 +44,9 @@ class providing the base-class of operations. _common_see_also = """ See Also -------- - pandas.Series.%(name)s - pandas.DataFrame.%(name)s - pandas.Panel.%(name)s + Series.%(name)s + DataFrame.%(name)s + Panel.%(name)s """ _apply_docs = dict( @@ -206,8 +206,8 @@ class providing the base-class of operations. See Also -------- -pandas.Series.pipe : Apply a function with arguments to a series. -pandas.DataFrame.pipe: Apply a function with arguments to a dataframe. +Series.pipe : Apply a function with arguments to a series. +DataFrame.pipe: Apply a function with arguments to a dataframe. apply : Apply function to each group instead of to the full %(klass)s object. @@ -443,12 +443,12 @@ def get_converter(s): raise ValueError(msg) converters = [get_converter(s) for s in index_sample] - names = [tuple(f(n) for f, n in zip(converters, name)) - for name in names] + names = (tuple(f(n) for f, n in zip(converters, name)) + for name in names) else: converter = get_converter(index_sample) - names = [converter(name) for name in names] + names = (converter(name) for name in names) return [self.indices.get(name, []) for name in names] @@ -625,7 +625,7 @@ def curried(x): def get_group(self, name, obj=None): """ - Constructs NDFrame from group with provided name. + Construct NDFrame from group with provided name. Parameters ---------- @@ -1047,7 +1047,7 @@ def result_to_bool(result): @Appender(_common_see_also) def any(self, skipna=True): """ - Returns True if any value in the group is truthful, else False. + Return True if any value in the group is truthful, else False. Parameters ---------- @@ -1060,7 +1060,7 @@ def any(self, skipna=True): @Appender(_common_see_also) def all(self, skipna=True): """ - Returns True if all values in the group are truthful, else False. + Return True if all values in the group are truthful, else False. Parameters ---------- @@ -1351,7 +1351,7 @@ def resample(self, rule, *args, **kwargs): See Also -------- - pandas.Grouper : Specify a frequency to resample with when + Grouper : Specify a frequency to resample with when grouping by a key. DatetimeIndex.resample : Frequency conversion and resampling of time series. @@ -1813,7 +1813,7 @@ def cumcount(self, ascending=True): def rank(self, method='average', ascending=True, na_option='keep', pct=False, axis=0): """ - Provides the rank of values within each group. + Provide the rank of values within each group. Parameters ---------- @@ -2039,7 +2039,7 @@ def pct_change(self, periods=1, fill_method='pad', limit=None, freq=None, @Substitution(name='groupby', see_also=_common_see_also) def head(self, n=5): """ - Returns first n rows of each group. + Return first n rows of each group. Essentially equivalent to ``.apply(lambda x: x.head(n))``, except ignores as_index flag. @@ -2067,7 +2067,7 @@ def head(self, n=5): @Substitution(name='groupby', see_also=_common_see_also) def tail(self, n=5): """ - Returns last n rows of each group. + Return last n rows of each group. Essentially equivalent to ``.apply(lambda x: x.tail(n))``, except ignores as_index flag. diff --git a/pandas/core/groupby/grouper.py b/pandas/core/groupby/grouper.py index 633a1643f6cdd..edba9439a675e 100644 --- a/pandas/core/groupby/grouper.py +++ b/pandas/core/groupby/grouper.py @@ -8,7 +8,7 @@ import numpy as np import pandas.compat as compat -from pandas.compat import callable, zip +from pandas.compat import zip from pandas.util._decorators import cache_readonly from pandas.core.dtypes.common import ( @@ -195,9 +195,9 @@ def groups(self): return self.grouper.groups def __repr__(self): - attrs_list = ["{}={!r}".format(attr_name, getattr(self, attr_name)) + attrs_list = ("{}={!r}".format(attr_name, getattr(self, attr_name)) for attr_name in self._attributes - if getattr(self, attr_name) is not None] + if getattr(self, attr_name) is not None) attrs = ", ".join(attrs_list) cls_name = self.__class__.__name__ return "{}({})".format(cls_name, attrs) @@ -299,6 +299,7 @@ def __init__(self, index, grouper=None, obj=None, name=None, level=None, self._labels = self.grouper.codes if observed: codes = algorithms.unique1d(self.grouper.codes) + codes = codes[codes != -1] else: codes = np.arange(len(categories)) diff --git a/pandas/core/groupby/ops.py b/pandas/core/groupby/ops.py index 87f48d5a40554..78c9aa9187135 100644 --- a/pandas/core/groupby/ops.py +++ b/pandas/core/groupby/ops.py @@ -380,7 +380,7 @@ def get_func(fname): # otherwise find dtype-specific version, falling back to object for dt in [dtype_str, 'object']: f = getattr(libgroupby, "{fname}_{dtype_str}".format( - fname=fname, dtype_str=dtype_str), None) + fname=fname, dtype_str=dt), None) if f is not None: return f diff --git a/pandas/core/indexes/accessors.py b/pandas/core/indexes/accessors.py index c43469d3c3a81..602e11a08b4ed 100644 --- a/pandas/core/indexes/accessors.py +++ b/pandas/core/indexes/accessors.py @@ -140,7 +140,7 @@ def to_pydatetime(self): Returns ------- numpy.ndarray - object dtype array containing native Python datetime objects. + Object dtype array containing native Python datetime objects. See Also -------- @@ -208,7 +208,7 @@ def to_pytimedelta(self): Returns ------- a : numpy.ndarray - 1D array containing data with `datetime.timedelta` type. + Array of 1D containing data with `datetime.timedelta` type. See Also -------- diff --git a/pandas/core/indexes/api.py b/pandas/core/indexes/api.py index 684a19c56c92f..6299fc482d0df 100644 --- a/pandas/core/indexes/api.py +++ b/pandas/core/indexes/api.py @@ -112,7 +112,7 @@ def _get_combined_index(indexes, intersect=False, sort=False): elif intersect: index = indexes[0] for other in indexes[1:]: - index = index.intersection(other, sort=sort) + index = index.intersection(other) else: index = _union_indexes(indexes, sort=sort) index = ensure_index(index) diff --git a/pandas/core/indexes/base.py b/pandas/core/indexes/base.py index 767da81c5c43a..cf813f4c3030b 100644 --- a/pandas/core/indexes/base.py +++ b/pandas/core/indexes/base.py @@ -1832,9 +1832,9 @@ def isna(self): See Also -------- - pandas.Index.notna : Boolean inverse of isna. - pandas.Index.dropna : Omit entries with missing values. - pandas.isna : Top-level isna. + Index.notna : Boolean inverse of isna. + Index.dropna : Omit entries with missing values. + isna : Top-level isna. Series.isna : Detect missing values in Series object. Examples @@ -1892,7 +1892,7 @@ def notna(self): -------- Index.notnull : Alias of notna. Index.isna: Inverse of notna. - pandas.notna : Top-level notna. + notna : Top-level notna. Examples -------- @@ -2074,9 +2074,9 @@ def duplicated(self, keep='first'): See Also -------- - pandas.Series.duplicated : Equivalent method on pandas.Series. - pandas.DataFrame.duplicated : Equivalent method on pandas.DataFrame. - pandas.Index.drop_duplicates : Remove duplicate values from Index. + Series.duplicated : Equivalent method on pandas.Series. + DataFrame.duplicated : Equivalent method on pandas.DataFrame. + Index.drop_duplicates : Remove duplicate values from Index. Examples -------- @@ -2245,18 +2245,37 @@ def _get_reconciled_name_object(self, other): return self._shallow_copy(name=name) return self - def union(self, other, sort=True): + def _validate_sort_keyword(self, sort): + if sort not in [None, False]: + raise ValueError("The 'sort' keyword only takes the values of " + "None or False; {0} was passed.".format(sort)) + + def union(self, other, sort=None): """ Form the union of two Index objects. Parameters ---------- other : Index or array-like - sort : bool, default True - Sort the resulting index if possible + sort : bool or None, default None + Whether to sort the resulting Index. + + * None : Sort the result, except when + + 1. `self` and `other` are equal. + 2. `self` or `other` has length 0. + 3. Some values in `self` or `other` cannot be compared. + A RuntimeWarning is issued in this case. + + * False : do not sort the result. .. versionadded:: 0.24.0 + .. versionchanged:: 0.24.1 + + Changed the default value from ``True`` to ``None`` + (without change in behaviour). + Returns ------- union : Index @@ -2269,6 +2288,7 @@ def union(self, other, sort=True): >>> idx1.union(idx2) Int64Index([1, 2, 3, 4, 5, 6], dtype='int64') """ + self._validate_sort_keyword(sort) self._assert_can_do_setop(other) other = ensure_index(other) @@ -2319,7 +2339,7 @@ def union(self, other, sort=True): else: result = lvals - if sort: + if sort is None: try: result = sorting.safe_sort(result) except TypeError as e: @@ -2333,7 +2353,7 @@ def union(self, other, sort=True): def _wrap_setop_result(self, other, result): return self._constructor(result, name=get_op_result_name(self, other)) - def intersection(self, other, sort=True): + def intersection(self, other, sort=False): """ Form the intersection of two Index objects. @@ -2342,11 +2362,20 @@ def intersection(self, other, sort=True): Parameters ---------- other : Index or array-like - sort : bool, default True - Sort the resulting index if possible + sort : False or None, default False + Whether to sort the resulting index. + + * False : do not sort the result. + * None : sort the result, except when `self` and `other` are equal + or when the values cannot be compared. .. versionadded:: 0.24.0 + .. versionchanged:: 0.24.1 + + Changed the default from ``True`` to ``False``, to match + the behaviour of 0.23.4 and earlier. + Returns ------- intersection : Index @@ -2359,6 +2388,7 @@ def intersection(self, other, sort=True): >>> idx1.intersection(idx2) Int64Index([3, 4], dtype='int64') """ + self._validate_sort_keyword(sort) self._assert_can_do_setop(other) other = ensure_index(other) @@ -2398,7 +2428,7 @@ def intersection(self, other, sort=True): taken = other.take(indexer) - if sort: + if sort is None: taken = sorting.safe_sort(taken.values) if self.name != other.name: name = None @@ -2411,7 +2441,7 @@ def intersection(self, other, sort=True): return taken - def difference(self, other, sort=True): + def difference(self, other, sort=None): """ Return a new Index with elements from the index that are not in `other`. @@ -2421,11 +2451,22 @@ def difference(self, other, sort=True): Parameters ---------- other : Index or array-like - sort : bool, default True - Sort the resulting index if possible + sort : False or None, default None + Whether to sort the resulting index. By default, the + values are attempted to be sorted, but any TypeError from + incomparable elements is caught by pandas. + + * None : Attempt to sort the result, but catch any TypeErrors + from comparing incomparable elements. + * False : Do not sort the result. .. versionadded:: 0.24.0 + .. versionchanged:: 0.24.1 + + Changed the default value from ``True`` to ``None`` + (without change in behaviour). + Returns ------- difference : Index @@ -2440,6 +2481,7 @@ def difference(self, other, sort=True): >>> idx1.difference(idx2, sort=False) Int64Index([2, 1], dtype='int64') """ + self._validate_sort_keyword(sort) self._assert_can_do_setop(other) if self.equals(other): @@ -2456,7 +2498,7 @@ def difference(self, other, sort=True): label_diff = np.setdiff1d(np.arange(this.size), indexer, assume_unique=True) the_diff = this.values.take(label_diff) - if sort: + if sort is None: try: the_diff = sorting.safe_sort(the_diff) except TypeError: @@ -2464,7 +2506,7 @@ def difference(self, other, sort=True): return this._shallow_copy(the_diff, name=result_name, freq=None) - def symmetric_difference(self, other, result_name=None, sort=True): + def symmetric_difference(self, other, result_name=None, sort=None): """ Compute the symmetric difference of two Index objects. @@ -2472,11 +2514,22 @@ def symmetric_difference(self, other, result_name=None, sort=True): ---------- other : Index or array-like result_name : str - sort : bool, default True - Sort the resulting index if possible + sort : False or None, default None + Whether to sort the resulting index. By default, the + values are attempted to be sorted, but any TypeError from + incomparable elements is caught by pandas. + + * None : Attempt to sort the result, but catch any TypeErrors + from comparing incomparable elements. + * False : Do not sort the result. .. versionadded:: 0.24.0 + .. versionchanged:: 0.24.1 + + Changed the default value from ``True`` to ``None`` + (without change in behaviour). + Returns ------- symmetric_difference : Index @@ -2500,6 +2553,7 @@ def symmetric_difference(self, other, result_name=None, sort=True): >>> idx1 ^ idx2 Int64Index([1, 5], dtype='int64') """ + self._validate_sort_keyword(sort) self._assert_can_do_setop(other) other, result_name_update = self._convert_can_do_setop(other) if result_name is None: @@ -2520,7 +2574,7 @@ def symmetric_difference(self, other, result_name=None, sort=True): right_diff = other.values.take(right_indexer) the_diff = _concat._concat_compat([left_diff, right_diff]) - if sort: + if sort is None: try: the_diff = sorting.safe_sort(the_diff) except TypeError: @@ -3995,7 +4049,7 @@ def putmask(self, mask, value): def equals(self, other): """ - Determines if two Index objects contain the same elements. + Determine if two Index objects contain the same elements. """ if self.is_(other): return True @@ -4090,7 +4144,7 @@ def asof(self, label): def asof_locs(self, where, mask): """ - Finds the locations (indices) of the labels from the index for + Find the locations (indices) of the labels from the index for every entry in the `where` argument. As in the `asof` function, if the label (a particular entry in @@ -4150,8 +4204,8 @@ def sort_values(self, return_indexer=False, ascending=True): See Also -------- - pandas.Series.sort_values : Sort values of a Series. - pandas.DataFrame.sort_values : Sort values in a DataFrame. + Series.sort_values : Sort values of a Series. + DataFrame.sort_values : Sort values in a DataFrame. Examples -------- @@ -4205,7 +4259,7 @@ def shift(self, periods=1, freq=None): Returns ------- pandas.Index - shifted index + Shifted index See Also -------- @@ -5123,9 +5177,9 @@ def _add_logical_methods(cls): See Also -------- - pandas.Index.any : Return whether any element in an Index is True. - pandas.Series.any : Return whether any element in a Series is True. - pandas.Series.all : Return whether all elements in a Series are True. + Index.any : Return whether any element in an Index is True. + Series.any : Return whether any element in a Series is True. + Series.all : Return whether all elements in a Series are True. Notes ----- @@ -5163,8 +5217,8 @@ def _add_logical_methods(cls): See Also -------- - pandas.Index.all : Return whether all elements are True. - pandas.Series.all : Return whether all elements are True. + Index.all : Return whether all elements are True. + Series.all : Return whether all elements are True. Notes ----- diff --git a/pandas/core/indexes/category.py b/pandas/core/indexes/category.py index e43b64827d02a..c6d31339f950d 100644 --- a/pandas/core/indexes/category.py +++ b/pandas/core/indexes/category.py @@ -232,7 +232,7 @@ def _is_dtype_compat(self, other): def equals(self, other): """ - Determines if two CategorialIndex objects contain the same elements. + Determine if two CategorialIndex objects contain the same elements. """ if self.is_(other): return True @@ -780,8 +780,8 @@ def _concat_same_dtype(self, to_concat, name): Concatenate to_concat which has the same class ValueError if other is not in the categories """ - to_concat = [self._is_dtype_compat(c) for c in to_concat] - codes = np.concatenate([c.codes for c in to_concat]) + codes = np.concatenate([self._is_dtype_compat(c).codes + for c in to_concat]) result = self._create_from_codes(codes, name=name) # if name is None, _create_from_codes sets self.name result.name = name diff --git a/pandas/core/indexes/datetimes.py b/pandas/core/indexes/datetimes.py index cc373c06efcc9..df91c71cfe238 100644 --- a/pandas/core/indexes/datetimes.py +++ b/pandas/core/indexes/datetimes.py @@ -594,7 +594,7 @@ def _wrap_setop_result(self, other, result): name = get_op_result_name(self, other) return self._shallow_copy(result, name=name, freq=None, tz=self.tz) - def intersection(self, other, sort=True): + def intersection(self, other, sort=False): """ Specialized intersection for DatetimeIndex objects. May be much faster than Index.intersection @@ -602,11 +602,21 @@ def intersection(self, other, sort=True): Parameters ---------- other : DatetimeIndex or array-like + sort : False or None, default False + Sort the resulting index if possible. + + .. versionadded:: 0.24.0 + + .. versionchanged:: 0.24.1 + + Changed the default to ``False`` to match the behaviour + from before 0.24.0. Returns ------- y : Index or DatetimeIndex """ + self._validate_sort_keyword(sort) self._assert_can_do_setop(other) if self.equals(other): @@ -1274,7 +1284,7 @@ def delete(self, loc): def indexer_at_time(self, time, asof=False): """ - Returns index locations of index values at particular time of day + Return index locations of index values at particular time of day (e.g. 9:30AM). Parameters @@ -1403,10 +1413,10 @@ def date_range(start=None, end=None, periods=None, freq=None, tz=None, See Also -------- - pandas.DatetimeIndex : An immutable container for datetimes. - pandas.timedelta_range : Return a fixed frequency TimedeltaIndex. - pandas.period_range : Return a fixed frequency PeriodIndex. - pandas.interval_range : Return a fixed frequency IntervalIndex. + DatetimeIndex : An immutable container for datetimes. + timedelta_range : Return a fixed frequency TimedeltaIndex. + period_range : Return a fixed frequency PeriodIndex. + interval_range : Return a fixed frequency IntervalIndex. Notes ----- diff --git a/pandas/core/indexes/interval.py b/pandas/core/indexes/interval.py index 2a6044fb0a08b..2c63fe33c57fe 100644 --- a/pandas/core/indexes/interval.py +++ b/pandas/core/indexes/interval.py @@ -38,6 +38,7 @@ _index_doc_kwargs.update( dict(klass='IntervalIndex', + qualname="IntervalIndex", target_klass='IntervalIndex or list of Intervals', name=textwrap.dedent("""\ name : object, optional @@ -282,10 +283,10 @@ def contains(self, key): examples=""" Examples -------- - >>> idx = pd.IntervalIndex.from_arrays([0, np.nan, 2], [1, np.nan, 3]) - >>> idx.to_tuples() + >>> idx = pd.IntervalIndex.from_arrays([0, np.nan, 2], [1, np.nan, 3]) + >>> idx.to_tuples() Index([(0.0, 1.0), (nan, nan), (2.0, 3.0)], dtype='object') - >>> idx.to_tuples(na_tuple=False) + >>> idx.to_tuples(na_tuple=False) Index([(0.0, 1.0), nan, (2.0, 3.0)], dtype='object')""", )) def to_tuples(self, na_tuple=True): @@ -1092,8 +1093,8 @@ def equals(self, other): def overlaps(self, other): return self._data.overlaps(other) - def _setop(op_name): - def func(self, other, sort=True): + def _setop(op_name, sort=None): + def func(self, other, sort=sort): other = self._as_like_interval_index(other) # GH 19016: ensure set op will not return a prohibited dtype @@ -1127,7 +1128,7 @@ def is_all_dates(self): return False union = _setop('union') - intersection = _setop('intersection') + intersection = _setop('intersection', sort=False) difference = _setop('difference') symmetric_difference = _setop('symmetric_difference') @@ -1201,15 +1202,15 @@ def interval_range(start=None, end=None, periods=None, freq=None, Numeric ``start`` and ``end`` is supported. >>> pd.interval_range(start=0, end=5) - IntervalIndex([(0, 1], (1, 2], (2, 3], (3, 4], (4, 5]] + IntervalIndex([(0, 1], (1, 2], (2, 3], (3, 4], (4, 5]], closed='right', dtype='interval[int64]') Additionally, datetime-like input is also supported. >>> pd.interval_range(start=pd.Timestamp('2017-01-01'), - end=pd.Timestamp('2017-01-04')) + ... end=pd.Timestamp('2017-01-04')) IntervalIndex([(2017-01-01, 2017-01-02], (2017-01-02, 2017-01-03], - (2017-01-03, 2017-01-04]] + (2017-01-03, 2017-01-04]], closed='right', dtype='interval[datetime64[ns]]') The ``freq`` parameter specifies the frequency between the left and right. @@ -1217,23 +1218,23 @@ def interval_range(start=None, end=None, periods=None, freq=None, numeric ``start`` and ``end``, the frequency must also be numeric. >>> pd.interval_range(start=0, periods=4, freq=1.5) - IntervalIndex([(0.0, 1.5], (1.5, 3.0], (3.0, 4.5], (4.5, 6.0]] + IntervalIndex([(0.0, 1.5], (1.5, 3.0], (3.0, 4.5], (4.5, 6.0]], closed='right', dtype='interval[float64]') Similarly, for datetime-like ``start`` and ``end``, the frequency must be convertible to a DateOffset. >>> pd.interval_range(start=pd.Timestamp('2017-01-01'), - periods=3, freq='MS') + ... periods=3, freq='MS') IntervalIndex([(2017-01-01, 2017-02-01], (2017-02-01, 2017-03-01], - (2017-03-01, 2017-04-01]] + (2017-03-01, 2017-04-01]], closed='right', dtype='interval[datetime64[ns]]') Specify ``start``, ``end``, and ``periods``; the frequency is generated automatically (linearly spaced). >>> pd.interval_range(start=0, end=6, periods=4) - IntervalIndex([(0.0, 1.5], (1.5, 3.0], (3.0, 4.5], (4.5, 6.0]] + IntervalIndex([(0.0, 1.5], (1.5, 3.0], (3.0, 4.5], (4.5, 6.0]], closed='right', dtype='interval[float64]') @@ -1241,7 +1242,7 @@ def interval_range(start=None, end=None, periods=None, freq=None, intervals within the ``IntervalIndex`` are closed. >>> pd.interval_range(end=5, periods=4, closed='both') - IntervalIndex([[1, 2], [2, 3], [3, 4], [4, 5]] + IntervalIndex([[1, 2], [2, 3], [3, 4], [4, 5]], closed='both', dtype='interval[int64]') """ start = com.maybe_box_datetimelike(start) diff --git a/pandas/core/indexes/multi.py b/pandas/core/indexes/multi.py index e4d01a40bd181..e2237afbcac0f 100644 --- a/pandas/core/indexes/multi.py +++ b/pandas/core/indexes/multi.py @@ -1391,7 +1391,7 @@ def get_level_values(self, level): Returns ------- values : Index - ``values`` is a level of this MultiIndex converted to + Values is a level of this MultiIndex converted to a single :class:`Index` (or subclass thereof). Examples @@ -2879,30 +2879,47 @@ def equal_levels(self, other): return False return True - def union(self, other, sort=True): + def union(self, other, sort=None): """ Form the union of two MultiIndex objects Parameters ---------- other : MultiIndex or array / Index of tuples - sort : bool, default True - Sort the resulting MultiIndex if possible + sort : False or None, default None + Whether to sort the resulting Index. + + * None : Sort the result, except when + + 1. `self` and `other` are equal. + 2. `self` has length 0. + 3. Some values in `self` or `other` cannot be compared. + A RuntimeWarning is issued in this case. + + * False : do not sort the result. .. versionadded:: 0.24.0 + .. versionchanged:: 0.24.1 + + Changed the default value from ``True`` to ``None`` + (without change in behaviour). + Returns ------- Index >>> index.union(index2) """ + self._validate_sort_keyword(sort) self._assert_can_do_setop(other) other, result_names = self._convert_can_do_setop(other) if len(other) == 0 or self.equals(other): return self + # TODO: Index.union returns other when `len(self)` is 0. + uniq_tuples = lib.fast_unique_multiple([self._ndarray_values, other._ndarray_values], sort=sort) @@ -2910,22 +2927,28 @@ def union(self, other, sort=True): return MultiIndex.from_arrays(lzip(*uniq_tuples), sortorder=0, names=result_names) - def intersection(self, other, sort=True): + def intersection(self, other, sort=False): """ Form the intersection of two MultiIndex objects. Parameters ---------- other : MultiIndex or array / Index of tuples - sort : bool, default True + sort : False or None, default False Sort the resulting MultiIndex if possible .. versionadded:: 0.24.0 + .. versionchanged:: 0.24.1 + + Changed the default from ``True`` to ``False``, to match + behaviour from before 0.24.0 + Returns ------- Index """ + self._validate_sort_keyword(sort) self._assert_can_do_setop(other) other, result_names = self._convert_can_do_setop(other) @@ -2936,7 +2959,7 @@ def intersection(self, other, sort=True): other_tuples = other._ndarray_values uniq_tuples = set(self_tuples) & set(other_tuples) - if sort: + if sort is None: uniq_tuples = sorted(uniq_tuples) if len(uniq_tuples) == 0: @@ -2947,22 +2970,28 @@ def intersection(self, other, sort=True): return MultiIndex.from_arrays(lzip(*uniq_tuples), sortorder=0, names=result_names) - def difference(self, other, sort=True): + def difference(self, other, sort=None): """ Compute set difference of two MultiIndex objects Parameters ---------- other : MultiIndex - sort : bool, default True + sort : False or None, default None Sort the resulting MultiIndex if possible .. versionadded:: 0.24.0 + .. versionchanged:: 0.24.1 + + Changed the default value from ``True`` to ``None`` + (without change in behaviour). + Returns ------- diff : MultiIndex """ + self._validate_sort_keyword(sort) self._assert_can_do_setop(other) other, result_names = self._convert_can_do_setop(other) @@ -2982,7 +3011,7 @@ def difference(self, other, sort=True): label_diff = np.setdiff1d(np.arange(this.size), indexer, assume_unique=True) difference = this.values.take(label_diff) - if sort: + if sort is None: difference = sorted(difference) if len(difference) == 0: diff --git a/pandas/core/indexes/range.py b/pandas/core/indexes/range.py index ebf5b279563cf..5aafe9734b6a0 100644 --- a/pandas/core/indexes/range.py +++ b/pandas/core/indexes/range.py @@ -343,22 +343,28 @@ def equals(self, other): return super(RangeIndex, self).equals(other) - def intersection(self, other, sort=True): + def intersection(self, other, sort=False): """ Form the intersection of two Index objects. Parameters ---------- other : Index or array-like - sort : bool, default True + sort : False or None, default False Sort the resulting index if possible .. versionadded:: 0.24.0 + .. versionchanged:: 0.24.1 + + Changed the default to ``False`` to match the behaviour + from before 0.24.0. + Returns ------- intersection : Index """ + self._validate_sort_keyword(sort) if self.equals(other): return self._get_reconciled_name_object(other) @@ -401,7 +407,7 @@ def intersection(self, other, sort=True): if (self._step < 0 and other._step < 0) is not (new_index._step < 0): new_index = new_index[::-1] - if sort: + if sort is None: new_index = new_index.sort_values() return new_index diff --git a/pandas/core/indexes/timedeltas.py b/pandas/core/indexes/timedeltas.py index cbe5ae198838f..830925535dab1 100644 --- a/pandas/core/indexes/timedeltas.py +++ b/pandas/core/indexes/timedeltas.py @@ -207,6 +207,11 @@ def __new__(cls, data=None, unit=None, freq=None, start=None, end=None, 'collection of some kind, {data} was passed' .format(cls=cls.__name__, data=repr(data))) + if unit in {'Y', 'y', 'M'}: + warnings.warn("M and Y units are deprecated and " + "will be removed in a future version.", + FutureWarning, stacklevel=2) + if isinstance(data, TimedeltaArray): if copy: data = data.copy() diff --git a/pandas/core/indexing.py b/pandas/core/indexing.py index bbcde8f3b3305..539da0beaefb4 100755 --- a/pandas/core/indexing.py +++ b/pandas/core/indexing.py @@ -347,10 +347,10 @@ def _setitem_with_indexer(self, indexer, value): # must have all defined axes if we have a scalar # or a list-like on the non-info axes if we have a # list-like - len_non_info_axes = [ + len_non_info_axes = ( len(_ax) for _i, _ax in enumerate(self.obj.axes) if _i != i - ] + ) if any(not l for l in len_non_info_axes): if not is_list_like_indexer(value): raise ValueError("cannot set a frame with no " diff --git a/pandas/core/internals/__init__.py b/pandas/core/internals/__init__.py index 7878613a8b1b1..a662e1d3ae197 100644 --- a/pandas/core/internals/__init__.py +++ b/pandas/core/internals/__init__.py @@ -1,6 +1,6 @@ # -*- coding: utf-8 -*- from .blocks import ( # noqa:F401 - _block2d_to_blocknd, _factor_indexer, _block_shape, # io.pytables + _block_shape, # io.pytables _safe_reshape, # io.packers make_block, # io.pytables, io.packers FloatBlock, IntBlock, ComplexBlock, BoolBlock, ObjectBlock, diff --git a/pandas/core/internals/blocks.py b/pandas/core/internals/blocks.py index df764aa4ba666..ac7d21de442db 100644 --- a/pandas/core/internals/blocks.py +++ b/pandas/core/internals/blocks.py @@ -87,7 +87,8 @@ def __init__(self, values, placement, ndim=None): '{mgr}'.format(val=len(self.values), mgr=len(self.mgr_locs))) def _check_ndim(self, values, ndim): - """ndim inference and validation. + """ + ndim inference and validation. Infers ndim from 'values' if not provided to __init__. Validates that values.ndim and ndim are consistent if and only if @@ -267,20 +268,6 @@ def _slice(self, slicer): """ return a slice of my values """ return self.values[slicer] - def reshape_nd(self, labels, shape, ref_items): - """ - Parameters - ---------- - labels : list of new axis labels - shape : new shape - ref_items : new ref_items - - return a new block that is transformed to a nd block - """ - return _block2d_to_blocknd(values=self.get_values().T, - placement=self.mgr_locs, shape=shape, - labels=labels, ref_items=ref_items) - def getitem_block(self, slicer, new_mgr_locs=None): """ Perform __getitem__-like, return result as block. @@ -2072,17 +2059,9 @@ def get_values(self, dtype=None): return object dtype as boxed values, such as Timestamps/Timedelta """ if is_object_dtype(dtype): - values = self.values - - if self.ndim > 1: - values = values.ravel() - - values = lib.map_infer(values, self._box_func) - - if self.ndim > 1: - values = values.reshape(self.values.shape) - - return values + values = self.values.ravel() + result = self._holder(values).astype(object) + return result.reshape(self.values.shape) return self.values @@ -3155,31 +3134,6 @@ def _merge_blocks(blocks, dtype=None, _can_consolidate=True): return blocks -def _block2d_to_blocknd(values, placement, shape, labels, ref_items): - """ pivot to the labels shape """ - panel_shape = (len(placement),) + shape - - # TODO: lexsort depth needs to be 2!! - - # Create observation selection vector using major and minor - # labels, for converting to panel format. - selector = _factor_indexer(shape[1:], labels) - mask = np.zeros(np.prod(shape), dtype=bool) - mask.put(selector, True) - - if mask.all(): - pvalues = np.empty(panel_shape, dtype=values.dtype) - else: - dtype, fill_value = maybe_promote(values.dtype) - pvalues = np.empty(panel_shape, dtype=dtype) - pvalues.fill(fill_value) - - for i in range(len(placement)): - pvalues[i].flat[mask] = values[:, i] - - return make_block(pvalues, placement=placement) - - def _safe_reshape(arr, new_shape): """ If possible, reshape `arr` to have shape `new_shape`, @@ -3202,16 +3156,6 @@ def _safe_reshape(arr, new_shape): return arr -def _factor_indexer(shape, labels): - """ - given a tuple of shape and a list of Categorical labels, return the - expanded label indexer - """ - mult = np.array(shape)[::-1].cumprod()[::-1] - return ensure_platform_int( - np.sum(np.array(labels).T * np.append(mult, [1]), axis=1).T) - - def _putmask_smart(v, m, n): """ Return a new ndarray, try to preserve dtype if possible. diff --git a/pandas/core/internals/concat.py b/pandas/core/internals/concat.py index 4a16707a376e9..640587b7f9f31 100644 --- a/pandas/core/internals/concat.py +++ b/pandas/core/internals/concat.py @@ -183,7 +183,7 @@ def get_reindexed_values(self, empty_dtype, upcasted_na): is_datetime64tz_dtype(empty_dtype)): if self.block is None: array = empty_dtype.construct_array_type() - return array(np.full(self.shape[1], fill_value), + return array(np.full(self.shape[1], fill_value.value), dtype=empty_dtype) pass elif getattr(self.block, 'is_categorical', False): @@ -335,8 +335,10 @@ def get_empty_dtype_and_na(join_units): elif 'category' in upcast_classes: return np.dtype(np.object_), np.nan elif 'datetimetz' in upcast_classes: + # GH-25014. We use NaT instead of iNaT, since this eventually + # ends up in DatetimeArray.take, which does not allow iNaT. dtype = upcast_classes['datetimetz'] - return dtype[0], tslibs.iNaT + return dtype[0], tslibs.NaT elif 'datetime' in upcast_classes: return np.dtype('M8[ns]'), tslibs.iNaT elif 'timedelta' in upcast_classes: diff --git a/pandas/core/internals/construction.py b/pandas/core/internals/construction.py index 7af347a141781..7e97512682720 100644 --- a/pandas/core/internals/construction.py +++ b/pandas/core/internals/construction.py @@ -93,7 +93,7 @@ def masked_rec_array_to_mgr(data, index, columns, dtype, copy): if columns is None: columns = arr_columns - mgr = arrays_to_mgr(arrays, arr_columns, index, columns) + mgr = arrays_to_mgr(arrays, arr_columns, index, columns, dtype) if copy: mgr = mgr.copy() @@ -197,18 +197,12 @@ def init_dict(data, index, columns, dtype=None): arrays.loc[missing] = [val] * missing.sum() else: - - for key in data: - if (isinstance(data[key], ABCDatetimeIndex) and - data[key].tz is not None): - # GH#24096 need copy to be deep for datetime64tz case - # TODO: See if we can avoid these copies - data[key] = data[key].copy(deep=True) - keys = com.dict_keys_to_ordered_list(data) columns = data_names = Index(keys) - arrays = [data[k] for k in keys] - + # GH#24096 need copy to be deep for datetime64tz case + # TODO: See if we can avoid these copies + arrays = [data[k] if not is_datetime64tz_dtype(data[k]) else + data[k].copy(deep=True) for k in keys] return arrays_to_mgr(arrays, data_names, index, columns, dtype=dtype) diff --git a/pandas/core/internals/managers.py b/pandas/core/internals/managers.py index 050c3d3e87fc6..38b719db1709f 100644 --- a/pandas/core/internals/managers.py +++ b/pandas/core/internals/managers.py @@ -584,10 +584,6 @@ def comp(s, regex=False): bm._consolidate_inplace() return bm - def reshape_nd(self, axes, **kwargs): - """ a 2d-nd reshape operation on a BlockManager """ - return self.apply('reshape_nd', axes=axes, **kwargs) - def is_consolidated(self): """ Return True if more than one block with the same dtype @@ -1971,16 +1967,28 @@ def items_overlap_with_suffix(left, lsuffix, right, rsuffix): raise ValueError('columns overlap but no suffix specified: ' '{rename}'.format(rename=to_rename)) - def lrenamer(x): - if x in to_rename: - return '{x}{lsuffix}'.format(x=x, lsuffix=lsuffix) - return x + def renamer(x, suffix): + """Rename the left and right indices. + + If there is overlap, and suffix is not None, add + suffix, otherwise, leave it as-is. - def rrenamer(x): - if x in to_rename: - return '{x}{rsuffix}'.format(x=x, rsuffix=rsuffix) + Parameters + ---------- + x : original column name + suffix : str or None + + Returns + ------- + x : renamed column name + """ + if x in to_rename and suffix is not None: + return '{x}{suffix}'.format(x=x, suffix=suffix) return x + lrenamer = partial(renamer, suffix=lsuffix) + rrenamer = partial(renamer, suffix=rsuffix) + return (_transform_index(left, lrenamer), _transform_index(right, rrenamer)) diff --git a/pandas/core/missing.py b/pandas/core/missing.py index 15538b8196684..cc7bdf95200d1 100644 --- a/pandas/core/missing.py +++ b/pandas/core/missing.py @@ -1,5 +1,5 @@ """ -Routines for filling missing data +Routines for filling missing data. """ from distutils.version import LooseVersion import operator @@ -116,7 +116,7 @@ def interpolate_1d(xvalues, yvalues, method='linear', limit=None, xvalues and yvalues will each be 1-d arrays of the same length. Bounds_error is currently hardcoded to False since non-scipy ones don't - take it as an argumnet. + take it as an argument. """ # Treat the original, non-scipy methods first. @@ -244,9 +244,9 @@ def interpolate_1d(xvalues, yvalues, method='linear', limit=None, def _interpolate_scipy_wrapper(x, y, new_x, method, fill_value=None, bounds_error=False, order=None, **kwargs): """ - passed off to scipy.interpolate.interp1d. method is scipy's kind. + Passed off to scipy.interpolate.interp1d. method is scipy's kind. Returns an array interpolated at new_x. Add any new methods to - the list in _clean_interp_method + the list in _clean_interp_method. """ try: from scipy import interpolate @@ -314,7 +314,7 @@ def _interpolate_scipy_wrapper(x, y, new_x, method, fill_value=None, def _from_derivatives(xi, yi, x, order=None, der=0, extrapolate=False): """ - Convenience function for interpolate.BPoly.from_derivatives + Convenience function for interpolate.BPoly.from_derivatives. Construct a piecewise polynomial in the Bernstein basis, compatible with the specified values and derivatives at breakpoints. @@ -325,7 +325,7 @@ def _from_derivatives(xi, yi, x, order=None, der=0, extrapolate=False): sorted 1D array of x-coordinates yi : array_like or list of array-likes yi[i][j] is the j-th derivative known at xi[i] - orders : None or int or array_like of ints. Default: None. + order: None or int or array_like of ints. Default: None. Specifies the degree of local polynomials. If not None, some derivatives are ignored. der : int or list @@ -344,8 +344,7 @@ def _from_derivatives(xi, yi, x, order=None, der=0, extrapolate=False): Returns ------- y : scalar or array_like - The result, of length R or length M or M by R, - + The result, of length R or length M or M by R. """ import scipy from scipy import interpolate @@ -418,8 +417,9 @@ def _akima_interpolate(xi, yi, x, der=0, axis=0): def interpolate_2d(values, method='pad', axis=0, limit=None, fill_value=None, dtype=None): - """ perform an actual interpolation of values, values will be make 2-d if - needed fills inplace, returns the result + """ + Perform an actual interpolation of values, values will be make 2-d if + needed fills inplace, returns the result. """ transf = (lambda x: x) if axis == 0 else (lambda x: x.T) @@ -533,13 +533,13 @@ def clean_reindex_fill_method(method): def fill_zeros(result, x, y, name, fill): """ - if this is a reversed op, then flip x,y + If this is a reversed op, then flip x,y - if we have an integer value (or array in y) + If we have an integer value (or array in y) and we have 0's, fill them with the fill, - return the result + return the result. - mask the nan's from x + Mask the nan's from x. """ if fill is None or is_float_dtype(result): return result diff --git a/pandas/core/nanops.py b/pandas/core/nanops.py index cafd3a9915fa0..86c3c380636c9 100644 --- a/pandas/core/nanops.py +++ b/pandas/core/nanops.py @@ -14,7 +14,8 @@ _get_dtype, is_any_int_dtype, is_bool_dtype, is_complex, is_complex_dtype, is_datetime64_dtype, is_datetime64tz_dtype, is_datetime_or_timedelta_dtype, is_float, is_float_dtype, is_integer, is_integer_dtype, is_numeric_dtype, - is_object_dtype, is_scalar, is_timedelta64_dtype) + is_object_dtype, is_scalar, is_timedelta64_dtype, pandas_dtype) +from pandas.core.dtypes.dtypes import DatetimeTZDtype from pandas.core.dtypes.missing import isna, na_value_for_dtype, notna import pandas.core.common as com @@ -57,7 +58,7 @@ class disallow(object): def __init__(self, *dtypes): super(disallow, self).__init__() - self.dtypes = tuple(np.dtype(dtype).type for dtype in dtypes) + self.dtypes = tuple(pandas_dtype(dtype).type for dtype in dtypes) def check(self, obj): return hasattr(obj, 'dtype') and issubclass(obj.dtype.type, @@ -437,6 +438,7 @@ def nansum(values, axis=None, skipna=True, min_count=0, mask=None): return _wrap_results(the_sum, dtype) +@disallow('M8', DatetimeTZDtype) @bottleneck_switch() def nanmean(values, axis=None, skipna=True, mask=None): """ diff --git a/pandas/core/ops.py b/pandas/core/ops.py index 10cebc6f94b92..dbdabecafae3a 100644 --- a/pandas/core/ops.py +++ b/pandas/core/ops.py @@ -447,7 +447,7 @@ def _get_op_name(op, special): _op_descriptions[reverse_op]['reverse'] = key _flex_doc_SERIES = """ -{desc} of series and other, element-wise (binary operator `{op_name}`). +Return {desc} of series and other, element-wise (binary operator `{op_name}`). Equivalent to ``{equiv}``, but with support to substitute a fill_value for missing data in one of the inputs. @@ -459,14 +459,15 @@ def _get_op_name(op, special): Fill existing missing (NaN) values, and any new element needed for successful Series alignment, with this value before computation. If data in both corresponding Series locations is missing - the result will be missing + the result will be missing. level : int or name Broadcast across a level, matching Index values on the - passed MultiIndex level + passed MultiIndex level. Returns ------- -result : Series +Series + The result of the operation. See Also -------- @@ -495,6 +496,27 @@ def _get_op_name(op, special): d 1.0 e NaN dtype: float64 +>>> a.subtract(b, fill_value=0) +a 0.0 +b 1.0 +c 1.0 +d -1.0 +e NaN +dtype: float64 +>>> a.multiply(b) +a 1.0 +b NaN +c NaN +d NaN +e NaN +dtype: float64 +>>> a.divide(b, fill_value=0) +a 1.0 +b inf +c inf +d 0.0 +e NaN +dtype: float64 """ _arith_doc_FRAME = """ @@ -525,7 +547,7 @@ def _get_op_name(op, special): """ _flex_doc_FRAME = """ -{desc} of dataframe and other, element-wise (binary operator `{op_name}`). +Get {desc} of dataframe and other, element-wise (binary operator `{op_name}`). Equivalent to ``{equiv}``, but with support to substitute a fill_value for missing data in one of the inputs. With reverse version, `{reverse}`. @@ -679,7 +701,7 @@ def _get_op_name(op, special): """ _flex_comp_doc_FRAME = """ -{desc} of dataframe and other, element-wise (binary operator `{op_name}`). +Get {desc} of dataframe and other, element-wise (binary operator `{op_name}`). Among flexible wrappers (`eq`, `ne`, `le`, `lt`, `ge`, `gt`) to comparison operators. @@ -825,7 +847,7 @@ def _get_op_name(op, special): """ _flex_doc_PANEL = """ -{desc} of series and other, element-wise (binary operator `{op_name}`). +Return {desc} of series and other, element-wise (binary operator `{op_name}`). Equivalent to ``{equiv}``. Parameters diff --git a/pandas/core/panel.py b/pandas/core/panel.py index 540192d1a592c..16bcc17a6b4ea 100644 --- a/pandas/core/panel.py +++ b/pandas/core/panel.py @@ -4,12 +4,13 @@ # pylint: disable=E1103,W0231,W0212,W0621 from __future__ import division +from collections import OrderedDict import warnings import numpy as np import pandas.compat as compat -from pandas.compat import OrderedDict, map, range, u, zip +from pandas.compat import map, range, u, zip from pandas.compat.numpy import function as nv from pandas.util._decorators import Appender, Substitution, deprecate_kwarg from pandas.util._validators import validate_axis_style_args @@ -802,7 +803,7 @@ def major_xs(self, key): Returns ------- y : DataFrame - index -> minor axis, columns -> items + Index -> minor axis, columns -> items Notes ----- @@ -826,7 +827,7 @@ def minor_xs(self, key): Returns ------- y : DataFrame - index -> major axis, columns -> items + Index -> major axis, columns -> items Notes ----- @@ -917,9 +918,7 @@ def groupby(self, function, axis='major'): ------- grouped : PanelGroupBy """ - from pandas.core.groupby import PanelGroupBy - axis = self._get_axis_number(axis) - return PanelGroupBy(self, function, axis=axis) + raise NotImplementedError("Panel is removed in pandas 0.25.0") def to_frame(self, filter_observations=True): """ @@ -999,7 +998,7 @@ def construct_index_parts(idx, major=True): def apply(self, func, axis='major', **kwargs): """ - Applies function along axis (or axes) of the Panel. + Apply function along axis (or axes) of the Panel. Parameters ---------- diff --git a/pandas/core/resample.py b/pandas/core/resample.py index 6822225273906..ff4dd7da15bd1 100644 --- a/pandas/core/resample.py +++ b/pandas/core/resample.py @@ -20,7 +20,7 @@ import pandas.core.algorithms as algos from pandas.core.generic import _shared_docs from pandas.core.groupby.base import GroupByMixin -from pandas.core.groupby.generic import PanelGroupBy, SeriesGroupBy +from pandas.core.groupby.generic import SeriesGroupBy from pandas.core.groupby.groupby import ( GroupBy, _GroupBy, _pipe_template, groupby) from pandas.core.groupby.grouper import Grouper @@ -30,14 +30,12 @@ from pandas.core.indexes.timedeltas import TimedeltaIndex, timedelta_range from pandas.tseries.frequencies import to_offset -from pandas.tseries.offsets import ( - DateOffset, Day, Nano, Tick, delta_to_nanoseconds) +from pandas.tseries.offsets import DateOffset, Day, Nano, Tick _shared_docs_kwargs = dict() class Resampler(_GroupBy): - """ Class for resampling datetimelike data, a groupby-like operation. See aggregate, transform, and apply functions on this object. @@ -85,9 +83,9 @@ def __unicode__(self): """ Provide a nice str repr of our rolling object. """ - attrs = ["{k}={v}".format(k=k, v=getattr(self.groupby, k)) + attrs = ("{k}={v}".format(k=k, v=getattr(self.groupby, k)) for k in self._attributes if - getattr(self.groupby, k, None) is not None] + getattr(self.groupby, k, None) is not None) return "{klass} [{attrs}]".format(klass=self.__class__.__name__, attrs=', '.join(attrs)) @@ -108,7 +106,7 @@ def __iter__(self): Returns ------- Generator yielding sequence of (name, subsetted object) - for each group + for each group. See Also -------- @@ -215,9 +213,9 @@ def pipe(self, func, *args, **kwargs): _agg_see_also_doc = dedent(""" See Also -------- - pandas.DataFrame.groupby.aggregate - pandas.DataFrame.resample.transform - pandas.DataFrame.aggregate + DataFrame.groupby.aggregate + DataFrame.resample.transform + DataFrame.aggregate """) _agg_examples_doc = dedent(""" @@ -287,8 +285,8 @@ def transform(self, arg, *args, **kwargs): Parameters ---------- - func : function - To apply to each group. Should return a Series with the same index + arg : function + To apply to each group. Should return a Series with the same index. Returns ------- @@ -342,15 +340,10 @@ def _groupby_and_aggregate(self, how, grouper=None, *args, **kwargs): obj = self._selected_obj - try: - grouped = groupby(obj, by=None, grouper=grouper, axis=self.axis) - except TypeError: - - # panel grouper - grouped = PanelGroupBy(obj, grouper=grouper, axis=self.axis) + grouped = groupby(obj, by=None, grouper=grouper, axis=self.axis) try: - if isinstance(obj, ABCDataFrame) and compat.callable(how): + if isinstance(obj, ABCDataFrame) and callable(how): # Check if the function is reducing or not. result = grouped._aggregate_item_by_item(how, *args, **kwargs) else: @@ -424,7 +417,7 @@ def pad(self, limit=None): Returns ------- - an upsampled Series + An upsampled Series. See Also -------- @@ -524,9 +517,9 @@ def backfill(self, limit=None): 'backfill'. nearest : Fill NaN values with nearest neighbor starting from center. pad : Forward fill NaN values. - pandas.Series.fillna : Fill NaN values in the Series using the + Series.fillna : Fill NaN values in the Series using the specified method, which can be 'backfill'. - pandas.DataFrame.fillna : Fill NaN values in the DataFrame using the + DataFrame.fillna : Fill NaN values in the DataFrame using the specified method, which can be 'backfill'. References @@ -637,9 +630,9 @@ def fillna(self, method, limit=None): nearest : Fill NaN values in the resampled data with nearest neighbor starting from center. interpolate : Fill NaN values using interpolation. - pandas.Series.fillna : Fill NaN values in the Series using the + Series.fillna : Fill NaN values in the Series using the specified method, which can be 'bfill' and 'ffill'. - pandas.DataFrame.fillna : Fill NaN values in the DataFrame using the + DataFrame.fillna : Fill NaN values in the DataFrame using the specified method, which can be 'bfill' and 'ffill'. References @@ -1613,20 +1606,20 @@ def _get_timestamp_range_edges(first, last, offset, closed='left', base=0): A tuple of length 2, containing the adjusted pd.Timestamp objects. """ if isinstance(offset, Tick): - is_day = isinstance(offset, Day) - day_nanos = delta_to_nanoseconds(timedelta(1)) - - # #1165 and #24127 - if (is_day and not offset.nanos % day_nanos) or not is_day: - first, last = _adjust_dates_anchored(first, last, offset, - closed=closed, base=base) - if is_day and first.tz is not None: - # _adjust_dates_anchored assumes 'D' means 24H, but first/last - # might contain a DST transition (23H, 24H, or 25H). - # Ensure first/last snap to midnight. - first = first.normalize() - last = last.normalize() - return first, last + if isinstance(offset, Day): + # _adjust_dates_anchored assumes 'D' means 24H, but first/last + # might contain a DST transition (23H, 24H, or 25H). + # So "pretend" the dates are naive when adjusting the endpoints + tz = first.tz + first = first.tz_localize(None) + last = last.tz_localize(None) + + first, last = _adjust_dates_anchored(first, last, offset, + closed=closed, base=base) + if isinstance(offset, Day): + first = first.tz_localize(tz) + last = last.tz_localize(tz) + return first, last else: first = first.normalize() diff --git a/pandas/core/reshape/merge.py b/pandas/core/reshape/merge.py index 0a51f2ee0dce7..ad3327e694b67 100644 --- a/pandas/core/reshape/merge.py +++ b/pandas/core/reshape/merge.py @@ -159,9 +159,15 @@ def merge_ordered(left, right, on=None, left DataFrame fill_method : {'ffill', None}, default None Interpolation method for data - suffixes : 2-length sequence (tuple, list, ...) - Suffix to apply to overlapping column names in the left and right - side, respectively + suffixes : Sequence, default is ("_x", "_y") + A length-2 sequence where each element is optionally a string + indicating the suffix to add to overlapping column names in + `left` and `right` respectively. Pass a value of `None` instead + of a string to indicate that the column name from `left` or + `right` should be left as-is, with no suffix. At least one of the + values must not be None. + + .. versionchanged:: 0.25.0 how : {'left', 'right', 'outer', 'inner'}, default 'outer' * left: use only keys from left frame (SQL: left outer join) * right: use only keys from right frame (SQL: right outer join) @@ -760,6 +766,7 @@ def _get_join_info(self): join_index = self._create_join_index(self.left.index, self.right.index, left_indexer, + right_indexer, how='right') else: join_index = self.right.index.take(right_indexer) @@ -769,6 +776,7 @@ def _get_join_info(self): join_index = self._create_join_index(self.right.index, self.left.index, right_indexer, + left_indexer, how='left') else: join_index = self.left.index.take(left_indexer) @@ -780,7 +788,8 @@ def _get_join_info(self): join_index = join_index.astype(object) return join_index, left_indexer, right_indexer - def _create_join_index(self, index, other_index, indexer, how='left'): + def _create_join_index(self, index, other_index, indexer, + other_indexer, how='left'): """ Create a join index by rearranging one index to match another @@ -806,7 +815,8 @@ def _create_join_index(self, index, other_index, indexer, how='left'): # if values missing (-1) from target index, # take from other_index instead join_list = join_index.to_numpy() - join_list[mask] = other_index.to_numpy()[mask] + other_list = other_index.take(other_indexer).to_numpy() + join_list[mask] = other_list[mask] join_index = Index(join_list, dtype=join_index.dtype, name=join_index.name) return join_index diff --git a/pandas/core/reshape/pivot.py b/pandas/core/reshape/pivot.py index c7c447d18b6b1..54f11646fc753 100644 --- a/pandas/core/reshape/pivot.py +++ b/pandas/core/reshape/pivot.py @@ -88,9 +88,9 @@ def pivot_table(data, values=None, index=None, columns=None, aggfunc='mean', # the original values are ints # as we grouped with a NaN value # and then dropped, coercing to floats - for v in [v for v in values if v in data and v in agged]: - if (is_integer_dtype(data[v]) and - not is_integer_dtype(agged[v])): + for v in values: + if (v in data and is_integer_dtype(data[v]) and + v in agged and not is_integer_dtype(agged[v])): agged[v] = maybe_downcast_to_dtype(agged[v], data[v].dtype) table = agged diff --git a/pandas/core/reshape/tile.py b/pandas/core/reshape/tile.py index c107ed51226b0..2a654fec36a9f 100644 --- a/pandas/core/reshape/tile.py +++ b/pandas/core/reshape/tile.py @@ -35,7 +35,7 @@ def cut(x, bins, right=True, labels=None, retbins=False, precision=3, ---------- x : array-like The input array to be binned. Must be 1-dimensional. - bins : int, sequence of scalars, or pandas.IntervalIndex + bins : int, sequence of scalars, or IntervalIndex The criteria to bin by. * int : Defines the number of equal-width bins in the range of `x`. The @@ -70,16 +70,16 @@ def cut(x, bins, right=True, labels=None, retbins=False, precision=3, Returns ------- - out : pandas.Categorical, Series, or ndarray + out : Categorical, Series, or ndarray An array-like object representing the respective bin for each value of `x`. The type depends on the value of `labels`. * True (default) : returns a Series for Series `x` or a - pandas.Categorical for all other inputs. The values stored within + Categorical for all other inputs. The values stored within are Interval dtype. * sequence of scalars : returns a Series for Series `x` or a - pandas.Categorical for all other inputs. The values stored within + Categorical for all other inputs. The values stored within are whatever the type in the sequence is. * False : returns an ndarray of integers. @@ -94,16 +94,15 @@ def cut(x, bins, right=True, labels=None, retbins=False, precision=3, -------- qcut : Discretize variable into equal-sized buckets based on rank or based on sample quantiles. - pandas.Categorical : Array type for storing data that come from a + Categorical : Array type for storing data that come from a fixed set of values. Series : One-dimensional array with axis labels (including time series). - pandas.IntervalIndex : Immutable Index implementing an ordered, - sliceable set. + IntervalIndex : Immutable Index implementing an ordered, sliceable set. Notes ----- Any NA values will be NA in the result. Out of bounds values will be NA in - the resulting Series or pandas.Categorical object. + the resulting Series or Categorical object. Examples -------- @@ -373,14 +372,6 @@ def _bins_to_cuts(x, bins, right=True, labels=None, return result, bins -def _trim_zeros(x): - while len(x) > 1 and x[-1] == '0': - x = x[:-1] - if len(x) > 1 and x[-1] == '.': - x = x[:-1] - return x - - def _coerce_to_type(x): """ if the passed data is of datetime/timedelta type, diff --git a/pandas/core/reshape/util.py b/pandas/core/reshape/util.py index 7f43a0e9719b8..9d4135a7f310e 100644 --- a/pandas/core/reshape/util.py +++ b/pandas/core/reshape/util.py @@ -1,7 +1,5 @@ import numpy as np -from pandas.compat import reduce - from pandas.core.dtypes.common import is_list_like from pandas.core import common as com @@ -57,14 +55,3 @@ def cartesian_product(X): return [np.tile(np.repeat(np.asarray(com.values_from_object(x)), b[i]), np.product(a[i])) for i, x in enumerate(X)] - - -def _compose2(f, g): - """Compose 2 callables""" - return lambda *args, **kwargs: f(g(*args, **kwargs)) - - -def compose(*funcs): - """Compose 2 or more callables""" - assert len(funcs) > 1, 'At least 2 callables must be passed to compose' - return reduce(_compose2, funcs) diff --git a/pandas/core/series.py b/pandas/core/series.py index 0c8e697c572e8..b2011fdcdee98 100644 --- a/pandas/core/series.py +++ b/pandas/core/series.py @@ -3,6 +3,7 @@ """ from __future__ import division +from collections import OrderedDict from textwrap import dedent import warnings @@ -10,7 +11,7 @@ from pandas._libs import iNaT, index as libindex, lib, tslibs import pandas.compat as compat -from pandas.compat import PY36, OrderedDict, StringIO, u, zip +from pandas.compat import PY36, StringIO, u, zip from pandas.compat.numpy import function as nv from pandas.util._decorators import Appender, Substitution, deprecate from pandas.util._validators import validate_bool_kwarg @@ -129,7 +130,7 @@ class Series(base.IndexOpsMixin, generic.NDFrame): sequence are used, the index will override the keys found in the dict. dtype : str, numpy.dtype, or ExtensionDtype, optional - dtype for the output Series. If not specified, this will be + Data type for the output Series. If not specified, this will be inferred from `data`. See the :ref:`user guide ` for more usages. copy : bool, default False @@ -444,7 +445,7 @@ def values(self): Returns ------- - arr : numpy.ndarray or ndarray-like + numpy.ndarray or ndarray-like See Also -------- @@ -513,6 +514,11 @@ def ravel(self, order='C'): """ Return the flattened underlying data as an ndarray. + Returns + ------- + numpy.ndarray or ndarray-like + Flattened data of the Series. + See Also -------- numpy.ndarray.ravel @@ -580,7 +586,7 @@ def nonzero(self): def put(self, *args, **kwargs): """ - Applies the `put` method to its `values` attribute if it has one. + Apply the `put` method to its `values` attribute if it has one. See Also -------- @@ -687,7 +693,7 @@ def __array__(self, dtype=None): See Also -------- - pandas.array : Create a new array from data. + array : Create a new array from data. Series.array : Zero-copy view to the array backing the Series. Series.to_numpy : Series method for similar behavior. @@ -830,7 +836,7 @@ def _ixs(self, i, axis=0): Returns ------- - value : scalar (int) or Series (slice, sequence) + scalar (int) or Series (slice, sequence) """ try: @@ -1120,7 +1126,7 @@ def repeat(self, repeats, axis=None): Returns ------- - repeated_series : Series + Series Newly created Series with repeated elements. See Also @@ -1173,7 +1179,7 @@ def get_value(self, label, takeable=False): Returns ------- - value : scalar value + scalar value """ warnings.warn("get_value is deprecated and will be removed " "in a future release. Please use " @@ -1207,7 +1213,7 @@ def set_value(self, label, value, takeable=False): Returns ------- - series : Series + Series If label is contained, will be reference to calling Series, otherwise a new object """ @@ -1394,29 +1400,30 @@ def to_string(self, buf=None, na_rep='NaN', float_format=None, header=True, Parameters ---------- buf : StringIO-like, optional - buffer to write to - na_rep : string, optional - string representation of NAN to use, default 'NaN' + Buffer to write to. + na_rep : str, optional + String representation of NaN to use, default 'NaN'. float_format : one-parameter function, optional - formatter function to apply to columns' elements if they are floats - default None - header : boolean, default True - Add the Series header (index name) + Formatter function to apply to columns' elements if they are + floats, default None. + header : bool, default True + Add the Series header (index name). index : bool, optional - Add index (row) labels, default True - length : boolean, default False - Add the Series length - dtype : boolean, default False - Add the Series dtype - name : boolean, default False - Add the Series name if not None + Add index (row) labels, default True. + length : bool, default False + Add the Series length. + dtype : bool, default False + Add the Series dtype. + name : bool, default False + Add the Series name if not None. max_rows : int, optional Maximum number of rows to show before truncating. If None, show all. Returns ------- - formatted : string (if not buffer passed) + str or None + String representation of Series if ``buf=None``, otherwise None. """ formatter = fmt.SeriesFormatter(self, name=name, length=length, @@ -1456,7 +1463,7 @@ def iteritems(self): def keys(self): """ - Alias for index. + Return alias for index. """ return self.index @@ -1476,7 +1483,8 @@ def to_dict(self, into=dict): Returns ------- - value_dict : collections.Mapping + collections.Mapping + Key-value representation of Series. Examples -------- @@ -1488,7 +1496,7 @@ def to_dict(self, into=dict): OrderedDict([(0, 1), (1, 2), (2, 3), (3, 4)]) >>> dd = defaultdict(list) >>> s.to_dict(dd) - defaultdict(, {0: 1, 1: 2, 2: 3, 3: 4}) + defaultdict(, {0: 1, 1: 2, 2: 3, 3: 4}) """ # GH16122 into_c = com.standardize_mapping(into) @@ -1506,7 +1514,18 @@ def to_frame(self, name=None): Returns ------- - data_frame : DataFrame + DataFrame + DataFrame representation of Series. + + Examples + -------- + >>> s = pd.Series(["a", "b", "c"], + ... name="vals") + >>> s.to_frame() + vals + 0 a + 1 b + 2 c """ if name is None: df = self._constructor_expanddim(self) @@ -1521,12 +1540,14 @@ def to_sparse(self, kind='block', fill_value=None): Parameters ---------- - kind : {'block', 'integer'} + kind : {'block', 'integer'}, default 'block' fill_value : float, defaults to NaN (missing) + Value to use for filling NaN values. Returns ------- - sp : SparseSeries + SparseSeries + Sparse representation of the Series. """ # TODO: deprecate from pandas.core.sparse.series import SparseSeries @@ -1564,11 +1585,18 @@ def count(self, level=None): ---------- level : int or level name, default None If the axis is a MultiIndex (hierarchical), count along a - particular level, collapsing into a smaller Series + particular level, collapsing into a smaller Series. Returns ------- - nobs : int or Series (if level specified) + int or Series (if level specified) + Number of non-null values in the Series. + + Examples + -------- + >>> s = pd.Series([0.0, 1.0, np.nan]) + >>> s.count() + 2 """ if level is None: return notna(com.values_from_object(self)).sum() @@ -1597,14 +1625,15 @@ def mode(self, dropna=True): Parameters ---------- - dropna : boolean, default True + dropna : bool, default True Don't consider counts of NaN/NaT. .. versionadded:: 0.24.0 Returns ------- - modes : Series (sorted) + Series + Modes of the Series in sorted order. """ # TODO: Add option for bins like value_counts() return algorithms.mode(self, dropna=dropna) @@ -1677,12 +1706,13 @@ def drop_duplicates(self, keep='first', inplace=False): - 'first' : Drop duplicates except for the first occurrence. - 'last' : Drop duplicates except for the last occurrence. - ``False`` : Drop all duplicates. - inplace : boolean, default ``False`` + inplace : bool, default ``False`` If ``True``, performs operation inplace and returns None. Returns ------- - deduplicated : Series + Series + Series with duplicates dropped. See Also -------- @@ -1759,7 +1789,9 @@ def duplicated(self, keep='first'): Returns ------- - pandas.core.series.Series + Series + Series indicating whether each value has occurred in the + preceding values. See Also -------- @@ -1823,7 +1855,7 @@ def idxmin(self, axis=0, skipna=True, *args, **kwargs): Parameters ---------- - skipna : boolean, default True + skipna : bool, default True Exclude NA/null values. If the entire Series is NA, the result will be NA. axis : int, default 0 @@ -1835,7 +1867,8 @@ def idxmin(self, axis=0, skipna=True, *args, **kwargs): Returns ------- - idxmin : Index of minimum of values. + Index + Label of the minimum value. Raises ------ @@ -1860,7 +1893,7 @@ def idxmin(self, axis=0, skipna=True, *args, **kwargs): Examples -------- >>> s = pd.Series(data=[1, None, 4, 1], - ... index=['A' ,'B' ,'C' ,'D']) + ... index=['A', 'B', 'C', 'D']) >>> s A 1.0 B NaN @@ -1892,7 +1925,7 @@ def idxmax(self, axis=0, skipna=True, *args, **kwargs): Parameters ---------- - skipna : boolean, default True + skipna : bool, default True Exclude NA/null values. If the entire Series is NA, the result will be NA. axis : int, default 0 @@ -1904,7 +1937,8 @@ def idxmax(self, axis=0, skipna=True, *args, **kwargs): Returns ------- - idxmax : Index of maximum of values. + Index + Label of the maximum value. Raises ------ @@ -1988,12 +2022,22 @@ def round(self, decimals=0, *args, **kwargs): Returns ------- - Series object + Series + Rounded values of the Series. See Also -------- - numpy.around - DataFrame.round + numpy.around : Round values of an np.array. + DataFrame.round : Round values of a DataFrame. + + Examples + -------- + >>> s = pd.Series([0.1, 1.3, 2.7]) + >>> s.round() + 0 0.0 + 1 1.0 + 2 3.0 + dtype: float64 """ nv.validate_round(args, kwargs) result = com.values_from_object(self).round(decimals) @@ -2008,7 +2052,7 @@ def quantile(self, q=0.5, interpolation='linear'): Parameters ---------- q : float or array-like, default 0.5 (50% quantile) - 0 <= q <= 1, the quantile(s) to compute + 0 <= q <= 1, the quantile(s) to compute. interpolation : {'linear', 'lower', 'higher', 'midpoint', 'nearest'} .. versionadded:: 0.18.0 @@ -2024,9 +2068,10 @@ def quantile(self, q=0.5, interpolation='linear'): Returns ------- - quantile : float or Series - if ``q`` is an array, a Series will be returned where the - index is ``q`` and the values are the quantiles. + float or Series + If ``q`` is an array, a Series will be returned where the + index is ``q`` and the values are the quantiles, otherwise + a float will be returned. See Also -------- @@ -2072,6 +2117,7 @@ def corr(self, other, method='pearson', min_periods=None): Parameters ---------- other : Series + Series with which to compute the correlation. method : {'pearson', 'kendall', 'spearman'} or callable * pearson : standard correlation coefficient * kendall : Kendall Tau correlation coefficient @@ -2081,16 +2127,18 @@ def corr(self, other, method='pearson', min_periods=None): .. versionadded:: 0.24.0 min_periods : int, optional - Minimum number of observations needed to have a valid result + Minimum number of observations needed to have a valid result. Returns ------- - correlation : float + float + Correlation with other. Examples -------- - >>> histogram_intersection = lambda a, b: np.minimum(a, b - ... ).sum().round(decimals=1) + >>> def histogram_intersection(a, b): + ... v = np.minimum(a, b).sum().round(decimals=1) + ... return v >>> s1 = pd.Series([.2, .0, .6, .2]) >>> s2 = pd.Series([.3, .6, .0, .1]) >>> s1.corr(s2, method=histogram_intersection) @@ -2115,14 +2163,22 @@ def cov(self, other, min_periods=None): Parameters ---------- other : Series + Series with which to compute the covariance. min_periods : int, optional - Minimum number of observations needed to have a valid result + Minimum number of observations needed to have a valid result. Returns ------- - covariance : float + float + Covariance between Series and other normalized by N-1 + (unbiased estimator). - Normalized by N-1 (unbiased estimator). + Examples + -------- + >>> s1 = pd.Series([0.90010907, 0.13484424, 0.62036035]) + >>> s2 = pd.Series([0.12528585, 0.26962463, 0.51111198]) + >>> s1.cov(s2) + -0.01685762652715874 """ this, other = self.align(other, join='inner', copy=False) if len(this) == 0: @@ -2145,7 +2201,8 @@ def diff(self, periods=1): Returns ------- - diffed : Series + Series + First differences of the Series. See Also -------- @@ -2279,7 +2336,7 @@ def dot(self, other): 8 >>> s @ other 8 - >>> df = pd.DataFrame([[0 ,1], [-2, 3], [4, -5], [6, 7]]) + >>> df = pd.DataFrame([[0, 1], [-2, 3], [4, -5], [6, 7]]) >>> s.dot(df) 0 24 1 14 @@ -2348,17 +2405,19 @@ def append(self, to_append, ignore_index=False, verify_integrity=False): Parameters ---------- to_append : Series or list/tuple of Series - ignore_index : boolean, default False + Series to append with self. + ignore_index : bool, default False If True, do not use the index labels. .. versionadded:: 0.19.0 - verify_integrity : boolean, default False - If True, raise Exception on creating index with duplicates + verify_integrity : bool, default False + If True, raise Exception on creating index with duplicates. Returns ------- - appended : Series + Series + Concatenated Series. See Also -------- @@ -2376,7 +2435,7 @@ def append(self, to_append, ignore_index=False, verify_integrity=False): -------- >>> s1 = pd.Series([1, 2, 3]) >>> s2 = pd.Series([4, 5, 6]) - >>> s3 = pd.Series([4, 5, 6], index=[3,4,5]) + >>> s3 = pd.Series([4, 5, 6], index=[3, 4, 5]) >>> s1.append(s2) 0 1 1 2 @@ -2439,7 +2498,7 @@ def _binop(self, other, func, level=None, fill_value=None): Returns ------- - combined : Series + Series """ if not isinstance(other, Series): raise AssertionError('Other operand must be Series') @@ -2857,13 +2916,13 @@ def sort_index(self, axis=0, level=None, ascending=True, inplace=False, If 'first' puts NaNs at the beginning, 'last' puts NaNs at the end. Not implemented for MultiIndex. sort_remaining : bool, default True - If true and sorting by level and index is multilevel, sort by other + If True and sorting by level and index is multilevel, sort by other levels too (in order) after sorting by specified level. Returns ------- - pandas.Series - The original Series sorted by the labels + Series + The original Series sorted by the labels. See Also -------- @@ -2987,7 +3046,7 @@ def sort_index(self, axis=0, level=None, ascending=True, inplace=False, def argsort(self, axis=0, kind='quicksort', order=None): """ - Overrides ndarray.argsort. Argsorts the value, omitting NA/null values, + Override ndarray.argsort. Argsorts the value, omitting NA/null values, and places the result in the same locations as the non-NA values. Parameters @@ -3002,7 +3061,9 @@ def argsort(self, axis=0, kind='quicksort', order=None): Returns ------- - argsorted : Series, with -1 indicated where nan values are present + Series + Positions of values within the sort order with -1 indicating + nan values. See Also -------- @@ -3220,12 +3281,13 @@ def swaplevel(self, i=-2, j=-1, copy=True): Parameters ---------- - i, j : int, string (can be mixed) + i, j : int, str (can be mixed) Level of index to be swapped. Can pass level name as string. Returns ------- - swapped : Series + Series + Series with levels swapped in MultiIndex. .. versionchanged:: 0.18.1 @@ -3265,21 +3327,23 @@ def unstack(self, level=-1, fill_value=None): Parameters ---------- - level : int, string, or list of these, default last level - Level(s) to unstack, can pass level name - fill_value : replace NaN with this value if the unstack produces - missing values + level : int, str, or list of these, default last level + Level(s) to unstack, can pass level name. + fill_value : scalar value, default None + Value to use when replacing NaN values. .. versionadded:: 0.18.0 Returns ------- - unstacked : DataFrame + DataFrame + Unstacked Series. Examples -------- >>> s = pd.Series([1, 2, 3, 4], - ... index=pd.MultiIndex.from_product([['one', 'two'], ['a', 'b']])) + ... index=pd.MultiIndex.from_product([['one', 'two'], + ... ['a', 'b']])) >>> s one a 1 b 2 @@ -3679,7 +3743,7 @@ def rename(self, index=None, **kwargs): Scalar or hashable sequence-like will alter the ``Series.name`` attribute. copy : bool, default True - Also copy underlying data + Whether to copy underlying data. inplace : bool, default False Whether to return a new Series. If True then value of copy is ignored. @@ -3689,11 +3753,12 @@ def rename(self, index=None, **kwargs): Returns ------- - renamed : Series (new object) + Series + Series with index labels or name altered. See Also -------- - Series.rename_axis + Series.rename_axis : Set the name of the axis. Examples -------- @@ -3703,7 +3768,7 @@ def rename(self, index=None, **kwargs): 1 2 2 3 dtype: int64 - >>> s.rename("my_name") # scalar, changes Series.name + >>> s.rename("my_name") # scalar, changes Series.name 0 1 1 2 2 3 @@ -3762,7 +3827,8 @@ def drop(self, labels=None, axis=0, index=None, columns=None, Returns ------- - dropped : pandas.Series + Series + Series with specified index labels removed. Raises ------ @@ -3778,7 +3844,7 @@ def drop(self, labels=None, axis=0, index=None, columns=None, Examples -------- - >>> s = pd.Series(data=np.arange(3), index=['A','B','C']) + >>> s = pd.Series(data=np.arange(3), index=['A', 'B', 'C']) >>> s A 0 B 1 @@ -3787,7 +3853,7 @@ def drop(self, labels=None, axis=0, index=None, columns=None, Drop labels B en C - >>> s.drop(labels=['B','C']) + >>> s.drop(labels=['B', 'C']) A 0 dtype: int64 @@ -3960,7 +4026,8 @@ def isin(self, values): Returns ------- - isin : Series (bool dtype) + Series + Series of booleans indicating if each element is in values. Raises ------ @@ -4019,7 +4086,8 @@ def between(self, left, right, inclusive=True): Returns ------- Series - Each element will be a boolean. + Series representing whether each element is between left and + right (inclusive). See Also -------- @@ -4101,27 +4169,27 @@ def from_csv(cls, path, sep=',', parse_dates=True, header=None, Parameters ---------- - path : string file path or file handle / StringIO - sep : string, default ',' - Field delimiter - parse_dates : boolean, default True - Parse dates. Different default from read_table + path : str, file path, or file handle / StringIO + sep : str, default ',' + Field delimiter. + parse_dates : bool, default True + Parse dates. Different default from read_table. header : int, default None - Row to use as header (skip prior rows) + Row to use as header (skip prior rows). index_col : int or sequence, default 0 Column to use for index. If a sequence is given, a MultiIndex - is used. Different default from read_table - encoding : string, optional - a string representing the encoding to use if the contents are - non-ascii, for python versions prior to 3 - infer_datetime_format : boolean, default False + is used. Different default from read_table. + encoding : str, optional + A string representing the encoding to use if the contents are + non-ascii, for python versions prior to 3. + infer_datetime_format : bool, default False If True and `parse_dates` is True for a column, try to infer the datetime format based on the first datetime string. If the format can be inferred, there often will be a large parsing speed-up. Returns ------- - y : Series + Series See Also -------- @@ -4322,19 +4390,21 @@ def valid(self, inplace=False, **kwargs): def to_timestamp(self, freq=None, how='start', copy=True): """ - Cast to datetimeindex of timestamps, at *beginning* of period. + Cast to DatetimeIndex of Timestamps, at *beginning* of period. Parameters ---------- - freq : string, default frequency of PeriodIndex - Desired frequency + freq : str, default frequency of PeriodIndex + Desired frequency. how : {'s', 'e', 'start', 'end'} Convention for converting period to timestamp; start of period - vs. end + vs. end. + copy : bool, default True + Whether or not to return a copy. Returns ------- - ts : Series with DatetimeIndex + Series with DatetimeIndex """ new_values = self._values if copy: @@ -4351,11 +4421,15 @@ def to_period(self, freq=None, copy=True): Parameters ---------- - freq : string, default + freq : str, default None + Frequency associated with the PeriodIndex. + copy : bool, default True + Whether or not to return a copy. Returns ------- - ts : Series with PeriodIndex + Series + Series with index converted to PeriodIndex. """ new_values = self._values if copy: diff --git a/pandas/core/sparse/frame.py b/pandas/core/sparse/frame.py index 586193fe11850..e0af11d13774c 100644 --- a/pandas/core/sparse/frame.py +++ b/pandas/core/sparse/frame.py @@ -194,7 +194,9 @@ def sp_maker(x): return to_manager(sdict, columns, index) def _init_matrix(self, data, index, columns, dtype=None): - """ Init self from ndarray or list of lists """ + """ + Init self from ndarray or list of lists. + """ data = prep_ndarray(data, copy=False) index, columns = self._prep_index(data, index, columns) data = {idx: data[:, i] for i, idx in enumerate(columns)} @@ -202,7 +204,9 @@ def _init_matrix(self, data, index, columns, dtype=None): def _init_spmatrix(self, data, index, columns, dtype=None, fill_value=None): - """ Init self from scipy.sparse matrix """ + """ + Init self from scipy.sparse matrix. + """ index, columns = self._prep_index(data, index, columns) data = data.tocoo() N = len(index) @@ -302,7 +306,9 @@ def __getstate__(self): _default_kind=self._default_kind) def _unpickle_sparse_frame_compat(self, state): - """ original pickle format """ + """ + Original pickle format + """ series, cols, idx, fv, kind = state if not isinstance(cols, Index): # pragma: no cover @@ -338,7 +344,9 @@ def to_dense(self): return DataFrame(data, index=self.index, columns=self.columns) def _apply_columns(self, func): - """ get new SparseDataFrame applying func to each columns """ + """ + Get new SparseDataFrame applying func to each columns + """ new_data = {col: func(series) for col, series in compat.iteritems(self)} diff --git a/pandas/core/sparse/scipy_sparse.py b/pandas/core/sparse/scipy_sparse.py index 2d0ce2d5e5951..5a39a1529a33a 100644 --- a/pandas/core/sparse/scipy_sparse.py +++ b/pandas/core/sparse/scipy_sparse.py @@ -3,7 +3,9 @@ Currently only includes SparseSeries.to_coo helpers. """ -from pandas.compat import OrderedDict, lmap +from collections import OrderedDict + +from pandas.compat import lmap from pandas.core.index import Index, MultiIndex from pandas.core.series import Series @@ -90,7 +92,8 @@ def _get_index_subset_to_coord_dict(index, subset, sort_labels=False): def _sparse_series_to_coo(ss, row_levels=(0, ), column_levels=(1, ), sort_labels=False): - """ Convert a SparseSeries to a scipy.sparse.coo_matrix using index + """ + Convert a SparseSeries to a scipy.sparse.coo_matrix using index levels row_levels, column_levels as the row and column labels respectively. Returns the sparse_matrix, row and column labels. """ @@ -116,7 +119,8 @@ def _sparse_series_to_coo(ss, row_levels=(0, ), column_levels=(1, ), def _coo_to_sparse_series(A, dense_index=False): - """ Convert a scipy.sparse.coo_matrix to a SparseSeries. + """ + Convert a scipy.sparse.coo_matrix to a SparseSeries. Use the defaults given in the SparseSeries constructor. """ s = Series(A.data, MultiIndex.from_arrays((A.row, A.col))) diff --git a/pandas/core/strings.py b/pandas/core/strings.py index ca79dcd9408d8..183a91c952140 100644 --- a/pandas/core/strings.py +++ b/pandas/core/strings.py @@ -1872,7 +1872,7 @@ def _wrap_result(self, result, use_codes=True, if expand is None: # infer from ndim if expand is not specified - expand = False if result.ndim == 1 else True + expand = result.ndim != 1 elif expand is True and not isinstance(self._orig, Index): # required when expand=True is explicitly specified @@ -2869,7 +2869,7 @@ def rindex(self, sub, start=0, end=None): return self._wrap_result(result) _shared_docs['len'] = (""" - Computes the length of each element in the Series/Index. The element may be + Compute the length of each element in the Series/Index. The element may be a sequence (such as a string, tuple or list) or a collection (such as a dictionary). diff --git a/pandas/core/tools/datetimes.py b/pandas/core/tools/datetimes.py index e6478da400d76..3da349c570274 100644 --- a/pandas/core/tools/datetimes.py +++ b/pandas/core/tools/datetimes.py @@ -497,8 +497,8 @@ def to_datetime(arg, errors='raise', dayfirst=False, yearfirst=False, See Also -------- - pandas.DataFrame.astype : Cast argument to a specified dtype. - pandas.to_timedelta : Convert argument to timedelta. + DataFrame.astype : Cast argument to a specified dtype. + to_timedelta : Convert argument to timedelta. Examples -------- diff --git a/pandas/core/tools/numeric.py b/pandas/core/tools/numeric.py index 803723dab46ff..b8a7eb5b0c570 100644 --- a/pandas/core/tools/numeric.py +++ b/pandas/core/tools/numeric.py @@ -19,9 +19,17 @@ def to_numeric(arg, errors='raise', downcast=None): depending on the data supplied. Use the `downcast` parameter to obtain other dtypes. + Please note that precision loss may occur if really large numbers + are passed in. Due to the internal limitations of `ndarray`, if + numbers smaller than `-9223372036854775808` (np.iinfo(np.int64).min) + or larger than `18446744073709551615` (np.iinfo(np.uint64).max) are + passed in, it is very likely they will be converted to float so that + they can stored in an `ndarray`. These warnings apply similarly to + `Series` since it internally leverages `ndarray`. + Parameters ---------- - arg : list, tuple, 1-d array, or Series + arg : scalar, list, tuple, 1-d array, or Series errors : {'ignore', 'raise', 'coerce'}, default 'raise' - If 'raise', then invalid parsing will raise an exception - If 'coerce', then invalid parsing will be set as NaN @@ -55,9 +63,9 @@ def to_numeric(arg, errors='raise', downcast=None): See Also -------- - pandas.DataFrame.astype : Cast argument to a specified dtype. - pandas.to_datetime : Convert argument to datetime. - pandas.to_timedelta : Convert argument to timedelta. + DataFrame.astype : Cast argument to a specified dtype. + to_datetime : Convert argument to datetime. + to_timedelta : Convert argument to timedelta. numpy.ndarray.astype : Cast a numpy array to a specified type. Examples @@ -130,7 +138,7 @@ def to_numeric(arg, errors='raise', downcast=None): values = values.astype(np.int64) else: values = ensure_object(values) - coerce_numeric = False if errors in ('ignore', 'raise') else True + coerce_numeric = errors not in ('ignore', 'raise') values = lib.maybe_convert_numeric(values, set(), coerce_numeric=coerce_numeric) diff --git a/pandas/core/tools/timedeltas.py b/pandas/core/tools/timedeltas.py index e3428146b91d8..30cb15f311b9f 100644 --- a/pandas/core/tools/timedeltas.py +++ b/pandas/core/tools/timedeltas.py @@ -2,6 +2,8 @@ timedelta support tools """ +import warnings + import numpy as np from pandas._libs.tslibs.timedeltas import Timedelta, parse_timedelta_unit @@ -90,6 +92,11 @@ def to_timedelta(arg, unit='ns', box=True, errors='raise'): raise ValueError("errors must be one of 'ignore', " "'raise', or 'coerce'}") + if unit in {'Y', 'y', 'M'}: + warnings.warn("M and Y units are deprecated and " + "will be removed in a future version.", + FutureWarning, stacklevel=2) + if arg is None: return arg elif isinstance(arg, ABCSeries): @@ -120,7 +127,8 @@ def _coerce_scalar_to_timedelta_type(r, unit='ns', box=True, errors='raise'): try: result = Timedelta(r, unit) if not box: - result = result.asm8 + # explicitly view as timedelta64 for case when result is pd.NaT + result = result.asm8.view('timedelta64[ns]') except ValueError: if errors == 'raise': raise diff --git a/pandas/core/window.py b/pandas/core/window.py index 5a9157b43ecd6..fb37d790f950c 100644 --- a/pandas/core/window.py +++ b/pandas/core/window.py @@ -164,9 +164,9 @@ def __unicode__(self): Provide a nice str repr of our rolling object. """ - attrs = ["{k}={v}".format(k=k, v=getattr(self, k)) + attrs = ("{k}={v}".format(k=k, v=getattr(self, k)) for k in self._attributes - if getattr(self, k, None) is not None] + if getattr(self, k, None) is not None) return "{klass} [{attrs}]".format(klass=self._window_type, attrs=','.join(attrs)) @@ -438,7 +438,7 @@ def aggregate(self, arg, *args, **kwargs): class Window(_Window): """ - Provides rolling window calculations. + Provide rolling window calculations. .. versionadded:: 0.18.0 @@ -900,9 +900,9 @@ class _Rolling_and_Expanding(_Rolling): See Also -------- - pandas.Series.%(name)s : Calling object with Series data. - pandas.DataFrame.%(name)s : Calling object with DataFrames. - pandas.DataFrame.count : Count of the full DataFrame. + Series.%(name)s : Calling object with Series data. + DataFrame.%(name)s : Calling object with DataFrames. + DataFrame.count : Count of the full DataFrame. Examples -------- @@ -1322,9 +1322,9 @@ def kurt(self, **kwargs): See Also -------- - pandas.Series.quantile : Computes value at the given quantile over all data + Series.quantile : Computes value at the given quantile over all data in Series. - pandas.DataFrame.quantile : Computes values at the given quantile over + DataFrame.quantile : Computes values at the given quantile over requested axis in DataFrame. Examples @@ -1626,8 +1626,8 @@ def _validate_freq(self): _agg_see_also_doc = dedent(""" See Also -------- - pandas.Series.rolling - pandas.DataFrame.rolling + Series.rolling + DataFrame.rolling """) _agg_examples_doc = dedent(""" @@ -1803,7 +1803,7 @@ def corr(self, other=None, pairwise=None, **kwargs): class RollingGroupby(_GroupByMixin, Rolling): """ - Provides a rolling groupby implementation. + Provide a rolling groupby implementation. .. versionadded:: 0.18.1 @@ -1834,7 +1834,7 @@ def _validate_monotonic(self): class Expanding(_Rolling_and_Expanding): """ - Provides expanding transformations. + Provide expanding transformations. .. versionadded:: 0.18.0 @@ -1916,9 +1916,9 @@ def _get_window(self, other=None): _agg_see_also_doc = dedent(""" See Also -------- - pandas.DataFrame.expanding.aggregate - pandas.DataFrame.rolling.aggregate - pandas.DataFrame.aggregate + DataFrame.expanding.aggregate + DataFrame.rolling.aggregate + DataFrame.aggregate """) _agg_examples_doc = dedent(""" @@ -2076,7 +2076,7 @@ def corr(self, other=None, pairwise=None, **kwargs): class ExpandingGroupby(_GroupByMixin, Expanding): """ - Provides a expanding groupby implementation. + Provide a expanding groupby implementation. .. versionadded:: 0.18.1 @@ -2117,7 +2117,7 @@ def _constructor(self): class EWM(_Rolling): r""" - Provides exponential weighted functions. + Provide exponential weighted functions. .. versionadded:: 0.18.0 @@ -2125,16 +2125,17 @@ class EWM(_Rolling): ---------- com : float, optional Specify decay in terms of center of mass, - :math:`\alpha = 1 / (1 + com),\text{ for } com \geq 0` + :math:`\alpha = 1 / (1 + com),\text{ for } com \geq 0`. span : float, optional Specify decay in terms of span, - :math:`\alpha = 2 / (span + 1),\text{ for } span \geq 1` + :math:`\alpha = 2 / (span + 1),\text{ for } span \geq 1`. halflife : float, optional Specify decay in terms of half-life, - :math:`\alpha = 1 - exp(log(0.5) / halflife),\text{ for } halflife > 0` + :math:`\alpha = 1 - exp(log(0.5) / halflife),\text{ for } + halflife > 0`. alpha : float, optional Specify smoothing factor :math:`\alpha` directly, - :math:`0 < \alpha \leq 1` + :math:`0 < \alpha \leq 1`. .. versionadded:: 0.18.0 @@ -2143,14 +2144,19 @@ class EWM(_Rolling): (otherwise result is NA). adjust : bool, default True Divide by decaying adjustment factor in beginning periods to account - for imbalance in relative weightings (viewing EWMA as a moving average) + for imbalance in relative weightings + (viewing EWMA as a moving average). ignore_na : bool, default False Ignore missing values when calculating weights; - specify True to reproduce pre-0.15.0 behavior + specify True to reproduce pre-0.15.0 behavior. + axis : {0 or 'index', 1 or 'columns'}, default 0 + The axis to use. The value 0 identifies the rows, and 1 + identifies the columns. Returns ------- - a Window sub-classed for the particular operation + DataFrame + A Window sub-classed for the particular operation. See Also -------- @@ -2188,6 +2194,7 @@ class EWM(_Rolling): -------- >>> df = pd.DataFrame({'B': [0, 1, 2, np.nan, 4]}) + >>> df B 0 0.0 1 1.0 diff --git a/pandas/errors/__init__.py b/pandas/errors/__init__.py index eb6a4674a7497..c57d27ff03ac6 100644 --- a/pandas/errors/__init__.py +++ b/pandas/errors/__init__.py @@ -45,8 +45,8 @@ class DtypeWarning(Warning): See Also -------- - pandas.read_csv : Read CSV (comma-separated) file into a DataFrame. - pandas.read_table : Read general delimited file into a DataFrame. + read_csv : Read CSV (comma-separated) file into a DataFrame. + read_table : Read general delimited file into a DataFrame. Notes ----- diff --git a/pandas/io/clipboard/windows.py b/pandas/io/clipboard/windows.py index 3d979a61b5f2d..4f5275af693b7 100644 --- a/pandas/io/clipboard/windows.py +++ b/pandas/io/clipboard/windows.py @@ -29,6 +29,7 @@ def init_windows_clipboard(): HINSTANCE, HMENU, BOOL, UINT, HANDLE) windll = ctypes.windll + msvcrt = ctypes.CDLL('msvcrt') safeCreateWindowExA = CheckedCall(windll.user32.CreateWindowExA) safeCreateWindowExA.argtypes = [DWORD, LPCSTR, LPCSTR, DWORD, INT, INT, @@ -71,6 +72,10 @@ def init_windows_clipboard(): safeGlobalUnlock.argtypes = [HGLOBAL] safeGlobalUnlock.restype = BOOL + wcslen = CheckedCall(msvcrt.wcslen) + wcslen.argtypes = [c_wchar_p] + wcslen.restype = UINT + GMEM_MOVEABLE = 0x0002 CF_UNICODETEXT = 13 @@ -129,13 +134,13 @@ def copy_windows(text): # If the hMem parameter identifies a memory object, # the object must have been allocated using the # function with the GMEM_MOVEABLE flag. - count = len(text) + 1 + count = wcslen(text) + 1 handle = safeGlobalAlloc(GMEM_MOVEABLE, count * sizeof(c_wchar)) locked_handle = safeGlobalLock(handle) - ctypes.memmove(c_wchar_p(locked_handle), - c_wchar_p(text), count * sizeof(c_wchar)) + ctypes.memmove(c_wchar_p(locked_handle), c_wchar_p(text), + count * sizeof(c_wchar)) safeGlobalUnlock(handle) safeSetClipboardData(CF_UNICODETEXT, handle) diff --git a/pandas/io/excel.py b/pandas/io/excel.py index 3a7c39ec65309..9e5e9f4f0d4f6 100644 --- a/pandas/io/excel.py +++ b/pandas/io/excel.py @@ -5,6 +5,7 @@ # --------------------------------------------------------------------- # ExcelFile class import abc +from collections import OrderedDict from datetime import date, datetime, time, timedelta from distutils.version import LooseVersion from io import UnsupportedOperation @@ -17,7 +18,7 @@ import pandas._libs.json as json import pandas.compat as compat from pandas.compat import ( - OrderedDict, add_metaclass, lrange, map, range, string_types, u, zip) + add_metaclass, lrange, map, range, string_types, u, zip) from pandas.errors import EmptyDataError from pandas.util._decorators import Appender, deprecate_kwarg @@ -274,7 +275,7 @@ def register_writer(klass): """Adds engine to the excel writer registry. You must use this method to integrate with ``to_excel``. Also adds config options for any new ``supported_extensions`` defined on the writer.""" - if not compat.callable(klass): + if not callable(klass): raise ValueError("Can only register callables as engines") engine_name = klass.engine _writers[engine_name] = klass @@ -375,60 +376,25 @@ def read_excel(io, **kwds) -class _XlrdReader(object): - - def __init__(self, filepath_or_buffer): - """Reader using xlrd engine. - - Parameters - ---------- - filepath_or_buffer : string, path object or Workbook - Object to be parsed. - """ - err_msg = "Install xlrd >= 1.0.0 for Excel support" - - try: - import xlrd - except ImportError: - raise ImportError(err_msg) - else: - if xlrd.__VERSION__ < LooseVersion("1.0.0"): - raise ImportError(err_msg + - ". Current version " + xlrd.__VERSION__) +@add_metaclass(abc.ABCMeta) +class _BaseExcelReader(object): - # If filepath_or_buffer is a url, want to keep the data as bytes so - # can't pass to get_filepath_or_buffer() - if _is_url(filepath_or_buffer): - filepath_or_buffer = _urlopen(filepath_or_buffer) - elif not isinstance(filepath_or_buffer, (ExcelFile, xlrd.Book)): - filepath_or_buffer, _, _, _ = get_filepath_or_buffer( - filepath_or_buffer) + @property + @abc.abstractmethod + def sheet_names(self): + pass - if isinstance(filepath_or_buffer, xlrd.Book): - self.book = filepath_or_buffer - elif not isinstance(filepath_or_buffer, xlrd.Book) and hasattr( - filepath_or_buffer, "read"): - # N.B. xlrd.Book has a read attribute too - if hasattr(filepath_or_buffer, 'seek'): - try: - # GH 19779 - filepath_or_buffer.seek(0) - except UnsupportedOperation: - # HTTPResponse does not support seek() - # GH 20434 - pass + @abc.abstractmethod + def get_sheet_by_name(self, name): + pass - data = filepath_or_buffer.read() - self.book = xlrd.open_workbook(file_contents=data) - elif isinstance(filepath_or_buffer, compat.string_types): - self.book = xlrd.open_workbook(filepath_or_buffer) - else: - raise ValueError('Must explicitly set engine if not passing in' - ' buffer or path for io.') + @abc.abstractmethod + def get_sheet_by_index(self, index): + pass - @property - def sheet_names(self): - return self.book.sheet_names() + @abc.abstractmethod + def get_sheet_data(self, sheet, convert_float): + pass def parse(self, sheet_name=0, @@ -455,48 +421,6 @@ def parse(self, _validate_header_arg(header) - from xlrd import (xldate, XL_CELL_DATE, - XL_CELL_ERROR, XL_CELL_BOOLEAN, - XL_CELL_NUMBER) - - epoch1904 = self.book.datemode - - def _parse_cell(cell_contents, cell_typ): - """converts the contents of the cell into a pandas - appropriate object""" - - if cell_typ == XL_CELL_DATE: - - # Use the newer xlrd datetime handling. - try: - cell_contents = xldate.xldate_as_datetime( - cell_contents, epoch1904) - except OverflowError: - return cell_contents - - # Excel doesn't distinguish between dates and time, - # so we treat dates on the epoch as times only. - # Also, Excel supports 1900 and 1904 epochs. - year = (cell_contents.timetuple())[0:3] - if ((not epoch1904 and year == (1899, 12, 31)) or - (epoch1904 and year == (1904, 1, 1))): - cell_contents = time(cell_contents.hour, - cell_contents.minute, - cell_contents.second, - cell_contents.microsecond) - - elif cell_typ == XL_CELL_ERROR: - cell_contents = np.nan - elif cell_typ == XL_CELL_BOOLEAN: - cell_contents = bool(cell_contents) - elif convert_float and cell_typ == XL_CELL_NUMBER: - # GH5394 - Excel 'numbers' are always floats - # it's a minimal perf hit and less surprising - val = int(cell_contents) - if val == cell_contents: - cell_contents = val - return cell_contents - ret_dict = False # Keep sheetname to maintain backwards compatibility. @@ -504,7 +428,7 @@ def _parse_cell(cell_contents, cell_typ): sheets = sheet_name ret_dict = True elif sheet_name is None: - sheets = self.book.sheet_names() + sheets = self.sheet_names ret_dict = True else: sheets = [sheet_name] @@ -519,19 +443,13 @@ def _parse_cell(cell_contents, cell_typ): print("Reading sheet {sheet}".format(sheet=asheetname)) if isinstance(asheetname, compat.string_types): - sheet = self.book.sheet_by_name(asheetname) + sheet = self.get_sheet_by_name(asheetname) else: # assume an integer if not a string - sheet = self.book.sheet_by_index(asheetname) + sheet = self.get_sheet_by_index(asheetname) - data = [] + data = self.get_sheet_data(sheet, convert_float) usecols = _maybe_convert_usecols(usecols) - for i in range(sheet.nrows): - row = [_parse_cell(value, typ) - for value, typ in zip(sheet.row_values(i), - sheet.row_types(i))] - data.append(row) - if sheet.nrows == 0: output[asheetname] = DataFrame() continue @@ -620,6 +538,120 @@ def _parse_cell(cell_contents, cell_typ): return output[asheetname] +class _XlrdReader(_BaseExcelReader): + + def __init__(self, filepath_or_buffer): + """Reader using xlrd engine. + + Parameters + ---------- + filepath_or_buffer : string, path object or Workbook + Object to be parsed. + """ + err_msg = "Install xlrd >= 1.0.0 for Excel support" + + try: + import xlrd + except ImportError: + raise ImportError(err_msg) + else: + if xlrd.__VERSION__ < LooseVersion("1.0.0"): + raise ImportError(err_msg + + ". Current version " + xlrd.__VERSION__) + + # If filepath_or_buffer is a url, want to keep the data as bytes so + # can't pass to get_filepath_or_buffer() + if _is_url(filepath_or_buffer): + filepath_or_buffer = _urlopen(filepath_or_buffer) + elif not isinstance(filepath_or_buffer, (ExcelFile, xlrd.Book)): + filepath_or_buffer, _, _, _ = get_filepath_or_buffer( + filepath_or_buffer) + + if isinstance(filepath_or_buffer, xlrd.Book): + self.book = filepath_or_buffer + elif hasattr(filepath_or_buffer, "read"): + # N.B. xlrd.Book has a read attribute too + if hasattr(filepath_or_buffer, 'seek'): + try: + # GH 19779 + filepath_or_buffer.seek(0) + except UnsupportedOperation: + # HTTPResponse does not support seek() + # GH 20434 + pass + + data = filepath_or_buffer.read() + self.book = xlrd.open_workbook(file_contents=data) + elif isinstance(filepath_or_buffer, compat.string_types): + self.book = xlrd.open_workbook(filepath_or_buffer) + else: + raise ValueError('Must explicitly set engine if not passing in' + ' buffer or path for io.') + + @property + def sheet_names(self): + return self.book.sheet_names() + + def get_sheet_by_name(self, name): + return self.book.sheet_by_name(name) + + def get_sheet_by_index(self, index): + return self.book.sheet_by_index(index) + + def get_sheet_data(self, sheet, convert_float): + from xlrd import (xldate, XL_CELL_DATE, + XL_CELL_ERROR, XL_CELL_BOOLEAN, + XL_CELL_NUMBER) + + epoch1904 = self.book.datemode + + def _parse_cell(cell_contents, cell_typ): + """converts the contents of the cell into a pandas + appropriate object""" + + if cell_typ == XL_CELL_DATE: + + # Use the newer xlrd datetime handling. + try: + cell_contents = xldate.xldate_as_datetime( + cell_contents, epoch1904) + except OverflowError: + return cell_contents + + # Excel doesn't distinguish between dates and time, + # so we treat dates on the epoch as times only. + # Also, Excel supports 1900 and 1904 epochs. + year = (cell_contents.timetuple())[0:3] + if ((not epoch1904 and year == (1899, 12, 31)) or + (epoch1904 and year == (1904, 1, 1))): + cell_contents = time(cell_contents.hour, + cell_contents.minute, + cell_contents.second, + cell_contents.microsecond) + + elif cell_typ == XL_CELL_ERROR: + cell_contents = np.nan + elif cell_typ == XL_CELL_BOOLEAN: + cell_contents = bool(cell_contents) + elif convert_float and cell_typ == XL_CELL_NUMBER: + # GH5394 - Excel 'numbers' are always floats + # it's a minimal perf hit and less surprising + val = int(cell_contents) + if val == cell_contents: + cell_contents = val + return cell_contents + + data = [] + + for i in range(sheet.nrows): + row = [_parse_cell(value, typ) + for value, typ in zip(sheet.row_values(i), + sheet.row_types(i))] + data.append(row) + + return data + + class ExcelFile(object): """ Class for parsing tabular excel sheets into DataFrame objects. @@ -971,7 +1003,7 @@ class ExcelWriter(object): mode : {'w' or 'a'}, default 'w' File mode to use (write or append). - .. versionadded:: 0.24.0 + .. versionadded:: 0.24.0 Attributes ---------- diff --git a/pandas/io/formats/console.py b/pandas/io/formats/console.py index d5ef9f61bc132..ad63b3efdd832 100644 --- a/pandas/io/formats/console.py +++ b/pandas/io/formats/console.py @@ -108,44 +108,6 @@ def check_main(): return check_main() -def in_qtconsole(): - """ - check if we're inside an IPython qtconsole - - .. deprecated:: 0.14.1 - This is no longer needed, or working, in IPython 3 and above. - """ - try: - ip = get_ipython() # noqa - front_end = ( - ip.config.get('KernelApp', {}).get('parent_appname', "") or - ip.config.get('IPKernelApp', {}).get('parent_appname', "")) - if 'qtconsole' in front_end.lower(): - return True - except NameError: - return False - return False - - -def in_ipnb(): - """ - check if we're inside an IPython Notebook - - .. deprecated:: 0.14.1 - This is no longer needed, or working, in IPython 3 and above. - """ - try: - ip = get_ipython() # noqa - front_end = ( - ip.config.get('KernelApp', {}).get('parent_appname', "") or - ip.config.get('IPKernelApp', {}).get('parent_appname', "")) - if 'notebook' in front_end.lower(): - return True - except NameError: - return False - return False - - def in_ipython_frontend(): """ check if we're inside an an IPython zmq frontend diff --git a/pandas/io/formats/format.py b/pandas/io/formats/format.py index bdeed58d856cc..f68ef2cc39006 100644 --- a/pandas/io/formats/format.py +++ b/pandas/io/formats/format.py @@ -435,9 +435,6 @@ def _chk_truncate(self): """ from pandas.core.reshape.concat import concat - # Column of which first element is used to determine width of a dot col - self.tr_size_col = -1 - # Cut the data to the information actually printed max_cols = self.max_cols max_rows = self.max_rows @@ -556,10 +553,7 @@ def _to_str_columns(self): if truncate_h: col_num = self.tr_col_num - # infer from column header - col_width = self.adj.len(strcols[self.tr_size_col][0]) - strcols.insert(self.tr_col_num + 1, ['...'.center(col_width)] * - (len(str_index))) + strcols.insert(self.tr_col_num + 1, [' ...'] * (len(str_index))) if truncate_v: n_header_rows = len(str_index) - len(frame) row_num = self.tr_row_num @@ -577,8 +571,8 @@ def _to_str_columns(self): if ix == 0: dot_mode = 'left' elif is_dot_col: - cwidth = self.adj.len(strcols[self.tr_size_col][0]) - dot_mode = 'center' + cwidth = 4 + dot_mode = 'right' else: dot_mode = 'right' dot_str = self.adj.justify([my_str], cwidth, mode=dot_mode)[0] @@ -1066,19 +1060,26 @@ def get_result_as_array(self): def format_values_with(float_format): formatter = self._value_formatter(float_format, threshold) + # default formatter leaves a space to the left when formatting + # floats, must be consistent for left-justifying NaNs (GH #25061) + if self.justify == 'left': + na_rep = ' ' + self.na_rep + else: + na_rep = self.na_rep + # separate the wheat from the chaff values = self.values mask = isna(values) if hasattr(values, 'to_dense'): # sparse numpy ndarray values = values.to_dense() values = np.array(values, dtype='object') - values[mask] = self.na_rep + values[mask] = na_rep imask = (~mask).ravel() values.flat[imask] = np.array([formatter(val) for val in values.ravel()[imask]]) if self.fixed_width: - return _trim_zeros(values, self.na_rep) + return _trim_zeros(values, na_rep) return values @@ -1414,16 +1415,20 @@ def _trim_zeros(str_floats, na_rep='NaN'): """ trimmed = str_floats + def _is_number(x): + return (x != na_rep and not x.endswith('inf')) + def _cond(values): - non_na = [x for x in values if x != na_rep] - return (len(non_na) > 0 and all(x.endswith('0') for x in non_na) and - not (any(('e' in x) or ('E' in x) for x in non_na))) + finite = [x for x in values if _is_number(x)] + return (len(finite) > 0 and all(x.endswith('0') for x in finite) and + not (any(('e' in x) or ('E' in x) for x in finite))) while _cond(trimmed): - trimmed = [x[:-1] if x != na_rep else x for x in trimmed] + trimmed = [x[:-1] if _is_number(x) else x for x in trimmed] # leave one 0 after the decimal points if need be. - return [x + "0" if x.endswith('.') and x != na_rep else x for x in trimmed] + return [x + "0" if x.endswith('.') and _is_number(x) else x + for x in trimmed] def _has_names(index): diff --git a/pandas/io/formats/html.py b/pandas/io/formats/html.py index 722fcd6bb39af..66d13bf2668f9 100644 --- a/pandas/io/formats/html.py +++ b/pandas/io/formats/html.py @@ -5,14 +5,14 @@ from __future__ import print_function +from collections import OrderedDict from textwrap import dedent -from pandas.compat import OrderedDict, lzip, map, range, u, unichr, zip +from pandas.compat import lzip, map, range, u, unichr, zip from pandas.core.dtypes.generic import ABCMultiIndex from pandas import compat, option_context -import pandas.core.common as com from pandas.core.config import get_option from pandas.io.common import _is_url @@ -190,7 +190,7 @@ def _write_col_header(self, indent): if self.fmt.sparsify: # GH3547 - sentinel = com.sentinel_factory() + sentinel = object() else: sentinel = False levels = self.columns.format(sparsify=sentinel, adjoin=False, @@ -392,7 +392,7 @@ def _write_hierarchical_rows(self, fmt_values, indent): if self.fmt.sparsify: # GH3547 - sentinel = com.sentinel_factory() + sentinel = object() levels = frame.index.format(sparsify=sentinel, adjoin=False, names=False) diff --git a/pandas/io/formats/style.py b/pandas/io/formats/style.py index 598453eb92d25..d241528d9779b 100644 --- a/pandas/io/formats/style.py +++ b/pandas/io/formats/style.py @@ -81,7 +81,7 @@ class Styler(object): See Also -------- - pandas.DataFrame.style + DataFrame.style Notes ----- @@ -433,7 +433,7 @@ def render(self, **kwargs): Returns ------- rendered : str - the rendered HTML + The rendered HTML Notes ----- @@ -1223,7 +1223,7 @@ def from_custom_template(cls, searchpath, name): Returns ------- MyStyler : subclass of Styler - has the correct ``env`` and ``template`` class attributes set. + Has the correct ``env`` and ``template`` class attributes set. """ loader = ChoiceLoader([ FileSystemLoader(searchpath), @@ -1322,7 +1322,7 @@ def _get_level_lengths(index, hidden_elements=None): Result is a dictionary of (level, inital_position): span """ - sentinel = com.sentinel_factory() + sentinel = object() levels = index.format(sparsify=sentinel, adjoin=False, names=False) if hidden_elements is None: diff --git a/pandas/io/gbq.py b/pandas/io/gbq.py index 639b68d433ac6..a6cec7ea8fb16 100644 --- a/pandas/io/gbq.py +++ b/pandas/io/gbq.py @@ -127,7 +127,7 @@ def read_gbq(query, project_id=None, index_col=None, col_order=None, See Also -------- pandas_gbq.read_gbq : This function in the pandas-gbq library. - pandas.DataFrame.to_gbq : Write a DataFrame to Google BigQuery. + DataFrame.to_gbq : Write a DataFrame to Google BigQuery. """ pandas_gbq = _try_import() diff --git a/pandas/io/html.py b/pandas/io/html.py index 74934740a6957..347bb3eec54af 100644 --- a/pandas/io/html.py +++ b/pandas/io/html.py @@ -988,7 +988,7 @@ def read_html(io, match='.+', flavor=None, header=None, index_col=None, latest information on table attributes for the modern web. parse_dates : bool, optional - See :func:`~pandas.read_csv` for more details. + See :func:`~read_csv` for more details. tupleize_cols : bool, optional If ``False`` try to parse multiple header rows into a @@ -1043,7 +1043,7 @@ def read_html(io, match='.+', flavor=None, header=None, index_col=None, See Also -------- - pandas.read_csv + read_csv Notes ----- @@ -1066,7 +1066,7 @@ def read_html(io, match='.+', flavor=None, header=None, index_col=None, .. versionadded:: 0.21.0 - Similar to :func:`~pandas.read_csv` the `header` argument is applied + Similar to :func:`~read_csv` the `header` argument is applied **after** `skiprows` is applied. This function will *always* return a list of :class:`DataFrame` *or* diff --git a/pandas/io/json/table_schema.py b/pandas/io/json/table_schema.py index 2bd93b19d4225..971386c91944e 100644 --- a/pandas/io/json/table_schema.py +++ b/pandas/io/json/table_schema.py @@ -314,12 +314,13 @@ def parse_table_schema(json, precise_float): df = df.astype(dtypes) - df = df.set_index(table['schema']['primaryKey']) - if len(df.index.names) == 1: - if df.index.name == 'index': - df.index.name = None - else: - df.index.names = [None if x.startswith('level_') else x for x in - df.index.names] + if 'primaryKey' in table['schema']: + df = df.set_index(table['schema']['primaryKey']) + if len(df.index.names) == 1: + if df.index.name == 'index': + df.index.name = None + else: + df.index.names = [None if x.startswith('level_') else x for x in + df.index.names] return df diff --git a/pandas/io/msgpack/_packer.pyx b/pandas/io/msgpack/_packer.pyx index d67c632188e62..8e2d943d8ddb1 100644 --- a/pandas/io/msgpack/_packer.pyx +++ b/pandas/io/msgpack/_packer.pyx @@ -74,14 +74,15 @@ cdef class Packer(object): Use bin type introduced in msgpack spec 2.0 for bytes. It also enable str8 type for unicode. """ - cdef msgpack_packer pk - cdef object _default - cdef object _bencoding - cdef object _berrors - cdef char *encoding - cdef char *unicode_errors - cdef bint use_float - cdef bint autoreset + cdef: + msgpack_packer pk + object _default + object _bencoding + object _berrors + char *encoding + char *unicode_errors + bint use_float + bint autoreset def __cinit__(self): cdef int buf_size = 1024 * 1024 @@ -123,16 +124,17 @@ cdef class Packer(object): cdef int _pack(self, object o, int nest_limit=DEFAULT_RECURSE_LIMIT) except -1: - cdef long long llval - cdef unsigned long long ullval - cdef long longval - cdef float fval - cdef double dval - cdef char* rawval - cdef int ret - cdef dict d - cdef size_t L - cdef int default_used = 0 + cdef: + long long llval + unsigned long long ullval + long longval + float fval + double dval + char* rawval + int ret + dict d + size_t L + int default_used = 0 if nest_limit < 0: raise PackValueError("recursion limit exceeded.") diff --git a/pandas/io/msgpack/_unpacker.pyx b/pandas/io/msgpack/_unpacker.pyx index 0c50aa5e68103..9bbfe749ef9ba 100644 --- a/pandas/io/msgpack/_unpacker.pyx +++ b/pandas/io/msgpack/_unpacker.pyx @@ -120,14 +120,15 @@ def unpackb(object packed, object object_hook=None, object list_hook=None, See :class:`Unpacker` for options. """ - cdef unpack_context ctx - cdef size_t off = 0 - cdef int ret + cdef: + unpack_context ctx + size_t off = 0 + int ret - cdef char* buf - cdef Py_ssize_t buf_len - cdef char* cenc = NULL - cdef char* cerr = NULL + char* buf + Py_ssize_t buf_len + char* cenc = NULL + char* cerr = NULL PyObject_AsReadBuffer(packed, &buf, &buf_len) @@ -243,16 +244,17 @@ cdef class Unpacker(object): for o in unpacker: process(o) """ - cdef unpack_context ctx - cdef char* buf - cdef size_t buf_size, buf_head, buf_tail - cdef object file_like - cdef object file_like_read - cdef Py_ssize_t read_size - # To maintain refcnt. - cdef object object_hook, object_pairs_hook, list_hook, ext_hook - cdef object encoding, unicode_errors - cdef size_t max_buffer_size + cdef: + unpack_context ctx + char* buf + size_t buf_size, buf_head, buf_tail + object file_like + object file_like_read + Py_ssize_t read_size + # To maintain refcnt. + object object_hook, object_pairs_hook, list_hook, ext_hook + object encoding, unicode_errors + size_t max_buffer_size def __cinit__(self): self.buf = NULL @@ -270,8 +272,9 @@ cdef class Unpacker(object): Py_ssize_t max_array_len=2147483647, Py_ssize_t max_map_len=2147483647, Py_ssize_t max_ext_len=2147483647): - cdef char *cenc=NULL, - cdef char *cerr=NULL + cdef: + char *cenc=NULL, + char *cerr=NULL self.object_hook = object_hook self.object_pairs_hook = object_pairs_hook @@ -388,9 +391,10 @@ cdef class Unpacker(object): cdef object _unpack(self, execute_fn execute, object write_bytes, bint iter=0): - cdef int ret - cdef object obj - cdef size_t prev_head + cdef: + int ret + object obj + size_t prev_head if self.buf_head >= self.buf_tail and self.file_like is not None: self.read_from_file() diff --git a/pandas/io/packers.py b/pandas/io/packers.py index efe4e3a91c69c..588d63d73515f 100644 --- a/pandas/io/packers.py +++ b/pandas/io/packers.py @@ -219,7 +219,7 @@ def read(fh): finally: if fh is not None: fh.close() - elif hasattr(path_or_buf, 'read') and compat.callable(path_or_buf.read): + elif hasattr(path_or_buf, 'read') and callable(path_or_buf.read): # treat as a buffer like return read(path_or_buf) diff --git a/pandas/io/parsers.py b/pandas/io/parsers.py index b31d3f665f47f..4163a571df800 100755 --- a/pandas/io/parsers.py +++ b/pandas/io/parsers.py @@ -203,9 +203,14 @@ * dict, e.g. {{'foo' : [1, 3]}} -> parse columns 1, 3 as date and call result 'foo' - If a column or index contains an unparseable date, the entire column or - index will be returned unaltered as an object data type. For non-standard - datetime parsing, use ``pd.to_datetime`` after ``pd.read_csv`` + If a column or index cannot be represented as an array of datetimes, + say because of an unparseable value or a mixture of timezones, the column + or index will be returned unaltered as an object data type. For + non-standard datetime parsing, use ``pd.to_datetime`` after + ``pd.read_csv``. To parse an index or column with a mixture of timezones, + specify ``date_parser`` to be a partially-applied + :func:`pandas.to_datetime` with ``utc=True``. See + :ref:`io.csv.mixed_timezones` for more. Note: A fast-path exists for iso8601-formatted dates. infer_datetime_format : bool, default False diff --git a/pandas/io/pickle.py b/pandas/io/pickle.py index 789f55a62dc58..ab4a266853a78 100644 --- a/pandas/io/pickle.py +++ b/pandas/io/pickle.py @@ -1,8 +1,7 @@ """ pickle compat """ import warnings -import numpy as np -from numpy.lib.format import read_array, write_array +from numpy.lib.format import read_array from pandas.compat import PY3, BytesIO, cPickle as pkl, pickle_compat as pc @@ -76,6 +75,7 @@ def to_pickle(obj, path, compression='infer', protocol=pkl.HIGHEST_PROTOCOL): try: f.write(pkl.dumps(obj, protocol=protocol)) finally: + f.close() for _f in fh: _f.close() @@ -138,63 +138,32 @@ def read_pickle(path, compression='infer'): >>> os.remove("./dummy.pkl") """ path = _stringify_path(path) + f, fh = _get_handle(path, 'rb', compression=compression, is_text=False) + + # 1) try with cPickle + # 2) try with the compat pickle to handle subclass changes + # 3) pass encoding only if its not None as py2 doesn't handle the param - def read_wrapper(func): - # wrapper file handle open/close operation - f, fh = _get_handle(path, 'rb', - compression=compression, - is_text=False) - try: - return func(f) - finally: - for _f in fh: - _f.close() - - def try_read(path, encoding=None): - # try with cPickle - # try with current pickle, if we have a Type Error then - # try with the compat pickle to handle subclass changes - # pass encoding only if its not None as py2 doesn't handle - # the param - - # cpickle - # GH 6899 - try: - with warnings.catch_warnings(record=True): - # We want to silence any warnings about, e.g. moved modules. - warnings.simplefilter("ignore", Warning) - return read_wrapper(lambda f: pkl.load(f)) - except Exception: # noqa: E722 - # reg/patched pickle - # compat not used in pandas/compat/pickle_compat.py::load - # TODO: remove except block OR modify pc.load to use compat - try: - return read_wrapper( - lambda f: pc.load(f, encoding=encoding, compat=False)) - # compat pickle - except Exception: # noqa: E722 - return read_wrapper( - lambda f: pc.load(f, encoding=encoding, compat=True)) try: - return try_read(path) + with warnings.catch_warnings(record=True): + # We want to silence any warnings about, e.g. moved modules. + warnings.simplefilter("ignore", Warning) + return pkl.load(f) except Exception: # noqa: E722 - if PY3: - return try_read(path, encoding='latin1') - raise - + try: + return pc.load(f, encoding=None) + except Exception: # noqa: E722 + if PY3: + return pc.load(f, encoding='latin1') + raise + finally: + f.close() + for _f in fh: + _f.close() # compat with sparse pickle / unpickle -def _pickle_array(arr): - arr = arr.view(np.ndarray) - - buf = BytesIO() - write_array(buf, arr) - - return buf.getvalue() - - def _unpickle_array(bytes): arr = read_array(BytesIO(bytes)) diff --git a/pandas/io/pytables.py b/pandas/io/pytables.py index 4e103482f48a2..2ee8759b9bdd8 100644 --- a/pandas/io/pytables.py +++ b/pandas/io/pytables.py @@ -15,34 +15,29 @@ import numpy as np -from pandas._libs import algos, lib, writers as libwriters +from pandas._libs import lib, writers as libwriters from pandas._libs.tslibs import timezones from pandas.compat import PY3, filter, lrange, range, string_types from pandas.errors import PerformanceWarning from pandas.core.dtypes.common import ( - ensure_int64, ensure_object, ensure_platform_int, is_categorical_dtype, - is_datetime64_dtype, is_datetime64tz_dtype, is_list_like, - is_timedelta64_dtype) + ensure_object, is_categorical_dtype, is_datetime64_dtype, + is_datetime64tz_dtype, is_list_like, is_timedelta64_dtype) from pandas.core.dtypes.missing import array_equivalent from pandas import ( - DataFrame, DatetimeIndex, Index, Int64Index, MultiIndex, Panel, - PeriodIndex, Series, SparseDataFrame, SparseSeries, TimedeltaIndex, compat, - concat, isna, to_datetime) + DataFrame, DatetimeIndex, Index, Int64Index, MultiIndex, PeriodIndex, + Series, SparseDataFrame, SparseSeries, TimedeltaIndex, compat, concat, + isna, to_datetime) from pandas.core import config -from pandas.core.algorithms import match, unique -from pandas.core.arrays.categorical import ( - Categorical, _factorize_from_iterables) +from pandas.core.arrays.categorical import Categorical from pandas.core.arrays.sparse import BlockIndex, IntIndex from pandas.core.base import StringMixin import pandas.core.common as com from pandas.core.computation.pytables import Expr, maybe_expression from pandas.core.config import get_option from pandas.core.index import ensure_index -from pandas.core.internals import ( - BlockManager, _block2d_to_blocknd, _block_shape, _factor_indexer, - make_block) +from pandas.core.internals import BlockManager, _block_shape, make_block from pandas.io.common import _stringify_path from pandas.io.formats.printing import adjoin, pprint_thing @@ -175,7 +170,6 @@ class DuplicateWarning(Warning): SparseSeries: u'sparse_series', DataFrame: u'frame', SparseDataFrame: u'sparse_frame', - Panel: u'wide', } # storer class map @@ -187,7 +181,6 @@ class DuplicateWarning(Warning): u'sparse_series': 'SparseSeriesFixed', u'frame': 'FrameFixed', u'sparse_frame': 'SparseFrameFixed', - u'wide': 'PanelFixed', } # table class map @@ -197,16 +190,12 @@ class DuplicateWarning(Warning): u'appendable_multiseries': 'AppendableMultiSeriesTable', u'appendable_frame': 'AppendableFrameTable', u'appendable_multiframe': 'AppendableMultiFrameTable', - u'appendable_panel': 'AppendablePanelTable', u'worm': 'WORMTable', - u'legacy_frame': 'LegacyFrameTable', - u'legacy_panel': 'LegacyPanelTable', } # axes map _AXES_MAP = { DataFrame: [0], - Panel: [1, 2] } # register our configuration options @@ -326,8 +315,8 @@ def read_hdf(path_or_buf, key=None, mode='r', **kwargs): See Also -------- - pandas.DataFrame.to_hdf : Write a HDF file from a DataFrame. - pandas.HDFStore : Low-level access to HDF files. + DataFrame.to_hdf : Write a HDF file from a DataFrame. + HDFStore : Low-level access to HDF files. Examples -------- @@ -865,7 +854,7 @@ def put(self, key, value, format=None, append=False, **kwargs): Parameters ---------- key : object - value : {Series, DataFrame, Panel} + value : {Series, DataFrame} format : 'fixed(f)|table(t)', default is 'fixed' fixed(f) : Fixed format Fast writing/reading. Not-appendable, nor searchable @@ -947,7 +936,7 @@ def append(self, key, value, format=None, append=True, columns=None, Parameters ---------- key : object - value : {Series, DataFrame, Panel} + value : {Series, DataFrame} format : 'table' is the default table(t) : table format Write as a PyTables Table structure which may perform @@ -3028,16 +3017,6 @@ class FrameFixed(BlockManagerFixed): obj_type = DataFrame -class PanelFixed(BlockManagerFixed): - pandas_kind = u'wide' - obj_type = Panel - is_shape_reversed = True - - def write(self, obj, **kwargs): - obj._consolidate_inplace() - return super(PanelFixed, self).write(obj, **kwargs) - - class Table(Fixed): """ represent a table: @@ -3288,7 +3267,7 @@ def get_attrs(self): self.nan_rep = getattr(self.attrs, 'nan_rep', None) self.encoding = _ensure_encoding( getattr(self.attrs, 'encoding', None)) - self.errors = getattr(self.attrs, 'errors', 'strict') + self.errors = _ensure_decoded(getattr(self.attrs, 'errors', 'strict')) self.levels = getattr( self.attrs, 'levels', None) or [] self.index_axes = [ @@ -3900,107 +3879,11 @@ def read(self, where=None, columns=None, **kwargs): if not self.read_axes(where=where, **kwargs): return None - lst_vals = [a.values for a in self.index_axes] - labels, levels = _factorize_from_iterables(lst_vals) - # labels and levels are tuples but lists are expected - labels = list(labels) - levels = list(levels) - N = [len(lvl) for lvl in levels] - - # compute the key - key = _factor_indexer(N[1:], labels) - - objs = [] - if len(unique(key)) == len(key): - - sorter, _ = algos.groupsort_indexer( - ensure_int64(key), np.prod(N)) - sorter = ensure_platform_int(sorter) - - # create the objs - for c in self.values_axes: - - # the data need to be sorted - sorted_values = c.take_data().take(sorter, axis=0) - if sorted_values.ndim == 1: - sorted_values = sorted_values.reshape( - (sorted_values.shape[0], 1)) - - take_labels = [l.take(sorter) for l in labels] - items = Index(c.values) - block = _block2d_to_blocknd( - values=sorted_values, placement=np.arange(len(items)), - shape=tuple(N), labels=take_labels, ref_items=items) - - # create the object - mgr = BlockManager([block], [items] + levels) - obj = self.obj_type(mgr) - - # permute if needed - if self.is_transposed: - obj = obj.transpose( - *tuple(Series(self.data_orientation).argsort())) - - objs.append(obj) - - else: - warnings.warn(duplicate_doc, DuplicateWarning, stacklevel=5) - - # reconstruct - long_index = MultiIndex.from_arrays( - [i.values for i in self.index_axes]) - - for c in self.values_axes: - lp = DataFrame(c.data, index=long_index, columns=c.values) - - # need a better algorithm - tuple_index = long_index.values - - unique_tuples = unique(tuple_index) - unique_tuples = com.asarray_tuplesafe(unique_tuples) - - indexer = match(unique_tuples, tuple_index) - indexer = ensure_platform_int(indexer) - - new_index = long_index.take(indexer) - new_values = lp.values.take(indexer, axis=0) - - lp = DataFrame(new_values, index=new_index, columns=lp.columns) - objs.append(lp.to_panel()) - - # create the composite object - if len(objs) == 1: - wp = objs[0] - else: - wp = concat(objs, axis=0, verify_integrity=False)._consolidate() - - # apply the selection filters & axis orderings - wp = self.process_axes(wp, columns=columns) - - return wp - - -class LegacyFrameTable(LegacyTable): - - """ support the legacy frame table """ - pandas_kind = u'frame_table' - table_type = u'legacy_frame' - obj_type = Panel - - def read(self, *args, **kwargs): - return super(LegacyFrameTable, self).read(*args, **kwargs)['value'] - - -class LegacyPanelTable(LegacyTable): - - """ support the legacy panel table """ - table_type = u'legacy_panel' - obj_type = Panel + raise NotImplementedError("Panel is removed in pandas 0.25.0") class AppendableTable(LegacyTable): - - """ suppor the new appendable table formats """ + """ support the new appendable table formats """ _indexables = None table_type = u'appendable' @@ -4232,8 +4115,7 @@ def delete(self, where=None, start=None, stop=None, **kwargs): class AppendableFrameTable(AppendableTable): - - """ suppor the new appendable table formats """ + """ support the new appendable table formats """ pandas_kind = u'frame_table' table_type = u'appendable_frame' ndim = 2 @@ -4442,24 +4324,6 @@ def read(self, **kwargs): return df -class AppendablePanelTable(AppendableTable): - - """ suppor the new appendable table formats """ - table_type = u'appendable_panel' - ndim = 3 - obj_type = Panel - - def get_object(self, obj): - """ these are written transposed """ - if self.is_transposed: - obj = obj.transpose(*self.data_orientation) - return obj - - @property - def is_transposed(self): - return self.data_orientation != tuple(range(self.ndim)) - - def _reindex_axis(obj, axis, labels, other=None): ax = obj._get_axis(axis) labels = ensure_index(labels) @@ -4875,16 +4739,3 @@ def select_coords(self): return self.coordinates return np.arange(start, stop) - -# utilities ### - - -def timeit(key, df, fn=None, remove=True, **kwargs): - if fn is None: - fn = 'timeit.h5' - store = HDFStore(fn, mode='w') - store.append(key, df, **kwargs) - store.close() - - if remove: - os.remove(fn) diff --git a/pandas/io/sas/sas.pyx b/pandas/io/sas/sas.pyx index a5bfd5866a261..9b8fba16741f6 100644 --- a/pandas/io/sas/sas.pyx +++ b/pandas/io/sas/sas.pyx @@ -203,11 +203,12 @@ cdef enum ColumnTypes: # type the page_data types -cdef int page_meta_type = const.page_meta_type -cdef int page_mix_types_0 = const.page_mix_types[0] -cdef int page_mix_types_1 = const.page_mix_types[1] -cdef int page_data_type = const.page_data_type -cdef int subheader_pointers_offset = const.subheader_pointers_offset +cdef: + int page_meta_type = const.page_meta_type + int page_mix_types_0 = const.page_mix_types[0] + int page_mix_types_1 = const.page_mix_types[1] + int page_data_type = const.page_data_type + int subheader_pointers_offset = const.subheader_pointers_offset cdef class Parser(object): diff --git a/pandas/io/sql.py b/pandas/io/sql.py index 5d1163b3e0024..aaface5415384 100644 --- a/pandas/io/sql.py +++ b/pandas/io/sql.py @@ -381,7 +381,8 @@ def read_sql(sql, con, index_col=None, coerce_float=True, params=None, try: _is_table_name = pandas_sql.has_table(sql) - except (ImportError, AttributeError): + except Exception: + # using generic exception to catch errors from sql drivers (GH24988) _is_table_name = False if _is_table_name: diff --git a/pandas/io/stata.py b/pandas/io/stata.py index 1b0660171ecac..62a9dbdc4657e 100644 --- a/pandas/io/stata.py +++ b/pandas/io/stata.py @@ -100,8 +100,8 @@ See Also -------- -pandas.io.stata.StataReader : Low-level reader for Stata data files. -pandas.DataFrame.to_stata: Export Stata data files. +io.stata.StataReader : Low-level reader for Stata data files. +DataFrame.to_stata: Export Stata data files. Examples -------- @@ -119,7 +119,7 @@ _iterator_params) _data_method_doc = """\ -Reads observations from Stata file, converting them into a dataframe +Read observations from Stata file, converting them into a dataframe .. deprecated:: This is a legacy method. Use `read` in new code. @@ -1726,18 +1726,22 @@ def _do_convert_categoricals(self, data, value_label_dict, lbllist, return data def data_label(self): - """Returns data label of Stata file""" + """ + Return data label of Stata file. + """ return self.data_label def variable_labels(self): - """Returns variable labels as a dict, associating each variable name - with corresponding label + """ + Return variable labels as a dict, associating each variable name + with corresponding label. """ return dict(zip(self.varlist, self._variable_labels)) def value_labels(self): - """Returns a dict, associating each variable name a dict, associating - each value its corresponding label + """ + Return a dict, associating each variable name a dict, associating + each value its corresponding label. """ if not self._value_labels_read: self._read_value_labels() @@ -1747,7 +1751,7 @@ def value_labels(self): def _open_file_binary_write(fname): """ - Open a binary file or no-op if file-like + Open a binary file or no-op if file-like. Parameters ---------- @@ -1778,14 +1782,14 @@ def _set_endianness(endianness): def _pad_bytes(name, length): """ - Takes a char string and pads it with null bytes until it's length chars + Take a char string and pads it with null bytes until it's length chars. """ return name + "\x00" * (length - len(name)) def _convert_datetime_to_stata_type(fmt): """ - Converts from one of the stata date formats to a type in TYPE_MAP + Convert from one of the stata date formats to a type in TYPE_MAP. """ if fmt in ["tc", "%tc", "td", "%td", "tw", "%tw", "tm", "%tm", "tq", "%tq", "th", "%th", "ty", "%ty"]: @@ -1812,7 +1816,7 @@ def _maybe_convert_to_int_keys(convert_dates, varlist): def _dtype_to_stata_type(dtype, column): """ - Converts dtype types to stata types. Returns the byte of the given ordinal. + Convert dtype types to stata types. Returns the byte of the given ordinal. See TYPE_MAP and comments for an explanation. This is also explained in the dta spec. 1 - 244 are strings of this length @@ -1850,7 +1854,7 @@ def _dtype_to_stata_type(dtype, column): def _dtype_to_default_stata_fmt(dtype, column, dta_version=114, force_strl=False): """ - Maps numpy dtype to stata's default format for this type. Not terribly + Map numpy dtype to stata's default format for this type. Not terribly important since users can change this in Stata. Semantics are object -> "%DDs" where DD is the length of the string. If not a string, @@ -2385,32 +2389,22 @@ def _prepare_data(self): data = self._convert_strls(data) # 3. Convert bad string data to '' and pad to correct length - dtypes = [] - data_cols = [] - has_strings = False + dtypes = {} native_byteorder = self._byteorder == _set_endianness(sys.byteorder) for i, col in enumerate(data): typ = typlist[i] if typ <= self._max_string_length: - has_strings = True data[col] = data[col].fillna('').apply(_pad_bytes, args=(typ,)) stype = 'S{type}'.format(type=typ) - dtypes.append(('c' + str(i), stype)) - string = data[col].str.encode(self._encoding) - data_cols.append(string.values.astype(stype)) + dtypes[col] = stype + data[col] = data[col].str.encode(self._encoding).astype(stype) else: - values = data[col].values dtype = data[col].dtype if not native_byteorder: dtype = dtype.newbyteorder(self._byteorder) - dtypes.append(('c' + str(i), dtype)) - data_cols.append(values) - dtypes = np.dtype(dtypes) + dtypes[col] = dtype - if has_strings or not native_byteorder: - self.data = np.fromiter(zip(*data_cols), dtype=dtypes) - else: - self.data = data.to_records(index=False) + self.data = data.to_records(index=False, column_dtypes=dtypes) def _write_data(self): data = self.data diff --git a/pandas/plotting/_core.py b/pandas/plotting/_core.py index 3ba06c0638317..a525b9cff1182 100644 --- a/pandas/plotting/_core.py +++ b/pandas/plotting/_core.py @@ -26,7 +26,6 @@ from pandas.core.generic import _shared_doc_kwargs, _shared_docs from pandas.io.formats.printing import pprint_thing -from pandas.plotting import _misc as misc from pandas.plotting._compat import _mpl_ge_3_0_0 from pandas.plotting._style import _get_standard_colors, plot_params from pandas.plotting._tools import ( @@ -40,7 +39,7 @@ else: _HAS_MPL = True if get_option('plotting.matplotlib.register_converters'): - _converter.register(explicit=True) + _converter.register(explicit=False) def _raise_if_no_mpl(): @@ -2549,7 +2548,7 @@ def boxplot_frame_groupby(grouped, subplots=True, column=None, fontsize=None, Parameters ---------- grouped : Grouped DataFrame - subplots : + subplots : bool * ``False`` - no subplots will be used * ``True`` - create a subplot for each group column : column name or list of names, or vector @@ -2906,15 +2905,6 @@ def pie(self, **kwds): """ return self(kind='pie', **kwds) - def lag(self, *args, **kwds): - return misc.lag_plot(self._parent, *args, **kwds) - - def autocorrelation(self, *args, **kwds): - return misc.autocorrelation_plot(self._parent, *args, **kwds) - - def bootstrap(self, *args, **kwds): - return misc.bootstrap_plot(self._parent, *args, **kwds) - class FramePlotMethods(BasePlotMethods): """DataFrame plotting accessor and method @@ -2967,7 +2957,7 @@ def line(self, x=None, y=None, **kwds): Either the location or the label of the columns to be used. By default, it will use the remaining DataFrame numeric columns. **kwds - Keyword arguments to pass on to :meth:`pandas.DataFrame.plot`. + Keyword arguments to pass on to :meth:`DataFrame.plot`. Returns ------- @@ -3032,7 +3022,7 @@ def bar(self, x=None, y=None, **kwds): all numerical columns are used. **kwds Additional keyword arguments are documented in - :meth:`pandas.DataFrame.plot`. + :meth:`DataFrame.plot`. Returns ------- @@ -3042,8 +3032,8 @@ def bar(self, x=None, y=None, **kwds): See Also -------- - pandas.DataFrame.plot.barh : Horizontal bar plot. - pandas.DataFrame.plot : Make plots of a DataFrame. + DataFrame.plot.barh : Horizontal bar plot. + DataFrame.plot : Make plots of a DataFrame. matplotlib.pyplot.bar : Make a bar plot with matplotlib. Examples @@ -3114,7 +3104,7 @@ def barh(self, x=None, y=None, **kwds): y : label or position, default All numeric columns in dataframe Columns to be plotted from the DataFrame. **kwds - Keyword arguments to pass on to :meth:`pandas.DataFrame.plot`. + Keyword arguments to pass on to :meth:`DataFrame.plot`. Returns ------- @@ -3122,8 +3112,8 @@ def barh(self, x=None, y=None, **kwds): See Also -------- - pandas.DataFrame.plot.bar: Vertical bar plot. - pandas.DataFrame.plot : Make plots of DataFrame using matplotlib. + DataFrame.plot.bar: Vertical bar plot. + DataFrame.plot : Make plots of DataFrame using matplotlib. matplotlib.axes.Axes.bar : Plot a vertical bar plot using matplotlib. Examples @@ -3201,7 +3191,7 @@ def box(self, by=None, **kwds): Column in the DataFrame to group by. **kwds : optional Additional keywords are documented in - :meth:`pandas.DataFrame.plot`. + :meth:`DataFrame.plot`. Returns ------- @@ -3209,8 +3199,8 @@ def box(self, by=None, **kwds): See Also -------- - pandas.DataFrame.boxplot: Another method to draw a box plot. - pandas.Series.plot.box: Draw a box plot from a Series object. + DataFrame.boxplot: Another method to draw a box plot. + Series.plot.box: Draw a box plot from a Series object. matplotlib.pyplot.boxplot: Draw a box plot in matplotlib. Examples @@ -3244,7 +3234,7 @@ def hist(self, by=None, bins=10, **kwds): Number of histogram bins to be used. **kwds Additional keyword arguments are documented in - :meth:`pandas.DataFrame.plot`. + :meth:`DataFrame.plot`. Returns ------- @@ -3337,7 +3327,7 @@ def area(self, x=None, y=None, **kwds): unstacked plot. **kwds : optional Additional keyword arguments are documented in - :meth:`pandas.DataFrame.plot`. + :meth:`DataFrame.plot`. Returns ------- @@ -3408,7 +3398,7 @@ def pie(self, y=None, **kwds): Label or position of the column to plot. If not provided, ``subplots=True`` argument must be passed. **kwds - Keyword arguments to pass on to :meth:`pandas.DataFrame.plot`. + Keyword arguments to pass on to :meth:`DataFrame.plot`. Returns ------- @@ -3484,7 +3474,7 @@ def scatter(self, x, y, s=None, c=None, **kwds): marker points according to a colormap. **kwds - Keyword arguments to pass on to :meth:`pandas.DataFrame.plot`. + Keyword arguments to pass on to :meth:`DataFrame.plot`. Returns ------- @@ -3558,7 +3548,7 @@ def hexbin(self, x, y, C=None, reduce_C_function=None, gridsize=None, y-direction. **kwds Additional keyword arguments are documented in - :meth:`pandas.DataFrame.plot`. + :meth:`DataFrame.plot`. Returns ------- @@ -3610,16 +3600,3 @@ def hexbin(self, x, y, C=None, reduce_C_function=None, gridsize=None, if gridsize is not None: kwds['gridsize'] = gridsize return self(kind='hexbin', x=x, y=y, C=C, **kwds) - - def scatter_matrix(self, *args, **kwds): - return misc.scatter_matrix(self._parent, *args, **kwds) - - def andrews_curves(self, class_column, *args, **kwds): - return misc.andrews_curves(self._parent, class_column, *args, **kwds) - - def parallel_coordinates(self, class_column, *args, **kwds): - return misc.parallel_coordinates(self._parent, class_column, - *args, **kwds) - - def radviz(self, class_column, *args, **kwds): - return misc.radviz(self._parent, class_column, *args, **kwds) diff --git a/pandas/plotting/_misc.py b/pandas/plotting/_misc.py index 1c69c03025e00..62a33245f99ef 100644 --- a/pandas/plotting/_misc.py +++ b/pandas/plotting/_misc.py @@ -182,7 +182,7 @@ def radviz(frame, class_column, ax=None, color=None, colormap=None, **kwds): See Also -------- - pandas.plotting.andrews_curves : Plot clustering visualization. + plotting.andrews_curves : Plot clustering visualization. Examples -------- @@ -273,7 +273,7 @@ def normalize(series): def andrews_curves(frame, class_column, ax=None, samples=200, color=None, colormap=None, **kwds): """ - Generates a matplotlib plot of Andrews curves, for visualising clusters of + Generate a matplotlib plot of Andrews curves, for visualising clusters of multivariate data. Andrews curves have the functional form: @@ -394,8 +394,8 @@ def bootstrap_plot(series, fig=None, size=50, samples=500, **kwds): See Also -------- - pandas.DataFrame.plot : Basic plotting for DataFrame objects. - pandas.Series.plot : Basic plotting for Series objects. + DataFrame.plot : Basic plotting for DataFrame objects. + Series.plot : Basic plotting for Series objects. Examples -------- @@ -598,7 +598,8 @@ def lag_plot(series, lag=1, ax=None, **kwds): def autocorrelation_plot(series, ax=None, **kwds): - """Autocorrelation plot for time series. + """ + Autocorrelation plot for time series. Parameters: ----------- diff --git a/pandas/tests/api/test_api.py b/pandas/tests/api/test_api.py index 07cf358c765b3..599ab9a3c5f7c 100644 --- a/pandas/tests/api/test_api.py +++ b/pandas/tests/api/test_api.py @@ -46,7 +46,6 @@ class TestPDApi(Base): 'Series', 'SparseArray', 'SparseDataFrame', 'SparseDtype', 'SparseSeries', 'Timedelta', 'TimedeltaIndex', 'Timestamp', 'Interval', 'IntervalIndex', - 'IntervalArray', 'CategoricalDtype', 'PeriodDtype', 'IntervalDtype', 'DatetimeTZDtype', 'Int8Dtype', 'Int16Dtype', 'Int32Dtype', 'Int64Dtype', diff --git a/pandas/tests/arithmetic/test_datetime64.py b/pandas/tests/arithmetic/test_datetime64.py index f97a1651163e8..405dc0805a285 100644 --- a/pandas/tests/arithmetic/test_datetime64.py +++ b/pandas/tests/arithmetic/test_datetime64.py @@ -124,14 +124,14 @@ def test_comparison_invalid(self, box_with_array): result = x != y expected = tm.box_expected([True] * 5, xbox) tm.assert_equal(result, expected) - - with pytest.raises(TypeError): + msg = 'Invalid comparison between' + with pytest.raises(TypeError, match=msg): x >= y - with pytest.raises(TypeError): + with pytest.raises(TypeError, match=msg): x > y - with pytest.raises(TypeError): + with pytest.raises(TypeError, match=msg): x < y - with pytest.raises(TypeError): + with pytest.raises(TypeError, match=msg): x <= y @pytest.mark.parametrize('data', [ @@ -327,9 +327,10 @@ def test_comparison_tzawareness_compat(self, op): # raise naive_series = Series(dr) aware_series = Series(dz) - with pytest.raises(TypeError): + msg = 'Cannot compare tz-naive and tz-aware' + with pytest.raises(TypeError, match=msg): op(dz, naive_series) - with pytest.raises(TypeError): + with pytest.raises(TypeError, match=msg): op(dr, aware_series) # TODO: implement _assert_tzawareness_compat for the reverse @@ -428,14 +429,14 @@ def test_dti_cmp_null_scalar_inequality(self, tz_naive_fixture, other, dti = pd.date_range('2016-01-01', periods=2, tz=tz) # FIXME: ValueError with transpose dtarr = tm.box_expected(dti, box_with_array, transpose=False) - - with pytest.raises(TypeError): + msg = 'Invalid comparison between' + with pytest.raises(TypeError, match=msg): dtarr < other - with pytest.raises(TypeError): + with pytest.raises(TypeError, match=msg): dtarr <= other - with pytest.raises(TypeError): + with pytest.raises(TypeError, match=msg): dtarr > other - with pytest.raises(TypeError): + with pytest.raises(TypeError, match=msg): dtarr >= other @pytest.mark.parametrize('dtype', [None, object]) @@ -584,22 +585,23 @@ def test_comparison_tzawareness_compat(self, op, box_with_array): dr = tm.box_expected(dr, box_with_array, transpose=False) dz = tm.box_expected(dz, box_with_array, transpose=False) - with pytest.raises(TypeError): + msg = 'Cannot compare tz-naive and tz-aware' + with pytest.raises(TypeError, match=msg): op(dr, dz) if box_with_array is not pd.DataFrame: # DataFrame op is invalid until transpose bug is fixed - with pytest.raises(TypeError): + with pytest.raises(TypeError, match=msg): op(dr, list(dz)) - with pytest.raises(TypeError): + with pytest.raises(TypeError, match=msg): op(dr, np.array(list(dz), dtype=object)) - with pytest.raises(TypeError): + with pytest.raises(TypeError, match=msg): op(dz, dr) if box_with_array is not pd.DataFrame: # DataFrame op is invalid until transpose bug is fixed - with pytest.raises(TypeError): + with pytest.raises(TypeError, match=msg): op(dz, list(dr)) - with pytest.raises(TypeError): + with pytest.raises(TypeError, match=msg): op(dz, np.array(list(dr), dtype=object)) # Check that there isn't a problem aware-aware and naive-naive do not @@ -617,15 +619,15 @@ def test_comparison_tzawareness_compat(self, op, box_with_array): ts_tz = pd.Timestamp('2000-03-14 01:59', tz='Europe/Amsterdam') assert_all(dr > ts) - with pytest.raises(TypeError): + with pytest.raises(TypeError, match=msg): op(dr, ts_tz) assert_all(dz > ts_tz) - with pytest.raises(TypeError): + with pytest.raises(TypeError, match=msg): op(dz, ts) # GH#12601: Check comparison against Timestamps and DatetimeIndex - with pytest.raises(TypeError): + with pytest.raises(TypeError, match=msg): op(ts, dz) @pytest.mark.parametrize('op', [operator.eq, operator.ne, @@ -641,10 +643,10 @@ def test_scalar_comparison_tzawareness(self, op, other, tz_aware_fixture, # FIXME: ValueError with transpose dtarr = tm.box_expected(dti, box_with_array, transpose=False) - - with pytest.raises(TypeError): + msg = 'Cannot compare tz-naive and tz-aware' + with pytest.raises(TypeError, match=msg): op(dtarr, other) - with pytest.raises(TypeError): + with pytest.raises(TypeError, match=msg): op(other, dtarr) @pytest.mark.parametrize('op', [operator.eq, operator.ne, @@ -714,14 +716,14 @@ def test_dt64arr_cmp_scalar_invalid(self, other, tz_naive_fixture, expected = np.array([True] * 10) expected = tm.box_expected(expected, xbox, transpose=False) tm.assert_equal(result, expected) - - with pytest.raises(TypeError): + msg = 'Invalid comparison between' + with pytest.raises(TypeError, match=msg): rng < other - with pytest.raises(TypeError): + with pytest.raises(TypeError, match=msg): rng <= other - with pytest.raises(TypeError): + with pytest.raises(TypeError, match=msg): rng > other - with pytest.raises(TypeError): + with pytest.raises(TypeError, match=msg): rng >= other def test_dti_cmp_list(self): @@ -749,14 +751,14 @@ def test_dti_cmp_tdi_tzawareness(self, other): result = dti != other expected = np.array([True] * 10) tm.assert_numpy_array_equal(result, expected) - - with pytest.raises(TypeError): + msg = 'Invalid comparison between' + with pytest.raises(TypeError, match=msg): dti < other - with pytest.raises(TypeError): + with pytest.raises(TypeError, match=msg): dti <= other - with pytest.raises(TypeError): + with pytest.raises(TypeError, match=msg): dti > other - with pytest.raises(TypeError): + with pytest.raises(TypeError, match=msg): dti >= other def test_dti_cmp_object_dtype(self): @@ -770,7 +772,8 @@ def test_dti_cmp_object_dtype(self): tm.assert_numpy_array_equal(result, expected) other = dti.tz_localize(None) - with pytest.raises(TypeError): + msg = 'Cannot compare tz-naive and tz-aware' + with pytest.raises(TypeError, match=msg): # tzawareness failure dti != other @@ -778,8 +781,8 @@ def test_dti_cmp_object_dtype(self): result = dti == other expected = np.array([True] * 5 + [False] * 5) tm.assert_numpy_array_equal(result, expected) - - with pytest.raises(TypeError): + msg = "Cannot compare type" + with pytest.raises(TypeError, match=msg): dti >= other @@ -898,7 +901,8 @@ def test_dt64arr_add_sub_td64_nat(self, box_with_array, tz_naive_fixture): tm.assert_equal(result, expected) result = obj - other tm.assert_equal(result, expected) - with pytest.raises(TypeError): + msg = 'cannot subtract' + with pytest.raises(TypeError, match=msg): other - obj def test_dt64arr_add_sub_td64ndarray(self, tz_naive_fixture, @@ -927,8 +931,8 @@ def test_dt64arr_add_sub_td64ndarray(self, tz_naive_fixture, result = dtarr - tdarr tm.assert_equal(result, expected) - - with pytest.raises(TypeError): + msg = 'cannot subtract' + with pytest.raises(TypeError, match=msg): tdarr - dtarr # ----------------------------------------------------------------- @@ -1028,10 +1032,10 @@ def test_dt64arr_aware_sub_dt64ndarray_raises(self, tz_aware_fixture, dt64vals = dti.values dtarr = tm.box_expected(dti, box_with_array) - - with pytest.raises(TypeError): + msg = 'DatetimeArray subtraction must have the same timezones or' + with pytest.raises(TypeError, match=msg): dtarr - dt64vals - with pytest.raises(TypeError): + with pytest.raises(TypeError, match=msg): dt64vals - dtarr # ------------------------------------------------------------- @@ -1048,17 +1052,17 @@ def test_dt64arr_add_dt64ndarray_raises(self, tz_naive_fixture, dt64vals = dti.values dtarr = tm.box_expected(dti, box_with_array) - - with pytest.raises(TypeError): + msg = 'cannot add' + with pytest.raises(TypeError, match=msg): dtarr + dt64vals - with pytest.raises(TypeError): + with pytest.raises(TypeError, match=msg): dt64vals + dtarr def test_dt64arr_add_timestamp_raises(self, box_with_array): # GH#22163 ensure DataFrame doesn't cast Timestamp to i8 idx = DatetimeIndex(['2011-01-01', '2011-01-02']) idx = tm.box_expected(idx, box_with_array) - msg = "cannot add" + msg = 'cannot add' with pytest.raises(TypeError, match=msg): idx + Timestamp('2011-01-01') with pytest.raises(TypeError, match=msg): @@ -1071,13 +1075,14 @@ def test_dt64arr_add_timestamp_raises(self, box_with_array): def test_dt64arr_add_sub_float(self, other, box_with_array): dti = DatetimeIndex(['2011-01-01', '2011-01-02'], freq='D') dtarr = tm.box_expected(dti, box_with_array) - with pytest.raises(TypeError): + msg = '|'.join(['unsupported operand type', 'cannot (add|subtract)']) + with pytest.raises(TypeError, match=msg): dtarr + other - with pytest.raises(TypeError): + with pytest.raises(TypeError, match=msg): other + dtarr - with pytest.raises(TypeError): + with pytest.raises(TypeError, match=msg): dtarr - other - with pytest.raises(TypeError): + with pytest.raises(TypeError, match=msg): other - dtarr @pytest.mark.parametrize('pi_freq', ['D', 'W', 'Q', 'H']) @@ -1090,14 +1095,15 @@ def test_dt64arr_add_sub_parr(self, dti_freq, pi_freq, dtarr = tm.box_expected(dti, box_with_array) parr = tm.box_expected(pi, box_with_array2) - - with pytest.raises(TypeError): + msg = '|'.join(['cannot (add|subtract)', 'unsupported operand', + 'descriptor.*requires', 'ufunc.*cannot use operands']) + with pytest.raises(TypeError, match=msg): dtarr + parr - with pytest.raises(TypeError): + with pytest.raises(TypeError, match=msg): parr + dtarr - with pytest.raises(TypeError): + with pytest.raises(TypeError, match=msg): dtarr - parr - with pytest.raises(TypeError): + with pytest.raises(TypeError, match=msg): parr - dtarr @pytest.mark.parametrize('dti_freq', [None, 'D']) @@ -1108,14 +1114,14 @@ def test_dt64arr_add_sub_period_scalar(self, dti_freq, box_with_array): idx = pd.DatetimeIndex(['2011-01-01', '2011-01-02'], freq=dti_freq) dtarr = tm.box_expected(idx, box_with_array) - - with pytest.raises(TypeError): + msg = '|'.join(['unsupported operand type', 'cannot (add|subtract)']) + with pytest.raises(TypeError, match=msg): dtarr + per - with pytest.raises(TypeError): + with pytest.raises(TypeError, match=msg): per + dtarr - with pytest.raises(TypeError): + with pytest.raises(TypeError, match=msg): dtarr - per - with pytest.raises(TypeError): + with pytest.raises(TypeError, match=msg): per - dtarr @@ -1156,8 +1162,8 @@ def test_dt64arr_series_sub_tick_DateOffset(self, box_with_array): result2 = -pd.offsets.Second(5) + ser tm.assert_equal(result2, expected) - - with pytest.raises(TypeError): + msg = "bad operand type for unary" + with pytest.raises(TypeError, match=msg): pd.offsets.Second(5) - ser @pytest.mark.parametrize('cls_name', ['Day', 'Hour', 'Minute', 'Second', @@ -1239,8 +1245,8 @@ def test_dt64arr_add_sub_relativedelta_offsets(self, box_with_array): expected = DatetimeIndex([x - off for x in vec_items]) expected = tm.box_expected(expected, box_with_array) tm.assert_equal(expected, vec - off) - - with pytest.raises(TypeError): + msg = "bad operand type for unary" + with pytest.raises(TypeError, match=msg): off - vec # ------------------------------------------------------------- @@ -1320,8 +1326,8 @@ def test_dt64arr_add_sub_DateOffsets(self, box_with_array, expected = DatetimeIndex([offset + x for x in vec_items]) expected = tm.box_expected(expected, box_with_array) tm.assert_equal(expected, offset + vec) - - with pytest.raises(TypeError): + msg = "bad operand type for unary" + with pytest.raises(TypeError, match=msg): offset - vec def test_dt64arr_add_sub_DateOffset(self, box_with_array): @@ -1440,13 +1446,14 @@ def test_dt64_series_arith_overflow(self): td = pd.Timedelta('20000 Days') dti = pd.date_range('1949-09-30', freq='100Y', periods=4) ser = pd.Series(dti) - with pytest.raises(OverflowError): + msg = 'Overflow in int64 addition' + with pytest.raises(OverflowError, match=msg): ser - dt - with pytest.raises(OverflowError): + with pytest.raises(OverflowError, match=msg): dt - ser - with pytest.raises(OverflowError): + with pytest.raises(OverflowError, match=msg): ser + td - with pytest.raises(OverflowError): + with pytest.raises(OverflowError, match=msg): td + ser ser.iloc[-1] = pd.NaT @@ -1480,9 +1487,9 @@ def test_datetimeindex_sub_timestamp_overflow(self): tspos.to_pydatetime(), tspos.to_datetime64().astype('datetime64[ns]'), tspos.to_datetime64().astype('datetime64[D]')] - + msg = 'Overflow in int64 addition' for variant in ts_neg_variants: - with pytest.raises(OverflowError): + with pytest.raises(OverflowError, match=msg): dtimax - variant expected = pd.Timestamp.max.value - tspos.value @@ -1496,7 +1503,7 @@ def test_datetimeindex_sub_timestamp_overflow(self): assert res[1].value == expected for variant in ts_pos_variants: - with pytest.raises(OverflowError): + with pytest.raises(OverflowError, match=msg): dtimin - variant def test_datetimeindex_sub_datetimeindex_overflow(self): @@ -1515,22 +1522,22 @@ def test_datetimeindex_sub_datetimeindex_overflow(self): expected = pd.Timestamp.min.value - ts_neg[1].value result = dtimin - ts_neg assert result[1].value == expected - - with pytest.raises(OverflowError): + msg = 'Overflow in int64 addition' + with pytest.raises(OverflowError, match=msg): dtimax - ts_neg - with pytest.raises(OverflowError): + with pytest.raises(OverflowError, match=msg): dtimin - ts_pos # Edge cases tmin = pd.to_datetime([pd.Timestamp.min]) t1 = tmin + pd.Timedelta.max + pd.Timedelta('1us') - with pytest.raises(OverflowError): + with pytest.raises(OverflowError, match=msg): t1 - tmin tmax = pd.to_datetime([pd.Timestamp.max]) t2 = tmax + pd.Timedelta.min - pd.Timedelta('1us') - with pytest.raises(OverflowError): + with pytest.raises(OverflowError, match=msg): tmax - t2 @@ -1543,7 +1550,8 @@ def test_empty_series_add_sub(self): tm.assert_series_equal(a, a + b) tm.assert_series_equal(a, a - b) tm.assert_series_equal(a, b + a) - with pytest.raises(TypeError): + msg = 'cannot subtract' + with pytest.raises(TypeError, match=msg): b - a def test_operators_datetimelike(self): @@ -1688,12 +1696,13 @@ def test_datetime64_ops_nat(self): # subtraction tm.assert_series_equal(-NaT + datetime_series, nat_series_dtype_timestamp) - with pytest.raises(TypeError): + msg = 'Unary negative expects' + with pytest.raises(TypeError, match=msg): -single_nat_dtype_datetime + datetime_series tm.assert_series_equal(-NaT + nat_series_dtype_timestamp, nat_series_dtype_timestamp) - with pytest.raises(TypeError): + with pytest.raises(TypeError, match=msg): -single_nat_dtype_datetime + nat_series_dtype_timestamp # addition @@ -1718,15 +1727,16 @@ def test_datetime64_ops_nat(self): @pytest.mark.parametrize('one', [1, 1.0, np.array(1)]) def test_dt64_mul_div_numeric_invalid(self, one, dt64_series): # multiplication - with pytest.raises(TypeError): + msg = 'cannot perform .* with this index type' + with pytest.raises(TypeError, match=msg): dt64_series * one - with pytest.raises(TypeError): + with pytest.raises(TypeError, match=msg): one * dt64_series # division - with pytest.raises(TypeError): + with pytest.raises(TypeError, match=msg): dt64_series / one - with pytest.raises(TypeError): + with pytest.raises(TypeError, match=msg): one / dt64_series @pytest.mark.parametrize('op', ['__add__', '__radd__', @@ -1740,13 +1750,17 @@ def test_dt64_series_add_intlike(self, tz, op): other = Series([20, 30, 40], dtype='uint8') method = getattr(ser, op) - with pytest.raises(TypeError): + msg = '|'.join(['incompatible type for a .* operation', + 'cannot evaluate a numeric op', + 'ufunc .* cannot use operands', + 'cannot (add|subtract)']) + with pytest.raises(TypeError, match=msg): method(1) - with pytest.raises(TypeError): + with pytest.raises(TypeError, match=msg): method(other) - with pytest.raises(TypeError): + with pytest.raises(TypeError, match=msg): method(other.values) - with pytest.raises(TypeError): + with pytest.raises(TypeError, match=msg): method(pd.Index(other)) # ------------------------------------------------------------- @@ -1783,13 +1797,14 @@ def test_operators_datetimelike_with_timezones(self): result = dt1 - td1[0] exp = (dt1.dt.tz_localize(None) - td1[0]).dt.tz_localize(tz) tm.assert_series_equal(result, exp) - with pytest.raises(TypeError): + msg = "bad operand type for unary" + with pytest.raises(TypeError, match=msg): td1[0] - dt1 result = dt2 - td2[0] exp = (dt2.dt.tz_localize(None) - td2[0]).dt.tz_localize(tz) tm.assert_series_equal(result, exp) - with pytest.raises(TypeError): + with pytest.raises(TypeError, match=msg): td2[0] - dt2 result = dt1 + td1 @@ -1807,10 +1822,10 @@ def test_operators_datetimelike_with_timezones(self): result = dt2 - td2 exp = (dt2.dt.tz_localize(None) - td2).dt.tz_localize(tz) tm.assert_series_equal(result, exp) - - with pytest.raises(TypeError): + msg = 'cannot (add|subtract)' + with pytest.raises(TypeError, match=msg): td1 - dt1 - with pytest.raises(TypeError): + with pytest.raises(TypeError, match=msg): td2 - dt2 @@ -1909,13 +1924,15 @@ def test_dti_add_intarray_no_freq(self, int_holder): # GH#19959 dti = pd.DatetimeIndex(['2016-01-01', 'NaT', '2017-04-05 06:07:08']) other = int_holder([9, 4, -1]) - with pytest.raises(NullFrequencyError): + nfmsg = 'Cannot shift with no freq' + tmsg = 'cannot subtract DatetimeArray from' + with pytest.raises(NullFrequencyError, match=nfmsg): dti + other - with pytest.raises(NullFrequencyError): + with pytest.raises(NullFrequencyError, match=nfmsg): other + dti - with pytest.raises(NullFrequencyError): + with pytest.raises(NullFrequencyError, match=nfmsg): dti - other - with pytest.raises(TypeError): + with pytest.raises(TypeError, match=tmsg): other - dti # ------------------------------------------------------------- @@ -2057,14 +2074,14 @@ def test_sub_dti_dti(self): result = dti_tz - dti_tz tm.assert_index_equal(result, expected) - - with pytest.raises(TypeError): + msg = 'DatetimeArray subtraction must have the same timezones or' + with pytest.raises(TypeError, match=msg): dti_tz - dti - with pytest.raises(TypeError): + with pytest.raises(TypeError, match=msg): dti - dti_tz - with pytest.raises(TypeError): + with pytest.raises(TypeError, match=msg): dti_tz - dti_tz2 # isub @@ -2074,7 +2091,8 @@ def test_sub_dti_dti(self): # different length raises ValueError dti1 = date_range('20130101', periods=3) dti2 = date_range('20130101', periods=4) - with pytest.raises(ValueError): + msg = 'cannot add indices of unequal length' + with pytest.raises(ValueError, match=msg): dti1 - dti2 # NaN propagation @@ -2148,8 +2166,8 @@ def test_ops_nat_mixed_datetime64_timedelta64(self): tm.assert_series_equal(-single_nat_dtype_timedelta + nat_series_dtype_timestamp, nat_series_dtype_timestamp) - - with pytest.raises(TypeError): + msg = 'cannot subtract a datelike' + with pytest.raises(TypeError, match=msg): timedelta_series - single_nat_dtype_datetime # addition diff --git a/pandas/tests/arrays/test_array.py b/pandas/tests/arrays/test_array.py index 4a51fd63d963b..9fea1989e46df 100644 --- a/pandas/tests/arrays/test_array.py +++ b/pandas/tests/arrays/test_array.py @@ -74,7 +74,7 @@ # Interval ([pd.Interval(1, 2), pd.Interval(3, 4)], 'interval', - pd.IntervalArray.from_tuples([(1, 2), (3, 4)])), + pd.arrays.IntervalArray.from_tuples([(1, 2), (3, 4)])), # Sparse ([0, 1], 'Sparse[int64]', pd.SparseArray([0, 1], dtype='int64')), @@ -129,7 +129,7 @@ def test_array_copy(): # interval ([pd.Interval(0, 1), pd.Interval(1, 2)], - pd.IntervalArray.from_breaks([0, 1, 2])), + pd.arrays.IntervalArray.from_breaks([0, 1, 2])), # datetime ([pd.Timestamp('2000',), pd.Timestamp('2001')], diff --git a/pandas/tests/arrays/test_integer.py b/pandas/tests/arrays/test_integer.py index 09298bb5cd08d..67e7db5460e6d 100644 --- a/pandas/tests/arrays/test_integer.py +++ b/pandas/tests/arrays/test_integer.py @@ -339,7 +339,7 @@ def _compare_other(self, data, op_name, other): expected = pd.Series(op(data._data, other)) # fill the nan locations - expected[data._mask] = True if op_name == '__ne__' else False + expected[data._mask] = op_name == '__ne__' tm.assert_series_equal(result, expected) @@ -351,7 +351,7 @@ def _compare_other(self, data, op_name, other): expected = op(expected, other) # fill the nan locations - expected[data._mask] = True if op_name == '__ne__' else False + expected[data._mask] = op_name == '__ne__' tm.assert_series_equal(result, expected) diff --git a/pandas/tests/dtypes/test_generic.py b/pandas/tests/dtypes/test_generic.py index 1622088d05f4d..2bb3559d56d61 100644 --- a/pandas/tests/dtypes/test_generic.py +++ b/pandas/tests/dtypes/test_generic.py @@ -1,6 +1,6 @@ # -*- coding: utf-8 -*- -from warnings import catch_warnings, simplefilter +from warnings import catch_warnings import numpy as np @@ -39,9 +39,6 @@ def test_abc_types(self): assert isinstance(pd.Int64Index([1, 2, 3]), gt.ABCIndexClass) assert isinstance(pd.Series([1, 2, 3]), gt.ABCSeries) assert isinstance(self.df, gt.ABCDataFrame) - with catch_warnings(record=True): - simplefilter('ignore', FutureWarning) - assert isinstance(self.df.to_panel(), gt.ABCPanel) assert isinstance(self.sparse_series, gt.ABCSparseSeries) assert isinstance(self.sparse_array, gt.ABCSparseArray) assert isinstance(self.sparse_frame, gt.ABCSparseDataFrame) diff --git a/pandas/tests/dtypes/test_inference.py b/pandas/tests/dtypes/test_inference.py index 89662b70a39ad..49a66efaffc11 100644 --- a/pandas/tests/dtypes/test_inference.py +++ b/pandas/tests/dtypes/test_inference.py @@ -159,13 +159,15 @@ def test_is_nested_list_like_fails(obj): @pytest.mark.parametrize( - "ll", [{}, {'A': 1}, Series([1])]) + "ll", [{}, {'A': 1}, Series([1]), collections.defaultdict()]) def test_is_dict_like_passes(ll): assert inference.is_dict_like(ll) -@pytest.mark.parametrize( - "ll", ['1', 1, [1, 2], (1, 2), range(2), Index([1])]) +@pytest.mark.parametrize("ll", [ + '1', 1, [1, 2], (1, 2), range(2), Index([1]), + dict, collections.defaultdict, Series +]) def test_is_dict_like_fails(ll): assert not inference.is_dict_like(ll) diff --git a/pandas/tests/dtypes/test_missing.py b/pandas/tests/dtypes/test_missing.py index d913d2ad299ce..7ca01e13a33a9 100644 --- a/pandas/tests/dtypes/test_missing.py +++ b/pandas/tests/dtypes/test_missing.py @@ -2,7 +2,7 @@ from datetime import datetime from decimal import Decimal -from warnings import catch_warnings, filterwarnings, simplefilter +from warnings import catch_warnings, filterwarnings import numpy as np import pytest @@ -94,15 +94,6 @@ def test_isna_isnull(self, isna_f): expected = df.apply(isna_f) tm.assert_frame_equal(result, expected) - # panel - with catch_warnings(record=True): - simplefilter("ignore", FutureWarning) - for p in [tm.makePanel(), tm.makePeriodPanel(), - tm.add_nans(tm.makePanel())]: - result = isna_f(p) - expected = p.apply(isna_f) - tm.assert_panel_equal(result, expected) - def test_isna_lists(self): result = isna([[False]]) exp = np.array([[False]]) diff --git a/pandas/tests/extension/base/groupby.py b/pandas/tests/extension/base/groupby.py index dd406ca0cd5ed..1929dad075695 100644 --- a/pandas/tests/extension/base/groupby.py +++ b/pandas/tests/extension/base/groupby.py @@ -55,19 +55,14 @@ def test_groupby_extension_transform(self, data_for_grouping): self.assert_series_equal(result, expected) - @pytest.mark.parametrize('op', [ - lambda x: 1, - lambda x: [1] * len(x), - lambda x: pd.Series([1] * len(x)), - lambda x: x, - ], ids=['scalar', 'list', 'series', 'object']) - def test_groupby_extension_apply(self, data_for_grouping, op): + def test_groupby_extension_apply( + self, data_for_grouping, groupby_apply_op): df = pd.DataFrame({"A": [1, 1, 2, 2, 3, 3, 1, 4], "B": data_for_grouping}) - df.groupby("B").apply(op) - df.groupby("B").A.apply(op) - df.groupby("A").apply(op) - df.groupby("A").B.apply(op) + df.groupby("B").apply(groupby_apply_op) + df.groupby("B").A.apply(groupby_apply_op) + df.groupby("A").apply(groupby_apply_op) + df.groupby("A").B.apply(groupby_apply_op) def test_in_numeric_groupby(self, data_for_grouping): df = pd.DataFrame({"A": [1, 1, 2, 2, 3, 3, 1, 4], diff --git a/pandas/tests/extension/base/methods.py b/pandas/tests/extension/base/methods.py index f64df7a84b7c0..1852edaa9e748 100644 --- a/pandas/tests/extension/base/methods.py +++ b/pandas/tests/extension/base/methods.py @@ -240,7 +240,6 @@ def test_shift_fill_value(self, data): expected = data.take([2, 3, 0, 0]) self.assert_extension_array_equal(result, expected) - @pytest.mark.parametrize("as_frame", [True, False]) def test_hash_pandas_object_works(self, data, as_frame): # https://github.com/pandas-dev/pandas/issues/23066 data = pd.Series(data) @@ -250,7 +249,6 @@ def test_hash_pandas_object_works(self, data, as_frame): b = pd.util.hash_pandas_object(data) self.assert_equal(a, b) - @pytest.mark.parametrize("as_series", [True, False]) def test_searchsorted(self, data_for_sorting, as_series): b, c, a = data_for_sorting arr = type(data_for_sorting)._from_sequence([a, b, c]) @@ -275,7 +273,6 @@ def test_searchsorted(self, data_for_sorting, as_series): sorter = np.array([1, 2, 0]) assert data_for_sorting.searchsorted(a, sorter=sorter) == 0 - @pytest.mark.parametrize("as_frame", [True, False]) def test_where_series(self, data, na_value, as_frame): assert data[0] != data[1] cls = type(data) @@ -309,8 +306,6 @@ def test_where_series(self, data, na_value, as_frame): expected = expected.to_frame(name='a') self.assert_equal(result, expected) - @pytest.mark.parametrize("use_numpy", [True, False]) - @pytest.mark.parametrize("as_series", [True, False]) @pytest.mark.parametrize("repeats", [0, 1, 2, [1, 2, 3]]) def test_repeat(self, data, repeats, as_series, use_numpy): arr = type(data)._from_sequence(data[:3], dtype=data.dtype) @@ -327,7 +322,6 @@ def test_repeat(self, data, repeats, as_series, use_numpy): self.assert_equal(result, expected) - @pytest.mark.parametrize("use_numpy", [True, False]) @pytest.mark.parametrize('repeats, kwargs, error, msg', [ (2, dict(axis=1), ValueError, "'axis"), (-1, dict(), ValueError, "negative"), diff --git a/pandas/tests/extension/base/missing.py b/pandas/tests/extension/base/missing.py index 2fe547e50a34b..834f49f0461f0 100644 --- a/pandas/tests/extension/base/missing.py +++ b/pandas/tests/extension/base/missing.py @@ -1,5 +1,4 @@ import numpy as np -import pytest import pandas as pd import pandas.util.testing as tm @@ -89,14 +88,13 @@ def test_fillna_series(self, data_missing): result = ser.fillna(ser) self.assert_series_equal(result, ser) - @pytest.mark.parametrize('method', ['ffill', 'bfill']) - def test_fillna_series_method(self, data_missing, method): + def test_fillna_series_method(self, data_missing, fillna_method): fill_value = data_missing[1] - if method == 'ffill': + if fillna_method == 'ffill': data_missing = data_missing[::-1] - result = pd.Series(data_missing).fillna(method=method) + result = pd.Series(data_missing).fillna(method=fillna_method) expected = pd.Series(data_missing._from_sequence( [fill_value, fill_value], dtype=data_missing.dtype)) diff --git a/pandas/tests/extension/base/setitem.py b/pandas/tests/extension/base/setitem.py index 42fda982f7339..db6328e39e6cc 100644 --- a/pandas/tests/extension/base/setitem.py +++ b/pandas/tests/extension/base/setitem.py @@ -24,7 +24,6 @@ def test_setitem_sequence(self, data, box_in_series): assert data[0] == original[1] assert data[1] == original[0] - @pytest.mark.parametrize('as_array', [True, False]) def test_setitem_sequence_mismatched_length_raises(self, data, as_array): ser = pd.Series(data) original = ser.copy() diff --git a/pandas/tests/extension/conftest.py b/pandas/tests/extension/conftest.py index 5349dd919f2a2..3cc2d313b09f5 100644 --- a/pandas/tests/extension/conftest.py +++ b/pandas/tests/extension/conftest.py @@ -2,6 +2,8 @@ import pytest +from pandas import Series + @pytest.fixture def dtype(): @@ -108,3 +110,58 @@ def data_for_grouping(): def box_in_series(request): """Whether to box the data in a Series""" return request.param + + +@pytest.fixture(params=[ + lambda x: 1, + lambda x: [1] * len(x), + lambda x: Series([1] * len(x)), + lambda x: x, +], ids=['scalar', 'list', 'series', 'object']) +def groupby_apply_op(request): + """ + Functions to test groupby.apply(). + """ + return request.param + + +@pytest.fixture(params=[True, False]) +def as_frame(request): + """ + Boolean fixture to support Series and Series.to_frame() comparison testing. + """ + return request.param + + +@pytest.fixture(params=[True, False]) +def as_series(request): + """ + Boolean fixture to support arr and Series(arr) comparison testing. + """ + return request.param + + +@pytest.fixture(params=[True, False]) +def use_numpy(request): + """ + Boolean fixture to support comparison testing of ExtensionDtype array + and numpy array. + """ + return request.param + + +@pytest.fixture(params=['ffill', 'bfill']) +def fillna_method(request): + """ + Parametrized fixture giving method parameters 'ffill' and 'bfill' for + Series.fillna(method=) testing. + """ + return request.param + + +@pytest.fixture(params=[True, False]) +def as_array(request): + """ + Boolean fixture to support ExtensionDtype _from_sequence method testing. + """ + return request.param diff --git a/pandas/tests/extension/test_numpy.py b/pandas/tests/extension/test_numpy.py index 7ca6882c7441b..41f5beb8c885d 100644 --- a/pandas/tests/extension/test_numpy.py +++ b/pandas/tests/extension/test_numpy.py @@ -1,6 +1,8 @@ import numpy as np import pytest +from pandas.compat.numpy import _np_version_under1p16 + import pandas as pd from pandas import compat from pandas.core.arrays.numpy_ import PandasArray, PandasDtype @@ -9,9 +11,9 @@ from . import base -@pytest.fixture -def dtype(): - return PandasDtype(np.dtype('float')) +@pytest.fixture(params=['float', 'object']) +def dtype(request): + return PandasDtype(np.dtype(request.param)) @pytest.fixture @@ -38,11 +40,19 @@ def allow_in_pandas(monkeypatch): @pytest.fixture def data(allow_in_pandas, dtype): + if dtype.numpy_dtype == 'object': + return pd.Series([(i,) for i in range(100)]).array return PandasArray(np.arange(1, 101, dtype=dtype._dtype)) @pytest.fixture -def data_missing(allow_in_pandas): +def data_missing(allow_in_pandas, dtype): + # For NumPy <1.16, np.array([np.nan, (1,)]) raises + # ValueError: setting an array element with a sequence. + if dtype.numpy_dtype == 'object': + if _np_version_under1p16: + raise pytest.skip("Skipping for NumPy <1.16") + return PandasArray(np.array([np.nan, (1,)])) return PandasArray(np.array([np.nan, 1.0])) @@ -59,49 +69,84 @@ def cmp(a, b): @pytest.fixture -def data_for_sorting(allow_in_pandas): +def data_for_sorting(allow_in_pandas, dtype): """Length-3 array with a known sort order. This should be three items [B, C, A] with A < B < C """ + if dtype.numpy_dtype == 'object': + # Use an empty tuple for first element, then remove, + # to disable np.array's shape inference. + return PandasArray( + np.array([(), (2,), (3,), (1,)])[1:] + ) return PandasArray( np.array([1, 2, 0]) ) @pytest.fixture -def data_missing_for_sorting(allow_in_pandas): +def data_missing_for_sorting(allow_in_pandas, dtype): """Length-3 array with a known sort order. This should be three items [B, NA, A] with A < B and NA missing. """ + if dtype.numpy_dtype == 'object': + return PandasArray( + np.array([(1,), np.nan, (0,)]) + ) return PandasArray( np.array([1, np.nan, 0]) ) @pytest.fixture -def data_for_grouping(allow_in_pandas): +def data_for_grouping(allow_in_pandas, dtype): """Data for factorization, grouping, and unique tests. Expected to be like [B, B, NA, NA, A, A, B, C] Where A < B < C and NA is missing """ - a, b, c = np.arange(3) + if dtype.numpy_dtype == 'object': + a, b, c = (1,), (2,), (3,) + else: + a, b, c = np.arange(3) return PandasArray(np.array( [b, b, np.nan, np.nan, a, a, b, c] )) +@pytest.fixture +def skip_numpy_object(dtype): + """ + Tests for PandasArray with nested data. Users typically won't create + these objects via `pd.array`, but they can show up through `.array` + on a Series with nested data. Many of the base tests fail, as they aren't + appropriate for nested data. + + This fixture allows these tests to be skipped when used as a usefixtures + marker to either an individual test or a test class. + """ + if dtype == 'object': + raise pytest.skip("Skipping for object dtype.") + + +skip_nested = pytest.mark.usefixtures('skip_numpy_object') + + class BaseNumPyTests(object): pass class TestCasting(BaseNumPyTests, base.BaseCastingTests): - pass + + @skip_nested + def test_astype_str(self, data): + # ValueError: setting an array element with a sequence + super(TestCasting, self).test_astype_str(data) class TestConstructors(BaseNumPyTests, base.BaseConstructorsTests): @@ -110,6 +155,11 @@ class TestConstructors(BaseNumPyTests, base.BaseConstructorsTests): def test_from_dtype(self, data): pass + @skip_nested + def test_array_from_scalars(self, data): + # ValueError: PandasArray must be 1-dimensional. + super(TestConstructors, self).test_array_from_scalars(data) + class TestDtype(BaseNumPyTests, base.BaseDtypeTests): @@ -120,15 +170,32 @@ def test_check_dtype(self, data): class TestGetitem(BaseNumPyTests, base.BaseGetitemTests): - pass + + @skip_nested + def test_getitem_scalar(self, data): + # AssertionError + super(TestGetitem, self).test_getitem_scalar(data) + + @skip_nested + def test_take_series(self, data): + # ValueError: PandasArray must be 1-dimensional. + super(TestGetitem, self).test_take_series(data) class TestGroupby(BaseNumPyTests, base.BaseGroupbyTests): - pass + @skip_nested + def test_groupby_extension_apply( + self, data_for_grouping, groupby_apply_op): + # ValueError: Names should be list-like for a MultiIndex + super(TestGroupby, self).test_groupby_extension_apply( + data_for_grouping, groupby_apply_op) class TestInterface(BaseNumPyTests, base.BaseInterfaceTests): - pass + @skip_nested + def test_array_interface(self, data): + # NumPy array shape inference + super(TestInterface, self).test_array_interface(data) class TestMethods(BaseNumPyTests, base.BaseMethodsTests): @@ -143,7 +210,57 @@ def test_value_counts(self, all_data, dropna): def test_combine_le(self, data_repeated): super(TestMethods, self).test_combine_le(data_repeated) - + @skip_nested + def test_combine_add(self, data_repeated): + # Not numeric + super(TestMethods, self).test_combine_add(data_repeated) + + @skip_nested + def test_shift_fill_value(self, data): + # np.array shape inference. Shift implementation fails. + super(TestMethods, self).test_shift_fill_value(data) + + @skip_nested + @pytest.mark.parametrize('box', [pd.Series, lambda x: x]) + @pytest.mark.parametrize('method', [lambda x: x.unique(), pd.unique]) + def test_unique(self, data, box, method): + # Fails creating expected + super(TestMethods, self).test_unique(data, box, method) + + @skip_nested + def test_fillna_copy_frame(self, data_missing): + # The "scalar" for this array isn't a scalar. + super(TestMethods, self).test_fillna_copy_frame(data_missing) + + @skip_nested + def test_fillna_copy_series(self, data_missing): + # The "scalar" for this array isn't a scalar. + super(TestMethods, self).test_fillna_copy_series(data_missing) + + @skip_nested + def test_hash_pandas_object_works(self, data, as_frame): + # ndarray of tuples not hashable + super(TestMethods, self).test_hash_pandas_object_works(data, as_frame) + + @skip_nested + def test_searchsorted(self, data_for_sorting, as_series): + # Test setup fails. + super(TestMethods, self).test_searchsorted(data_for_sorting, as_series) + + @skip_nested + def test_where_series(self, data, na_value, as_frame): + # Test setup fails. + super(TestMethods, self).test_where_series(data, na_value, as_frame) + + @skip_nested + @pytest.mark.parametrize("repeats", [0, 1, 2, [1, 2, 3]]) + def test_repeat(self, data, repeats, as_series, use_numpy): + # Fails creating expected + super(TestMethods, self).test_repeat( + data, repeats, as_series, use_numpy) + + +@skip_nested class TestArithmetics(BaseNumPyTests, base.BaseArithmeticOpsTests): divmod_exc = None series_scalar_exc = None @@ -183,6 +300,7 @@ class TestPrinting(BaseNumPyTests, base.BasePrintingTests): pass +@skip_nested class TestNumericReduce(BaseNumPyTests, base.BaseNumericReduceTests): def check_reduce(self, s, op_name, skipna): @@ -192,12 +310,33 @@ def check_reduce(self, s, op_name, skipna): tm.assert_almost_equal(result, expected) +@skip_nested class TestBooleanReduce(BaseNumPyTests, base.BaseBooleanReduceTests): pass -class TestMising(BaseNumPyTests, base.BaseMissingTests): - pass +class TestMissing(BaseNumPyTests, base.BaseMissingTests): + + @skip_nested + def test_fillna_scalar(self, data_missing): + # Non-scalar "scalar" values. + super(TestMissing, self).test_fillna_scalar(data_missing) + + @skip_nested + def test_fillna_series_method(self, data_missing, fillna_method): + # Non-scalar "scalar" values. + super(TestMissing, self).test_fillna_series_method( + data_missing, fillna_method) + + @skip_nested + def test_fillna_series(self, data_missing): + # Non-scalar "scalar" values. + super(TestMissing, self).test_fillna_series(data_missing) + + @skip_nested + def test_fillna_frame(self, data_missing): + # Non-scalar "scalar" values. + super(TestMissing, self).test_fillna_frame(data_missing) class TestReshaping(BaseNumPyTests, base.BaseReshapingTests): @@ -207,10 +346,85 @@ class TestReshaping(BaseNumPyTests, base.BaseReshapingTests): def test_concat_mixed_dtypes(self, data): super(TestReshaping, self).test_concat_mixed_dtypes(data) + @skip_nested + def test_merge(self, data, na_value): + # Fails creating expected + super(TestReshaping, self).test_merge(data, na_value) -class TestSetitem(BaseNumPyTests, base.BaseSetitemTests): - pass + @skip_nested + def test_merge_on_extension_array(self, data): + # Fails creating expected + super(TestReshaping, self).test_merge_on_extension_array(data) + @skip_nested + def test_merge_on_extension_array_duplicates(self, data): + # Fails creating expected + super(TestReshaping, self).test_merge_on_extension_array_duplicates( + data) + + +class TestSetitem(BaseNumPyTests, base.BaseSetitemTests): + @skip_nested + def test_setitem_scalar_series(self, data, box_in_series): + # AssertionError + super(TestSetitem, self).test_setitem_scalar_series( + data, box_in_series) + + @skip_nested + def test_setitem_sequence(self, data, box_in_series): + # ValueError: shape mismatch: value array of shape (2,1) could not + # be broadcast to indexing result of shape (2,) + super(TestSetitem, self).test_setitem_sequence(data, box_in_series) + + @skip_nested + def test_setitem_sequence_mismatched_length_raises(self, data, as_array): + # ValueError: PandasArray must be 1-dimensional. + (super(TestSetitem, self). + test_setitem_sequence_mismatched_length_raises(data, as_array)) + + @skip_nested + def test_setitem_sequence_broadcasts(self, data, box_in_series): + # ValueError: cannot set using a list-like indexer with a different + # length than the value + super(TestSetitem, self).test_setitem_sequence_broadcasts( + data, box_in_series) + + @skip_nested + def test_setitem_loc_scalar_mixed(self, data): + # AssertionError + super(TestSetitem, self).test_setitem_loc_scalar_mixed(data) + + @skip_nested + def test_setitem_loc_scalar_multiple_homogoneous(self, data): + # AssertionError + super(TestSetitem, self).test_setitem_loc_scalar_multiple_homogoneous( + data) + + @skip_nested + def test_setitem_iloc_scalar_mixed(self, data): + # AssertionError + super(TestSetitem, self).test_setitem_iloc_scalar_mixed(data) + + @skip_nested + def test_setitem_iloc_scalar_multiple_homogoneous(self, data): + # AssertionError + super(TestSetitem, self).test_setitem_iloc_scalar_multiple_homogoneous( + data) + + @skip_nested + @pytest.mark.parametrize('setter', ['loc', None]) + def test_setitem_mask_broadcast(self, data, setter): + # ValueError: cannot set using a list-like indexer with a different + # length than the value + super(TestSetitem, self).test_setitem_mask_broadcast(data, setter) + + @skip_nested + def test_setitem_scalar_key_sequence_raise(self, data): + # Failed: DID NOT RAISE + super(TestSetitem, self).test_setitem_scalar_key_sequence_raise(data) + + +@skip_nested class TestParsing(BaseNumPyTests, base.BaseParsingTests): pass diff --git a/pandas/tests/extension/test_sparse.py b/pandas/tests/extension/test_sparse.py index 21dbf9524961c..146dea2b65d83 100644 --- a/pandas/tests/extension/test_sparse.py +++ b/pandas/tests/extension/test_sparse.py @@ -287,11 +287,10 @@ def test_combine_first(self, data): pytest.skip("TODO(SparseArray.__setitem__ will preserve dtype.") super(TestMethods, self).test_combine_first(data) - @pytest.mark.parametrize("as_series", [True, False]) def test_searchsorted(self, data_for_sorting, as_series): with tm.assert_produces_warning(PerformanceWarning): super(TestMethods, self).test_searchsorted(data_for_sorting, - as_series=as_series) + as_series) class TestCasting(BaseSparseTests, base.BaseCastingTests): diff --git a/pandas/tests/frame/test_alter_axes.py b/pandas/tests/frame/test_alter_axes.py index c2355742199dc..cc3687f856b4e 100644 --- a/pandas/tests/frame/test_alter_axes.py +++ b/pandas/tests/frame/test_alter_axes.py @@ -253,23 +253,129 @@ def test_set_index_raise_keys(self, frame_of_index_cols, drop, append): df.set_index(['A', df['A'], tuple(df['A'])], drop=drop, append=append) + @pytest.mark.xfail(reason='broken due to revert, see GH 25085') @pytest.mark.parametrize('append', [True, False]) @pytest.mark.parametrize('drop', [True, False]) - @pytest.mark.parametrize('box', [set, iter]) + @pytest.mark.parametrize('box', [set, iter, lambda x: (y for y in x)], + ids=['set', 'iter', 'generator']) def test_set_index_raise_on_type(self, frame_of_index_cols, box, drop, append): df = frame_of_index_cols msg = 'The parameter "keys" may be a column key, .*' - # forbidden type, e.g. set/tuple/iter - with pytest.raises(ValueError, match=msg): + # forbidden type, e.g. set/iter/generator + with pytest.raises(TypeError, match=msg): df.set_index(box(df['A']), drop=drop, append=append) - # forbidden type in list, e.g. set/tuple/iter - with pytest.raises(ValueError, match=msg): + # forbidden type in list, e.g. set/iter/generator + with pytest.raises(TypeError, match=msg): df.set_index(['A', df['A'], box(df['A'])], drop=drop, append=append) + def test_set_index_custom_label_type(self): + # GH 24969 + + class Thing(object): + def __init__(self, name, color): + self.name = name + self.color = color + + def __str__(self): + return "" % (self.name,) + + # necessary for pretty KeyError + __repr__ = __str__ + + thing1 = Thing('One', 'red') + thing2 = Thing('Two', 'blue') + df = DataFrame({thing1: [0, 1], thing2: [2, 3]}) + expected = DataFrame({thing1: [0, 1]}, + index=Index([2, 3], name=thing2)) + + # use custom label directly + result = df.set_index(thing2) + tm.assert_frame_equal(result, expected) + + # custom label wrapped in list + result = df.set_index([thing2]) + tm.assert_frame_equal(result, expected) + + # missing key + thing3 = Thing('Three', 'pink') + msg = "" + with pytest.raises(KeyError, match=msg): + # missing label directly + df.set_index(thing3) + + with pytest.raises(KeyError, match=msg): + # missing label in list + df.set_index([thing3]) + + def test_set_index_custom_label_hashable_iterable(self): + # GH 24969 + + # actual example discussed in GH 24984 was e.g. for shapely.geometry + # objects (e.g. a collection of Points) that can be both hashable and + # iterable; using frozenset as a stand-in for testing here + + class Thing(frozenset): + # need to stabilize repr for KeyError (due to random order in sets) + def __repr__(self): + tmp = sorted(list(self)) + # double curly brace prints one brace in format string + return "frozenset({{{}}})".format(', '.join(map(repr, tmp))) + + thing1 = Thing(['One', 'red']) + thing2 = Thing(['Two', 'blue']) + df = DataFrame({thing1: [0, 1], thing2: [2, 3]}) + expected = DataFrame({thing1: [0, 1]}, + index=Index([2, 3], name=thing2)) + + # use custom label directly + result = df.set_index(thing2) + tm.assert_frame_equal(result, expected) + + # custom label wrapped in list + result = df.set_index([thing2]) + tm.assert_frame_equal(result, expected) + + # missing key + thing3 = Thing(['Three', 'pink']) + msg = '.*' # due to revert, see GH 25085 + with pytest.raises(KeyError, match=msg): + # missing label directly + df.set_index(thing3) + + with pytest.raises(KeyError, match=msg): + # missing label in list + df.set_index([thing3]) + + def test_set_index_custom_label_type_raises(self): + # GH 24969 + + # purposefully inherit from something unhashable + class Thing(set): + def __init__(self, name, color): + self.name = name + self.color = color + + def __str__(self): + return "" % (self.name,) + + thing1 = Thing('One', 'red') + thing2 = Thing('Two', 'blue') + df = DataFrame([[0, 2], [1, 3]], columns=[thing1, thing2]) + + msg = 'unhashable type.*' + + with pytest.raises(TypeError, match=msg): + # use custom label directly + df.set_index(thing2) + + with pytest.raises(TypeError, match=msg): + # custom label wrapped in list + df.set_index([thing2]) + def test_construction_with_categorical_index(self): ci = tm.makeCategoricalIndex(10) ci.name = 'B' @@ -600,6 +706,26 @@ def test_rename_axis_mapper(self): with pytest.raises(TypeError, match='bogus'): df.rename_axis(bogus=None) + @pytest.mark.parametrize('kwargs, rename_index, rename_columns', [ + ({'mapper': None, 'axis': 0}, True, False), + ({'mapper': None, 'axis': 1}, False, True), + ({'index': None}, True, False), + ({'columns': None}, False, True), + ({'index': None, 'columns': None}, True, True), + ({}, False, False)]) + def test_rename_axis_none(self, kwargs, rename_index, rename_columns): + # GH 25034 + index = Index(list('abc'), name='foo') + columns = Index(['col1', 'col2'], name='bar') + data = np.arange(6).reshape(3, 2) + df = DataFrame(data, index, columns) + + result = df.rename_axis(**kwargs) + expected_index = index.rename(None) if rename_index else index + expected_columns = columns.rename(None) if rename_columns else columns + expected = DataFrame(data, expected_index, expected_columns) + tm.assert_frame_equal(result, expected) + def test_rename_multiindex(self): tuples_index = [('foo1', 'bar1'), ('foo2', 'bar2')] diff --git a/pandas/tests/frame/test_analytics.py b/pandas/tests/frame/test_analytics.py index f2c3f50c291c3..2e690ebbfa121 100644 --- a/pandas/tests/frame/test_analytics.py +++ b/pandas/tests/frame/test_analytics.py @@ -231,9 +231,9 @@ def assert_bool_op_api(opname, bool_frame_with_na, float_string_frame, getattr(bool_frame_with_na, opname)(axis=1, bool_only=False) -class TestDataFrameAnalytics(): +class TestDataFrameAnalytics(object): - # ---------------------------------------------------------------------= + # --------------------------------------------------------------------- # Correlation and covariance @td.skip_if_no_scipy @@ -502,6 +502,9 @@ def test_corrwith_kendall(self): expected = Series(np.ones(len(result))) tm.assert_series_equal(result, expected) + # --------------------------------------------------------------------- + # Describe + def test_bool_describe_in_mixed_frame(self): df = DataFrame({ 'string_data': ['a', 'b', 'c', 'd', 'e'], @@ -693,82 +696,113 @@ def test_describe_tz_values(self, tz_naive_fixture): result = df.describe(include='all') tm.assert_frame_equal(result, expected) - def test_reduce_mixed_frame(self): - # GH 6806 - df = DataFrame({ - 'bool_data': [True, True, False, False, False], - 'int_data': [10, 20, 30, 40, 50], - 'string_data': ['a', 'b', 'c', 'd', 'e'], - }) - df.reindex(columns=['bool_data', 'int_data', 'string_data']) - test = df.sum(axis=0) - tm.assert_numpy_array_equal(test.values, - np.array([2, 150, 'abcde'], dtype=object)) - tm.assert_series_equal(test, df.T.sum(axis=1)) + # --------------------------------------------------------------------- + # Reductions - def test_count(self, float_frame_with_na, float_frame, float_string_frame): - f = lambda s: notna(s).sum() - assert_stat_op_calc('count', f, float_frame_with_na, has_skipna=False, - check_dtype=False, check_dates=True) + def test_stat_op_api(self, float_frame, float_string_frame): assert_stat_op_api('count', float_frame, float_string_frame, has_numeric_only=True) + assert_stat_op_api('sum', float_frame, float_string_frame, + has_numeric_only=True) - # corner case - frame = DataFrame() - ct1 = frame.count(1) - assert isinstance(ct1, Series) + assert_stat_op_api('nunique', float_frame, float_string_frame) + assert_stat_op_api('mean', float_frame, float_string_frame) + assert_stat_op_api('product', float_frame, float_string_frame) + assert_stat_op_api('median', float_frame, float_string_frame) + assert_stat_op_api('min', float_frame, float_string_frame) + assert_stat_op_api('max', float_frame, float_string_frame) + assert_stat_op_api('mad', float_frame, float_string_frame) + assert_stat_op_api('var', float_frame, float_string_frame) + assert_stat_op_api('std', float_frame, float_string_frame) + assert_stat_op_api('sem', float_frame, float_string_frame) + assert_stat_op_api('median', float_frame, float_string_frame) - ct2 = frame.count(0) - assert isinstance(ct2, Series) + try: + from scipy.stats import skew, kurtosis # noqa:F401 + assert_stat_op_api('skew', float_frame, float_string_frame) + assert_stat_op_api('kurt', float_frame, float_string_frame) + except ImportError: + pass - # GH 423 - df = DataFrame(index=lrange(10)) - result = df.count(1) - expected = Series(0, index=df.index) - tm.assert_series_equal(result, expected) + def test_stat_op_calc(self, float_frame_with_na, mixed_float_frame): - df = DataFrame(columns=lrange(10)) - result = df.count(0) - expected = Series(0, index=df.columns) - tm.assert_series_equal(result, expected) + def count(s): + return notna(s).sum() - df = DataFrame() - result = df.count() - expected = Series(0, index=[]) - tm.assert_series_equal(result, expected) + def nunique(s): + return len(algorithms.unique1d(s.dropna())) - def test_nunique(self, float_frame_with_na, float_frame, - float_string_frame): - f = lambda s: len(algorithms.unique1d(s.dropna())) - assert_stat_op_calc('nunique', f, float_frame_with_na, + def mad(x): + return np.abs(x - x.mean()).mean() + + def var(x): + return np.var(x, ddof=1) + + def std(x): + return np.std(x, ddof=1) + + def sem(x): + return np.std(x, ddof=1) / np.sqrt(len(x)) + + def skewness(x): + from scipy.stats import skew # noqa:F811 + if len(x) < 3: + return np.nan + return skew(x, bias=False) + + def kurt(x): + from scipy.stats import kurtosis # noqa:F811 + if len(x) < 4: + return np.nan + return kurtosis(x, bias=False) + + assert_stat_op_calc('nunique', nunique, float_frame_with_na, has_skipna=False, check_dtype=False, check_dates=True) - assert_stat_op_api('nunique', float_frame, float_string_frame) - df = DataFrame({'A': [1, 1, 1], - 'B': [1, 2, 3], - 'C': [1, np.nan, 3]}) - tm.assert_series_equal(df.nunique(), Series({'A': 1, 'B': 3, 'C': 2})) - tm.assert_series_equal(df.nunique(dropna=False), - Series({'A': 1, 'B': 3, 'C': 3})) - tm.assert_series_equal(df.nunique(axis=1), Series({0: 1, 1: 2, 2: 2})) - tm.assert_series_equal(df.nunique(axis=1, dropna=False), - Series({0: 1, 1: 3, 2: 2})) - - def test_sum(self, float_frame_with_na, mixed_float_frame, - float_frame, float_string_frame): - assert_stat_op_api('sum', float_frame, float_string_frame, - has_numeric_only=True) - assert_stat_op_calc('sum', np.sum, float_frame_with_na, - skipna_alternative=np.nansum) # mixed types (with upcasting happening) assert_stat_op_calc('sum', np.sum, mixed_float_frame.astype('float32'), check_dtype=False, check_less_precise=True) + assert_stat_op_calc('sum', np.sum, float_frame_with_na, + skipna_alternative=np.nansum) + assert_stat_op_calc('mean', np.mean, float_frame_with_na, + check_dates=True) + assert_stat_op_calc('product', np.prod, float_frame_with_na) + + assert_stat_op_calc('mad', mad, float_frame_with_na) + assert_stat_op_calc('var', var, float_frame_with_na) + assert_stat_op_calc('std', std, float_frame_with_na) + assert_stat_op_calc('sem', sem, float_frame_with_na) + + assert_stat_op_calc('count', count, float_frame_with_na, + has_skipna=False, check_dtype=False, + check_dates=True) + + try: + from scipy import skew, kurtosis # noqa:F401 + assert_stat_op_calc('skew', skewness, float_frame_with_na) + assert_stat_op_calc('kurt', kurt, float_frame_with_na) + except ImportError: + pass + + # TODO: Ensure warning isn't emitted in the first place + @pytest.mark.filterwarnings("ignore:All-NaN:RuntimeWarning") + def test_median(self, float_frame_with_na, int_frame): + def wrapper(x): + if isna(x).any(): + return np.nan + return np.median(x) + + assert_stat_op_calc('median', wrapper, float_frame_with_na, + check_dates=True) + assert_stat_op_calc('median', wrapper, int_frame, check_dtype=False, + check_dates=True) + @pytest.mark.parametrize('method', ['sum', 'mean', 'prod', 'var', 'std', 'skew', 'min', 'max']) def test_stat_operators_attempt_obj_array(self, method): - # GH 676 + # GH#676 data = { 'a': [-0.00049987540199591344, -0.0016467257772919831, 0.00067695870775883013], @@ -789,108 +823,65 @@ def test_stat_operators_attempt_obj_array(self, method): if method in ['sum', 'prod']: tm.assert_series_equal(result, expected) - def test_mean(self, float_frame_with_na, float_frame, float_string_frame): - assert_stat_op_calc('mean', np.mean, float_frame_with_na, - check_dates=True) - assert_stat_op_api('mean', float_frame, float_string_frame) - - def test_product(self, float_frame_with_na, float_frame, - float_string_frame): - assert_stat_op_calc('product', np.prod, float_frame_with_na) - assert_stat_op_api('product', float_frame, float_string_frame) - - # TODO: Ensure warning isn't emitted in the first place - @pytest.mark.filterwarnings("ignore:All-NaN:RuntimeWarning") - def test_median(self, float_frame_with_na, float_frame, - float_string_frame): - def wrapper(x): - if isna(x).any(): - return np.nan - return np.median(x) - - assert_stat_op_calc('median', wrapper, float_frame_with_na, - check_dates=True) - assert_stat_op_api('median', float_frame, float_string_frame) - - def test_min(self, float_frame_with_na, int_frame, - float_frame, float_string_frame): - with warnings.catch_warnings(record=True): - warnings.simplefilter("ignore", RuntimeWarning) - assert_stat_op_calc('min', np.min, float_frame_with_na, - check_dates=True) - assert_stat_op_calc('min', np.min, int_frame) - assert_stat_op_api('min', float_frame, float_string_frame) - - def test_cummin(self, datetime_frame): - datetime_frame.loc[5:10, 0] = np.nan - datetime_frame.loc[10:15, 1] = np.nan - datetime_frame.loc[15:, 2] = np.nan - - # axis = 0 - cummin = datetime_frame.cummin() - expected = datetime_frame.apply(Series.cummin) - tm.assert_frame_equal(cummin, expected) - - # axis = 1 - cummin = datetime_frame.cummin(axis=1) - expected = datetime_frame.apply(Series.cummin, axis=1) - tm.assert_frame_equal(cummin, expected) - - # it works - df = DataFrame({'A': np.arange(20)}, index=np.arange(20)) - result = df.cummin() # noqa - - # fix issue - cummin_xs = datetime_frame.cummin(axis=1) - assert np.shape(cummin_xs) == np.shape(datetime_frame) - - def test_cummax(self, datetime_frame): - datetime_frame.loc[5:10, 0] = np.nan - datetime_frame.loc[10:15, 1] = np.nan - datetime_frame.loc[15:, 2] = np.nan - - # axis = 0 - cummax = datetime_frame.cummax() - expected = datetime_frame.apply(Series.cummax) - tm.assert_frame_equal(cummax, expected) - - # axis = 1 - cummax = datetime_frame.cummax(axis=1) - expected = datetime_frame.apply(Series.cummax, axis=1) - tm.assert_frame_equal(cummax, expected) + @pytest.mark.parametrize('op', ['mean', 'std', 'var', + 'skew', 'kurt', 'sem']) + def test_mixed_ops(self, op): + # GH#16116 + df = DataFrame({'int': [1, 2, 3, 4], + 'float': [1., 2., 3., 4.], + 'str': ['a', 'b', 'c', 'd']}) - # it works - df = DataFrame({'A': np.arange(20)}, index=np.arange(20)) - result = df.cummax() # noqa + result = getattr(df, op)() + assert len(result) == 2 - # fix issue - cummax_xs = datetime_frame.cummax(axis=1) - assert np.shape(cummax_xs) == np.shape(datetime_frame) + with pd.option_context('use_bottleneck', False): + result = getattr(df, op)() + assert len(result) == 2 - def test_max(self, float_frame_with_na, int_frame, - float_frame, float_string_frame): - with warnings.catch_warnings(record=True): - warnings.simplefilter("ignore", RuntimeWarning) - assert_stat_op_calc('max', np.max, float_frame_with_na, - check_dates=True) - assert_stat_op_calc('max', np.max, int_frame) - assert_stat_op_api('max', float_frame, float_string_frame) + def test_reduce_mixed_frame(self): + # GH 6806 + df = DataFrame({ + 'bool_data': [True, True, False, False, False], + 'int_data': [10, 20, 30, 40, 50], + 'string_data': ['a', 'b', 'c', 'd', 'e'], + }) + df.reindex(columns=['bool_data', 'int_data', 'string_data']) + test = df.sum(axis=0) + tm.assert_numpy_array_equal(test.values, + np.array([2, 150, 'abcde'], dtype=object)) + tm.assert_series_equal(test, df.T.sum(axis=1)) - def test_mad(self, float_frame_with_na, float_frame, float_string_frame): - f = lambda x: np.abs(x - x.mean()).mean() - assert_stat_op_calc('mad', f, float_frame_with_na) - assert_stat_op_api('mad', float_frame, float_string_frame) + def test_nunique(self): + df = DataFrame({'A': [1, 1, 1], + 'B': [1, 2, 3], + 'C': [1, np.nan, 3]}) + tm.assert_series_equal(df.nunique(), Series({'A': 1, 'B': 3, 'C': 2})) + tm.assert_series_equal(df.nunique(dropna=False), + Series({'A': 1, 'B': 3, 'C': 3})) + tm.assert_series_equal(df.nunique(axis=1), Series({0: 1, 1: 2, 2: 2})) + tm.assert_series_equal(df.nunique(axis=1, dropna=False), + Series({0: 1, 1: 3, 2: 2})) - def test_var_std(self, float_frame_with_na, datetime_frame, float_frame, - float_string_frame): - alt = lambda x: np.var(x, ddof=1) - assert_stat_op_calc('var', alt, float_frame_with_na) - assert_stat_op_api('var', float_frame, float_string_frame) + @pytest.mark.parametrize('tz', [None, 'UTC']) + def test_mean_mixed_datetime_numeric(self, tz): + # https://github.com/pandas-dev/pandas/issues/24752 + df = pd.DataFrame({"A": [1, 1], + "B": [pd.Timestamp('2000', tz=tz)] * 2}) + result = df.mean() + expected = pd.Series([1.0], index=['A']) + tm.assert_series_equal(result, expected) - alt = lambda x: np.std(x, ddof=1) - assert_stat_op_calc('std', alt, float_frame_with_na) - assert_stat_op_api('std', float_frame, float_string_frame) + @pytest.mark.parametrize('tz', [None, 'UTC']) + def test_mean_excludeds_datetimes(self, tz): + # https://github.com/pandas-dev/pandas/issues/24752 + # Our long-term desired behavior is unclear, but the behavior in + # 0.24.0rc1 was buggy. + df = pd.DataFrame({"A": [pd.Timestamp('2000', tz=tz)] * 2}) + result = df.mean() + expected = pd.Series() + tm.assert_series_equal(result, expected) + def test_var_std(self, datetime_frame): result = datetime_frame.std(ddof=4) expected = datetime_frame.apply(lambda x: x.std(ddof=4)) tm.assert_almost_equal(result, expected) @@ -933,79 +924,7 @@ def test_numeric_only_flag(self, meth): pytest.raises(TypeError, lambda: getattr(df2, meth)( axis=1, numeric_only=False)) - @pytest.mark.parametrize('op', ['mean', 'std', 'var', - 'skew', 'kurt', 'sem']) - def test_mixed_ops(self, op): - # GH 16116 - df = DataFrame({'int': [1, 2, 3, 4], - 'float': [1., 2., 3., 4.], - 'str': ['a', 'b', 'c', 'd']}) - - result = getattr(df, op)() - assert len(result) == 2 - - with pd.option_context('use_bottleneck', False): - result = getattr(df, op)() - assert len(result) == 2 - - def test_cumsum(self, datetime_frame): - datetime_frame.loc[5:10, 0] = np.nan - datetime_frame.loc[10:15, 1] = np.nan - datetime_frame.loc[15:, 2] = np.nan - - # axis = 0 - cumsum = datetime_frame.cumsum() - expected = datetime_frame.apply(Series.cumsum) - tm.assert_frame_equal(cumsum, expected) - - # axis = 1 - cumsum = datetime_frame.cumsum(axis=1) - expected = datetime_frame.apply(Series.cumsum, axis=1) - tm.assert_frame_equal(cumsum, expected) - - # works - df = DataFrame({'A': np.arange(20)}, index=np.arange(20)) - result = df.cumsum() # noqa - - # fix issue - cumsum_xs = datetime_frame.cumsum(axis=1) - assert np.shape(cumsum_xs) == np.shape(datetime_frame) - - def test_cumprod(self, datetime_frame): - datetime_frame.loc[5:10, 0] = np.nan - datetime_frame.loc[10:15, 1] = np.nan - datetime_frame.loc[15:, 2] = np.nan - - # axis = 0 - cumprod = datetime_frame.cumprod() - expected = datetime_frame.apply(Series.cumprod) - tm.assert_frame_equal(cumprod, expected) - - # axis = 1 - cumprod = datetime_frame.cumprod(axis=1) - expected = datetime_frame.apply(Series.cumprod, axis=1) - tm.assert_frame_equal(cumprod, expected) - - # fix issue - cumprod_xs = datetime_frame.cumprod(axis=1) - assert np.shape(cumprod_xs) == np.shape(datetime_frame) - - # ints - df = datetime_frame.fillna(0).astype(int) - df.cumprod(0) - df.cumprod(1) - - # ints32 - df = datetime_frame.fillna(0).astype(np.int32) - df.cumprod(0) - df.cumprod(1) - - def test_sem(self, float_frame_with_na, datetime_frame, - float_frame, float_string_frame): - alt = lambda x: np.std(x, ddof=1) / np.sqrt(len(x)) - assert_stat_op_calc('sem', alt, float_frame_with_na) - assert_stat_op_api('sem', float_frame, float_string_frame) - + def test_sem(self, datetime_frame): result = datetime_frame.sem(ddof=4) expected = datetime_frame.apply( lambda x: x.std(ddof=4) / np.sqrt(len(x))) @@ -1020,29 +939,7 @@ def test_sem(self, float_frame_with_na, datetime_frame, assert not (result < 0).any() @td.skip_if_no_scipy - def test_skew(self, float_frame_with_na, float_frame, float_string_frame): - from scipy.stats import skew - - def alt(x): - if len(x) < 3: - return np.nan - return skew(x, bias=False) - - assert_stat_op_calc('skew', alt, float_frame_with_na) - assert_stat_op_api('skew', float_frame, float_string_frame) - - @td.skip_if_no_scipy - def test_kurt(self, float_frame_with_na, float_frame, float_string_frame): - from scipy.stats import kurtosis - - def alt(x): - if len(x) < 4: - return np.nan - return kurtosis(x, bias=False) - - assert_stat_op_calc('kurt', alt, float_frame_with_na) - assert_stat_op_api('kurt', float_frame, float_string_frame) - + def test_kurt(self): index = MultiIndex(levels=[['bar'], ['one', 'two', 'three'], [0, 1]], codes=[[0, 0, 0, 0, 0, 0], [0, 1, 2, 0, 1, 2], @@ -1304,20 +1201,146 @@ def test_stats_mixed_type(self, float_string_frame): float_string_frame.mean(1) float_string_frame.skew(1) - # TODO: Ensure warning isn't emitted in the first place - @pytest.mark.filterwarnings("ignore:All-NaN:RuntimeWarning") - def test_median_corner(self, int_frame, float_frame, float_string_frame): - def wrapper(x): - if isna(x).any(): - return np.nan - return np.median(x) + def test_sum_bools(self): + df = DataFrame(index=lrange(1), columns=lrange(10)) + bools = isna(df) + assert bools.sum(axis=1)[0] == 10 - assert_stat_op_calc('median', wrapper, int_frame, check_dtype=False, - check_dates=True) - assert_stat_op_api('median', float_frame, float_string_frame) + # --------------------------------------------------------------------- + # Cumulative Reductions - cumsum, cummax, ... + + def test_cumsum_corner(self): + dm = DataFrame(np.arange(20).reshape(4, 5), + index=lrange(4), columns=lrange(5)) + # ?(wesm) + result = dm.cumsum() # noqa + def test_cumsum(self, datetime_frame): + datetime_frame.loc[5:10, 0] = np.nan + datetime_frame.loc[10:15, 1] = np.nan + datetime_frame.loc[15:, 2] = np.nan + + # axis = 0 + cumsum = datetime_frame.cumsum() + expected = datetime_frame.apply(Series.cumsum) + tm.assert_frame_equal(cumsum, expected) + + # axis = 1 + cumsum = datetime_frame.cumsum(axis=1) + expected = datetime_frame.apply(Series.cumsum, axis=1) + tm.assert_frame_equal(cumsum, expected) + + # works + df = DataFrame({'A': np.arange(20)}, index=np.arange(20)) + result = df.cumsum() # noqa + + # fix issue + cumsum_xs = datetime_frame.cumsum(axis=1) + assert np.shape(cumsum_xs) == np.shape(datetime_frame) + + def test_cumprod(self, datetime_frame): + datetime_frame.loc[5:10, 0] = np.nan + datetime_frame.loc[10:15, 1] = np.nan + datetime_frame.loc[15:, 2] = np.nan + + # axis = 0 + cumprod = datetime_frame.cumprod() + expected = datetime_frame.apply(Series.cumprod) + tm.assert_frame_equal(cumprod, expected) + + # axis = 1 + cumprod = datetime_frame.cumprod(axis=1) + expected = datetime_frame.apply(Series.cumprod, axis=1) + tm.assert_frame_equal(cumprod, expected) + + # fix issue + cumprod_xs = datetime_frame.cumprod(axis=1) + assert np.shape(cumprod_xs) == np.shape(datetime_frame) + + # ints + df = datetime_frame.fillna(0).astype(int) + df.cumprod(0) + df.cumprod(1) + + # ints32 + df = datetime_frame.fillna(0).astype(np.int32) + df.cumprod(0) + df.cumprod(1) + + def test_cummin(self, datetime_frame): + datetime_frame.loc[5:10, 0] = np.nan + datetime_frame.loc[10:15, 1] = np.nan + datetime_frame.loc[15:, 2] = np.nan + + # axis = 0 + cummin = datetime_frame.cummin() + expected = datetime_frame.apply(Series.cummin) + tm.assert_frame_equal(cummin, expected) + + # axis = 1 + cummin = datetime_frame.cummin(axis=1) + expected = datetime_frame.apply(Series.cummin, axis=1) + tm.assert_frame_equal(cummin, expected) + + # it works + df = DataFrame({'A': np.arange(20)}, index=np.arange(20)) + result = df.cummin() # noqa + + # fix issue + cummin_xs = datetime_frame.cummin(axis=1) + assert np.shape(cummin_xs) == np.shape(datetime_frame) + + def test_cummax(self, datetime_frame): + datetime_frame.loc[5:10, 0] = np.nan + datetime_frame.loc[10:15, 1] = np.nan + datetime_frame.loc[15:, 2] = np.nan + + # axis = 0 + cummax = datetime_frame.cummax() + expected = datetime_frame.apply(Series.cummax) + tm.assert_frame_equal(cummax, expected) + + # axis = 1 + cummax = datetime_frame.cummax(axis=1) + expected = datetime_frame.apply(Series.cummax, axis=1) + tm.assert_frame_equal(cummax, expected) + + # it works + df = DataFrame({'A': np.arange(20)}, index=np.arange(20)) + result = df.cummax() # noqa + + # fix issue + cummax_xs = datetime_frame.cummax(axis=1) + assert np.shape(cummax_xs) == np.shape(datetime_frame) + + # --------------------------------------------------------------------- # Miscellanea + def test_count(self): + # corner case + frame = DataFrame() + ct1 = frame.count(1) + assert isinstance(ct1, Series) + + ct2 = frame.count(0) + assert isinstance(ct2, Series) + + # GH#423 + df = DataFrame(index=lrange(10)) + result = df.count(1) + expected = Series(0, index=df.index) + tm.assert_series_equal(result, expected) + + df = DataFrame(columns=lrange(10)) + result = df.count(0) + expected = Series(0, index=df.columns) + tm.assert_series_equal(result, expected) + + df = DataFrame() + result = df.count() + expected = Series(0, index=[]) + tm.assert_series_equal(result, expected) + def test_count_objects(self, float_string_frame): dm = DataFrame(float_string_frame._series) df = DataFrame(float_string_frame._series) @@ -1325,17 +1348,23 @@ def test_count_objects(self, float_string_frame): tm.assert_series_equal(dm.count(), df.count()) tm.assert_series_equal(dm.count(1), df.count(1)) - def test_cumsum_corner(self): - dm = DataFrame(np.arange(20).reshape(4, 5), - index=lrange(4), columns=lrange(5)) - # ?(wesm) - result = dm.cumsum() # noqa + def test_pct_change(self): + # GH#11150 + pnl = DataFrame([np.arange(0, 40, 10), + np.arange(0, 40, 10), + np.arange(0, 40, 10)]).astype(np.float64) + pnl.iat[1, 0] = np.nan + pnl.iat[1, 1] = np.nan + pnl.iat[2, 3] = 60 - def test_sum_bools(self): - df = DataFrame(index=lrange(1), columns=lrange(10)) - bools = isna(df) - assert bools.sum(axis=1)[0] == 10 + for axis in range(2): + expected = pnl.ffill(axis=axis) / pnl.ffill(axis=axis).shift( + axis=axis) - 1 + result = pnl.pct_change(axis=axis, fill_method='pad') + tm.assert_frame_equal(result, expected) + + # ---------------------------------------------------------------------- # Index of max / min def test_idxmin(self, float_frame, int_frame): @@ -1423,6 +1452,26 @@ def test_any_datetime(self): expected = Series([True, True, True, False]) tm.assert_series_equal(result, expected) + def test_any_all_bool_only(self): + + # GH 25101 + df = DataFrame({"col1": [1, 2, 3], + "col2": [4, 5, 6], + "col3": [None, None, None]}) + + result = df.all(bool_only=True) + expected = Series(dtype=np.bool) + tm.assert_series_equal(result, expected) + + df = DataFrame({"col1": [1, 2, 3], + "col2": [4, 5, 6], + "col3": [None, None, None], + "col4": [False, False, True]}) + + result = df.all(bool_only=True) + expected = Series({"col4": False}) + tm.assert_series_equal(result, expected) + @pytest.mark.parametrize('func, data, expected', [ (np.any, {}, False), (np.all, {}, True), @@ -1661,7 +1710,9 @@ def test_isin_empty_datetimelike(self): result = df1_td.isin(df3) tm.assert_frame_equal(result, expected) + # --------------------------------------------------------------------- # Rounding + def test_round(self): # GH 2665 @@ -1849,22 +1900,9 @@ def test_round_nonunique_categorical(self): tm.assert_frame_equal(result, expected) - def test_pct_change(self): - # GH 11150 - pnl = DataFrame([np.arange(0, 40, 10), np.arange(0, 40, 10), np.arange( - 0, 40, 10)]).astype(np.float64) - pnl.iat[1, 0] = np.nan - pnl.iat[1, 1] = np.nan - pnl.iat[2, 3] = 60 - - for axis in range(2): - expected = pnl.ffill(axis=axis) / pnl.ffill(axis=axis).shift( - axis=axis) - 1 - result = pnl.pct_change(axis=axis, fill_method='pad') - - tm.assert_frame_equal(result, expected) - + # --------------------------------------------------------------------- # Clip + def test_clip(self, float_frame): median = float_frame.median().median() original = float_frame.copy() @@ -2037,7 +2075,9 @@ def test_clip_with_na_args(self, float_frame): 'col_2': [np.nan, np.nan, np.nan]}) tm.assert_frame_equal(result, expected) + # --------------------------------------------------------------------- # Matrix-like + def test_dot(self): a = DataFrame(np.random.randn(3, 4), index=['a', 'b', 'c'], columns=['p', 'q', 'r', 's']) diff --git a/pandas/tests/frame/test_apply.py b/pandas/tests/frame/test_apply.py index ade527a16c902..a4cd1aa3bacb6 100644 --- a/pandas/tests/frame/test_apply.py +++ b/pandas/tests/frame/test_apply.py @@ -318,6 +318,13 @@ def test_apply_reduce_Series(self, float_frame): result = float_frame.apply(np.mean, axis=1) assert_series_equal(result, expected) + def test_apply_reduce_rows_to_dict(self): + # GH 25196 + data = pd.DataFrame([[1, 2], [3, 4]]) + expected = pd.Series([{0: 1, 1: 3}, {0: 2, 1: 4}]) + result = data.apply(dict) + assert_series_equal(result, expected) + def test_apply_differently_indexed(self): df = DataFrame(np.random.randn(20, 10)) diff --git a/pandas/tests/frame/test_constructors.py b/pandas/tests/frame/test_constructors.py index 4f6a2e2bfbebf..a8a78b26e317c 100644 --- a/pandas/tests/frame/test_constructors.py +++ b/pandas/tests/frame/test_constructors.py @@ -2,6 +2,7 @@ from __future__ import print_function +from collections import OrderedDict from datetime import datetime, timedelta import functools import itertools @@ -11,8 +12,7 @@ import pytest from pandas.compat import ( - PY3, PY36, OrderedDict, is_platform_little_endian, lmap, long, lrange, - lzip, range, zip) + PY3, PY36, is_platform_little_endian, lmap, long, lrange, lzip, range, zip) from pandas.core.dtypes.cast import construct_1d_object_array_from_listlike from pandas.core.dtypes.common import is_integer_dtype @@ -787,6 +787,17 @@ def test_constructor_maskedarray_hardened(self): dtype=float) tm.assert_frame_equal(result, expected) + def test_constructor_maskedrecarray_dtype(self): + # Ensure constructor honors dtype + data = np.ma.array( + np.ma.zeros(5, dtype=[('date', ' is a non-fixed frequency'), + ('M', ' is a non-fixed frequency'), + ('foobar', 'Invalid frequency: foobar')]) + def test_round_invalid(self, freq, error_msg): + dti = date_range('20130101 09:10:11', periods=5) + dti = dti.tz_localize('UTC').tz_convert('US/Eastern') + with pytest.raises(ValueError, match=error_msg): + dti.round(freq) def test_round(self, tz_naive_fixture): tz = tz_naive_fixture diff --git a/pandas/tests/indexes/datetimes/test_setops.py b/pandas/tests/indexes/datetimes/test_setops.py index bd37cc815d0f7..19009e45ee83a 100644 --- a/pandas/tests/indexes/datetimes/test_setops.py +++ b/pandas/tests/indexes/datetimes/test_setops.py @@ -138,7 +138,7 @@ def test_intersection2(self): @pytest.mark.parametrize("tz", [None, 'Asia/Tokyo', 'US/Eastern', 'dateutil/US/Pacific']) - @pytest.mark.parametrize("sort", [True, False]) + @pytest.mark.parametrize("sort", [None, False]) def test_intersection(self, tz, sort): # GH 4690 (with tz) base = date_range('6/1/2000', '6/30/2000', freq='D', name='idx') @@ -187,7 +187,7 @@ def test_intersection(self, tz, sort): for (rng, expected) in [(rng2, expected2), (rng3, expected3), (rng4, expected4)]: result = base.intersection(rng, sort=sort) - if sort: + if sort is None: expected = expected.sort_values() tm.assert_index_equal(result, expected) assert result.name == expected.name @@ -212,7 +212,7 @@ def test_intersection_bug_1708(self): assert len(result) == 0 @pytest.mark.parametrize("tz", tz) - @pytest.mark.parametrize("sort", [True, False]) + @pytest.mark.parametrize("sort", [None, False]) def test_difference(self, tz, sort): rng_dates = ['1/2/2000', '1/3/2000', '1/1/2000', '1/4/2000', '1/5/2000'] @@ -233,11 +233,11 @@ def test_difference(self, tz, sort): (rng2, other2, expected2), (rng3, other3, expected3)]: result_diff = rng.difference(other, sort) - if sort: + if sort is None: expected = expected.sort_values() tm.assert_index_equal(result_diff, expected) - @pytest.mark.parametrize("sort", [True, False]) + @pytest.mark.parametrize("sort", [None, False]) def test_difference_freq(self, sort): # GH14323: difference of DatetimeIndex should not preserve frequency @@ -254,7 +254,7 @@ def test_difference_freq(self, sort): tm.assert_index_equal(idx_diff, expected) tm.assert_attr_equal('freq', idx_diff, expected) - @pytest.mark.parametrize("sort", [True, False]) + @pytest.mark.parametrize("sort", [None, False]) def test_datetimeindex_diff(self, sort): dti1 = date_range(freq='Q-JAN', start=datetime(1997, 12, 31), periods=100) diff --git a/pandas/tests/indexes/datetimes/test_timezones.py b/pandas/tests/indexes/datetimes/test_timezones.py index 8bcc9296cb010..12c1b15733895 100644 --- a/pandas/tests/indexes/datetimes/test_timezones.py +++ b/pandas/tests/indexes/datetimes/test_timezones.py @@ -434,24 +434,19 @@ def test_dti_tz_localize_utc_conversion(self, tz): with pytest.raises(pytz.NonExistentTimeError): rng.tz_localize(tz) - @pytest.mark.parametrize('idx', [ - date_range(start='2014-01-01', end='2014-12-31', freq='M'), - date_range(start='2014-01-01', end='2014-12-31', freq='D'), - date_range(start='2014-01-01', end='2014-03-01', freq='H'), - date_range(start='2014-08-01', end='2014-10-31', freq='T') - ]) - def test_dti_tz_localize_roundtrip(self, tz_aware_fixture, idx): + def test_dti_tz_localize_roundtrip(self, tz_aware_fixture): + # note: this tz tests that a tz-naive index can be localized + # and de-localized successfully, when there are no DST transitions + # in the range. + idx = date_range(start='2014-06-01', end='2014-08-30', freq='15T') tz = tz_aware_fixture localized = idx.tz_localize(tz) - expected = date_range(start=idx[0], end=idx[-1], freq=idx.freq, - tz=tz) - tm.assert_index_equal(localized, expected) + # cant localize a tz-aware object with pytest.raises(TypeError): localized.tz_localize(tz) - reset = localized.tz_localize(None) - tm.assert_index_equal(reset, idx) assert reset.tzinfo is None + tm.assert_index_equal(reset, idx) def test_dti_tz_localize_naive(self): rng = date_range('1/1/2011', periods=100, freq='H') diff --git a/pandas/tests/indexes/datetimes/test_tools.py b/pandas/tests/indexes/datetimes/test_tools.py index bec2fa66c43cd..b94935d2521eb 100644 --- a/pandas/tests/indexes/datetimes/test_tools.py +++ b/pandas/tests/indexes/datetimes/test_tools.py @@ -346,12 +346,16 @@ def test_to_datetime_dt64s(self, cache): for dt in in_bound_dts: assert pd.to_datetime(dt, cache=cache) == Timestamp(dt) - oob_dts = [np.datetime64('1000-01-01'), np.datetime64('5000-01-02'), ] - - for dt in oob_dts: - pytest.raises(ValueError, pd.to_datetime, dt, errors='raise') - pytest.raises(ValueError, Timestamp, dt) - assert pd.to_datetime(dt, errors='coerce', cache=cache) is NaT + @pytest.mark.parametrize('dt', [np.datetime64('1000-01-01'), + np.datetime64('5000-01-02')]) + @pytest.mark.parametrize('cache', [True, False]) + def test_to_datetime_dt64s_out_of_bounds(self, cache, dt): + msg = "Out of bounds nanosecond timestamp: {}".format(dt) + with pytest.raises(OutOfBoundsDatetime, match=msg): + pd.to_datetime(dt, errors='raise') + with pytest.raises(OutOfBoundsDatetime, match=msg): + Timestamp(dt) + assert pd.to_datetime(dt, errors='coerce', cache=cache) is NaT @pytest.mark.parametrize('cache', [True, False]) def test_to_datetime_array_of_dt64s(self, cache): @@ -367,8 +371,9 @@ def test_to_datetime_array_of_dt64s(self, cache): # A list of datetimes where the last one is out of bounds dts_with_oob = dts + [np.datetime64('9999-01-01')] - pytest.raises(ValueError, pd.to_datetime, dts_with_oob, - errors='raise') + msg = "Out of bounds nanosecond timestamp: 9999-01-01 00:00:00" + with pytest.raises(OutOfBoundsDatetime, match=msg): + pd.to_datetime(dts_with_oob, errors='raise') tm.assert_numpy_array_equal( pd.to_datetime(dts_with_oob, box=False, errors='coerce', @@ -410,7 +415,10 @@ def test_to_datetime_tz(self, cache): # mixed tzs will raise arr = [pd.Timestamp('2013-01-01 13:00:00', tz='US/Pacific'), pd.Timestamp('2013-01-02 14:00:00', tz='US/Eastern')] - pytest.raises(ValueError, lambda: pd.to_datetime(arr, cache=cache)) + msg = ("Tz-aware datetime.datetime cannot be converted to datetime64" + " unless utc=True") + with pytest.raises(ValueError, match=msg): + pd.to_datetime(arr, cache=cache) @pytest.mark.parametrize('cache', [True, False]) def test_to_datetime_tz_pytz(self, cache): @@ -706,6 +714,29 @@ def test_iso_8601_strings_with_different_offsets(self): NaT], tz='UTC') tm.assert_index_equal(result, expected) + def test_iss8601_strings_mixed_offsets_with_naive(self): + # GH 24992 + result = pd.to_datetime([ + '2018-11-28T00:00:00', + '2018-11-28T00:00:00+12:00', + '2018-11-28T00:00:00', + '2018-11-28T00:00:00+06:00', + '2018-11-28T00:00:00' + ], utc=True) + expected = pd.to_datetime([ + '2018-11-28T00:00:00', + '2018-11-27T12:00:00', + '2018-11-28T00:00:00', + '2018-11-27T18:00:00', + '2018-11-28T00:00:00' + ], utc=True) + tm.assert_index_equal(result, expected) + + items = ['2018-11-28T00:00:00+12:00', '2018-11-28T00:00:00'] + result = pd.to_datetime(items, utc=True) + expected = pd.to_datetime(list(reversed(items)), utc=True)[::-1] + tm.assert_index_equal(result, expected) + def test_non_iso_strings_with_tz_offset(self): result = to_datetime(['March 1, 2018 12:00:00+0400'] * 2) expected = DatetimeIndex([datetime(2018, 3, 1, 12, @@ -1088,9 +1119,9 @@ def test_to_datetime_on_datetime64_series(self, cache): def test_to_datetime_with_space_in_series(self, cache): # GH 6428 s = Series(['10/18/2006', '10/18/2008', ' ']) - pytest.raises(ValueError, lambda: to_datetime(s, - errors='raise', - cache=cache)) + msg = r"(\(u?')?String does not contain a date(:', ' '\))?" + with pytest.raises(ValueError, match=msg): + to_datetime(s, errors='raise', cache=cache) result_coerce = to_datetime(s, errors='coerce', cache=cache) expected_coerce = Series([datetime(2006, 10, 18), datetime(2008, 10, 18), @@ -1111,13 +1142,12 @@ def test_to_datetime_with_apply(self, cache): assert_series_equal(result, expected) td = pd.Series(['May 04', 'Jun 02', ''], index=[1, 2, 3]) - pytest.raises(ValueError, - lambda: pd.to_datetime(td, format='%b %y', - errors='raise', - cache=cache)) - pytest.raises(ValueError, - lambda: td.apply(pd.to_datetime, format='%b %y', - errors='raise', cache=cache)) + msg = r"time data '' does not match format '%b %y' \(match\)" + with pytest.raises(ValueError, match=msg): + pd.to_datetime(td, format='%b %y', errors='raise', cache=cache) + with pytest.raises(ValueError, match=msg): + td.apply(pd.to_datetime, format='%b %y', + errors='raise', cache=cache) expected = pd.to_datetime(td, format='%b %y', errors='coerce', cache=cache) @@ -1168,8 +1198,9 @@ def test_to_datetime_unprocessable_input(self, cache, box, klass): result = to_datetime([1, '1'], errors='ignore', cache=cache, box=box) expected = klass(np.array([1, '1'], dtype='O')) tm.assert_equal(result, expected) - pytest.raises(TypeError, to_datetime, [1, '1'], errors='raise', - cache=cache, box=box) + msg = "invalid string coercion to datetime" + with pytest.raises(TypeError, match=msg): + to_datetime([1, '1'], errors='raise', cache=cache, box=box) def test_to_datetime_other_datetime64_units(self): # 5/25/2012 @@ -1225,17 +1256,18 @@ def test_string_na_nat_conversion(self, cache): malformed = np.array(['1/100/2000', np.nan], dtype=object) # GH 10636, default is now 'raise' - pytest.raises(ValueError, - lambda: to_datetime(malformed, errors='raise', - cache=cache)) + msg = (r"\(u?'Unknown string format:', '1/100/2000'\)|" + "day is out of range for month") + with pytest.raises(ValueError, match=msg): + to_datetime(malformed, errors='raise', cache=cache) result = to_datetime(malformed, errors='ignore', cache=cache) # GH 21864 expected = Index(malformed) tm.assert_index_equal(result, expected) - pytest.raises(ValueError, to_datetime, malformed, errors='raise', - cache=cache) + with pytest.raises(ValueError, match=msg): + to_datetime(malformed, errors='raise', cache=cache) idx = ['a', 'b', 'c', 'd', 'e'] series = Series(['1/1/2000', np.nan, '1/3/2000', np.nan, @@ -1414,14 +1446,24 @@ def test_day_not_in_month_coerce(self, cache): @pytest.mark.parametrize('cache', [True, False]) def test_day_not_in_month_raise(self, cache): - pytest.raises(ValueError, to_datetime, '2015-02-29', - errors='raise', cache=cache) - pytest.raises(ValueError, to_datetime, '2015-02-29', - errors='raise', format="%Y-%m-%d", cache=cache) - pytest.raises(ValueError, to_datetime, '2015-02-32', - errors='raise', format="%Y-%m-%d", cache=cache) - pytest.raises(ValueError, to_datetime, '2015-04-31', - errors='raise', format="%Y-%m-%d", cache=cache) + msg = "day is out of range for month" + with pytest.raises(ValueError, match=msg): + to_datetime('2015-02-29', errors='raise', cache=cache) + + msg = "time data 2015-02-29 doesn't match format specified" + with pytest.raises(ValueError, match=msg): + to_datetime('2015-02-29', errors='raise', format="%Y-%m-%d", + cache=cache) + + msg = "time data 2015-02-32 doesn't match format specified" + with pytest.raises(ValueError, match=msg): + to_datetime('2015-02-32', errors='raise', format="%Y-%m-%d", + cache=cache) + + msg = "time data 2015-04-31 doesn't match format specified" + with pytest.raises(ValueError, match=msg): + to_datetime('2015-04-31', errors='raise', format="%Y-%m-%d", + cache=cache) @pytest.mark.parametrize('cache', [True, False]) def test_day_not_in_month_ignore(self, cache): @@ -1656,7 +1698,9 @@ def test_parsers_time(self): assert tools.to_time(time_string) == expected new_string = "14.15" - pytest.raises(ValueError, tools.to_time, new_string) + msg = r"Cannot convert arg \['14\.15'\] to a time" + with pytest.raises(ValueError, match=msg): + tools.to_time(new_string) assert tools.to_time(new_string, format="%H.%M") == expected arg = ["14:15", "20:20"] diff --git a/pandas/tests/indexes/interval/test_interval.py b/pandas/tests/indexes/interval/test_interval.py index db69258c1d3d2..e4f25ff143273 100644 --- a/pandas/tests/indexes/interval/test_interval.py +++ b/pandas/tests/indexes/interval/test_interval.py @@ -242,12 +242,10 @@ def test_take(self, closed): [0, 0, 1], [1, 1, 2], closed=closed) tm.assert_index_equal(result, expected) - def test_unique(self, closed): - # unique non-overlapping - idx = IntervalIndex.from_tuples( - [(0, 1), (2, 3), (4, 5)], closed=closed) - assert idx.is_unique is True - + def test_is_unique_interval(self, closed): + """ + Interval specific tests for is_unique in addition to base class tests + """ # unique overlapping - distinct endpoints idx = IntervalIndex.from_tuples([(0, 1), (0.5, 1.5)], closed=closed) assert idx.is_unique is True @@ -261,15 +259,6 @@ def test_unique(self, closed): idx = IntervalIndex.from_tuples([(-1, 1), (-2, 2)], closed=closed) assert idx.is_unique is True - # duplicate - idx = IntervalIndex.from_tuples( - [(0, 1), (0, 1), (2, 3)], closed=closed) - assert idx.is_unique is False - - # empty - idx = IntervalIndex([], closed=closed) - assert idx.is_unique is True - def test_monotonic(self, closed): # increasing non-overlapping idx = IntervalIndex.from_tuples( @@ -783,19 +772,19 @@ def test_non_contiguous(self, closed): assert 1.5 not in index - @pytest.mark.parametrize("sort", [True, False]) + @pytest.mark.parametrize("sort", [None, False]) def test_union(self, closed, sort): index = self.create_index(closed=closed) other = IntervalIndex.from_breaks(range(5, 13), closed=closed) expected = IntervalIndex.from_breaks(range(13), closed=closed) result = index[::-1].union(other, sort=sort) - if sort: + if sort is None: tm.assert_index_equal(result, expected) assert tm.equalContents(result, expected) result = other[::-1].union(index, sort=sort) - if sort: + if sort is None: tm.assert_index_equal(result, expected) assert tm.equalContents(result, expected) @@ -812,19 +801,19 @@ def test_union(self, closed, sort): result = index.union(other, sort=sort) tm.assert_index_equal(result, index) - @pytest.mark.parametrize("sort", [True, False]) + @pytest.mark.parametrize("sort", [None, False]) def test_intersection(self, closed, sort): index = self.create_index(closed=closed) other = IntervalIndex.from_breaks(range(5, 13), closed=closed) expected = IntervalIndex.from_breaks(range(5, 11), closed=closed) result = index[::-1].intersection(other, sort=sort) - if sort: + if sort is None: tm.assert_index_equal(result, expected) assert tm.equalContents(result, expected) result = other[::-1].intersection(index, sort=sort) - if sort: + if sort is None: tm.assert_index_equal(result, expected) assert tm.equalContents(result, expected) @@ -842,14 +831,14 @@ def test_intersection(self, closed, sort): result = index.intersection(other, sort=sort) tm.assert_index_equal(result, expected) - @pytest.mark.parametrize("sort", [True, False]) + @pytest.mark.parametrize("sort", [None, False]) def test_difference(self, closed, sort): index = IntervalIndex.from_arrays([1, 0, 3, 2], [1, 2, 3, 4], closed=closed) result = index.difference(index[:1], sort=sort) expected = index[1:] - if sort: + if sort is None: expected = expected.sort_values() tm.assert_index_equal(result, expected) @@ -864,19 +853,19 @@ def test_difference(self, closed, sort): result = index.difference(other, sort=sort) tm.assert_index_equal(result, expected) - @pytest.mark.parametrize("sort", [True, False]) + @pytest.mark.parametrize("sort", [None, False]) def test_symmetric_difference(self, closed, sort): index = self.create_index(closed=closed) result = index[1:].symmetric_difference(index[:-1], sort=sort) expected = IntervalIndex([index[0], index[-1]]) - if sort: + if sort is None: tm.assert_index_equal(result, expected) assert tm.equalContents(result, expected) # GH 19101: empty result, same dtype result = index.symmetric_difference(index, sort=sort) expected = IntervalIndex(np.array([], dtype='int64'), closed=closed) - if sort: + if sort is None: tm.assert_index_equal(result, expected) assert tm.equalContents(result, expected) @@ -888,7 +877,7 @@ def test_symmetric_difference(self, closed, sort): @pytest.mark.parametrize('op_name', [ 'union', 'intersection', 'difference', 'symmetric_difference']) - @pytest.mark.parametrize("sort", [True, False]) + @pytest.mark.parametrize("sort", [None, False]) def test_set_operation_errors(self, closed, op_name, sort): index = self.create_index(closed=closed) set_op = getattr(index, op_name) diff --git a/pandas/tests/indexes/multi/test_analytics.py b/pandas/tests/indexes/multi/test_analytics.py index dca6180f39664..27a5ba9e5434a 100644 --- a/pandas/tests/indexes/multi/test_analytics.py +++ b/pandas/tests/indexes/multi/test_analytics.py @@ -3,7 +3,8 @@ import numpy as np import pytest -from pandas.compat import lrange +from pandas.compat import PY2, lrange +from pandas.compat.numpy import _np_version_under1p16 import pandas as pd from pandas import Index, MultiIndex, date_range, period_range @@ -13,8 +14,11 @@ def test_shift(idx): # GH8083 test the base class for shift - pytest.raises(NotImplementedError, idx.shift, 1) - pytest.raises(NotImplementedError, idx.shift, 1, 2) + msg = "Not supported for type MultiIndex" + with pytest.raises(NotImplementedError, match=msg): + idx.shift(1) + with pytest.raises(NotImplementedError, match=msg): + idx.shift(1, 2) def test_groupby(idx): @@ -50,25 +54,26 @@ def test_truncate(): result = index.truncate(before=1, after=2) assert len(result.levels[0]) == 2 - # after < before - pytest.raises(ValueError, index.truncate, 3, 1) + msg = "after < before" + with pytest.raises(ValueError, match=msg): + index.truncate(3, 1) def test_where(): i = MultiIndex.from_tuples([('A', 1), ('A', 2)]) - with pytest.raises(NotImplementedError): + msg = r"\.where is not supported for MultiIndex operations" + with pytest.raises(NotImplementedError, match=msg): i.where(True) -def test_where_array_like(): +@pytest.mark.parametrize('klass', [list, tuple, np.array, pd.Series]) +def test_where_array_like(klass): i = MultiIndex.from_tuples([('A', 1), ('A', 2)]) - klasses = [list, tuple, np.array, pd.Series] cond = [False, True] - - for klass in klasses: - with pytest.raises(NotImplementedError): - i.where(klass(cond)) + msg = r"\.where is not supported for MultiIndex operations" + with pytest.raises(NotImplementedError, match=msg): + i.where(klass(cond)) # TODO: reshape @@ -141,7 +146,8 @@ def test_take(idx): # if not isinstance(idx, # (DatetimeIndex, PeriodIndex, TimedeltaIndex)): # GH 10791 - with pytest.raises(AttributeError): + msg = "'MultiIndex' object has no attribute 'freq'" + with pytest.raises(AttributeError, match=msg): idx.freq @@ -199,7 +205,8 @@ def test_take_fill_value(): with pytest.raises(ValueError, match=msg): idx.take(np.array([1, 0, -5]), fill_value=True) - with pytest.raises(IndexError): + msg = "index -5 is out of bounds for size 4" + with pytest.raises(IndexError, match=msg): idx.take(np.array([1, -5])) @@ -215,13 +222,15 @@ def test_sub(idx): first = idx # - now raises (previously was set op difference) - with pytest.raises(TypeError): + msg = "cannot perform __sub__ with this index type: MultiIndex" + with pytest.raises(TypeError, match=msg): first - idx[-3:] - with pytest.raises(TypeError): + with pytest.raises(TypeError, match=msg): idx[-3:] - first - with pytest.raises(TypeError): + with pytest.raises(TypeError, match=msg): idx[-3:] - first.tolist() - with pytest.raises(TypeError): + msg = "cannot perform __rsub__ with this index type: MultiIndex" + with pytest.raises(TypeError, match=msg): first.tolist() - idx[-3:] @@ -266,56 +275,35 @@ def test_map_dictlike(idx, mapper): tm.assert_index_equal(result, expected) +@pytest.mark.skipif(PY2, reason="pytest.raises match regex fails") @pytest.mark.parametrize('func', [ np.exp, np.exp2, np.expm1, np.log, np.log2, np.log10, np.log1p, np.sqrt, np.sin, np.cos, np.tan, np.arcsin, np.arccos, np.arctan, np.sinh, np.cosh, np.tanh, np.arcsinh, np.arccosh, np.arctanh, np.deg2rad, np.rad2deg -]) -def test_numpy_ufuncs(func): +], ids=lambda func: func.__name__) +def test_numpy_ufuncs(idx, func): # test ufuncs of numpy. see: # http://docs.scipy.org/doc/numpy/reference/ufuncs.html - # copy and paste from idx fixture as pytest doesn't support - # parameters and fixtures at the same time. - major_axis = Index(['foo', 'bar', 'baz', 'qux']) - minor_axis = Index(['one', 'two']) - major_codes = np.array([0, 0, 1, 2, 3, 3]) - minor_codes = np.array([0, 1, 0, 1, 0, 1]) - index_names = ['first', 'second'] - - idx = MultiIndex( - levels=[major_axis, minor_axis], - codes=[major_codes, minor_codes], - names=index_names, - verify_integrity=False - ) - - with pytest.raises(Exception): - with np.errstate(all='ignore'): - func(idx) + if _np_version_under1p16: + expected_exception = AttributeError + msg = "'tuple' object has no attribute '{}'".format(func.__name__) + else: + expected_exception = TypeError + msg = ("loop of ufunc does not support argument 0 of type tuple which" + " has no callable {} method").format(func.__name__) + with pytest.raises(expected_exception, match=msg): + func(idx) @pytest.mark.parametrize('func', [ np.isfinite, np.isinf, np.isnan, np.signbit -]) -def test_numpy_type_funcs(func): - # for func in [np.isfinite, np.isinf, np.isnan, np.signbit]: - # copy and paste from idx fixture as pytest doesn't support - # parameters and fixtures at the same time. - major_axis = Index(['foo', 'bar', 'baz', 'qux']) - minor_axis = Index(['one', 'two']) - major_codes = np.array([0, 0, 1, 2, 3, 3]) - minor_codes = np.array([0, 1, 0, 1, 0, 1]) - index_names = ['first', 'second'] - - idx = MultiIndex( - levels=[major_axis, minor_axis], - codes=[major_codes, minor_codes], - names=index_names, - verify_integrity=False - ) - - with pytest.raises(Exception): +], ids=lambda func: func.__name__) +def test_numpy_type_funcs(idx, func): + msg = ("ufunc '{}' not supported for the input types, and the inputs" + " could not be safely coerced to any supported types according to" + " the casting rule ''safe''").format(func.__name__) + with pytest.raises(TypeError, match=msg): func(idx) diff --git a/pandas/tests/indexes/multi/test_compat.py b/pandas/tests/indexes/multi/test_compat.py index f405fc659c709..89685b9feec27 100644 --- a/pandas/tests/indexes/multi/test_compat.py +++ b/pandas/tests/indexes/multi/test_compat.py @@ -124,8 +124,6 @@ def test_compat(indices): def test_pickle_compat_construction(holder): # this is testing for pickle compat - if holder is None: - return - # need an object to create with - pytest.raises(TypeError, holder) + with pytest.raises(TypeError, match="Must pass both levels and codes"): + holder() diff --git a/pandas/tests/indexes/multi/test_constructor.py b/pandas/tests/indexes/multi/test_constructor.py index e6678baf8a996..055d54c613260 100644 --- a/pandas/tests/indexes/multi/test_constructor.py +++ b/pandas/tests/indexes/multi/test_constructor.py @@ -1,7 +1,6 @@ # -*- coding: utf-8 -*- from collections import OrderedDict -import re import numpy as np import pytest @@ -30,10 +29,10 @@ def test_constructor_no_levels(): with pytest.raises(ValueError, match=msg): MultiIndex(levels=[], codes=[]) - both_re = re.compile('Must pass both levels and codes') - with pytest.raises(TypeError, match=both_re): + msg = "Must pass both levels and codes" + with pytest.raises(TypeError, match=msg): MultiIndex(levels=[]) - with pytest.raises(TypeError, match=both_re): + with pytest.raises(TypeError, match=msg): MultiIndex(codes=[]) @@ -42,8 +41,8 @@ def test_constructor_nonhashable_names(): levels = [[1, 2], [u'one', u'two']] codes = [[0, 0, 1, 1], [0, 1, 0, 1]] names = (['foo'], ['bar']) - message = "MultiIndex.name must be a hashable type" - with pytest.raises(TypeError, match=message): + msg = r"MultiIndex\.name must be a hashable type" + with pytest.raises(TypeError, match=msg): MultiIndex(levels=levels, codes=codes, names=names) # With .rename() @@ -51,11 +50,11 @@ def test_constructor_nonhashable_names(): codes=[[0, 0, 1, 1], [0, 1, 0, 1]], names=('foo', 'bar')) renamed = [['foor'], ['barr']] - with pytest.raises(TypeError, match=message): + with pytest.raises(TypeError, match=msg): mi.rename(names=renamed) # With .set_names() - with pytest.raises(TypeError, match=message): + with pytest.raises(TypeError, match=msg): mi.set_names(names=renamed) @@ -67,8 +66,9 @@ def test_constructor_mismatched_codes_levels(idx): with pytest.raises(ValueError, match=msg): MultiIndex(levels=levels, codes=codes) - length_error = re.compile('>= length of level') - label_error = re.compile(r'Unequal code lengths: \[4, 2\]') + length_error = (r"On level 0, code max \(3\) >= length of level \(1\)\." + " NOTE: this index is in an inconsistent state") + label_error = r"Unequal code lengths: \[4, 2\]" # important to check that it's looking at the right thing. with pytest.raises(ValueError, match=length_error): @@ -253,21 +253,14 @@ def test_from_arrays_empty(): tm.assert_index_equal(result, expected) -@pytest.mark.parametrize('invalid_array', [ - (1), - ([1]), - ([1, 2]), - ([[1], 2]), - ('a'), - (['a']), - (['a', 'b']), - ([['a'], 'b']), -]) -def test_from_arrays_invalid_input(invalid_array): - invalid_inputs = [1, [1], [1, 2], [[1], 2], - 'a', ['a'], ['a', 'b'], [['a'], 'b']] - for i in invalid_inputs: - pytest.raises(TypeError, MultiIndex.from_arrays, arrays=i) +@pytest.mark.parametrize('invalid_sequence_of_arrays', [ + 1, [1], [1, 2], [[1], 2], 'a', ['a'], ['a', 'b'], [['a'], 'b']]) +def test_from_arrays_invalid_input(invalid_sequence_of_arrays): + msg = (r"Input must be a list / sequence of array-likes|" + r"Input must be list-like|" + r"object of type 'int' has no len\(\)") + with pytest.raises(TypeError, match=msg): + MultiIndex.from_arrays(arrays=invalid_sequence_of_arrays) @pytest.mark.parametrize('idx1, idx2', [ @@ -332,9 +325,10 @@ def test_tuples_with_name_string(): # GH 15110 and GH 14848 li = [(0, 0, 1), (0, 1, 0), (1, 0, 0)] - with pytest.raises(ValueError): + msg = "Names should be list-like for a MultiIndex" + with pytest.raises(ValueError, match=msg): pd.Index(li, name='abc') - with pytest.raises(ValueError): + with pytest.raises(ValueError, match=msg): pd.Index(li, name='a') @@ -398,7 +392,10 @@ def test_from_product_empty_three_levels(N): [['a'], 'b'], ]) def test_from_product_invalid_input(invalid_input): - pytest.raises(TypeError, MultiIndex.from_product, iterables=invalid_input) + msg = (r"Input must be a list / sequence of iterables|" + "Input must be list-like") + with pytest.raises(TypeError, match=msg): + MultiIndex.from_product(iterables=invalid_input) def test_from_product_datetimeindex(): @@ -563,15 +560,15 @@ def test_from_frame_valid_names(names_in, names_out): assert mi.names == names_out -@pytest.mark.parametrize('names_in,names_out', [ - ('bad_input', ValueError("Names should be list-like for a MultiIndex")), - (['a', 'b', 'c'], ValueError("Length of names must match number of " - "levels in MultiIndex.")) +@pytest.mark.parametrize('names,expected_error_msg', [ + ('bad_input', "Names should be list-like for a MultiIndex"), + (['a', 'b', 'c'], + "Length of names must match number of levels in MultiIndex") ]) -def test_from_frame_invalid_names(names_in, names_out): +def test_from_frame_invalid_names(names, expected_error_msg): # GH 22420 df = pd.DataFrame([['a', 'a'], ['a', 'b'], ['b', 'a'], ['b', 'b']], columns=pd.MultiIndex.from_tuples([('L1', 'x'), ('L2', 'y')])) - with pytest.raises(type(names_out), match=names_out.args[0]): - pd.MultiIndex.from_frame(df, names=names_in) + with pytest.raises(ValueError, match=expected_error_msg): + pd.MultiIndex.from_frame(df, names=names) diff --git a/pandas/tests/indexes/multi/test_contains.py b/pandas/tests/indexes/multi/test_contains.py index b73ff11a4dd4e..56836b94a6b03 100644 --- a/pandas/tests/indexes/multi/test_contains.py +++ b/pandas/tests/indexes/multi/test_contains.py @@ -83,15 +83,24 @@ def test_isin_level_kwarg(): tm.assert_numpy_array_equal(expected, idx.isin(vals_1, level=1)) tm.assert_numpy_array_equal(expected, idx.isin(vals_1, level=-1)) - pytest.raises(IndexError, idx.isin, vals_0, level=5) - pytest.raises(IndexError, idx.isin, vals_0, level=-5) - - pytest.raises(KeyError, idx.isin, vals_0, level=1.0) - pytest.raises(KeyError, idx.isin, vals_1, level=-1.0) - pytest.raises(KeyError, idx.isin, vals_1, level='A') + msg = "Too many levels: Index has only 2 levels, not 6" + with pytest.raises(IndexError, match=msg): + idx.isin(vals_0, level=5) + msg = ("Too many levels: Index has only 2 levels, -5 is not a valid level" + " number") + with pytest.raises(IndexError, match=msg): + idx.isin(vals_0, level=-5) + + with pytest.raises(KeyError, match=r"'Level 1\.0 not found'"): + idx.isin(vals_0, level=1.0) + with pytest.raises(KeyError, match=r"'Level -1\.0 not found'"): + idx.isin(vals_1, level=-1.0) + with pytest.raises(KeyError, match="'Level A not found'"): + idx.isin(vals_1, level='A') idx.names = ['A', 'B'] tm.assert_numpy_array_equal(expected, idx.isin(vals_0, level='A')) tm.assert_numpy_array_equal(expected, idx.isin(vals_1, level='B')) - pytest.raises(KeyError, idx.isin, vals_1, level='C') + with pytest.raises(KeyError, match="'Level C not found'"): + idx.isin(vals_1, level='C') diff --git a/pandas/tests/indexes/multi/test_drop.py b/pandas/tests/indexes/multi/test_drop.py index 0cf73d3d752ad..ac167c126fd13 100644 --- a/pandas/tests/indexes/multi/test_drop.py +++ b/pandas/tests/indexes/multi/test_drop.py @@ -4,7 +4,7 @@ import numpy as np import pytest -from pandas.compat import lrange +from pandas.compat import PY2, lrange from pandas.errors import PerformanceWarning import pandas as pd @@ -12,6 +12,7 @@ import pandas.util.testing as tm +@pytest.mark.skipif(PY2, reason="pytest.raises match regex fails") def test_drop(idx): dropped = idx.drop([('foo', 'two'), ('qux', 'one')]) @@ -31,13 +32,17 @@ def test_drop(idx): tm.assert_index_equal(dropped, expected) index = MultiIndex.from_tuples([('bar', 'two')]) - pytest.raises(KeyError, idx.drop, [('bar', 'two')]) - pytest.raises(KeyError, idx.drop, index) - pytest.raises(KeyError, idx.drop, ['foo', 'two']) + with pytest.raises(KeyError, match=r"^10$"): + idx.drop([('bar', 'two')]) + with pytest.raises(KeyError, match=r"^10$"): + idx.drop(index) + with pytest.raises(KeyError, match=r"^'two'$"): + idx.drop(['foo', 'two']) # partially correct argument mixed_index = MultiIndex.from_tuples([('qux', 'one'), ('bar', 'two')]) - pytest.raises(KeyError, idx.drop, mixed_index) + with pytest.raises(KeyError, match=r"^10$"): + idx.drop(mixed_index) # error='ignore' dropped = idx.drop(index, errors='ignore') @@ -59,7 +64,8 @@ def test_drop(idx): # mixed partial / full drop / error='ignore' mixed_index = ['foo', ('qux', 'one'), 'two'] - pytest.raises(KeyError, idx.drop, mixed_index) + with pytest.raises(KeyError, match=r"^'two'$"): + idx.drop(mixed_index) dropped = idx.drop(mixed_index, errors='ignore') expected = idx[[2, 3, 5]] tm.assert_index_equal(dropped, expected) @@ -98,10 +104,12 @@ def test_droplevel_list(): expected = index[:2] assert dropped.equals(expected) - with pytest.raises(ValueError): + msg = ("Cannot remove 3 levels from an index with 3 levels: at least one" + " level must be left") + with pytest.raises(ValueError, match=msg): index[:2].droplevel(['one', 'two', 'three']) - with pytest.raises(KeyError): + with pytest.raises(KeyError, match="'Level four not found'"): index[:2].droplevel(['one', 'four']) diff --git a/pandas/tests/indexes/multi/test_duplicates.py b/pandas/tests/indexes/multi/test_duplicates.py index af15026de2b34..35034dc57b4b8 100644 --- a/pandas/tests/indexes/multi/test_duplicates.py +++ b/pandas/tests/indexes/multi/test_duplicates.py @@ -143,6 +143,18 @@ def test_has_duplicates(idx, idx_dup): assert mi.is_unique is False assert mi.has_duplicates is True + # single instance of NaN + mi_nan = MultiIndex(levels=[['a', 'b'], [0, 1]], + codes=[[-1, 0, 0, 1, 1], [-1, 0, 1, 0, 1]]) + assert mi_nan.is_unique is True + assert mi_nan.has_duplicates is False + + # multiple instances of NaN + mi_nan_dup = MultiIndex(levels=[['a', 'b'], [0, 1]], + codes=[[-1, -1, 0, 0, 1, 1], [-1, -1, 0, 1, 0, 1]]) + assert mi_nan_dup.is_unique is False + assert mi_nan_dup.has_duplicates is True + def test_has_duplicates_from_tuples(): # GH 9075 diff --git a/pandas/tests/indexes/multi/test_get_set.py b/pandas/tests/indexes/multi/test_get_set.py index d201cb2eb178b..62911c7032aca 100644 --- a/pandas/tests/indexes/multi/test_get_set.py +++ b/pandas/tests/indexes/multi/test_get_set.py @@ -25,7 +25,9 @@ def test_get_level_number_integer(idx): idx.names = [1, 0] assert idx._get_level_number(1) == 0 assert idx._get_level_number(0) == 1 - pytest.raises(IndexError, idx._get_level_number, 2) + msg = "Too many levels: Index has only 2 levels, not 3" + with pytest.raises(IndexError, match=msg): + idx._get_level_number(2) with pytest.raises(KeyError, match='Level fourth not found'): idx._get_level_number('fourth') @@ -62,7 +64,7 @@ def test_get_value_duplicates(): names=['tag', 'day']) assert index.get_loc('D') == slice(0, 3) - with pytest.raises(KeyError): + with pytest.raises(KeyError, match=r"^'D'$"): index._engine.get_value(np.array([]), 'D') @@ -125,7 +127,8 @@ def test_set_name_methods(idx, index_names): ind = idx.set_names(new_names) assert idx.names == index_names assert ind.names == new_names - with pytest.raises(ValueError, match="^Length"): + msg = "Length of names must match number of levels in MultiIndex" + with pytest.raises(ValueError, match=msg): ind.set_names(new_names + new_names) new_names2 = [name + "SUFFIX2" for name in new_names] res = ind.set_names(new_names2, inplace=True) @@ -163,10 +166,10 @@ def test_set_levels_codes_directly(idx): minor_codes = [(x + 1) % 1 for x in minor_codes] new_codes = [major_codes, minor_codes] - with pytest.raises(AttributeError): + msg = "can't set attribute" + with pytest.raises(AttributeError, match=msg): idx.levels = new_levels - - with pytest.raises(AttributeError): + with pytest.raises(AttributeError, match=msg): idx.codes = new_codes diff --git a/pandas/tests/indexes/multi/test_indexing.py b/pandas/tests/indexes/multi/test_indexing.py index c40ecd9e82a07..c2af3b2050d8d 100644 --- a/pandas/tests/indexes/multi/test_indexing.py +++ b/pandas/tests/indexes/multi/test_indexing.py @@ -6,7 +6,7 @@ import numpy as np import pytest -from pandas.compat import lrange +from pandas.compat import PY2, lrange import pandas as pd from pandas import ( @@ -112,13 +112,14 @@ def test_slice_locs_not_contained(): def test_putmask_with_wrong_mask(idx): # GH18368 - with pytest.raises(ValueError): + msg = "putmask: mask and data must be the same size" + with pytest.raises(ValueError, match=msg): idx.putmask(np.ones(len(idx) + 1, np.bool), 1) - with pytest.raises(ValueError): + with pytest.raises(ValueError, match=msg): idx.putmask(np.ones(len(idx) - 1, np.bool), 1) - with pytest.raises(ValueError): + with pytest.raises(ValueError, match=msg): idx.putmask('foo', 1) @@ -176,9 +177,12 @@ def test_get_indexer(): def test_get_indexer_nearest(): midx = MultiIndex.from_tuples([('a', 1), ('b', 2)]) - with pytest.raises(NotImplementedError): + msg = ("method='nearest' not implemented yet for MultiIndex; see GitHub" + " issue 9365") + with pytest.raises(NotImplementedError, match=msg): midx.get_indexer(['a'], method='nearest') - with pytest.raises(NotImplementedError): + msg = "tolerance not implemented yet for MultiIndex" + with pytest.raises(NotImplementedError, match=msg): midx.get_indexer(['a'], method='pad', tolerance=2) @@ -251,20 +255,26 @@ def test_getitem_bool_index_single(ind1, ind2): tm.assert_index_equal(idx[ind2], expected) +@pytest.mark.skipif(PY2, reason="pytest.raises match regex fails") def test_get_loc(idx): assert idx.get_loc(('foo', 'two')) == 1 assert idx.get_loc(('baz', 'two')) == 3 - pytest.raises(KeyError, idx.get_loc, ('bar', 'two')) - pytest.raises(KeyError, idx.get_loc, 'quux') + with pytest.raises(KeyError, match=r"^10$"): + idx.get_loc(('bar', 'two')) + with pytest.raises(KeyError, match=r"^'quux'$"): + idx.get_loc('quux') - pytest.raises(NotImplementedError, idx.get_loc, 'foo', - method='nearest') + msg = ("only the default get_loc method is currently supported for" + " MultiIndex") + with pytest.raises(NotImplementedError, match=msg): + idx.get_loc('foo', method='nearest') # 3 levels index = MultiIndex(levels=[Index(lrange(4)), Index(lrange(4)), Index( lrange(4))], codes=[np.array([0, 0, 1, 2, 2, 2, 3, 3]), np.array( [0, 1, 0, 0, 0, 1, 0, 1]), np.array([1, 0, 1, 1, 0, 0, 1, 0])]) - pytest.raises(KeyError, index.get_loc, (1, 1)) + with pytest.raises(KeyError, match=r"^\(1, 1\)$"): + index.get_loc((1, 1)) assert index.get_loc((2, 0)) == slice(3, 5) @@ -297,11 +307,14 @@ def test_get_loc_level(): assert loc == expected assert new_index is None - pytest.raises(KeyError, index.get_loc_level, (2, 2)) + with pytest.raises(KeyError, match=r"^\(2, 2\)$"): + index.get_loc_level((2, 2)) # GH 22221: unused label - pytest.raises(KeyError, index.drop(2).get_loc_level, 2) + with pytest.raises(KeyError, match=r"^2$"): + index.drop(2).get_loc_level(2) # Unused label on unsorted level: - pytest.raises(KeyError, index.drop(1, level=2).get_loc_level, 2, 2) + with pytest.raises(KeyError, match=r"^2$"): + index.drop(1, level=2).get_loc_level(2, level=2) index = MultiIndex(levels=[[2000], lrange(4)], codes=[np.array( [0, 0, 0, 0]), np.array([0, 1, 2, 3])]) @@ -342,8 +355,10 @@ def test_get_loc_cast_bool(): assert idx.get_loc((0, 1)) == 1 assert idx.get_loc((1, 0)) == 2 - pytest.raises(KeyError, idx.get_loc, (False, True)) - pytest.raises(KeyError, idx.get_loc, (True, False)) + with pytest.raises(KeyError, match=r"^\(False, True\)$"): + idx.get_loc((False, True)) + with pytest.raises(KeyError, match=r"^\(True, False\)$"): + idx.get_loc((True, False)) @pytest.mark.parametrize('level', [0, 1]) @@ -361,9 +376,12 @@ def test_get_loc_missing_nan(): # GH 8569 idx = MultiIndex.from_arrays([[1.0, 2.0], [3.0, 4.0]]) assert isinstance(idx.get_loc(1), slice) - pytest.raises(KeyError, idx.get_loc, 3) - pytest.raises(KeyError, idx.get_loc, np.nan) - pytest.raises(KeyError, idx.get_loc, [np.nan]) + with pytest.raises(KeyError, match=r"^3\.0$"): + idx.get_loc(3) + with pytest.raises(KeyError, match=r"^nan$"): + idx.get_loc(np.nan) + with pytest.raises(KeyError, match=r"^\[nan\]$"): + idx.get_loc([np.nan]) def test_get_indexer_categorical_time(): diff --git a/pandas/tests/indexes/multi/test_integrity.py b/pandas/tests/indexes/multi/test_integrity.py index c1638a9cde660..a7dc093147725 100644 --- a/pandas/tests/indexes/multi/test_integrity.py +++ b/pandas/tests/indexes/multi/test_integrity.py @@ -159,7 +159,8 @@ def test_isna_behavior(idx): # should not segfault GH5123 # NOTE: if MI representation changes, may make sense to allow # isna(MI) - with pytest.raises(NotImplementedError): + msg = "isna is not defined for MultiIndex" + with pytest.raises(NotImplementedError, match=msg): pd.isna(idx) @@ -168,16 +169,16 @@ def test_large_multiindex_error(): df_below_1000000 = pd.DataFrame( 1, index=pd.MultiIndex.from_product([[1, 2], range(499999)]), columns=['dest']) - with pytest.raises(KeyError): + with pytest.raises(KeyError, match=r"^\(-1, 0\)$"): df_below_1000000.loc[(-1, 0), 'dest'] - with pytest.raises(KeyError): + with pytest.raises(KeyError, match=r"^\(3, 0\)$"): df_below_1000000.loc[(3, 0), 'dest'] df_above_1000000 = pd.DataFrame( 1, index=pd.MultiIndex.from_product([[1, 2], range(500001)]), columns=['dest']) - with pytest.raises(KeyError): + with pytest.raises(KeyError, match=r"^\(-1, 0\)$"): df_above_1000000.loc[(-1, 0), 'dest'] - with pytest.raises(KeyError): + with pytest.raises(KeyError, match=r"^\(3, 0\)$"): df_above_1000000.loc[(3, 0), 'dest'] @@ -260,7 +261,9 @@ def test_hash_error(indices): def test_mutability(indices): if not len(indices): return - pytest.raises(TypeError, indices.__setitem__, 0, indices[0]) + msg = "Index does not support mutable operations" + with pytest.raises(TypeError, match=msg): + indices[0] = indices[0] def test_wrong_number_names(indices): diff --git a/pandas/tests/indexes/multi/test_set_ops.py b/pandas/tests/indexes/multi/test_set_ops.py index 208d6cf1c639f..41a0e1e59e8a5 100644 --- a/pandas/tests/indexes/multi/test_set_ops.py +++ b/pandas/tests/indexes/multi/test_set_ops.py @@ -9,7 +9,7 @@ @pytest.mark.parametrize("case", [0.5, "xxx"]) -@pytest.mark.parametrize("sort", [True, False]) +@pytest.mark.parametrize("sort", [None, False]) @pytest.mark.parametrize("method", ["intersection", "union", "difference", "symmetric_difference"]) def test_set_ops_error_cases(idx, case, sort, method): @@ -19,13 +19,13 @@ def test_set_ops_error_cases(idx, case, sort, method): getattr(idx, method)(case, sort=sort) -@pytest.mark.parametrize("sort", [True, False]) +@pytest.mark.parametrize("sort", [None, False]) def test_intersection_base(idx, sort): first = idx[:5] second = idx[:3] intersect = first.intersection(second, sort=sort) - if sort: + if sort is None: tm.assert_index_equal(intersect, second.sort_values()) assert tm.equalContents(intersect, second) @@ -34,7 +34,7 @@ def test_intersection_base(idx, sort): for klass in [np.array, Series, list]] for case in cases: result = first.intersection(case, sort=sort) - if sort: + if sort is None: tm.assert_index_equal(result, second.sort_values()) assert tm.equalContents(result, second) @@ -43,13 +43,13 @@ def test_intersection_base(idx, sort): first.intersection([1, 2, 3], sort=sort) -@pytest.mark.parametrize("sort", [True, False]) +@pytest.mark.parametrize("sort", [None, False]) def test_union_base(idx, sort): first = idx[3:] second = idx[:5] everything = idx union = first.union(second, sort=sort) - if sort: + if sort is None: tm.assert_index_equal(union, everything.sort_values()) assert tm.equalContents(union, everything) @@ -58,7 +58,7 @@ def test_union_base(idx, sort): for klass in [np.array, Series, list]] for case in cases: result = first.union(case, sort=sort) - if sort: + if sort is None: tm.assert_index_equal(result, everything.sort_values()) assert tm.equalContents(result, everything) @@ -67,13 +67,13 @@ def test_union_base(idx, sort): first.union([1, 2, 3], sort=sort) -@pytest.mark.parametrize("sort", [True, False]) +@pytest.mark.parametrize("sort", [None, False]) def test_difference_base(idx, sort): second = idx[4:] answer = idx[:4] result = idx.difference(second, sort=sort) - if sort: + if sort is None: answer = answer.sort_values() assert result.equals(answer) @@ -91,14 +91,14 @@ def test_difference_base(idx, sort): idx.difference([1, 2, 3], sort=sort) -@pytest.mark.parametrize("sort", [True, False]) +@pytest.mark.parametrize("sort", [None, False]) def test_symmetric_difference(idx, sort): first = idx[1:] second = idx[:-1] answer = idx[[-1, 0]] result = first.symmetric_difference(second, sort=sort) - if sort: + if sort is None: answer = answer.sort_values() tm.assert_index_equal(result, answer) @@ -121,14 +121,14 @@ def test_empty(idx): assert idx[:0].empty -@pytest.mark.parametrize("sort", [True, False]) +@pytest.mark.parametrize("sort", [None, False]) def test_difference(idx, sort): first = idx result = first.difference(idx[-3:], sort=sort) vals = idx[:-3].values - if sort: + if sort is None: vals = sorted(vals) expected = MultiIndex.from_tuples(vals, @@ -189,14 +189,62 @@ def test_difference(idx, sort): first.difference([1, 2, 3, 4, 5], sort=sort) -@pytest.mark.parametrize("sort", [True, False]) +def test_difference_sort_special(): + # GH-24959 + idx = pd.MultiIndex.from_product([[1, 0], ['a', 'b']]) + # sort=None, the default + result = idx.difference([]) + tm.assert_index_equal(result, idx) + + +@pytest.mark.xfail(reason="Not implemented.") +def test_difference_sort_special_true(): + # TODO decide on True behaviour + idx = pd.MultiIndex.from_product([[1, 0], ['a', 'b']]) + result = idx.difference([], sort=True) + expected = pd.MultiIndex.from_product([[0, 1], ['a', 'b']]) + tm.assert_index_equal(result, expected) + + +def test_difference_sort_incomparable(): + # GH-24959 + idx = pd.MultiIndex.from_product([[1, pd.Timestamp('2000'), 2], + ['a', 'b']]) + + other = pd.MultiIndex.from_product([[3, pd.Timestamp('2000'), 4], + ['c', 'd']]) + # sort=None, the default + # MultiIndex.difference deviates here from other difference + # implementations in not catching the TypeError + with pytest.raises(TypeError): + result = idx.difference(other) + + # sort=False + result = idx.difference(other, sort=False) + tm.assert_index_equal(result, idx) + + +@pytest.mark.xfail(reason="Not implemented.") +def test_difference_sort_incomparable_true(): + # TODO decide on True behaviour + # # sort=True, raises + idx = pd.MultiIndex.from_product([[1, pd.Timestamp('2000'), 2], + ['a', 'b']]) + other = pd.MultiIndex.from_product([[3, pd.Timestamp('2000'), 4], + ['c', 'd']]) + + with pytest.raises(TypeError): + idx.difference(other, sort=True) + + +@pytest.mark.parametrize("sort", [None, False]) def test_union(idx, sort): piece1 = idx[:5][::-1] piece2 = idx[3:] the_union = piece1.union(piece2, sort=sort) - if sort: + if sort is None: tm.assert_index_equal(the_union, idx.sort_values()) assert tm.equalContents(the_union, idx) @@ -225,14 +273,14 @@ def test_union(idx, sort): # assert result.equals(result2) -@pytest.mark.parametrize("sort", [True, False]) +@pytest.mark.parametrize("sort", [None, False]) def test_intersection(idx, sort): piece1 = idx[:5][::-1] piece2 = idx[3:] the_int = piece1.intersection(piece2, sort=sort) - if sort: + if sort is None: tm.assert_index_equal(the_int, idx[3:5]) assert tm.equalContents(the_int, idx[3:5]) @@ -249,3 +297,76 @@ def test_intersection(idx, sort): # tuples = _index.values # result = _index & tuples # assert result.equals(tuples) + + +def test_intersect_equal_sort(): + # GH-24959 + idx = pd.MultiIndex.from_product([[1, 0], ['a', 'b']]) + tm.assert_index_equal(idx.intersection(idx, sort=False), idx) + tm.assert_index_equal(idx.intersection(idx, sort=None), idx) + + +@pytest.mark.xfail(reason="Not implemented.") +def test_intersect_equal_sort_true(): + # TODO decide on True behaviour + idx = pd.MultiIndex.from_product([[1, 0], ['a', 'b']]) + sorted_ = pd.MultiIndex.from_product([[0, 1], ['a', 'b']]) + tm.assert_index_equal(idx.intersection(idx, sort=True), sorted_) + + +@pytest.mark.parametrize('slice_', [slice(None), slice(0)]) +def test_union_sort_other_empty(slice_): + # https://github.com/pandas-dev/pandas/issues/24959 + idx = pd.MultiIndex.from_product([[1, 0], ['a', 'b']]) + + # default, sort=None + other = idx[slice_] + tm.assert_index_equal(idx.union(other), idx) + # MultiIndex does not special case empty.union(idx) + # tm.assert_index_equal(other.union(idx), idx) + + # sort=False + tm.assert_index_equal(idx.union(other, sort=False), idx) + + +@pytest.mark.xfail(reason="Not implemented.") +def test_union_sort_other_empty_sort(slice_): + # TODO decide on True behaviour + # # sort=True + idx = pd.MultiIndex.from_product([[1, 0], ['a', 'b']]) + other = idx[:0] + result = idx.union(other, sort=True) + expected = pd.MultiIndex.from_product([[0, 1], ['a', 'b']]) + tm.assert_index_equal(result, expected) + + +def test_union_sort_other_incomparable(): + # https://github.com/pandas-dev/pandas/issues/24959 + idx = pd.MultiIndex.from_product([[1, pd.Timestamp('2000')], ['a', 'b']]) + + # default, sort=None + result = idx.union(idx[:1]) + tm.assert_index_equal(result, idx) + + # sort=False + result = idx.union(idx[:1], sort=False) + tm.assert_index_equal(result, idx) + + +@pytest.mark.xfail(reason="Not implemented.") +def test_union_sort_other_incomparable_sort(): + # TODO decide on True behaviour + # # sort=True + idx = pd.MultiIndex.from_product([[1, pd.Timestamp('2000')], ['a', 'b']]) + with pytest.raises(TypeError, match='Cannot compare'): + idx.union(idx[:1], sort=True) + + +@pytest.mark.parametrize("method", ['union', 'intersection', 'difference', + 'symmetric_difference']) +def test_setops_disallow_true(method): + idx1 = pd.MultiIndex.from_product([['a', 'b'], [1, 2]]) + idx2 = pd.MultiIndex.from_product([['b', 'c'], [1, 2]]) + + with pytest.raises(ValueError, match="The 'sort' keyword only takes"): + getattr(idx1, method)(idx2, sort=True) diff --git a/pandas/tests/indexes/period/test_asfreq.py b/pandas/tests/indexes/period/test_asfreq.py index 2dd49e7e0845e..30b416e3fe9dd 100644 --- a/pandas/tests/indexes/period/test_asfreq.py +++ b/pandas/tests/indexes/period/test_asfreq.py @@ -67,7 +67,9 @@ def test_asfreq(self): assert pi7.asfreq('H', 'S') == pi5 assert pi7.asfreq('Min', 'S') == pi6 - pytest.raises(ValueError, pi7.asfreq, 'T', 'foo') + msg = "How must be one of S or E" + with pytest.raises(ValueError, match=msg): + pi7.asfreq('T', 'foo') result1 = pi1.asfreq('3M') result2 = pi1.asfreq('M') expected = period_range(freq='M', start='2001-12', end='2001-12') diff --git a/pandas/tests/indexes/period/test_construction.py b/pandas/tests/indexes/period/test_construction.py index 916260c4cee7e..f1adeca7245f6 100644 --- a/pandas/tests/indexes/period/test_construction.py +++ b/pandas/tests/indexes/period/test_construction.py @@ -1,6 +1,7 @@ import numpy as np import pytest +from pandas._libs.tslibs.period import IncompatibleFrequency from pandas.compat import PY3, lmap, lrange, text_type from pandas.core.dtypes.dtypes import PeriodDtype @@ -66,12 +67,17 @@ def test_constructor_field_arrays(self): years = [2007, 2007, 2007] months = [1, 2] - pytest.raises(ValueError, PeriodIndex, year=years, month=months, - freq='M') - pytest.raises(ValueError, PeriodIndex, year=years, month=months, - freq='2M') - pytest.raises(ValueError, PeriodIndex, year=years, month=months, - freq='M', start=Period('2007-01', freq='M')) + + msg = "Mismatched Period array lengths" + with pytest.raises(ValueError, match=msg): + PeriodIndex(year=years, month=months, freq='M') + with pytest.raises(ValueError, match=msg): + PeriodIndex(year=years, month=months, freq='2M') + + msg = "Can either instantiate from fields or endpoints, but not both" + with pytest.raises(ValueError, match=msg): + PeriodIndex(year=years, month=months, freq='M', + start=Period('2007-01', freq='M')) years = [2007, 2007, 2007] months = [1, 2, 3] @@ -81,8 +87,8 @@ def test_constructor_field_arrays(self): def test_constructor_U(self): # U was used as undefined period - pytest.raises(ValueError, period_range, '2007-1-1', periods=500, - freq='X') + with pytest.raises(ValueError, match="Invalid frequency: X"): + period_range('2007-1-1', periods=500, freq='X') def test_constructor_nano(self): idx = period_range(start=Period(ordinal=1, freq='N'), @@ -103,17 +109,29 @@ def test_constructor_arrays_negative_year(self): tm.assert_index_equal(pindex.quarter, pd.Index(quarters)) def test_constructor_invalid_quarters(self): - pytest.raises(ValueError, PeriodIndex, year=lrange(2000, 2004), - quarter=lrange(4), freq='Q-DEC') + msg = "Quarter must be 1 <= q <= 4" + with pytest.raises(ValueError, match=msg): + PeriodIndex(year=lrange(2000, 2004), quarter=lrange(4), + freq='Q-DEC') def test_constructor_corner(self): - pytest.raises(ValueError, PeriodIndex, periods=10, freq='A') + msg = "Not enough parameters to construct Period range" + with pytest.raises(ValueError, match=msg): + PeriodIndex(periods=10, freq='A') start = Period('2007', freq='A-JUN') end = Period('2010', freq='A-DEC') - pytest.raises(ValueError, PeriodIndex, start=start, end=end) - pytest.raises(ValueError, PeriodIndex, start=start) - pytest.raises(ValueError, PeriodIndex, end=end) + + msg = "start and end must have same freq" + with pytest.raises(ValueError, match=msg): + PeriodIndex(start=start, end=end) + + msg = ("Of the three parameters: start, end, and periods, exactly two" + " must be specified") + with pytest.raises(ValueError, match=msg): + PeriodIndex(start=start) + with pytest.raises(ValueError, match=msg): + PeriodIndex(end=end) result = period_range('2007-01', periods=10.5, freq='M') exp = period_range('2007-01', periods=10, freq='M') @@ -126,10 +144,15 @@ def test_constructor_fromarraylike(self): tm.assert_index_equal(PeriodIndex(idx.values), idx) tm.assert_index_equal(PeriodIndex(list(idx.values)), idx) - pytest.raises(ValueError, PeriodIndex, idx._ndarray_values) - pytest.raises(ValueError, PeriodIndex, list(idx._ndarray_values)) - pytest.raises(TypeError, PeriodIndex, - data=Period('2007', freq='A')) + msg = "freq not specified and cannot be inferred" + with pytest.raises(ValueError, match=msg): + PeriodIndex(idx._ndarray_values) + with pytest.raises(ValueError, match=msg): + PeriodIndex(list(idx._ndarray_values)) + + msg = "'Period' object is not iterable" + with pytest.raises(TypeError, match=msg): + PeriodIndex(data=Period('2007', freq='A')) result = PeriodIndex(iter(idx)) tm.assert_index_equal(result, idx) @@ -160,7 +183,9 @@ def test_constructor_datetime64arr(self): vals = np.arange(100000, 100000 + 10000, 100, dtype=np.int64) vals = vals.view(np.dtype('M8[us]')) - pytest.raises(ValueError, PeriodIndex, vals, freq='D') + msg = r"Wrong dtype: datetime64\[us\]" + with pytest.raises(ValueError, match=msg): + PeriodIndex(vals, freq='D') @pytest.mark.parametrize('box', [None, 'series', 'index']) def test_constructor_datetime64arr_ok(self, box): @@ -300,17 +325,20 @@ def test_constructor_simple_new_empty(self): @pytest.mark.parametrize('floats', [[1.1, 2.1], np.array([1.1, 2.1])]) def test_constructor_floats(self, floats): - with pytest.raises(TypeError): + msg = r"PeriodIndex\._simple_new does not accept floats" + with pytest.raises(TypeError, match=msg): pd.PeriodIndex._simple_new(floats, freq='M') - with pytest.raises(TypeError): + msg = "PeriodIndex does not allow floating point in construction" + with pytest.raises(TypeError, match=msg): pd.PeriodIndex(floats, freq='M') def test_constructor_nat(self): - pytest.raises(ValueError, period_range, start='NaT', - end='2011-01-01', freq='M') - pytest.raises(ValueError, period_range, start='2011-01-01', - end='NaT', freq='M') + msg = "start and end must not be NaT" + with pytest.raises(ValueError, match=msg): + period_range(start='NaT', end='2011-01-01', freq='M') + with pytest.raises(ValueError, match=msg): + period_range(start='2011-01-01', end='NaT', freq='M') def test_constructor_year_and_quarter(self): year = pd.Series([2001, 2002, 2003]) @@ -455,9 +483,12 @@ def test_constructor(self): # Mixed freq should fail vals = [end_intv, Period('2006-12-31', 'w')] - pytest.raises(ValueError, PeriodIndex, vals) + msg = r"Input has different freq=W-SUN from PeriodIndex\(freq=B\)" + with pytest.raises(IncompatibleFrequency, match=msg): + PeriodIndex(vals) vals = np.array(vals) - pytest.raises(ValueError, PeriodIndex, vals) + with pytest.raises(IncompatibleFrequency, match=msg): + PeriodIndex(vals) def test_constructor_error(self): start = Period('02-Apr-2005', 'B') @@ -508,7 +539,8 @@ def setup_method(self, method): self.series = Series(period_range('2000-01-01', periods=10, freq='D')) def test_constructor_cant_cast_period(self): - with pytest.raises(TypeError): + msg = "Cannot cast PeriodArray to dtype float64" + with pytest.raises(TypeError, match=msg): Series(period_range('2000-01-01', periods=10, freq='D'), dtype=float) diff --git a/pandas/tests/indexes/period/test_indexing.py b/pandas/tests/indexes/period/test_indexing.py index 47c2edfd13395..fa8199b4e6163 100644 --- a/pandas/tests/indexes/period/test_indexing.py +++ b/pandas/tests/indexes/period/test_indexing.py @@ -84,7 +84,8 @@ def test_getitem_partial(self): rng = period_range('2007-01', periods=50, freq='M') ts = Series(np.random.randn(len(rng)), rng) - pytest.raises(KeyError, ts.__getitem__, '2006') + with pytest.raises(KeyError, match=r"^'2006'$"): + ts['2006'] result = ts['2008'] assert (result.index.year == 2008).all() @@ -326,7 +327,8 @@ def test_take_fill_value(self): with pytest.raises(ValueError, match=msg): idx.take(np.array([1, 0, -5]), fill_value=True) - with pytest.raises(IndexError): + msg = "index -5 is out of bounds for size 3" + with pytest.raises(IndexError, match=msg): idx.take(np.array([1, -5])) @@ -335,7 +337,8 @@ class TestIndexing(object): def test_get_loc_msg(self): idx = period_range('2000-1-1', freq='A', periods=10) bad_period = Period('2012', 'A') - pytest.raises(KeyError, idx.get_loc, bad_period) + with pytest.raises(KeyError, match=r"^Period\('2012', 'A-DEC'\)$"): + idx.get_loc(bad_period) try: idx.get_loc(bad_period) @@ -373,8 +376,13 @@ def test_get_loc(self): msg = "Cannot interpret 'foo' as period" with pytest.raises(KeyError, match=msg): idx0.get_loc('foo') - pytest.raises(KeyError, idx0.get_loc, 1.1) - pytest.raises(TypeError, idx0.get_loc, idx0) + with pytest.raises(KeyError, match=r"^1\.1$"): + idx0.get_loc(1.1) + + msg = (r"'PeriodIndex\(\['2017-09-01', '2017-09-02', '2017-09-03'\]," + r" dtype='period\[D\]', freq='D'\)' is an invalid key") + with pytest.raises(TypeError, match=msg): + idx0.get_loc(idx0) # get the location of p1/p2 from # monotonic increasing PeriodIndex with duplicate @@ -391,8 +399,13 @@ def test_get_loc(self): with pytest.raises(KeyError, match=msg): idx1.get_loc('foo') - pytest.raises(KeyError, idx1.get_loc, 1.1) - pytest.raises(TypeError, idx1.get_loc, idx1) + with pytest.raises(KeyError, match=r"^1\.1$"): + idx1.get_loc(1.1) + + msg = (r"'PeriodIndex\(\['2017-09-02', '2017-09-02', '2017-09-03'\]," + r" dtype='period\[D\]', freq='D'\)' is an invalid key") + with pytest.raises(TypeError, match=msg): + idx1.get_loc(idx1) # get the location of p1/p2 from # non-monotonic increasing/decreasing PeriodIndex with duplicate @@ -441,18 +454,6 @@ def test_is_monotonic_decreasing(self): assert idx_dec1.is_monotonic_decreasing is True assert idx.is_monotonic_decreasing is False - def test_is_unique(self): - # GH 17717 - p0 = pd.Period('2017-09-01') - p1 = pd.Period('2017-09-02') - p2 = pd.Period('2017-09-03') - - idx0 = pd.PeriodIndex([p0, p1, p2]) - assert idx0.is_unique is True - - idx1 = pd.PeriodIndex([p1, p1, p2]) - assert idx1.is_unique is False - def test_contains(self): # GH 17717 p0 = pd.Period('2017-09-01') @@ -581,7 +582,7 @@ def test_get_loc2(self): msg = 'Input has different freq=None from PeriodArray\\(freq=D\\)' with pytest.raises(ValueError, match=msg): idx.get_loc('2000-01-10', method='nearest', tolerance='1 hour') - with pytest.raises(KeyError): + with pytest.raises(KeyError, match=r"^Period\('2000-01-10', 'D'\)$"): idx.get_loc('2000-01-10', method='nearest', tolerance='1 day') with pytest.raises( ValueError, diff --git a/pandas/tests/indexes/period/test_period.py b/pandas/tests/indexes/period/test_period.py index 464ff7aa5d58d..89bcf56dbda71 100644 --- a/pandas/tests/indexes/period/test_period.py +++ b/pandas/tests/indexes/period/test_period.py @@ -71,13 +71,15 @@ def test_fillna_period(self): pd.Period('2011-01-01', freq='D')), exp) def test_no_millisecond_field(self): - with pytest.raises(AttributeError): + msg = "type object 'DatetimeIndex' has no attribute 'millisecond'" + with pytest.raises(AttributeError, match=msg): DatetimeIndex.millisecond - with pytest.raises(AttributeError): + msg = "'DatetimeIndex' object has no attribute 'millisecond'" + with pytest.raises(AttributeError, match=msg): DatetimeIndex([]).millisecond - @pytest.mark.parametrize("sort", [True, False]) + @pytest.mark.parametrize("sort", [None, False]) def test_difference_freq(self, sort): # GH14323: difference of Period MUST preserve frequency # but the ability to union results must be preserved @@ -98,8 +100,8 @@ def test_difference_freq(self, sort): def test_hash_error(self): index = period_range('20010101', periods=10) - with pytest.raises(TypeError, match=("unhashable type: %r" % - type(index).__name__)): + msg = "unhashable type: '{}'".format(type(index).__name__) + with pytest.raises(TypeError, match=msg): hash(index) def test_make_time_series(self): @@ -124,7 +126,8 @@ def test_shallow_copy_i8(self): def test_shallow_copy_changing_freq_raises(self): pi = period_range("2018-01-01", periods=3, freq="2D") - with pytest.raises(IncompatibleFrequency, match="are different"): + msg = "specified freq and dtype are different" + with pytest.raises(IncompatibleFrequency, match=msg): pi._shallow_copy(pi, freq="H") def test_dtype_str(self): @@ -214,21 +217,17 @@ def test_period_index_length(self): assert (i1 == i2).all() assert i1.freq == i2.freq - try: + msg = "start and end must have same freq" + with pytest.raises(ValueError, match=msg): period_range(start=start, end=end_intv) - raise AssertionError('Cannot allow mixed freq for start and end') - except ValueError: - pass end_intv = Period('2005-05-01', 'B') i1 = period_range(start=start, end=end_intv) - try: + msg = ("Of the three parameters: start, end, and periods, exactly two" + " must be specified") + with pytest.raises(ValueError, match=msg): period_range(start=start) - raise AssertionError( - 'Must specify periods if missing start or end') - except ValueError: - pass # infer freq from first element i2 = PeriodIndex([end_intv, Period('2005-05-05', 'B')]) @@ -241,9 +240,12 @@ def test_period_index_length(self): # Mixed freq should fail vals = [end_intv, Period('2006-12-31', 'w')] - pytest.raises(ValueError, PeriodIndex, vals) + msg = r"Input has different freq=W-SUN from PeriodIndex\(freq=B\)" + with pytest.raises(IncompatibleFrequency, match=msg): + PeriodIndex(vals) vals = np.array(vals) - pytest.raises(ValueError, PeriodIndex, vals) + with pytest.raises(ValueError, match=msg): + PeriodIndex(vals) def test_fields(self): # year, month, day, hour, minute @@ -381,7 +383,9 @@ def test_contains_nat(self): assert np.nan in idx def test_periods_number_check(self): - with pytest.raises(ValueError): + msg = ("Of the three parameters: start, end, and periods, exactly two" + " must be specified") + with pytest.raises(ValueError, match=msg): period_range('2011-1-1', '2012-1-1', 'B') def test_start_time(self): @@ -500,7 +504,8 @@ def test_is_full(self): assert index.is_full index = PeriodIndex([2006, 2005, 2005], freq='A') - pytest.raises(ValueError, getattr, index, 'is_full') + with pytest.raises(ValueError, match="Index is not monotonic"): + index.is_full assert index[:0].is_full @@ -574,5 +579,6 @@ def test_maybe_convert_timedelta(): assert pi._maybe_convert_timedelta(2) == 2 offset = offsets.BusinessDay() - with pytest.raises(ValueError, match='freq'): + msg = r"Input has different freq=B from PeriodIndex\(freq=D\)" + with pytest.raises(ValueError, match=msg): pi._maybe_convert_timedelta(offset) diff --git a/pandas/tests/indexes/period/test_setops.py b/pandas/tests/indexes/period/test_setops.py index a97ab47bcda16..bf29edad4841e 100644 --- a/pandas/tests/indexes/period/test_setops.py +++ b/pandas/tests/indexes/period/test_setops.py @@ -38,7 +38,7 @@ def test_join_does_not_recur(self): df.columns[0], df.columns[1]], object) tm.assert_index_equal(res, expected) - @pytest.mark.parametrize("sort", [True, False]) + @pytest.mark.parametrize("sort", [None, False]) def test_union(self, sort): # union other1 = pd.period_range('1/1/2000', freq='D', periods=5) @@ -97,11 +97,11 @@ def test_union(self, sort): (rng8, other8, expected8)]: result_union = rng.union(other, sort=sort) - if sort: + if sort is None: expected = expected.sort_values() tm.assert_index_equal(result_union, expected) - @pytest.mark.parametrize("sort", [True, False]) + @pytest.mark.parametrize("sort", [None, False]) def test_union_misc(self, sort): index = period_range('1/1/2000', '1/20/2000', freq='D') @@ -110,7 +110,7 @@ def test_union_misc(self, sort): # not in order result = _permute(index[:-5]).union(_permute(index[10:]), sort=sort) - if sort: + if sort is None: tm.assert_index_equal(result, index) assert tm.equalContents(result, index) @@ -139,7 +139,7 @@ def test_union_dataframe_index(self): exp = pd.period_range('1/1/1980', '1/1/2012', freq='M') tm.assert_index_equal(df.index, exp) - @pytest.mark.parametrize("sort", [True, False]) + @pytest.mark.parametrize("sort", [None, False]) def test_intersection(self, sort): index = period_range('1/1/2000', '1/20/2000', freq='D') @@ -150,7 +150,7 @@ def test_intersection(self, sort): left = _permute(index[:-5]) right = _permute(index[10:]) result = left.intersection(right, sort=sort) - if sort: + if sort is None: tm.assert_index_equal(result, index[10:-5]) assert tm.equalContents(result, index[10:-5]) @@ -164,7 +164,7 @@ def test_intersection(self, sort): with pytest.raises(period.IncompatibleFrequency): index.intersection(index3, sort=sort) - @pytest.mark.parametrize("sort", [True, False]) + @pytest.mark.parametrize("sort", [None, False]) def test_intersection_cases(self, sort): base = period_range('6/1/2000', '6/30/2000', freq='D', name='idx') @@ -210,7 +210,7 @@ def test_intersection_cases(self, sort): for (rng, expected) in [(rng2, expected2), (rng3, expected3), (rng4, expected4)]: result = base.intersection(rng, sort=sort) - if sort: + if sort is None: expected = expected.sort_values() tm.assert_index_equal(result, expected) assert result.name == expected.name @@ -224,7 +224,7 @@ def test_intersection_cases(self, sort): result = rng.intersection(rng[0:0]) assert len(result) == 0 - @pytest.mark.parametrize("sort", [True, False]) + @pytest.mark.parametrize("sort", [None, False]) def test_difference(self, sort): # diff period_rng = ['1/3/2000', '1/2/2000', '1/1/2000', '1/5/2000', @@ -276,6 +276,6 @@ def test_difference(self, sort): (rng6, other6, expected6), (rng7, other7, expected7), ]: result_difference = rng.difference(other, sort=sort) - if sort: + if sort is None: expected = expected.sort_values() tm.assert_index_equal(result_difference, expected) diff --git a/pandas/tests/indexes/test_base.py b/pandas/tests/indexes/test_base.py index f3e9d835c7391..c99007cef90d4 100644 --- a/pandas/tests/indexes/test_base.py +++ b/pandas/tests/indexes/test_base.py @@ -3,6 +3,7 @@ from collections import defaultdict from datetime import datetime, timedelta import math +import operator import sys import numpy as np @@ -684,12 +685,12 @@ def test_empty_fancy_raises(self, attr): # np.ndarray only accepts ndarray of int & bool dtypes, so should Index pytest.raises(IndexError, index.__getitem__, empty_farr) - @pytest.mark.parametrize("sort", [True, False]) + @pytest.mark.parametrize("sort", [None, False]) def test_intersection(self, sort): first = self.strIndex[:20] second = self.strIndex[:10] intersect = first.intersection(second, sort=sort) - if sort: + if sort is None: tm.assert_index_equal(intersect, second.sort_values()) assert tm.equalContents(intersect, second) @@ -701,7 +702,7 @@ def test_intersection(self, sort): (Index([3, 4, 5, 6, 7], name="index"), True), # preserve same name (Index([3, 4, 5, 6, 7], name="other"), False), # drop diff names (Index([3, 4, 5, 6, 7]), False)]) - @pytest.mark.parametrize("sort", [True, False]) + @pytest.mark.parametrize("sort", [None, False]) def test_intersection_name_preservation(self, index2, keeps_name, sort): index1 = Index([1, 2, 3, 4, 5], name='index') expected = Index([3, 4, 5]) @@ -715,7 +716,7 @@ def test_intersection_name_preservation(self, index2, keeps_name, sort): @pytest.mark.parametrize("first_name,second_name,expected_name", [ ('A', 'A', 'A'), ('A', 'B', None), (None, 'B', None)]) - @pytest.mark.parametrize("sort", [True, False]) + @pytest.mark.parametrize("sort", [None, False]) def test_intersection_name_preservation2(self, first_name, second_name, expected_name, sort): first = self.strIndex[5:20] @@ -728,7 +729,7 @@ def test_intersection_name_preservation2(self, first_name, second_name, @pytest.mark.parametrize("index2,keeps_name", [ (Index([4, 7, 6, 5, 3], name='index'), True), (Index([4, 7, 6, 5, 3], name='other'), False)]) - @pytest.mark.parametrize("sort", [True, False]) + @pytest.mark.parametrize("sort", [None, False]) def test_intersection_monotonic(self, index2, keeps_name, sort): index1 = Index([5, 3, 2, 4, 1], name='index') expected = Index([5, 3, 4]) @@ -737,25 +738,25 @@ def test_intersection_monotonic(self, index2, keeps_name, sort): expected.name = "index" result = index1.intersection(index2, sort=sort) - if sort: + if sort is None: expected = expected.sort_values() tm.assert_index_equal(result, expected) @pytest.mark.parametrize("index2,expected_arr", [ (Index(['B', 'D']), ['B']), (Index(['B', 'D', 'A']), ['A', 'B', 'A'])]) - @pytest.mark.parametrize("sort", [True, False]) + @pytest.mark.parametrize("sort", [None, False]) def test_intersection_non_monotonic_non_unique(self, index2, expected_arr, sort): # non-monotonic non-unique index1 = Index(['A', 'B', 'A', 'C']) expected = Index(expected_arr, dtype='object') result = index1.intersection(index2, sort=sort) - if sort: + if sort is None: expected = expected.sort_values() tm.assert_index_equal(result, expected) - @pytest.mark.parametrize("sort", [True, False]) + @pytest.mark.parametrize("sort", [None, False]) def test_intersect_str_dates(self, sort): dt_dates = [datetime(2012, 2, 9), datetime(2012, 2, 22)] @@ -765,7 +766,24 @@ def test_intersect_str_dates(self, sort): assert len(result) == 0 - @pytest.mark.parametrize("sort", [True, False]) + def test_intersect_nosort(self): + result = pd.Index(['c', 'b', 'a']).intersection(['b', 'a']) + expected = pd.Index(['b', 'a']) + tm.assert_index_equal(result, expected) + + def test_intersection_equal_sort(self): + idx = pd.Index(['c', 'a', 'b']) + tm.assert_index_equal(idx.intersection(idx, sort=False), idx) + tm.assert_index_equal(idx.intersection(idx, sort=None), idx) + + @pytest.mark.xfail(reason="Not implemented") + def test_intersection_equal_sort_true(self): + # TODO decide on True behaviour + idx = pd.Index(['c', 'a', 'b']) + sorted_ = pd.Index(['a', 'b', 'c']) + tm.assert_index_equal(idx.intersection(idx, sort=True), sorted_) + + @pytest.mark.parametrize("sort", [None, False]) def test_chained_union(self, sort): # Chained unions handles names correctly i1 = Index([1, 2], name='i1') @@ -782,7 +800,7 @@ def test_chained_union(self, sort): expected = j1.union(j2, sort=sort).union(j3, sort=sort) tm.assert_index_equal(union, expected) - @pytest.mark.parametrize("sort", [True, False]) + @pytest.mark.parametrize("sort", [None, False]) def test_union(self, sort): # TODO: Replace with fixturesult first = self.strIndex[5:20] @@ -790,13 +808,65 @@ def test_union(self, sort): everything = self.strIndex[:20] union = first.union(second, sort=sort) - if sort: + if sort is None: tm.assert_index_equal(union, everything.sort_values()) assert tm.equalContents(union, everything) + @pytest.mark.parametrize('slice_', [slice(None), slice(0)]) + def test_union_sort_other_special(self, slice_): + # https://github.com/pandas-dev/pandas/issues/24959 + + idx = pd.Index([1, 0, 2]) + # default, sort=None + other = idx[slice_] + tm.assert_index_equal(idx.union(other), idx) + tm.assert_index_equal(other.union(idx), idx) + + # sort=False + tm.assert_index_equal(idx.union(other, sort=False), idx) + + @pytest.mark.xfail(reason="Not implemented") + @pytest.mark.parametrize('slice_', [slice(None), slice(0)]) + def test_union_sort_special_true(self, slice_): + # TODO decide on True behaviour + # sort=True + idx = pd.Index([1, 0, 2]) + # default, sort=None + other = idx[slice_] + + result = idx.union(other, sort=True) + expected = pd.Index([0, 1, 2]) + tm.assert_index_equal(result, expected) + + def test_union_sort_other_incomparable(self): + # https://github.com/pandas-dev/pandas/issues/24959 + idx = pd.Index([1, pd.Timestamp('2000')]) + # default (sort=None) + with tm.assert_produces_warning(RuntimeWarning): + result = idx.union(idx[:1]) + + tm.assert_index_equal(result, idx) + + # sort=None + with tm.assert_produces_warning(RuntimeWarning): + result = idx.union(idx[:1], sort=None) + tm.assert_index_equal(result, idx) + + # sort=False + result = idx.union(idx[:1], sort=False) + tm.assert_index_equal(result, idx) + + @pytest.mark.xfail(reason="Not implemented") + def test_union_sort_other_incomparable_true(self): + # TODO decide on True behaviour + # sort=True + idx = pd.Index([1, pd.Timestamp('2000')]) + with pytest.raises(TypeError, match='.*'): + idx.union(idx[:1], sort=True) + @pytest.mark.parametrize("klass", [ np.array, Series, list]) - @pytest.mark.parametrize("sort", [True, False]) + @pytest.mark.parametrize("sort", [None, False]) def test_union_from_iterables(self, klass, sort): # GH 10149 # TODO: Replace with fixturesult @@ -806,29 +876,30 @@ def test_union_from_iterables(self, klass, sort): case = klass(second.values) result = first.union(case, sort=sort) - if sort: + if sort is None: tm.assert_index_equal(result, everything.sort_values()) assert tm.equalContents(result, everything) - @pytest.mark.parametrize("sort", [True, False]) + @pytest.mark.parametrize("sort", [None, False]) def test_union_identity(self, sort): # TODO: replace with fixturesult first = self.strIndex[5:20] union = first.union(first, sort=sort) - assert union is first + # i.e. identity is not preserved when sort is True + assert (union is first) is (not sort) union = first.union([], sort=sort) - assert union is first + assert (union is first) is (not sort) union = Index([]).union(first, sort=sort) - assert union is first + assert (union is first) is (not sort) @pytest.mark.parametrize("first_list", [list('ba'), list()]) @pytest.mark.parametrize("second_list", [list('ab'), list()]) @pytest.mark.parametrize("first_name, second_name, expected_name", [ ('A', 'B', None), (None, 'B', None), ('A', None, None)]) - @pytest.mark.parametrize("sort", [True, False]) + @pytest.mark.parametrize("sort", [None, False]) def test_union_name_preservation(self, first_list, second_list, first_name, second_name, expected_name, sort): first = Index(first_list, name=first_name) @@ -837,14 +908,14 @@ def test_union_name_preservation(self, first_list, second_list, first_name, vals = set(first_list).union(second_list) - if sort and len(first_list) > 0 and len(second_list) > 0: + if sort is None and len(first_list) > 0 and len(second_list) > 0: expected = Index(sorted(vals), name=expected_name) tm.assert_index_equal(union, expected) else: expected = Index(vals, name=expected_name) assert tm.equalContents(union, expected) - @pytest.mark.parametrize("sort", [True, False]) + @pytest.mark.parametrize("sort", [None, False]) def test_union_dt_as_obj(self, sort): # TODO: Replace with fixturesult firstCat = self.strIndex.union(self.dateIndex) @@ -861,6 +932,15 @@ def test_union_dt_as_obj(self, sort): tm.assert_contains_all(self.strIndex, secondCat) tm.assert_contains_all(self.dateIndex, firstCat) + @pytest.mark.parametrize("method", ['union', 'intersection', 'difference', + 'symmetric_difference']) + def test_setops_disallow_true(self, method): + idx1 = pd.Index(['a', 'b']) + idx2 = pd.Index(['b', 'c']) + + with pytest.raises(ValueError, match="The 'sort' keyword only takes"): + getattr(idx1, method)(idx2, sort=True) + def test_map_identity_mapping(self): # GH 12766 # TODO: replace with fixture @@ -982,7 +1062,7 @@ def test_append_empty_preserve_name(self, name, expected): @pytest.mark.parametrize("second_name,expected", [ (None, None), ('name', 'name')]) - @pytest.mark.parametrize("sort", [True, False]) + @pytest.mark.parametrize("sort", [None, False]) def test_difference_name_preservation(self, second_name, expected, sort): # TODO: replace with fixturesult first = self.strIndex[5:20] @@ -1000,7 +1080,7 @@ def test_difference_name_preservation(self, second_name, expected, sort): else: assert result.name == expected - @pytest.mark.parametrize("sort", [True, False]) + @pytest.mark.parametrize("sort", [None, False]) def test_difference_empty_arg(self, sort): first = self.strIndex[5:20] first.name == 'name' @@ -1009,7 +1089,7 @@ def test_difference_empty_arg(self, sort): assert tm.equalContents(result, first) assert result.name == first.name - @pytest.mark.parametrize("sort", [True, False]) + @pytest.mark.parametrize("sort", [None, False]) def test_difference_identity(self, sort): first = self.strIndex[5:20] first.name == 'name' @@ -1018,7 +1098,7 @@ def test_difference_identity(self, sort): assert len(result) == 0 assert result.name == first.name - @pytest.mark.parametrize("sort", [True, False]) + @pytest.mark.parametrize("sort", [None, False]) def test_difference_sort(self, sort): first = self.strIndex[5:20] second = self.strIndex[:10] @@ -1026,12 +1106,12 @@ def test_difference_sort(self, sort): result = first.difference(second, sort) expected = self.strIndex[10:20] - if sort: + if sort is None: expected = expected.sort_values() tm.assert_index_equal(result, expected) - @pytest.mark.parametrize("sort", [True, False]) + @pytest.mark.parametrize("sort", [None, False]) def test_symmetric_difference(self, sort): # smoke index1 = Index([5, 2, 3, 4], name='index1') @@ -1040,7 +1120,7 @@ def test_symmetric_difference(self, sort): expected = Index([5, 1]) assert tm.equalContents(result, expected) assert result.name is None - if sort: + if sort is None: expected = expected.sort_values() tm.assert_index_equal(result, expected) @@ -1049,13 +1129,43 @@ def test_symmetric_difference(self, sort): assert tm.equalContents(result, expected) assert result.name is None - @pytest.mark.parametrize("sort", [True, False]) + @pytest.mark.parametrize('opname', ['difference', 'symmetric_difference']) + def test_difference_incomparable(self, opname): + a = pd.Index([3, pd.Timestamp('2000'), 1]) + b = pd.Index([2, pd.Timestamp('1999'), 1]) + op = operator.methodcaller(opname, b) + + # sort=None, the default + result = op(a) + expected = pd.Index([3, pd.Timestamp('2000'), 2, pd.Timestamp('1999')]) + if opname == 'difference': + expected = expected[:2] + tm.assert_index_equal(result, expected) + + # sort=False + op = operator.methodcaller(opname, b, sort=False) + result = op(a) + tm.assert_index_equal(result, expected) + + @pytest.mark.xfail(reason="Not implemented") + @pytest.mark.parametrize('opname', ['difference', 'symmetric_difference']) + def test_difference_incomparable_true(self, opname): + # TODO decide on True behaviour + # # sort=True, raises + a = pd.Index([3, pd.Timestamp('2000'), 1]) + b = pd.Index([2, pd.Timestamp('1999'), 1]) + op = operator.methodcaller(opname, b, sort=True) + + with pytest.raises(TypeError, match='Cannot compare'): + op(a) + + @pytest.mark.parametrize("sort", [None, False]) def test_symmetric_difference_mi(self, sort): index1 = MultiIndex.from_tuples(self.tuples) index2 = MultiIndex.from_tuples([('foo', 1), ('bar', 3)]) result = index1.symmetric_difference(index2, sort=sort) expected = MultiIndex.from_tuples([('bar', 2), ('baz', 3), ('bar', 3)]) - if sort: + if sort is None: expected = expected.sort_values() tm.assert_index_equal(result, expected) assert tm.equalContents(result, expected) @@ -1063,18 +1173,18 @@ def test_symmetric_difference_mi(self, sort): @pytest.mark.parametrize("index2,expected", [ (Index([0, 1, np.nan]), Index([2.0, 3.0, 0.0])), (Index([0, 1]), Index([np.nan, 2.0, 3.0, 0.0]))]) - @pytest.mark.parametrize("sort", [True, False]) + @pytest.mark.parametrize("sort", [None, False]) def test_symmetric_difference_missing(self, index2, expected, sort): # GH 13514 change: {nan} - {nan} == {} # (GH 6444, sorting of nans, is no longer an issue) index1 = Index([1, np.nan, 2, 3]) result = index1.symmetric_difference(index2, sort=sort) - if sort: + if sort is None: expected = expected.sort_values() tm.assert_index_equal(result, expected) - @pytest.mark.parametrize("sort", [True, False]) + @pytest.mark.parametrize("sort", [None, False]) def test_symmetric_difference_non_index(self, sort): index1 = Index([1, 2, 3, 4], name='index1') index2 = np.array([2, 3, 4, 5]) @@ -1088,7 +1198,7 @@ def test_symmetric_difference_non_index(self, sort): assert tm.equalContents(result, expected) assert result.name == 'new_name' - @pytest.mark.parametrize("sort", [True, False]) + @pytest.mark.parametrize("sort", [None, False]) def test_difference_type(self, sort): # GH 20040 # If taking difference of a set and itself, it @@ -1099,7 +1209,7 @@ def test_difference_type(self, sort): expected = index.drop(index) tm.assert_index_equal(result, expected) - @pytest.mark.parametrize("sort", [True, False]) + @pytest.mark.parametrize("sort", [None, False]) def test_intersection_difference(self, sort): # GH 20040 # Test that the intersection of an index with an @@ -1595,20 +1705,27 @@ def test_drop_tuple(self, values, to_drop): for drop_me in to_drop[1], [to_drop[1]]: pytest.raises(KeyError, removed.drop, drop_me) - @pytest.mark.parametrize("method,expected", [ + @pytest.mark.parametrize("method,expected,sort", [ + ('intersection', np.array([(1, 'A'), (2, 'A'), (1, 'B'), (2, 'B')], + dtype=[('num', int), ('let', 'a1')]), + False), + ('intersection', np.array([(1, 'A'), (1, 'B'), (2, 'A'), (2, 'B')], - dtype=[('num', int), ('let', 'a1')])), + dtype=[('num', int), ('let', 'a1')]), + None), + ('union', np.array([(1, 'A'), (1, 'B'), (1, 'C'), (2, 'A'), (2, 'B'), - (2, 'C')], dtype=[('num', int), ('let', 'a1')])) + (2, 'C')], dtype=[('num', int), ('let', 'a1')]), + None) ]) - def test_tuple_union_bug(self, method, expected): + def test_tuple_union_bug(self, method, expected, sort): index1 = Index(np.array([(1, 'A'), (2, 'A'), (1, 'B'), (2, 'B')], dtype=[('num', int), ('let', 'a1')])) index2 = Index(np.array([(1, 'A'), (2, 'A'), (1, 'B'), (2, 'B'), (1, 'C'), (2, 'C')], dtype=[('num', int), ('let', 'a1')])) - result = getattr(index1, method)(index2) + result = getattr(index1, method)(index2, sort=sort) assert result.ndim == 1 expected = Index(expected) @@ -2247,20 +2364,20 @@ def test_unique_na(self): result = idx.unique() tm.assert_index_equal(result, expected) - @pytest.mark.parametrize("sort", [True, False]) + @pytest.mark.parametrize("sort", [None, False]) def test_intersection_base(self, sort): # (same results for py2 and py3 but sortedness not tested elsewhere) index = self.create_index() first = index[:5] second = index[:3] - expected = Index([0, 1, 'a']) if sort else Index([0, 'a', 1]) + expected = Index([0, 1, 'a']) if sort is None else Index([0, 'a', 1]) result = first.intersection(second, sort=sort) tm.assert_index_equal(result, expected) @pytest.mark.parametrize("klass", [ np.array, Series, list]) - @pytest.mark.parametrize("sort", [True, False]) + @pytest.mark.parametrize("sort", [None, False]) def test_intersection_different_type_base(self, klass, sort): # GH 10149 index = self.create_index() @@ -2270,7 +2387,7 @@ def test_intersection_different_type_base(self, klass, sort): result = first.intersection(klass(second.values), sort=sort) assert tm.equalContents(result, second) - @pytest.mark.parametrize("sort", [True, False]) + @pytest.mark.parametrize("sort", [None, False]) def test_difference_base(self, sort): # (same results for py2 and py3 but sortedness not tested elsewhere) index = self.create_index() @@ -2279,7 +2396,7 @@ def test_difference_base(self, sort): result = first.difference(second, sort) expected = Index([0, 'a', 1]) - if sort: + if sort is None: expected = Index(safe_sort(expected)) tm.assert_index_equal(result, expected) diff --git a/pandas/tests/indexes/test_category.py b/pandas/tests/indexes/test_category.py index 582d466c6178e..d889135160ae2 100644 --- a/pandas/tests/indexes/test_category.py +++ b/pandas/tests/indexes/test_category.py @@ -611,15 +611,6 @@ def test_is_monotonic(self, data, non_lexsorted_data): assert c.is_monotonic_increasing is True assert c.is_monotonic_decreasing is False - @pytest.mark.parametrize('values, expected', [ - ([1, 2, 3], True), - ([1, 3, 1], False), - (list('abc'), True), - (list('aba'), False)]) - def test_is_unique(self, values, expected): - ci = CategoricalIndex(values) - assert ci.is_unique is expected - def test_has_duplicates(self): idx = CategoricalIndex([0, 0, 0], name='foo') diff --git a/pandas/tests/indexes/test_range.py b/pandas/tests/indexes/test_range.py index bbd1e0ccc19b1..96cf83d477376 100644 --- a/pandas/tests/indexes/test_range.py +++ b/pandas/tests/indexes/test_range.py @@ -503,7 +503,7 @@ def test_join_self(self): joined = self.index.join(self.index, how=kind) assert self.index is joined - @pytest.mark.parametrize("sort", [True, False]) + @pytest.mark.parametrize("sort", [None, False]) def test_intersection(self, sort): # intersect with Int64Index other = Index(np.arange(1, 6)) diff --git a/pandas/tests/indexes/timedeltas/test_timedelta.py b/pandas/tests/indexes/timedeltas/test_timedelta.py index 547366ec79094..3cbd9942f9d84 100644 --- a/pandas/tests/indexes/timedeltas/test_timedelta.py +++ b/pandas/tests/indexes/timedeltas/test_timedelta.py @@ -1,4 +1,5 @@ from datetime import timedelta +import re import numpy as np import pytest @@ -51,7 +52,7 @@ def test_fillna_timedelta(self): [pd.Timedelta('1 day'), 'x', pd.Timedelta('3 day')], dtype=object) tm.assert_index_equal(idx.fillna('x'), exp) - @pytest.mark.parametrize("sort", [True, False]) + @pytest.mark.parametrize("sort", [None, False]) def test_difference_freq(self, sort): # GH14323: Difference of TimedeltaIndex should not preserve frequency @@ -69,7 +70,7 @@ def test_difference_freq(self, sort): tm.assert_index_equal(idx_diff, expected) tm.assert_attr_equal('freq', idx_diff, expected) - @pytest.mark.parametrize("sort", [True, False]) + @pytest.mark.parametrize("sort", [None, False]) def test_difference_sort(self, sort): index = pd.TimedeltaIndex(["5 days", "3 days", "2 days", "4 days", @@ -80,7 +81,7 @@ def test_difference_sort(self, sort): expected = TimedeltaIndex(["5 days", "0 days"], freq=None) - if sort: + if sort is None: expected = expected.sort_values() tm.assert_index_equal(idx_diff, expected) @@ -90,7 +91,7 @@ def test_difference_sort(self, sort): idx_diff = index.difference(other, sort) expected = TimedeltaIndex(["1 days", "0 days"], freq=None) - if sort: + if sort is None: expected = expected.sort_values() tm.assert_index_equal(idx_diff, expected) @@ -325,6 +326,13 @@ def test_freq_conversion(self): result = td.astype('timedelta64[s]') assert_index_equal(result, expected) + @pytest.mark.parametrize('unit', ['Y', 'y', 'M']) + def test_unit_m_y_deprecated(self, unit): + with tm.assert_produces_warning(FutureWarning) as w: + TimedeltaIndex([1, 3, 7], unit) + msg = r'.* units are deprecated .*' + assert re.match(msg, str(w[0].message)) + class TestTimeSeries(object): diff --git a/pandas/tests/indexing/multiindex/test_loc.py b/pandas/tests/indexing/multiindex/test_loc.py index ea451d40eb5d3..073d40001a16b 100644 --- a/pandas/tests/indexing/multiindex/test_loc.py +++ b/pandas/tests/indexing/multiindex/test_loc.py @@ -123,10 +123,12 @@ def test_loc_multiindex(self): tm.assert_frame_equal(rs, xp) # missing label - pytest.raises(KeyError, lambda: mi_int.loc[2]) + with pytest.raises(KeyError, match=r"^2L?$"): + mi_int.loc[2] with catch_warnings(record=True): # GH 21593 - pytest.raises(KeyError, lambda: mi_int.ix[2]) + with pytest.raises(KeyError, match=r"^2L?$"): + mi_int.ix[2] def test_loc_multiindex_indexer_none(self): diff --git a/pandas/tests/indexing/multiindex/test_partial.py b/pandas/tests/indexing/multiindex/test_partial.py index 2e37ebe4a0629..473463def2b87 100644 --- a/pandas/tests/indexing/multiindex/test_partial.py +++ b/pandas/tests/indexing/multiindex/test_partial.py @@ -104,8 +104,8 @@ def test_getitem_partial_column_select(self): result = df.ix[('a', 'y'), [1, 0]] tm.assert_frame_equal(result, expected) - pytest.raises(KeyError, df.loc.__getitem__, - (('a', 'foo'), slice(None, None))) + with pytest.raises(KeyError, match=r"\('a', 'foo'\)"): + df.loc[('a', 'foo'), :] def test_partial_set( self, multiindex_year_month_day_dataframe_random_data): diff --git a/pandas/tests/indexing/multiindex/test_slice.py b/pandas/tests/indexing/multiindex/test_slice.py index fcecb2b454eb6..db7d079186708 100644 --- a/pandas/tests/indexing/multiindex/test_slice.py +++ b/pandas/tests/indexing/multiindex/test_slice.py @@ -107,7 +107,8 @@ def test_per_axis_per_level_getitem(self): # ambiguous cases # these can be multiply interpreted (e.g. in this case # as df.loc[slice(None),[1]] as well - pytest.raises(KeyError, lambda: df.loc[slice(None), [1]]) + with pytest.raises(KeyError, match=r"'\[1\] not in index'"): + df.loc[slice(None), [1]] result = df.loc[(slice(None), [1]), :] expected = df.iloc[[0, 3]] diff --git a/pandas/tests/indexing/test_categorical.py b/pandas/tests/indexing/test_categorical.py index b7443e242137b..317aac1766cf8 100644 --- a/pandas/tests/indexing/test_categorical.py +++ b/pandas/tests/indexing/test_categorical.py @@ -53,23 +53,20 @@ def test_loc_scalar(self): assert_frame_equal(df, expected) # value not in the categories - pytest.raises(KeyError, lambda: df.loc['d']) + with pytest.raises(KeyError, match=r"^'d'$"): + df.loc['d'] - def f(): + msg = "cannot append a non-category item to a CategoricalIndex" + with pytest.raises(TypeError, match=msg): df.loc['d'] = 10 - pytest.raises(TypeError, f) - - def f(): + msg = ("cannot insert an item into a CategoricalIndex that is not" + " already an existing category") + with pytest.raises(TypeError, match=msg): df.loc['d', 'A'] = 10 - - pytest.raises(TypeError, f) - - def f(): + with pytest.raises(TypeError, match=msg): df.loc['d', 'C'] = 10 - pytest.raises(TypeError, f) - def test_getitem_scalar(self): cats = Categorical([Timestamp('12-31-1999'), @@ -318,7 +315,8 @@ def test_loc_listlike(self): assert_frame_equal(result, expected, check_index_type=True) # element in the categories but not in the values - pytest.raises(KeyError, lambda: self.df2.loc['e']) + with pytest.raises(KeyError, match=r"^'e'$"): + self.df2.loc['e'] # assign is ok df = self.df2.copy() @@ -616,22 +614,29 @@ def test_reindexing(self): assert_frame_equal(result, expected, check_index_type=True) # passed duplicate indexers are not allowed - pytest.raises(ValueError, lambda: self.df2.reindex(['a', 'a'])) + msg = "cannot reindex with a non-unique indexer" + with pytest.raises(ValueError, match=msg): + self.df2.reindex(['a', 'a']) # args NotImplemented ATM - pytest.raises(NotImplementedError, - lambda: self.df2.reindex(['a'], method='ffill')) - pytest.raises(NotImplementedError, - lambda: self.df2.reindex(['a'], level=1)) - pytest.raises(NotImplementedError, - lambda: self.df2.reindex(['a'], limit=2)) + msg = r"argument {} is not implemented for CategoricalIndex\.reindex" + with pytest.raises(NotImplementedError, match=msg.format('method')): + self.df2.reindex(['a'], method='ffill') + with pytest.raises(NotImplementedError, match=msg.format('level')): + self.df2.reindex(['a'], level=1) + with pytest.raises(NotImplementedError, match=msg.format('limit')): + self.df2.reindex(['a'], limit=2) def test_loc_slice(self): # slicing # not implemented ATM # GH9748 - pytest.raises(TypeError, lambda: self.df.loc[1:5]) + msg = ("cannot do slice indexing on {klass} with these " + r"indexers \[1\] of {kind}".format( + klass=str(CategoricalIndex), kind=str(int))) + with pytest.raises(TypeError, match=msg): + self.df.loc[1:5] # result = df.loc[1:5] # expected = df.iloc[[1,2,3,4]] @@ -679,8 +684,11 @@ def test_boolean_selection(self): # categories=[3, 2, 1], # ordered=False, # name=u'B') - pytest.raises(TypeError, lambda: df4[df4.index < 2]) - pytest.raises(TypeError, lambda: df4[df4.index > 1]) + msg = "Unordered Categoricals can only compare equality or not" + with pytest.raises(TypeError, match=msg): + df4[df4.index < 2] + with pytest.raises(TypeError, match=msg): + df4[df4.index > 1] def test_indexing_with_category(self): diff --git a/pandas/tests/indexing/test_chaining_and_caching.py b/pandas/tests/indexing/test_chaining_and_caching.py index e38c1b16b3b60..6070edca075c2 100644 --- a/pandas/tests/indexing/test_chaining_and_caching.py +++ b/pandas/tests/indexing/test_chaining_and_caching.py @@ -302,11 +302,11 @@ def test_setting_with_copy_bug(self): 'c': ['a', 'b', np.nan, 'd']}) mask = pd.isna(df.c) - def f(): + msg = ("A value is trying to be set on a copy of a slice from a" + " DataFrame") + with pytest.raises(com.SettingWithCopyError, match=msg): df[['c']][mask] = df[['b']][mask] - pytest.raises(com.SettingWithCopyError, f) - # invalid warning as we are returning a new object # GH 8730 df1 = DataFrame({'x': Series(['a', 'b', 'c']), @@ -357,7 +357,6 @@ def check(result, expected): check(result4, expected) @pytest.mark.filterwarnings("ignore::DeprecationWarning") - @pytest.mark.filterwarnings("ignore:\\nPanel:FutureWarning") def test_cache_updating(self): # GH 4939, make sure to update the cache on setitem @@ -367,12 +366,6 @@ def test_cache_updating(self): assert "Hello Friend" in df['A'].index assert "Hello Friend" in df['B'].index - panel = tm.makePanel() - panel.ix[0] # get first item into cache - panel.ix[:, :, 'A+1'] = panel.ix[:, :, 'A'] + 1 - assert "A+1" in panel.ix[0].columns - assert "A+1" in panel.ix[1].columns - # 10264 df = DataFrame(np.zeros((5, 5), dtype='int64'), columns=[ 'a', 'b', 'c', 'd', 'e'], index=range(5)) diff --git a/pandas/tests/indexing/test_floats.py b/pandas/tests/indexing/test_floats.py index de91b8f4a796c..b9b47338c9de2 100644 --- a/pandas/tests/indexing/test_floats.py +++ b/pandas/tests/indexing/test_floats.py @@ -6,7 +6,7 @@ import pytest from pandas import ( - DataFrame, Float64Index, Index, Int64Index, RangeIndex, Series) + DataFrame, Float64Index, Index, Int64Index, RangeIndex, Series, compat) import pandas.util.testing as tm from pandas.util.testing import assert_almost_equal, assert_series_equal @@ -54,9 +54,11 @@ def test_scalar_error(self): with pytest.raises(TypeError, match=msg): s.iloc[3.0] - def f(): + msg = ("cannot do positional indexing on {klass} with these " + r"indexers \[3\.0\] of {kind}".format( + klass=type(i), kind=str(float))) + with pytest.raises(TypeError, match=msg): s.iloc[3.0] = 0 - pytest.raises(TypeError, f) @ignore_ix def test_scalar_non_numeric(self): @@ -82,35 +84,46 @@ def test_scalar_non_numeric(self): (lambda x: x.iloc, False), (lambda x: x, True)]: - def f(): - with catch_warnings(record=True): - idxr(s)[3.0] - # gettitem on a DataFrame is a KeyError as it is indexing # via labels on the columns if getitem and isinstance(s, DataFrame): error = KeyError + msg = r"^3(\.0)?$" else: error = TypeError - pytest.raises(error, f) + msg = (r"cannot do (label|index|positional) indexing" + r" on {klass} with these indexers \[3\.0\] of" + r" {kind}|" + "Cannot index by location index with a" + " non-integer key" + .format(klass=type(i), kind=str(float))) + with catch_warnings(record=True): + with pytest.raises(error, match=msg): + idxr(s)[3.0] # label based can be a TypeError or KeyError - def f(): - s.loc[3.0] - if s.index.inferred_type in ['string', 'unicode', 'mixed']: error = KeyError + msg = r"^3$" else: error = TypeError - pytest.raises(error, f) + msg = (r"cannot do (label|index) indexing" + r" on {klass} with these indexers \[3\.0\] of" + r" {kind}" + .format(klass=type(i), kind=str(float))) + with pytest.raises(error, match=msg): + s.loc[3.0] # contains assert 3.0 not in s # setting with a float fails with iloc - def f(): + msg = (r"cannot do (label|index|positional) indexing" + r" on {klass} with these indexers \[3\.0\] of" + r" {kind}" + .format(klass=type(i), kind=str(float))) + with pytest.raises(TypeError, match=msg): s.iloc[3.0] = 0 - pytest.raises(TypeError, f) # setting with an indexer if s.index.inferred_type in ['categorical']: @@ -145,7 +158,12 @@ def f(): # fallsback to position selection, series only s = Series(np.arange(len(i)), index=i) s[3] - pytest.raises(TypeError, lambda: s[3.0]) + msg = (r"cannot do (label|index) indexing" + r" on {klass} with these indexers \[3\.0\] of" + r" {kind}" + .format(klass=type(i), kind=str(float))) + with pytest.raises(TypeError, match=msg): + s[3.0] @ignore_ix def test_scalar_with_mixed(self): @@ -153,19 +171,23 @@ def test_scalar_with_mixed(self): s2 = Series([1, 2, 3], index=['a', 'b', 'c']) s3 = Series([1, 2, 3], index=['a', 'b', 1.5]) - # lookup in a pure string index + # lookup in a pure stringstr # with an invalid indexer for idxr in [lambda x: x.ix, lambda x: x, lambda x: x.iloc]: - def f(): - with catch_warnings(record=True): + msg = (r"cannot do label indexing" + r" on {klass} with these indexers \[1\.0\] of" + r" {kind}|" + "Cannot index by location index with a non-integer key" + .format(klass=str(Index), kind=str(float))) + with catch_warnings(record=True): + with pytest.raises(TypeError, match=msg): idxr(s2)[1.0] - pytest.raises(TypeError, f) - - pytest.raises(KeyError, lambda: s2.loc[1.0]) + with pytest.raises(KeyError, match=r"^1$"): + s2.loc[1.0] result = s2.loc['b'] expected = 2 @@ -175,11 +197,13 @@ def f(): # indexing for idxr in [lambda x: x]: - def f(): + msg = (r"cannot do label indexing" + r" on {klass} with these indexers \[1\.0\] of" + r" {kind}" + .format(klass=str(Index), kind=str(float))) + with pytest.raises(TypeError, match=msg): idxr(s3)[1.0] - pytest.raises(TypeError, f) - result = idxr(s3)[1] expected = 2 assert result == expected @@ -189,17 +213,22 @@ def f(): for idxr in [lambda x: x.ix]: with catch_warnings(record=True): - def f(): + msg = (r"cannot do label indexing" + r" on {klass} with these indexers \[1\.0\] of" + r" {kind}" + .format(klass=str(Index), kind=str(float))) + with pytest.raises(TypeError, match=msg): idxr(s3)[1.0] - pytest.raises(TypeError, f) - result = idxr(s3)[1] expected = 2 assert result == expected - pytest.raises(TypeError, lambda: s3.iloc[1.0]) - pytest.raises(KeyError, lambda: s3.loc[1.0]) + msg = "Cannot index by location index with a non-integer key" + with pytest.raises(TypeError, match=msg): + s3.iloc[1.0] + with pytest.raises(KeyError, match=r"^1$"): + s3.loc[1.0] result = s3.loc[1.5] expected = 3 @@ -280,16 +309,14 @@ def test_scalar_float(self): # setting s2 = s.copy() - def f(): - with catch_warnings(record=True): - idxr(s2)[indexer] = expected with catch_warnings(record=True): result = idxr(s2)[indexer] self.check(result, s, 3, getitem) # random integer is a KeyError with catch_warnings(record=True): - pytest.raises(KeyError, lambda: idxr(s)[3.5]) + with pytest.raises(KeyError, match=r"^3\.5$"): + idxr(s)[3.5] # contains assert 3.0 in s @@ -303,11 +330,16 @@ def f(): self.check(result, s, 3, False) # iloc raises with a float - pytest.raises(TypeError, lambda: s.iloc[3.0]) + msg = "Cannot index by location index with a non-integer key" + with pytest.raises(TypeError, match=msg): + s.iloc[3.0] - def g(): + msg = (r"cannot do positional indexing" + r" on {klass} with these indexers \[3\.0\] of" + r" {kind}" + .format(klass=str(Float64Index), kind=str(float))) + with pytest.raises(TypeError, match=msg): s2.iloc[3.0] = 0 - pytest.raises(TypeError, g) @ignore_ix def test_slice_non_numeric(self): @@ -329,37 +361,55 @@ def test_slice_non_numeric(self): slice(3, 4.0), slice(3.0, 4.0)]: - def f(): + msg = ("cannot do slice indexing" + r" on {klass} with these indexers \[(3|4)\.0\] of" + " {kind}" + .format(klass=type(index), kind=str(float))) + with pytest.raises(TypeError, match=msg): s.iloc[l] - pytest.raises(TypeError, f) for idxr in [lambda x: x.ix, lambda x: x.loc, lambda x: x.iloc, lambda x: x]: - def f(): - with catch_warnings(record=True): + msg = ("cannot do slice indexing" + r" on {klass} with these indexers" + r" \[(3|4)(\.0)?\]" + r" of ({kind_float}|{kind_int})" + .format(klass=type(index), + kind_float=str(float), + kind_int=str(int))) + with catch_warnings(record=True): + with pytest.raises(TypeError, match=msg): idxr(s)[l] - pytest.raises(TypeError, f) # setitem for l in [slice(3.0, 4), slice(3, 4.0), slice(3.0, 4.0)]: - def f(): + msg = ("cannot do slice indexing" + r" on {klass} with these indexers \[(3|4)\.0\] of" + " {kind}" + .format(klass=type(index), kind=str(float))) + with pytest.raises(TypeError, match=msg): s.iloc[l] = 0 - pytest.raises(TypeError, f) for idxr in [lambda x: x.ix, lambda x: x.loc, lambda x: x.iloc, lambda x: x]: - def f(): - with catch_warnings(record=True): + msg = ("cannot do slice indexing" + r" on {klass} with these indexers" + r" \[(3|4)(\.0)?\]" + r" of ({kind_float}|{kind_int})" + .format(klass=type(index), + kind_float=str(float), + kind_int=str(int))) + with catch_warnings(record=True): + with pytest.raises(TypeError, match=msg): idxr(s)[l] = 0 - pytest.raises(TypeError, f) @ignore_ix def test_slice_integer(self): @@ -396,11 +446,13 @@ def test_slice_integer(self): self.check(result, s, indexer, False) # positional indexing - def f(): + msg = ("cannot do slice indexing" + r" on {klass} with these indexers \[(3|4)\.0\] of" + " {kind}" + .format(klass=type(index), kind=str(float))) + with pytest.raises(TypeError, match=msg): s[l] - pytest.raises(TypeError, f) - # getitem out-of-bounds for l in [slice(-6, 6), slice(-6.0, 6.0)]: @@ -420,11 +472,13 @@ def f(): self.check(result, s, indexer, False) # positional indexing - def f(): + msg = ("cannot do slice indexing" + r" on {klass} with these indexers \[-6\.0\] of" + " {kind}" + .format(klass=type(index), kind=str(float))) + with pytest.raises(TypeError, match=msg): s[slice(-6.0, 6.0)] - pytest.raises(TypeError, f) - # getitem odd floats for l, res1 in [(slice(2.5, 4), slice(3, 5)), (slice(2, 3.5), slice(2, 4)), @@ -443,11 +497,13 @@ def f(): self.check(result, s, res, False) # positional indexing - def f(): + msg = ("cannot do slice indexing" + r" on {klass} with these indexers \[(2|3)\.5\] of" + " {kind}" + .format(klass=type(index), kind=str(float))) + with pytest.raises(TypeError, match=msg): s[l] - pytest.raises(TypeError, f) - # setitem for l in [slice(3.0, 4), slice(3, 4.0), @@ -462,11 +518,13 @@ def f(): assert (result == 0).all() # positional indexing - def f(): + msg = ("cannot do slice indexing" + r" on {klass} with these indexers \[(3|4)\.0\] of" + " {kind}" + .format(klass=type(index), kind=str(float))) + with pytest.raises(TypeError, match=msg): s[l] = 0 - pytest.raises(TypeError, f) - def test_integer_positional_indexing(self): """ make sure that we are raising on positional indexing w.r.t. an integer index """ @@ -484,11 +542,17 @@ def test_integer_positional_indexing(self): slice(2.0, 4), slice(2.0, 4.0)]: - def f(): + if compat.PY2: + klass = Int64Index + else: + klass = RangeIndex + msg = ("cannot do slice indexing" + r" on {klass} with these indexers \[(2|4)\.0\] of" + " {kind}" + .format(klass=str(klass), kind=str(float))) + with pytest.raises(TypeError, match=msg): idxr(s)[l] - pytest.raises(TypeError, f) - @ignore_ix def test_slice_integer_frame_getitem(self): @@ -509,11 +573,13 @@ def f(idxr): self.check(result, s, indexer, False) # positional indexing - def f(): + msg = ("cannot do slice indexing" + r" on {klass} with these indexers \[(0|1)\.0\] of" + " {kind}" + .format(klass=type(index), kind=str(float))) + with pytest.raises(TypeError, match=msg): s[l] - pytest.raises(TypeError, f) - # getitem out-of-bounds for l in [slice(-10, 10), slice(-10.0, 10.0)]: @@ -522,11 +588,13 @@ def f(): self.check(result, s, slice(-10, 10), True) # positional indexing - def f(): + msg = ("cannot do slice indexing" + r" on {klass} with these indexers \[-10\.0\] of" + " {kind}" + .format(klass=type(index), kind=str(float))) + with pytest.raises(TypeError, match=msg): s[slice(-10.0, 10.0)] - pytest.raises(TypeError, f) - # getitem odd floats for l, res in [(slice(0.5, 1), slice(1, 2)), (slice(0, 0.5), slice(0, 1)), @@ -536,11 +604,13 @@ def f(): self.check(result, s, res, False) # positional indexing - def f(): + msg = ("cannot do slice indexing" + r" on {klass} with these indexers \[0\.5\] of" + " {kind}" + .format(klass=type(index), kind=str(float))) + with pytest.raises(TypeError, match=msg): s[l] - pytest.raises(TypeError, f) - # setitem for l in [slice(3.0, 4), slice(3, 4.0), @@ -552,11 +622,13 @@ def f(): assert (result == 0).all() # positional indexing - def f(): + msg = ("cannot do slice indexing" + r" on {klass} with these indexers \[(3|4)\.0\] of" + " {kind}" + .format(klass=type(index), kind=str(float))) + with pytest.raises(TypeError, match=msg): s[l] = 0 - pytest.raises(TypeError, f) - f(lambda x: x.loc) with catch_warnings(record=True): f(lambda x: x.ix) @@ -632,9 +704,12 @@ def test_floating_misc(self): # value not found (and no fallbacking at all) # scalar integers - pytest.raises(KeyError, lambda: s.loc[4]) - pytest.raises(KeyError, lambda: s.loc[4]) - pytest.raises(KeyError, lambda: s[4]) + with pytest.raises(KeyError, match=r"^4\.0$"): + s.loc[4] + with pytest.raises(KeyError, match=r"^4\.0$"): + s.loc[4] + with pytest.raises(KeyError, match=r"^4\.0$"): + s[4] # fancy floats/integers create the correct entry (as nan) # fancy tests diff --git a/pandas/tests/indexing/test_iloc.py b/pandas/tests/indexing/test_iloc.py index a867387db4b46..5c87d553daba3 100644 --- a/pandas/tests/indexing/test_iloc.py +++ b/pandas/tests/indexing/test_iloc.py @@ -26,26 +26,33 @@ def test_iloc_exceeds_bounds(self): msg = 'positional indexers are out-of-bounds' with pytest.raises(IndexError, match=msg): df.iloc[:, [0, 1, 2, 3, 4, 5]] - pytest.raises(IndexError, lambda: df.iloc[[1, 30]]) - pytest.raises(IndexError, lambda: df.iloc[[1, -30]]) - pytest.raises(IndexError, lambda: df.iloc[[100]]) + with pytest.raises(IndexError, match=msg): + df.iloc[[1, 30]] + with pytest.raises(IndexError, match=msg): + df.iloc[[1, -30]] + with pytest.raises(IndexError, match=msg): + df.iloc[[100]] s = df['A'] - pytest.raises(IndexError, lambda: s.iloc[[100]]) - pytest.raises(IndexError, lambda: s.iloc[[-100]]) + with pytest.raises(IndexError, match=msg): + s.iloc[[100]] + with pytest.raises(IndexError, match=msg): + s.iloc[[-100]] # still raise on a single indexer msg = 'single positional indexer is out-of-bounds' with pytest.raises(IndexError, match=msg): df.iloc[30] - pytest.raises(IndexError, lambda: df.iloc[-30]) + with pytest.raises(IndexError, match=msg): + df.iloc[-30] # GH10779 # single positive/negative indexer exceeding Series bounds should raise # an IndexError with pytest.raises(IndexError, match=msg): s.iloc[30] - pytest.raises(IndexError, lambda: s.iloc[-30]) + with pytest.raises(IndexError, match=msg): + s.iloc[-30] # slices are ok result = df.iloc[:, 4:10] # 0 < start < len < stop @@ -104,8 +111,12 @@ def check(result, expected): check(dfl.iloc[:, 1:3], dfl.iloc[:, [1]]) check(dfl.iloc[4:6], dfl.iloc[[4]]) - pytest.raises(IndexError, lambda: dfl.iloc[[4, 5, 6]]) - pytest.raises(IndexError, lambda: dfl.iloc[:, 4]) + msg = "positional indexers are out-of-bounds" + with pytest.raises(IndexError, match=msg): + dfl.iloc[[4, 5, 6]] + msg = "single positional indexer is out-of-bounds" + with pytest.raises(IndexError, match=msg): + dfl.iloc[:, 4] def test_iloc_getitem_int(self): @@ -437,10 +448,16 @@ def test_iloc_getitem_labelled_frame(self): assert result == exp # out-of-bounds exception - pytest.raises(IndexError, df.iloc.__getitem__, tuple([10, 5])) + msg = "single positional indexer is out-of-bounds" + with pytest.raises(IndexError, match=msg): + df.iloc[10, 5] # trying to use a label - pytest.raises(ValueError, df.iloc.__getitem__, tuple(['j', 'D'])) + msg = (r"Location based indexing can only have \[integer, integer" + r" slice \(START point is INCLUDED, END point is EXCLUDED\)," + r" listlike of integers, boolean array\] types") + with pytest.raises(ValueError, match=msg): + df.iloc['j', 'D'] def test_iloc_getitem_doc_issue(self): @@ -555,10 +572,15 @@ def test_iloc_mask(self): # GH 3631, iloc with a mask (of a series) should raise df = DataFrame(lrange(5), list('ABCDE'), columns=['a']) mask = (df.a % 2 == 0) - pytest.raises(ValueError, df.iloc.__getitem__, tuple([mask])) + msg = ("iLocation based boolean indexing cannot use an indexable as" + " a mask") + with pytest.raises(ValueError, match=msg): + df.iloc[mask] mask.index = lrange(len(mask)) - pytest.raises(NotImplementedError, df.iloc.__getitem__, - tuple([mask])) + msg = ("iLocation based boolean indexing on an integer type is not" + " available") + with pytest.raises(NotImplementedError, match=msg): + df.iloc[mask] # ndarray ok result = df.iloc[np.array([True] * len(mask), dtype=bool)] diff --git a/pandas/tests/indexing/test_ix.py b/pandas/tests/indexing/test_ix.py index 35805bce07705..fb4dfbb39ce94 100644 --- a/pandas/tests/indexing/test_ix.py +++ b/pandas/tests/indexing/test_ix.py @@ -102,7 +102,12 @@ def compare(result, expected): with catch_warnings(record=True): df.ix[key] - pytest.raises(TypeError, lambda: df.loc[key]) + msg = (r"cannot do slice indexing" + r" on {klass} with these indexers \[(0|1)\] of" + r" {kind}" + .format(klass=type(df.index), kind=str(int))) + with pytest.raises(TypeError, match=msg): + df.loc[key] df = DataFrame(np.random.randn(5, 4), columns=list('ABCD'), index=pd.date_range('2012-01-01', periods=5)) @@ -122,7 +127,8 @@ def compare(result, expected): with catch_warnings(record=True): expected = df.ix[key] except KeyError: - pytest.raises(KeyError, lambda: df.loc[key]) + with pytest.raises(KeyError, match=r"^'2012-01-31'$"): + df.loc[key] continue result = df.loc[key] @@ -279,14 +285,18 @@ def test_ix_setitem_out_of_bounds_axis_0(self): np.random.randn(2, 5), index=["row%s" % i for i in range(2)], columns=["col%s" % i for i in range(5)]) with catch_warnings(record=True): - pytest.raises(ValueError, df.ix.__setitem__, (2, 0), 100) + msg = "cannot set by positional indexing with enlargement" + with pytest.raises(ValueError, match=msg): + df.ix[2, 0] = 100 def test_ix_setitem_out_of_bounds_axis_1(self): df = DataFrame( np.random.randn(5, 2), index=["row%s" % i for i in range(5)], columns=["col%s" % i for i in range(2)]) with catch_warnings(record=True): - pytest.raises(ValueError, df.ix.__setitem__, (0, 2), 100) + msg = "cannot set by positional indexing with enlargement" + with pytest.raises(ValueError, match=msg): + df.ix[0, 2] = 100 def test_ix_empty_list_indexer_is_ok(self): with catch_warnings(record=True): diff --git a/pandas/tests/indexing/test_loc.py b/pandas/tests/indexing/test_loc.py index 17e107c7a1130..3bf4a6bee4af9 100644 --- a/pandas/tests/indexing/test_loc.py +++ b/pandas/tests/indexing/test_loc.py @@ -233,8 +233,10 @@ def test_loc_to_fail(self): columns=['e', 'f', 'g']) # raise a KeyError? - pytest.raises(KeyError, df.loc.__getitem__, - tuple([[1, 2], [1, 2]])) + msg = (r"\"None of \[Int64Index\(\[1, 2\], dtype='int64'\)\] are" + r" in the \[index\]\"") + with pytest.raises(KeyError, match=msg): + df.loc[[1, 2], [1, 2]] # GH 7496 # loc should not fallback @@ -243,10 +245,18 @@ def test_loc_to_fail(self): s.loc[1] = 1 s.loc['a'] = 2 - pytest.raises(KeyError, lambda: s.loc[-1]) - pytest.raises(KeyError, lambda: s.loc[[-1, -2]]) + with pytest.raises(KeyError, match=r"^-1$"): + s.loc[-1] + + msg = (r"\"None of \[Int64Index\(\[-1, -2\], dtype='int64'\)\] are" + r" in the \[index\]\"") + with pytest.raises(KeyError, match=msg): + s.loc[[-1, -2]] - pytest.raises(KeyError, lambda: s.loc[['4']]) + msg = (r"\"None of \[Index\(\[u?'4'\], dtype='object'\)\] are" + r" in the \[index\]\"") + with pytest.raises(KeyError, match=msg): + s.loc[['4']] s.loc[-1] = 3 with tm.assert_produces_warning(FutureWarning, @@ -256,29 +266,28 @@ def test_loc_to_fail(self): tm.assert_series_equal(result, expected) s['a'] = 2 - pytest.raises(KeyError, lambda: s.loc[[-2]]) + msg = (r"\"None of \[Int64Index\(\[-2\], dtype='int64'\)\] are" + r" in the \[index\]\"") + with pytest.raises(KeyError, match=msg): + s.loc[[-2]] del s['a'] - def f(): + with pytest.raises(KeyError, match=msg): s.loc[[-2]] = 0 - pytest.raises(KeyError, f) - # inconsistency between .loc[values] and .loc[values,:] # GH 7999 df = DataFrame([['a'], ['b']], index=[1, 2], columns=['value']) - def f(): + msg = (r"\"None of \[Int64Index\(\[3\], dtype='int64'\)\] are" + r" in the \[index\]\"") + with pytest.raises(KeyError, match=msg): df.loc[[3], :] - pytest.raises(KeyError, f) - - def f(): + with pytest.raises(KeyError, match=msg): df.loc[[3]] - pytest.raises(KeyError, f) - def test_loc_getitem_list_with_fail(self): # 15747 # should KeyError if *any* missing labels @@ -600,11 +609,15 @@ def test_loc_non_unique(self): # these are going to raise because the we are non monotonic df = DataFrame({'A': [1, 2, 3, 4, 5, 6], 'B': [3, 4, 5, 6, 7, 8]}, index=[0, 1, 0, 1, 2, 3]) - pytest.raises(KeyError, df.loc.__getitem__, - tuple([slice(1, None)])) - pytest.raises(KeyError, df.loc.__getitem__, - tuple([slice(0, None)])) - pytest.raises(KeyError, df.loc.__getitem__, tuple([slice(1, 2)])) + msg = "'Cannot get left slice bound for non-unique label: 1'" + with pytest.raises(KeyError, match=msg): + df.loc[1:] + msg = "'Cannot get left slice bound for non-unique label: 0'" + with pytest.raises(KeyError, match=msg): + df.loc[0:] + msg = "'Cannot get left slice bound for non-unique label: 1'" + with pytest.raises(KeyError, match=msg): + df.loc[1:2] # monotonic are ok df = DataFrame({'A': [1, 2, 3, 4, 5, 6], diff --git a/pandas/tests/indexing/test_panel.py b/pandas/tests/indexing/test_panel.py index 34708e1148c90..8530adec011be 100644 --- a/pandas/tests/indexing/test_panel.py +++ b/pandas/tests/indexing/test_panel.py @@ -122,28 +122,6 @@ def test_panel_getitem(self): test1 = panel.loc[:, "2002"] tm.assert_panel_equal(test1, test2) - # GH8710 - # multi-element getting with a list - panel = tm.makePanel() - - expected = panel.iloc[[0, 1]] - - result = panel.loc[['ItemA', 'ItemB']] - tm.assert_panel_equal(result, expected) - - result = panel.loc[['ItemA', 'ItemB'], :, :] - tm.assert_panel_equal(result, expected) - - result = panel[['ItemA', 'ItemB']] - tm.assert_panel_equal(result, expected) - - result = panel.loc['ItemA':'ItemB'] - tm.assert_panel_equal(result, expected) - - with catch_warnings(record=True): - result = panel.ix[['ItemA', 'ItemB']] - tm.assert_panel_equal(result, expected) - # with an object-like # GH 9140 class TestObject(object): diff --git a/pandas/tests/indexing/test_partial.py b/pandas/tests/indexing/test_partial.py index b863afe02c2e8..5b6a5ab9ecf7b 100644 --- a/pandas/tests/indexing/test_partial.py +++ b/pandas/tests/indexing/test_partial.py @@ -246,7 +246,10 @@ def test_series_partial_set(self): tm.assert_series_equal(result, expected, check_index_type=True) # raises as nothing in in the index - pytest.raises(KeyError, lambda: ser.loc[[3, 3, 3]]) + msg = (r"\"None of \[Int64Index\(\[3, 3, 3\], dtype='int64'\)\] are" + r" in the \[index\]\"") + with pytest.raises(KeyError, match=msg): + ser.loc[[3, 3, 3]] expected = Series([0.2, 0.2, np.nan], index=[2, 2, 3]) with tm.assert_produces_warning(FutureWarning, check_stacklevel=False): @@ -342,7 +345,10 @@ def test_series_partial_set_with_name(self): tm.assert_series_equal(result, expected, check_index_type=True) # raises as nothing in in the index - pytest.raises(KeyError, lambda: ser.loc[[3, 3, 3]]) + msg = (r"\"None of \[Int64Index\(\[3, 3, 3\], dtype='int64'," + r" name=u?'idx'\)\] are in the \[index\]\"") + with pytest.raises(KeyError, match=msg): + ser.loc[[3, 3, 3]] exp_idx = Index([2, 2, 3], dtype='int64', name='idx') expected = Series([0.2, 0.2, np.nan], index=exp_idx, name='s') diff --git a/pandas/tests/indexing/test_scalar.py b/pandas/tests/indexing/test_scalar.py index e4b8181a67514..6d607ce86c08e 100644 --- a/pandas/tests/indexing/test_scalar.py +++ b/pandas/tests/indexing/test_scalar.py @@ -30,7 +30,9 @@ def _check(f, func, values=False): for f in [d['labels'], d['ts'], d['floats']]: if f is not None: - pytest.raises(ValueError, self.check_values, f, 'iat') + msg = "iAt based indexing can only have integer indexers" + with pytest.raises(ValueError, match=msg): + self.check_values(f, 'iat') # at for f in [d['ints'], d['uints'], d['labels'], @@ -57,7 +59,9 @@ def _check(f, func, values=False): for f in [d['labels'], d['ts'], d['floats']]: if f is not None: - pytest.raises(ValueError, _check, f, 'iat') + msg = "iAt based indexing can only have integer indexers" + with pytest.raises(ValueError, match=msg): + _check(f, 'iat') # at for f in [d['ints'], d['uints'], d['labels'], @@ -107,8 +111,12 @@ def test_imethods_with_dups(self): result = s.iat[2] assert result == 2 - pytest.raises(IndexError, lambda: s.iat[10]) - pytest.raises(IndexError, lambda: s.iat[-10]) + msg = "index 10 is out of bounds for axis 0 with size 5" + with pytest.raises(IndexError, match=msg): + s.iat[10] + msg = "index -10 is out of bounds for axis 0 with size 5" + with pytest.raises(IndexError, match=msg): + s.iat[-10] result = s.iloc[[2, 3]] expected = Series([2, 3], [2, 2], dtype='int64') @@ -128,22 +136,30 @@ def test_at_to_fail(self): s = Series([1, 2, 3], index=list('abc')) result = s.at['a'] assert result == 1 - pytest.raises(ValueError, lambda: s.at[0]) + msg = ("At based indexing on an non-integer index can only have" + " non-integer indexers") + with pytest.raises(ValueError, match=msg): + s.at[0] df = DataFrame({'A': [1, 2, 3]}, index=list('abc')) result = df.at['a', 'A'] assert result == 1 - pytest.raises(ValueError, lambda: df.at['a', 0]) + with pytest.raises(ValueError, match=msg): + df.at['a', 0] s = Series([1, 2, 3], index=[3, 2, 1]) result = s.at[1] assert result == 3 - pytest.raises(ValueError, lambda: s.at['a']) + msg = ("At based indexing on an integer index can only have integer" + " indexers") + with pytest.raises(ValueError, match=msg): + s.at['a'] df = DataFrame({0: [1, 2, 3]}, index=[3, 2, 1]) result = df.at[1, 0] assert result == 3 - pytest.raises(ValueError, lambda: df.at['a', 0]) + with pytest.raises(ValueError, match=msg): + df.at['a', 0] # GH 13822, incorrect error string with non-unique columns when missing # column is accessed diff --git a/pandas/tests/internals/test_internals.py b/pandas/tests/internals/test_internals.py index fe0706efdc4f8..bda486411e01e 100644 --- a/pandas/tests/internals/test_internals.py +++ b/pandas/tests/internals/test_internals.py @@ -1,6 +1,6 @@ # -*- coding: utf-8 -*- # pylint: disable=W0102 - +from collections import OrderedDict from datetime import date, datetime from distutils.version import LooseVersion import itertools @@ -12,7 +12,7 @@ import pytest from pandas._libs.internals import BlockPlacement -from pandas.compat import OrderedDict, lrange, u, zip +from pandas.compat import lrange, u, zip import pandas as pd from pandas import ( diff --git a/pandas/tests/io/data/legacy_hdf/legacy_table.h5 b/pandas/tests/io/data/legacy_hdf/legacy_table.h5 deleted file mode 100644 index 1c90382d9125c..0000000000000 Binary files a/pandas/tests/io/data/legacy_hdf/legacy_table.h5 and /dev/null differ diff --git a/pandas/tests/io/data/legacy_hdf/legacy_table_py2.h5 b/pandas/tests/io/data/legacy_hdf/legacy_table_py2.h5 new file mode 100644 index 0000000000000..3863d714a315b Binary files /dev/null and b/pandas/tests/io/data/legacy_hdf/legacy_table_py2.h5 differ diff --git a/pandas/tests/io/formats/test_format.py b/pandas/tests/io/formats/test_format.py index 52dce572c6d4f..b0cf5a2f17609 100644 --- a/pandas/tests/io/formats/test_format.py +++ b/pandas/tests/io/formats/test_format.py @@ -12,6 +12,7 @@ import os import re import sys +import textwrap import warnings import dateutil @@ -345,6 +346,15 @@ def test_repr_truncates_terminal_size_full(self, monkeypatch): lambda: terminal_size) assert "..." not in str(df) + def test_repr_truncation_column_size(self): + # dataframe with last column very wide -> check it is not used to + # determine size of truncation (...) column + df = pd.DataFrame({'a': [108480, 30830], 'b': [12345, 12345], + 'c': [12345, 12345], 'd': [12345, 12345], + 'e': ['a' * 50] * 2}) + assert "..." in str(df) + assert " ... " not in str(df) + def test_repr_max_columns_max_rows(self): term_width, term_height = get_terminal_size() if term_width < 10 or term_height < 10: @@ -543,7 +553,7 @@ def test_to_string_with_formatters_unicode(self): formatters={u('c/\u03c3'): lambda x: '{x}'.format(x=x)}) assert result == u(' c/\u03c3\n') + '0 1\n1 2\n2 3' - def test_east_asian_unicode_frame(self): + def test_east_asian_unicode_false(self): if PY3: _rep = repr else: @@ -643,17 +653,23 @@ def test_east_asian_unicode_frame(self): u'ああああ': [u'さ', u'し', u'す', u'せ']}, columns=['a', 'b', 'c', u'ああああ']) - expected = (u" a ... ああああ\n0 あああああ ... さ\n" - u".. ... ... ...\n3 えええ ... せ\n" + expected = (u" a ... ああああ\n0 あああああ ... さ\n" + u".. ... ... ...\n3 えええ ... せ\n" u"\n[4 rows x 4 columns]") assert _rep(df) == expected df.index = [u'あああ', u'いいいい', u'う', 'aaa'] - expected = (u" a ... ああああ\nあああ あああああ ... さ\n" - u".. ... ... ...\naaa えええ ... せ\n" + expected = (u" a ... ああああ\nあああ あああああ ... さ\n" + u".. ... ... ...\naaa えええ ... せ\n" u"\n[4 rows x 4 columns]") assert _rep(df) == expected + def test_east_asian_unicode_true(self): + if PY3: + _rep = repr + else: + _rep = unicode # noqa + # Emable Unicode option ----------------------------------------- with option_context('display.unicode.east_asian_width', True): @@ -757,18 +773,18 @@ def test_east_asian_unicode_frame(self): u'ああああ': [u'さ', u'し', u'す', u'せ']}, columns=['a', 'b', 'c', u'ああああ']) - expected = (u" a ... ああああ\n" - u"0 あああああ ... さ\n" - u".. ... ... ...\n" - u"3 えええ ... せ\n" + expected = (u" a ... ああああ\n" + u"0 あああああ ... さ\n" + u".. ... ... ...\n" + u"3 えええ ... せ\n" u"\n[4 rows x 4 columns]") assert _rep(df) == expected df.index = [u'あああ', u'いいいい', u'う', 'aaa'] - expected = (u" a ... ああああ\n" - u"あああ あああああ ... さ\n" - u"... ... ... ...\n" - u"aaa えええ ... せ\n" + expected = (u" a ... ああああ\n" + u"あああ あああああ ... さ\n" + u"... ... ... ...\n" + u"aaa えええ ... せ\n" u"\n[4 rows x 4 columns]") assert _rep(df) == expected @@ -1465,6 +1481,39 @@ def test_to_string_format_na(self): '4 4.0 bar') assert result == expected + def test_to_string_format_inf(self): + # Issue #24861 + tm.reset_display_options() + df = DataFrame({ + 'A': [-np.inf, np.inf, -1, -2.1234, 3, 4], + 'B': [-np.inf, np.inf, 'foo', 'foooo', 'fooooo', 'bar'] + }) + result = df.to_string() + + expected = (' A B\n' + '0 -inf -inf\n' + '1 inf inf\n' + '2 -1.0000 foo\n' + '3 -2.1234 foooo\n' + '4 3.0000 fooooo\n' + '5 4.0000 bar') + assert result == expected + + df = DataFrame({ + 'A': [-np.inf, np.inf, -1., -2., 3., 4.], + 'B': [-np.inf, np.inf, 'foo', 'foooo', 'fooooo', 'bar'] + }) + result = df.to_string() + + expected = (' A B\n' + '0 -inf -inf\n' + '1 inf inf\n' + '2 -1.0 foo\n' + '3 -2.0 foooo\n' + '4 3.0 fooooo\n' + '5 4.0 bar') + assert result == expected + def test_to_string_decimal(self): # Issue #23614 df = DataFrame({'A': [6.0, 3.1, 2.2]}) @@ -2729,3 +2778,17 @@ def test_format_percentiles(): fmt.format_percentiles([2, 0.1, 0.5]) with pytest.raises(ValueError, match=msg): fmt.format_percentiles([0.1, 0.5, 'a']) + + +def test_repr_html_ipython_config(ip): + code = textwrap.dedent("""\ + import pandas as pd + df = pd.DataFrame({"A": [1, 2]}) + df._repr_html_() + + cfg = get_ipython().config + cfg['IPKernelApp']['parent_appname'] + df._repr_html_() + """) + result = ip.run_cell(code) + assert not result.error_in_exec diff --git a/pandas/tests/io/json/test_pandas.py b/pandas/tests/io/json/test_pandas.py index 23c40276072d6..0ffc8c978a228 100644 --- a/pandas/tests/io/json/test_pandas.py +++ b/pandas/tests/io/json/test_pandas.py @@ -1,5 +1,6 @@ # -*- coding: utf-8 -*- # pylint: disable-msg=W0612,E1101 +from collections import OrderedDict from datetime import timedelta import json import os @@ -7,8 +8,7 @@ import numpy as np import pytest -from pandas.compat import ( - OrderedDict, StringIO, is_platform_32bit, lrange, range) +from pandas.compat import StringIO, is_platform_32bit, lrange, range import pandas.util._test_decorators as td import pandas as pd @@ -1262,3 +1262,13 @@ def test_index_false_error_to_json(self, orient): "'orient' is 'split' or 'table'") with pytest.raises(ValueError, match=msg): df.to_json(orient=orient, index=False) + + @pytest.mark.parametrize('orient', ['split', 'table']) + @pytest.mark.parametrize('index', [True, False]) + def test_index_false_from_json_to_json(self, orient, index): + # GH25170 + # Test index=False in from_json to_json + expected = DataFrame({'a': [1, 2], 'b': [3, 4]}) + dfjson = expected.to_json(orient=orient, index=index) + result = read_json(dfjson, orient=orient) + assert_frame_equal(result, expected) diff --git a/pandas/tests/io/msgpack/test_pack.py b/pandas/tests/io/msgpack/test_pack.py index 8c82d0d2cf870..078d9f4ceb649 100644 --- a/pandas/tests/io/msgpack/test_pack.py +++ b/pandas/tests/io/msgpack/test_pack.py @@ -1,10 +1,10 @@ # coding: utf-8 - +from collections import OrderedDict import struct import pytest -from pandas.compat import OrderedDict, u +from pandas.compat import u from pandas import compat diff --git a/pandas/tests/io/test_clipboard.py b/pandas/tests/io/test_clipboard.py index 8eb26d9f3dec5..565db92210b0a 100644 --- a/pandas/tests/io/test_clipboard.py +++ b/pandas/tests/io/test_clipboard.py @@ -12,6 +12,7 @@ from pandas.util import testing as tm from pandas.util.testing import makeCustomDataframe as mkdf +from pandas.io.clipboard import clipboard_get, clipboard_set from pandas.io.clipboard.exceptions import PyperclipException try: @@ -30,8 +31,8 @@ def build_kwargs(sep, excel): return kwargs -@pytest.fixture(params=['delims', 'utf8', 'string', 'long', 'nonascii', - 'colwidth', 'mixed', 'float', 'int']) +@pytest.fixture(params=['delims', 'utf8', 'utf16', 'string', 'long', + 'nonascii', 'colwidth', 'mixed', 'float', 'int']) def df(request): data_type = request.param @@ -41,6 +42,10 @@ def df(request): elif data_type == 'utf8': return pd.DataFrame({'a': ['µasd', 'Ωœ∑´'], 'b': ['øπ∆˚¬', 'œ∑´®']}) + elif data_type == 'utf16': + return pd.DataFrame({'a': ['\U0001f44d\U0001f44d', + '\U0001f44d\U0001f44d'], + 'b': ['abc', 'def']}) elif data_type == 'string': return mkdf(5, 3, c_idx_type='s', r_idx_type='i', c_idx_names=[None], r_idx_names=[None]) @@ -225,3 +230,14 @@ def test_invalid_encoding(self, df): @pytest.mark.parametrize('enc', ['UTF-8', 'utf-8', 'utf8']) def test_round_trip_valid_encodings(self, enc, df): self.check_round_trip_frame(df, encoding=enc) + + +@pytest.mark.single +@pytest.mark.clipboard +@pytest.mark.skipif(not _DEPS_INSTALLED, + reason="clipboard primitives not installed") +@pytest.mark.parametrize('data', [u'\U0001f44d...', u'Ωœ∑´...', 'abcd...']) +def test_raw_roundtrip(data): + # PR #25040 wide unicode wasn't copied correctly on PY3 on windows + clipboard_set(data) + assert data == clipboard_get() diff --git a/pandas/tests/io/test_excel.py b/pandas/tests/io/test_excel.py index 717e9bc23c6b1..8c92db734168b 100644 --- a/pandas/tests/io/test_excel.py +++ b/pandas/tests/io/test_excel.py @@ -5,7 +5,6 @@ from functools import partial import os import warnings -from warnings import catch_warnings import numpy as np from numpy import nan @@ -2382,15 +2381,12 @@ def check_called(func): assert isinstance(writer, DummyClass) df = tm.makeCustomDataframe(1, 1) - with catch_warnings(record=True): - panel = tm.makePanel() - func = lambda: df.to_excel('something.test') - check_called(func) - check_called(lambda: panel.to_excel('something.test')) - check_called(lambda: df.to_excel('something.xlsx')) - check_called( - lambda: df.to_excel( - 'something.xls', engine='dummy')) + func = lambda: df.to_excel('something.test') + check_called(func) + check_called(lambda: df.to_excel('something.xlsx')) + check_called( + lambda: df.to_excel( + 'something.xls', engine='dummy')) @pytest.mark.parametrize('engine', [ diff --git a/pandas/tests/io/test_pytables.py b/pandas/tests/io/test_pytables.py index 517a3e059469c..b464903d8b4e0 100644 --- a/pandas/tests/io/test_pytables.py +++ b/pandas/tests/io/test_pytables.py @@ -19,11 +19,11 @@ import pandas as pd from pandas import ( Categorical, DataFrame, DatetimeIndex, Index, Int64Index, MultiIndex, - Panel, RangeIndex, Series, Timestamp, bdate_range, compat, concat, - date_range, isna, timedelta_range) + RangeIndex, Series, Timestamp, bdate_range, compat, concat, date_range, + isna, timedelta_range) import pandas.util.testing as tm from pandas.util.testing import ( - assert_frame_equal, assert_panel_equal, assert_series_equal, set_timezone) + assert_frame_equal, assert_series_equal, set_timezone) from pandas.io import pytables as pytables # noqa:E402 from pandas.io.formats.printing import pprint_thing @@ -141,7 +141,6 @@ def teardown_method(self, method): @pytest.mark.single -@pytest.mark.filterwarnings("ignore:\\nPanel:FutureWarning") class TestHDFStore(Base): def test_format_kwarg_in_constructor(self): @@ -185,11 +184,6 @@ def roundtrip(key, obj, **kwargs): o = tm.makeDataFrame() assert_frame_equal(o, roundtrip('frame', o)) - with catch_warnings(record=True): - - o = tm.makePanel() - assert_panel_equal(o, roundtrip('panel', o)) - # table df = DataFrame(dict(A=lrange(5), B=lrange(5))) df.to_hdf(path, 'table', append=True) @@ -348,11 +342,9 @@ def test_keys(self): store['a'] = tm.makeTimeSeries() store['b'] = tm.makeStringSeries() store['c'] = tm.makeDataFrame() - with catch_warnings(record=True): - store['d'] = tm.makePanel() - store['foo/bar'] = tm.makePanel() - assert len(store) == 5 - expected = {'/a', '/b', '/c', '/d', '/foo/bar'} + + assert len(store) == 3 + expected = {'/a', '/b', '/c'} assert set(store.keys()) == expected assert set(store) == expected @@ -388,11 +380,6 @@ def test_repr(self): store['b'] = tm.makeStringSeries() store['c'] = tm.makeDataFrame() - with catch_warnings(record=True): - store['d'] = tm.makePanel() - store['foo/bar'] = tm.makePanel() - store.append('e', tm.makePanel()) - df = tm.makeDataFrame() df['obj1'] = 'foo' df['obj2'] = 'bar' @@ -936,21 +923,6 @@ def test_append(self): store.append('/df3 foo', df[10:]) tm.assert_frame_equal(store['df3 foo'], df) - # panel - wp = tm.makePanel() - _maybe_remove(store, 'wp1') - store.append('wp1', wp.iloc[:, :10, :]) - store.append('wp1', wp.iloc[:, 10:, :]) - assert_panel_equal(store['wp1'], wp) - - # test using differt order of items on the non-index axes - _maybe_remove(store, 'wp1') - wp_append1 = wp.iloc[:, :10, :] - store.append('wp1', wp_append1) - wp_append2 = wp.iloc[:, 10:, :].reindex(items=wp.items[::-1]) - store.append('wp1', wp_append2) - assert_panel_equal(store['wp1'], wp) - # dtype issues - mizxed type in a single object column df = DataFrame(data=[[1, 2], [0, 1], [1, 2], [0, 0]]) df['mixed_column'] = 'testing' @@ -1254,22 +1226,6 @@ def test_append_all_nans(self): reloaded = read_hdf(path, 'df_with_missing') tm.assert_frame_equal(df_with_missing, reloaded) - matrix = [[[np.nan, np.nan, np.nan], [1, np.nan, np.nan]], - [[np.nan, np.nan, np.nan], [np.nan, 5, 6]], - [[np.nan, np.nan, np.nan], [np.nan, 3, np.nan]]] - - with catch_warnings(record=True): - panel_with_missing = Panel(matrix, - items=['Item1', 'Item2', 'Item3'], - major_axis=[1, 2], - minor_axis=['A', 'B', 'C']) - - with ensure_clean_path(self.path) as path: - panel_with_missing.to_hdf( - path, 'panel_with_missing', format='table') - reloaded_panel = read_hdf(path, 'panel_with_missing') - tm.assert_panel_equal(panel_with_missing, reloaded_panel) - def test_append_frame_column_oriented(self): with ensure_clean_store(self.path) as store: @@ -1342,40 +1298,11 @@ def test_append_with_strings(self): with ensure_clean_store(self.path) as store: with catch_warnings(record=True): - simplefilter("ignore", FutureWarning) - wp = tm.makePanel() - wp2 = wp.rename( - minor_axis={x: "%s_extra" % x for x in wp.minor_axis}) def check_col(key, name, size): assert getattr(store.get_storer(key) .table.description, name).itemsize == size - store.append('s1', wp, min_itemsize=20) - store.append('s1', wp2) - expected = concat([wp, wp2], axis=2) - expected = expected.reindex( - minor_axis=sorted(expected.minor_axis)) - assert_panel_equal(store['s1'], expected) - check_col('s1', 'minor_axis', 20) - - # test dict format - store.append('s2', wp, min_itemsize={'minor_axis': 20}) - store.append('s2', wp2) - expected = concat([wp, wp2], axis=2) - expected = expected.reindex( - minor_axis=sorted(expected.minor_axis)) - assert_panel_equal(store['s2'], expected) - check_col('s2', 'minor_axis', 20) - - # apply the wrong field (similar to #1) - store.append('s3', wp, min_itemsize={'major_axis': 20}) - pytest.raises(ValueError, store.append, 's3', wp2) - - # test truncation of bigger strings - store.append('s4', wp) - pytest.raises(ValueError, store.append, 's4', wp2) - # avoid truncation on elements df = DataFrame([[123, 'asdqwerty'], [345, 'dggnhebbsdfbdfb']]) store.append('df_big', df) @@ -1674,32 +1601,6 @@ def check_col(key, name, size): (df_dc.string == 'foo')] tm.assert_frame_equal(result, expected) - with ensure_clean_store(self.path) as store: - with catch_warnings(record=True): - # panel - # GH5717 not handling data_columns - np.random.seed(1234) - p = tm.makePanel() - - store.append('p1', p) - tm.assert_panel_equal(store.select('p1'), p) - - store.append('p2', p, data_columns=True) - tm.assert_panel_equal(store.select('p2'), p) - - result = store.select('p2', where='ItemA>0') - expected = p.to_frame() - expected = expected[expected['ItemA'] > 0] - tm.assert_frame_equal(result.to_frame(), expected) - - result = store.select( - 'p2', where='ItemA>0 & minor_axis=["A","B"]') - expected = p.to_frame() - expected = expected[expected['ItemA'] > 0] - expected = expected[expected.reset_index( - level=['major']).index.isin(['A', 'B'])] - tm.assert_frame_equal(result.to_frame(), expected) - def test_create_table_index(self): with ensure_clean_store(self.path) as store: @@ -1708,37 +1609,6 @@ def test_create_table_index(self): def col(t, column): return getattr(store.get_storer(t).table.cols, column) - # index=False - wp = tm.makePanel() - store.append('p5', wp, index=False) - store.create_table_index('p5', columns=['major_axis']) - assert(col('p5', 'major_axis').is_indexed is True) - assert(col('p5', 'minor_axis').is_indexed is False) - - # index=True - store.append('p5i', wp, index=True) - assert(col('p5i', 'major_axis').is_indexed is True) - assert(col('p5i', 'minor_axis').is_indexed is True) - - # default optlevels - store.get_storer('p5').create_index() - assert(col('p5', 'major_axis').index.optlevel == 6) - assert(col('p5', 'minor_axis').index.kind == 'medium') - - # let's change the indexing scheme - store.create_table_index('p5') - assert(col('p5', 'major_axis').index.optlevel == 6) - assert(col('p5', 'minor_axis').index.kind == 'medium') - store.create_table_index('p5', optlevel=9) - assert(col('p5', 'major_axis').index.optlevel == 9) - assert(col('p5', 'minor_axis').index.kind == 'medium') - store.create_table_index('p5', kind='full') - assert(col('p5', 'major_axis').index.optlevel == 9) - assert(col('p5', 'minor_axis').index.kind == 'full') - store.create_table_index('p5', optlevel=1, kind='light') - assert(col('p5', 'major_axis').index.optlevel == 1) - assert(col('p5', 'minor_axis').index.kind == 'light') - # data columns df = tm.makeTimeDataFrame() df['string'] = 'foo' @@ -1761,19 +1631,6 @@ def col(t, column): store.put('f2', df) pytest.raises(TypeError, store.create_table_index, 'f2') - def test_append_diff_item_order(self): - - with catch_warnings(record=True): - wp = tm.makePanel() - wp1 = wp.iloc[:, :10, :] - wp2 = wp.iloc[wp.items.get_indexer(['ItemC', 'ItemB', 'ItemA']), - 10:, :] - - with ensure_clean_store(self.path) as store: - store.put('panel', wp1, format='table') - pytest.raises(ValueError, store.put, 'panel', wp2, - append=True) - def test_append_hierarchical(self): index = MultiIndex(levels=[['foo', 'bar', 'baz', 'qux'], ['one', 'two', 'three']], @@ -1987,10 +1844,6 @@ def check(obj, comparator): df['time2'] = Timestamp('20130102') check(df, tm.assert_frame_equal) - with catch_warnings(record=True): - p = tm.makePanel() - check(p, assert_panel_equal) - # empty frame, GH4273 with ensure_clean_store(self.path) as store: @@ -2011,24 +1864,6 @@ def check(obj, comparator): store.put('df2', df) assert_frame_equal(store.select('df2'), df) - with catch_warnings(record=True): - - # 0 len - p_empty = Panel(items=list('ABC')) - store.append('p', p_empty) - pytest.raises(KeyError, store.select, 'p') - - # repeated append of 0/non-zero frames - p = Panel(np.random.randn(3, 4, 5), items=list('ABC')) - store.append('p', p) - assert_panel_equal(store.select('p'), p) - store.append('p', p_empty) - assert_panel_equal(store.select('p'), p) - - # store - store.put('p2', p_empty) - assert_panel_equal(store.select('p2'), p_empty) - def test_append_raise(self): with ensure_clean_store(self.path) as store: @@ -2143,24 +1978,6 @@ def test_table_mixed_dtypes(self): store.append('df1_mixed', df) tm.assert_frame_equal(store.select('df1_mixed'), df) - with catch_warnings(record=True): - - # panel - wp = tm.makePanel() - wp['obj1'] = 'foo' - wp['obj2'] = 'bar' - wp['bool1'] = wp['ItemA'] > 0 - wp['bool2'] = wp['ItemB'] > 0 - wp['int1'] = 1 - wp['int2'] = 2 - wp = wp._consolidate() - - with catch_warnings(record=True): - - with ensure_clean_store(self.path) as store: - store.append('p1_mixed', wp) - assert_panel_equal(store.select('p1_mixed'), wp) - def test_unimplemented_dtypes_table_columns(self): with ensure_clean_store(self.path) as store: @@ -2308,193 +2125,6 @@ def test_remove(self): del store['b'] assert len(store) == 0 - def test_remove_where(self): - - with ensure_clean_store(self.path) as store: - - with catch_warnings(record=True): - - # non-existance - crit1 = 'index>foo' - pytest.raises(KeyError, store.remove, 'a', [crit1]) - - # try to remove non-table (with crit) - # non-table ok (where = None) - wp = tm.makePanel(30) - store.put('wp', wp, format='table') - store.remove('wp', ["minor_axis=['A', 'D']"]) - rs = store.select('wp') - expected = wp.reindex(minor_axis=['B', 'C']) - assert_panel_equal(rs, expected) - - # empty where - _maybe_remove(store, 'wp') - store.put('wp', wp, format='table') - - # deleted number (entire table) - n = store.remove('wp', []) - assert n == 120 - - # non - empty where - _maybe_remove(store, 'wp') - store.put('wp', wp, format='table') - pytest.raises(ValueError, store.remove, - 'wp', ['foo']) - - def test_remove_startstop(self): - # GH #4835 and #6177 - - with ensure_clean_store(self.path) as store: - - with catch_warnings(record=True): - wp = tm.makePanel(30) - - # start - _maybe_remove(store, 'wp1') - store.put('wp1', wp, format='t') - n = store.remove('wp1', start=32) - assert n == 120 - 32 - result = store.select('wp1') - expected = wp.reindex(major_axis=wp.major_axis[:32 // 4]) - assert_panel_equal(result, expected) - - _maybe_remove(store, 'wp2') - store.put('wp2', wp, format='t') - n = store.remove('wp2', start=-32) - assert n == 32 - result = store.select('wp2') - expected = wp.reindex(major_axis=wp.major_axis[:-32 // 4]) - assert_panel_equal(result, expected) - - # stop - _maybe_remove(store, 'wp3') - store.put('wp3', wp, format='t') - n = store.remove('wp3', stop=32) - assert n == 32 - result = store.select('wp3') - expected = wp.reindex(major_axis=wp.major_axis[32 // 4:]) - assert_panel_equal(result, expected) - - _maybe_remove(store, 'wp4') - store.put('wp4', wp, format='t') - n = store.remove('wp4', stop=-32) - assert n == 120 - 32 - result = store.select('wp4') - expected = wp.reindex(major_axis=wp.major_axis[-32 // 4:]) - assert_panel_equal(result, expected) - - # start n stop - _maybe_remove(store, 'wp5') - store.put('wp5', wp, format='t') - n = store.remove('wp5', start=16, stop=-16) - assert n == 120 - 32 - result = store.select('wp5') - expected = wp.reindex( - major_axis=(wp.major_axis[:16 // 4] - .union(wp.major_axis[-16 // 4:]))) - assert_panel_equal(result, expected) - - _maybe_remove(store, 'wp6') - store.put('wp6', wp, format='t') - n = store.remove('wp6', start=16, stop=16) - assert n == 0 - result = store.select('wp6') - expected = wp.reindex(major_axis=wp.major_axis) - assert_panel_equal(result, expected) - - # with where - _maybe_remove(store, 'wp7') - - # TODO: unused? - date = wp.major_axis.take(np.arange(0, 30, 3)) # noqa - - crit = 'major_axis=date' - store.put('wp7', wp, format='t') - n = store.remove('wp7', where=[crit], stop=80) - assert n == 28 - result = store.select('wp7') - expected = wp.reindex(major_axis=wp.major_axis.difference( - wp.major_axis[np.arange(0, 20, 3)])) - assert_panel_equal(result, expected) - - def test_remove_crit(self): - - with ensure_clean_store(self.path) as store: - - with catch_warnings(record=True): - wp = tm.makePanel(30) - - # group row removal - _maybe_remove(store, 'wp3') - date4 = wp.major_axis.take([0, 1, 2, 4, 5, 6, 8, 9, 10]) - crit4 = 'major_axis=date4' - store.put('wp3', wp, format='t') - n = store.remove('wp3', where=[crit4]) - assert n == 36 - - result = store.select('wp3') - expected = wp.reindex( - major_axis=wp.major_axis.difference(date4)) - assert_panel_equal(result, expected) - - # upper half - _maybe_remove(store, 'wp') - store.put('wp', wp, format='table') - date = wp.major_axis[len(wp.major_axis) // 2] - - crit1 = 'major_axis>date' - crit2 = "minor_axis=['A', 'D']" - n = store.remove('wp', where=[crit1]) - assert n == 56 - - n = store.remove('wp', where=[crit2]) - assert n == 32 - - result = store['wp'] - expected = wp.truncate(after=date).reindex(minor=['B', 'C']) - assert_panel_equal(result, expected) - - # individual row elements - _maybe_remove(store, 'wp2') - store.put('wp2', wp, format='table') - - date1 = wp.major_axis[1:3] - crit1 = 'major_axis=date1' - store.remove('wp2', where=[crit1]) - result = store.select('wp2') - expected = wp.reindex( - major_axis=wp.major_axis.difference(date1)) - assert_panel_equal(result, expected) - - date2 = wp.major_axis[5] - crit2 = 'major_axis=date2' - store.remove('wp2', where=[crit2]) - result = store['wp2'] - expected = wp.reindex( - major_axis=(wp.major_axis - .difference(date1) - .difference(Index([date2])) - )) - assert_panel_equal(result, expected) - - date3 = [wp.major_axis[7], wp.major_axis[9]] - crit3 = 'major_axis=date3' - store.remove('wp2', where=[crit3]) - result = store['wp2'] - expected = wp.reindex(major_axis=wp.major_axis - .difference(date1) - .difference(Index([date2])) - .difference(Index(date3))) - assert_panel_equal(result, expected) - - # corners - _maybe_remove(store, 'wp4') - store.put('wp4', wp, format='table') - n = store.remove( - 'wp4', where="major_axis>wp.major_axis[-1]") - result = store.select('wp4') - assert_panel_equal(result, wp) - def test_invalid_terms(self): with ensure_clean_store(self.path) as store: @@ -2504,27 +2134,16 @@ def test_invalid_terms(self): df = tm.makeTimeDataFrame() df['string'] = 'foo' df.loc[0:4, 'string'] = 'bar' - wp = tm.makePanel() store.put('df', df, format='table') - store.put('wp', wp, format='table') # some invalid terms - pytest.raises(ValueError, store.select, - 'wp', "minor=['A', 'B']") - pytest.raises(ValueError, store.select, - 'wp', ["index=['20121114']"]) - pytest.raises(ValueError, store.select, 'wp', [ - "index=['20121114', '20121114']"]) pytest.raises(TypeError, Term) # more invalid pytest.raises( ValueError, store.select, 'df', 'df.index[3]') pytest.raises(SyntaxError, store.select, 'df', 'index>') - pytest.raises( - ValueError, store.select, 'wp', - "major_axis<'20000108' & minor_axis['A', 'B']") # from the docs with ensure_clean_path(self.path) as path: @@ -2546,127 +2165,6 @@ def test_invalid_terms(self): pytest.raises(ValueError, read_hdf, path, 'dfq', where="A>0 or C>0") - def test_terms(self): - - with ensure_clean_store(self.path) as store: - - with catch_warnings(record=True): - simplefilter("ignore", FutureWarning) - - wp = tm.makePanel() - wpneg = Panel.fromDict({-1: tm.makeDataFrame(), - 0: tm.makeDataFrame(), - 1: tm.makeDataFrame()}) - - store.put('wp', wp, format='table') - store.put('wpneg', wpneg, format='table') - - # panel - result = store.select( - 'wp', - "major_axis<'20000108' and minor_axis=['A', 'B']") - expected = wp.truncate( - after='20000108').reindex(minor=['A', 'B']) - assert_panel_equal(result, expected) - - # with deprecation - result = store.select( - 'wp', where=("major_axis<'20000108' " - "and minor_axis=['A', 'B']")) - expected = wp.truncate( - after='20000108').reindex(minor=['A', 'B']) - tm.assert_panel_equal(result, expected) - - with catch_warnings(record=True): - - # valid terms - terms = [('major_axis=20121114'), - ('major_axis>20121114'), - (("major_axis=['20121114', '20121114']"),), - ('major_axis=datetime.datetime(2012, 11, 14)'), - 'major_axis> 20121114', - 'major_axis >20121114', - 'major_axis > 20121114', - (("minor_axis=['A', 'B']"),), - (("minor_axis=['A', 'B']"),), - ((("minor_axis==['A', 'B']"),),), - (("items=['ItemA', 'ItemB']"),), - ('items=ItemA'), - ] - - for t in terms: - store.select('wp', t) - - with pytest.raises(TypeError, - match='Only named functions are supported'): - store.select( - 'wp', - 'major_axis == (lambda x: x)("20130101")') - - with catch_warnings(record=True): - # check USub node parsing - res = store.select('wpneg', 'items == -1') - expected = Panel({-1: wpneg[-1]}) - tm.assert_panel_equal(res, expected) - - msg = 'Unary addition not supported' - with pytest.raises(NotImplementedError, match=msg): - store.select('wpneg', 'items == +1') - - def test_term_compat(self): - with ensure_clean_store(self.path) as store: - - with catch_warnings(record=True): - wp = Panel(np.random.randn(2, 5, 4), items=['Item1', 'Item2'], - major_axis=date_range('1/1/2000', periods=5), - minor_axis=['A', 'B', 'C', 'D']) - store.append('wp', wp) - - result = store.select( - 'wp', where=("major_axis>20000102 " - "and minor_axis=['A', 'B']")) - expected = wp.loc[:, wp.major_axis > - Timestamp('20000102'), ['A', 'B']] - assert_panel_equal(result, expected) - - store.remove('wp', 'major_axis>20000103') - result = store.select('wp') - expected = wp.loc[:, wp.major_axis <= Timestamp('20000103'), :] - assert_panel_equal(result, expected) - - with ensure_clean_store(self.path) as store: - - with catch_warnings(record=True): - wp = Panel(np.random.randn(2, 5, 4), - items=['Item1', 'Item2'], - major_axis=date_range('1/1/2000', periods=5), - minor_axis=['A', 'B', 'C', 'D']) - store.append('wp', wp) - - # stringified datetimes - result = store.select( - 'wp', 'major_axis>datetime.datetime(2000, 1, 2)') - expected = wp.loc[:, wp.major_axis > Timestamp('20000102')] - assert_panel_equal(result, expected) - - result = store.select( - 'wp', 'major_axis>datetime.datetime(2000, 1, 2)') - expected = wp.loc[:, wp.major_axis > Timestamp('20000102')] - assert_panel_equal(result, expected) - - result = store.select( - 'wp', - "major_axis=[datetime.datetime(2000, 1, 2, 0, 0), " - "datetime.datetime(2000, 1, 3, 0, 0)]") - expected = wp.loc[:, [Timestamp('20000102'), - Timestamp('20000103')]] - assert_panel_equal(result, expected) - - result = store.select( - 'wp', "minor_axis=['A', 'B']") - expected = wp.loc[:, :, ['A', 'B']] - assert_panel_equal(result, expected) - def test_same_name_scoping(self): with ensure_clean_store(self.path) as store: @@ -2982,12 +2480,6 @@ def _make_one(): self._check_roundtrip(df1['int1'], tm.assert_series_equal, compression=compression) - def test_wide(self): - - with catch_warnings(record=True): - wp = tm.makePanel() - self._check_roundtrip(wp, assert_panel_equal) - @pytest.mark.filterwarnings( "ignore:\\nduplicate:pandas.io.pytables.DuplicateWarning" ) @@ -3050,29 +2542,6 @@ def test_select_with_dups(self): result = store.select('df', columns=['B', 'A']) assert_frame_equal(result, expected, by_blocks=True) - @pytest.mark.filterwarnings( - "ignore:\\nduplicate:pandas.io.pytables.DuplicateWarning" - ) - def test_wide_table_dups(self): - with ensure_clean_store(self.path) as store: - with catch_warnings(record=True): - - wp = tm.makePanel() - store.put('panel', wp, format='table') - store.put('panel', wp, format='table', append=True) - - recons = store['panel'] - - assert_panel_equal(recons, wp) - - def test_long(self): - def _check(left, right): - assert_panel_equal(left.to_panel(), right.to_panel()) - - with catch_warnings(record=True): - wp = tm.makePanel() - self._check_roundtrip(wp.to_frame(), _check) - def test_overwrite_node(self): with ensure_clean_store(self.path) as store: @@ -3119,34 +2588,6 @@ def test_select(self): with ensure_clean_store(self.path) as store: with catch_warnings(record=True): - wp = tm.makePanel() - - # put/select ok - _maybe_remove(store, 'wp') - store.put('wp', wp, format='table') - store.select('wp') - - # non-table ok (where = None) - _maybe_remove(store, 'wp') - store.put('wp2', wp) - store.select('wp2') - - # selection on the non-indexable with a large number of columns - wp = Panel(np.random.randn(100, 100, 100), - items=['Item%03d' % i for i in range(100)], - major_axis=date_range('1/1/2000', periods=100), - minor_axis=['E%03d' % i for i in range(100)]) - - _maybe_remove(store, 'wp') - store.append('wp', wp) - items = ['Item%03d' % i for i in range(80)] - result = store.select('wp', 'items=items') - expected = wp.reindex(items=items) - assert_panel_equal(expected, result) - - # selectin non-table with a where - # pytest.raises(ValueError, store.select, - # 'wp2', ('column', ['A', 'D'])) # select with columns= df = tm.makeTimeDataFrame() @@ -3675,31 +3116,6 @@ def test_retain_index_attributes2(self): assert read_hdf(path, 'data').index.name is None - def test_panel_select(self): - - with ensure_clean_store(self.path) as store: - - with catch_warnings(record=True): - - wp = tm.makePanel() - - store.put('wp', wp, format='table') - date = wp.major_axis[len(wp.major_axis) // 2] - - crit1 = ('major_axis>=date') - crit2 = ("minor_axis=['A', 'D']") - - result = store.select('wp', [crit1, crit2]) - expected = wp.truncate(before=date).reindex(minor=['A', 'D']) - assert_panel_equal(result, expected) - - result = store.select( - 'wp', ['major_axis>="20000124"', - ("minor_axis=['A', 'B']")]) - expected = wp.truncate( - before='20000124').reindex(minor=['A', 'B']) - assert_panel_equal(result, expected) - def test_frame_select(self): df = tm.makeTimeDataFrame() @@ -4540,7 +3956,7 @@ def test_pytables_native2_read(self, datapath): def test_legacy_table_fixed_format_read_py2(self, datapath): # GH 24510 - # legacy table with fixed format written en Python 2 + # legacy table with fixed format written in Python 2 with ensure_clean_store( datapath('io', 'data', 'legacy_hdf', 'legacy_table_fixed_py2.h5'), @@ -4552,29 +3968,20 @@ def test_legacy_table_fixed_format_read_py2(self, datapath): name='INDEX_NAME')) assert_frame_equal(expected, result) - def test_legacy_table_read(self, datapath): - # legacy table types + def test_legacy_table_read_py2(self, datapath): + # issue: 24925 + # legacy table written in Python 2 with ensure_clean_store( - datapath('io', 'data', 'legacy_hdf', 'legacy_table.h5'), + datapath('io', 'data', 'legacy_hdf', + 'legacy_table_py2.h5'), mode='r') as store: + result = store.select('table') - with catch_warnings(): - simplefilter("ignore", pd.io.pytables.IncompatibilityWarning) - store.select('df1') - store.select('df2') - store.select('wp1') - - # force the frame - store.select('df2', typ='legacy_frame') - - # old version warning - pytest.raises( - Exception, store.select, 'wp1', 'minor_axis=B') - - df2 = store.select('df2') - result = store.select('df2', 'index>df2.index[2]') - expected = df2[df2.index > df2.index[2]] - assert_frame_equal(expected, result) + expected = pd.DataFrame({ + "a": ["a", "b"], + "b": [2, 3] + }) + assert_frame_equal(expected, result) def test_copy(self): @@ -5308,35 +4715,30 @@ def test_complex_mixed_table(self): reread = read_hdf(path, 'df') assert_frame_equal(df, reread) - @pytest.mark.filterwarnings("ignore:\\nPanel:FutureWarning") def test_complex_across_dimensions_fixed(self): with catch_warnings(record=True): complex128 = np.array( [1.0 + 1.0j, 1.0 + 1.0j, 1.0 + 1.0j, 1.0 + 1.0j]) s = Series(complex128, index=list('abcd')) df = DataFrame({'A': s, 'B': s}) - p = Panel({'One': df, 'Two': df}) - objs = [s, df, p] - comps = [tm.assert_series_equal, tm.assert_frame_equal, - tm.assert_panel_equal] + objs = [s, df] + comps = [tm.assert_series_equal, tm.assert_frame_equal] for obj, comp in zip(objs, comps): with ensure_clean_path(self.path) as path: obj.to_hdf(path, 'obj', format='fixed') reread = read_hdf(path, 'obj') comp(obj, reread) - @pytest.mark.filterwarnings("ignore:\\nPanel:FutureWarning") def test_complex_across_dimensions(self): complex128 = np.array([1.0 + 1.0j, 1.0 + 1.0j, 1.0 + 1.0j, 1.0 + 1.0j]) s = Series(complex128, index=list('abcd')) df = DataFrame({'A': s, 'B': s}) with catch_warnings(record=True): - p = Panel({'One': df, 'Two': df}) - objs = [df, p] - comps = [tm.assert_frame_equal, tm.assert_panel_equal] + objs = [df] + comps = [tm.assert_frame_equal] for obj, comp in zip(objs, comps): with ensure_clean_path(self.path) as path: obj.to_hdf(path, 'obj', format='table') diff --git a/pandas/tests/io/test_sql.py b/pandas/tests/io/test_sql.py index 75a6d8d009083..9d0bce3b342b4 100644 --- a/pandas/tests/io/test_sql.py +++ b/pandas/tests/io/test_sql.py @@ -605,12 +605,6 @@ def test_to_sql_series(self): s2 = sql.read_sql_query("SELECT * FROM test_series", self.conn) tm.assert_frame_equal(s.to_frame(), s2) - @pytest.mark.filterwarnings("ignore:\\nPanel:FutureWarning") - def test_to_sql_panel(self): - panel = tm.makePanel() - pytest.raises(NotImplementedError, sql.to_sql, panel, - 'test_panel', self.conn) - def test_roundtrip(self): sql.to_sql(self.test_frame1, 'test_frame_roundtrip', con=self.conn) diff --git a/pandas/tests/plotting/test_frame.py b/pandas/tests/plotting/test_frame.py index 0e7672f4e2f9d..98b241f5c8206 100644 --- a/pandas/tests/plotting/test_frame.py +++ b/pandas/tests/plotting/test_frame.py @@ -2988,22 +2988,6 @@ def test_secondary_axis_font_size(self, method): self._check_ticks_props(axes=ax.right_ax, ylabelsize=fontsize) - def test_misc_bindings(self, monkeypatch): - df = pd.DataFrame(randn(10, 10), columns=list('abcdefghij')) - monkeypatch.setattr('pandas.plotting._misc.scatter_matrix', - lambda x: 2) - monkeypatch.setattr('pandas.plotting._misc.andrews_curves', - lambda x, y: 2) - monkeypatch.setattr('pandas.plotting._misc.parallel_coordinates', - lambda x, y: 2) - monkeypatch.setattr('pandas.plotting._misc.radviz', - lambda x, y: 2) - - assert df.plot.scatter_matrix() == 2 - assert df.plot.andrews_curves('a') == 2 - assert df.plot.parallel_coordinates('a') == 2 - assert df.plot.radviz('a') == 2 - def _generate_4_axes_via_gridspec(): import matplotlib.pyplot as plt diff --git a/pandas/tests/plotting/test_series.py b/pandas/tests/plotting/test_series.py index 1e223c20f55b7..07a4b168a66f1 100644 --- a/pandas/tests/plotting/test_series.py +++ b/pandas/tests/plotting/test_series.py @@ -878,19 +878,6 @@ def test_custom_business_day_freq(self): _check_plot_works(s.plot) - def test_misc_bindings(self, monkeypatch): - s = Series(randn(10)) - monkeypatch.setattr('pandas.plotting._misc.lag_plot', - lambda x: 2) - monkeypatch.setattr('pandas.plotting._misc.autocorrelation_plot', - lambda x: 2) - monkeypatch.setattr('pandas.plotting._misc.bootstrap_plot', - lambda x: 2) - - assert s.plot.lag() == 2 - assert s.plot.autocorrelation() == 2 - assert s.plot.bootstrap() == 2 - @pytest.mark.xfail def test_plot_accessor_updates_on_inplace(self): s = Series([1, 2, 3, 4]) diff --git a/pandas/tests/resample/test_base.py b/pandas/tests/resample/test_base.py index 911cd990ab881..48debfa2848e7 100644 --- a/pandas/tests/resample/test_base.py +++ b/pandas/tests/resample/test_base.py @@ -95,7 +95,10 @@ def test_resample_interpolate_all_ts(frame): def test_raises_on_non_datetimelike_index(): # this is a non datetimelike index xp = DataFrame() - pytest.raises(TypeError, lambda: xp.resample('A').mean()) + msg = ("Only valid with DatetimeIndex, TimedeltaIndex or PeriodIndex," + " but got an instance of 'Index'") + with pytest.raises(TypeError, match=msg): + xp.resample('A').mean() @pytest.mark.parametrize('freq', ['M', 'D', 'H']) @@ -189,8 +192,10 @@ def test_resample_loffset_arg_type_all_ts(frame, create_index): # GH 13022, 7687 - TODO: fix resample w/ TimedeltaIndex if isinstance(expected.index, TimedeltaIndex): - with pytest.raises(AssertionError): + msg = "DataFrame are different" + with pytest.raises(AssertionError, match=msg): assert_frame_equal(result_agg, expected) + with pytest.raises(AssertionError, match=msg): assert_frame_equal(result_how, expected) else: assert_frame_equal(result_agg, expected) diff --git a/pandas/tests/resample/test_datetime_index.py b/pandas/tests/resample/test_datetime_index.py index 73995cbe79ecd..ceccb48194f85 100644 --- a/pandas/tests/resample/test_datetime_index.py +++ b/pandas/tests/resample/test_datetime_index.py @@ -1,6 +1,5 @@ from datetime import datetime, timedelta from functools import partial -from warnings import catch_warnings, simplefilter import numpy as np import pytest @@ -10,7 +9,7 @@ from pandas.errors import UnsupportedFunctionCall import pandas as pd -from pandas import DataFrame, Panel, Series, Timedelta, Timestamp, isna, notna +from pandas import DataFrame, Series, Timedelta, Timestamp, isna, notna from pandas.core.indexes.datetimes import date_range from pandas.core.indexes.period import Period, period_range from pandas.core.resample import ( @@ -113,16 +112,18 @@ def test_resample_basic_grouper(series): @pytest.mark.parametrize( '_index_start,_index_end,_index_name', [('1/1/2000 00:00:00', '1/1/2000 00:13:00', 'index')]) -@pytest.mark.parametrize('kwargs', [ - dict(label='righttt'), - dict(closed='righttt'), - dict(convention='starttt') +@pytest.mark.parametrize('keyword,value', [ + ('label', 'righttt'), + ('closed', 'righttt'), + ('convention', 'starttt') ]) -def test_resample_string_kwargs(series, kwargs): +def test_resample_string_kwargs(series, keyword, value): # see gh-19303 # Check that wrong keyword argument strings raise an error - with pytest.raises(ValueError, match='Unsupported value'): - series.resample('5min', **kwargs) + msg = "Unsupported value {value} for `{keyword}`".format( + value=value, keyword=keyword) + with pytest.raises(ValueError, match=msg): + series.resample('5min', **({keyword: value})) @pytest.mark.parametrize( @@ -676,7 +677,7 @@ def test_asfreq_non_unique(): ts = Series(np.random.randn(len(rng2)), index=rng2) msg = 'cannot reindex from a duplicate axis' - with pytest.raises(Exception, match=msg): + with pytest.raises(ValueError, match=msg): ts.asfreq('B') @@ -690,56 +691,6 @@ def test_resample_axis1(): tm.assert_frame_equal(result, expected) -def test_resample_panel(): - rng = date_range('1/1/2000', '6/30/2000') - n = len(rng) - - with catch_warnings(record=True): - simplefilter("ignore", FutureWarning) - panel = Panel(np.random.randn(3, n, 5), - items=['one', 'two', 'three'], - major_axis=rng, - minor_axis=['a', 'b', 'c', 'd', 'e']) - - result = panel.resample('M', axis=1).mean() - - def p_apply(panel, f): - result = {} - for item in panel.items: - result[item] = f(panel[item]) - return Panel(result, items=panel.items) - - expected = p_apply(panel, lambda x: x.resample('M').mean()) - tm.assert_panel_equal(result, expected) - - panel2 = panel.swapaxes(1, 2) - result = panel2.resample('M', axis=2).mean() - expected = p_apply(panel2, - lambda x: x.resample('M', axis=1).mean()) - tm.assert_panel_equal(result, expected) - - -@pytest.mark.filterwarnings("ignore:\\nPanel:FutureWarning") -def test_resample_panel_numpy(): - rng = date_range('1/1/2000', '6/30/2000') - n = len(rng) - - with catch_warnings(record=True): - panel = Panel(np.random.randn(3, n, 5), - items=['one', 'two', 'three'], - major_axis=rng, - minor_axis=['a', 'b', 'c', 'd', 'e']) - - result = panel.resample('M', axis=1).apply(lambda x: x.mean(1)) - expected = panel.resample('M', axis=1).mean() - tm.assert_panel_equal(result, expected) - - panel = panel.swapaxes(1, 2) - result = panel.resample('M', axis=2).apply(lambda x: x.mean(2)) - expected = panel.resample('M', axis=2).mean() - tm.assert_panel_equal(result, expected) - - def test_resample_anchored_ticks(): # If a fixed delta (5 minute, 4 hour) evenly divides a day, we should # "anchor" the origin at midnight so we get regular intervals rather @@ -1276,6 +1227,21 @@ def test_resample_across_dst(): assert_frame_equal(result, expected) +def test_groupby_with_dst_time_change(): + # GH 24972 + index = pd.DatetimeIndex([1478064900001000000, 1480037118776792000], + tz='UTC').tz_convert('America/Chicago') + + df = pd.DataFrame([1, 2], index=index) + result = df.groupby(pd.Grouper(freq='1d')).last() + expected_index_values = pd.date_range('2016-11-02', '2016-11-24', + freq='d', tz='America/Chicago') + + index = pd.DatetimeIndex(expected_index_values) + expected = pd.DataFrame([1.0] + ([np.nan] * 21) + [2.0], index=index) + assert_frame_equal(result, expected) + + def test_resample_dst_anchor(): # 5172 dti = DatetimeIndex([datetime(2012, 11, 4, 23)], tz='US/Eastern') diff --git a/pandas/tests/resample/test_period_index.py b/pandas/tests/resample/test_period_index.py index c2fbb5bbb088c..8abdf9034527b 100644 --- a/pandas/tests/resample/test_period_index.py +++ b/pandas/tests/resample/test_period_index.py @@ -11,6 +11,7 @@ import pandas as pd from pandas import DataFrame, Series, Timestamp +from pandas.core.indexes.base import InvalidIndexError from pandas.core.indexes.datetimes import date_range from pandas.core.indexes.period import Period, PeriodIndex, period_range from pandas.core.resample import _get_period_range_edges @@ -72,17 +73,19 @@ def test_asfreq_fill_value(self, series): @pytest.mark.parametrize('freq', ['H', '12H', '2D', 'W']) @pytest.mark.parametrize('kind', [None, 'period', 'timestamp']) - def test_selection(self, index, freq, kind): + @pytest.mark.parametrize('kwargs', [dict(on='date'), dict(level='d')]) + def test_selection(self, index, freq, kind, kwargs): # This is a bug, these should be implemented # GH 14008 rng = np.arange(len(index), dtype=np.int64) df = DataFrame({'date': index, 'a': rng}, index=pd.MultiIndex.from_arrays([rng, index], names=['v', 'd'])) - with pytest.raises(NotImplementedError): - df.resample(freq, on='date', kind=kind) - with pytest.raises(NotImplementedError): - df.resample(freq, level='d', kind=kind) + msg = ("Resampling from level= or on= selection with a PeriodIndex is" + r" not currently supported, use \.set_index\(\.\.\.\) to" + " explicitly set index") + with pytest.raises(NotImplementedError, match=msg): + df.resample(freq, kind=kind, **kwargs) @pytest.mark.parametrize('month', MONTHS) @pytest.mark.parametrize('meth', ['ffill', 'bfill']) @@ -110,13 +113,20 @@ def test_basic_downsample(self, simple_period_range_series): assert_series_equal(ts.resample('a-dec').mean(), result) assert_series_equal(ts.resample('a').mean(), result) - def test_not_subperiod(self, simple_period_range_series): + @pytest.mark.parametrize('rule,expected_error_msg', [ + ('a-dec', ''), + ('q-mar', ''), + ('M', ''), + ('w-thu', '') + ]) + def test_not_subperiod( + self, simple_period_range_series, rule, expected_error_msg): # These are incompatible period rules for resampling ts = simple_period_range_series('1/1/1990', '6/30/1995', freq='w-wed') - pytest.raises(ValueError, lambda: ts.resample('a-dec').mean()) - pytest.raises(ValueError, lambda: ts.resample('q-mar').mean()) - pytest.raises(ValueError, lambda: ts.resample('M').mean()) - pytest.raises(ValueError, lambda: ts.resample('w-thu').mean()) + msg = ("Frequency cannot be resampled to {}, as they" + " are not sub or super periods").format(expected_error_msg) + with pytest.raises(IncompatibleFrequency, match=msg): + ts.resample(rule).mean() @pytest.mark.parametrize('freq', ['D', '2D']) def test_basic_upsample(self, freq, simple_period_range_series): @@ -212,8 +222,9 @@ def test_resample_same_freq(self, resample_method): assert_series_equal(result, expected) def test_resample_incompat_freq(self): - - with pytest.raises(IncompatibleFrequency): + msg = ("Frequency cannot be resampled to ," + " as they are not sub or super periods") + with pytest.raises(IncompatibleFrequency, match=msg): Series(range(3), index=pd.period_range( start='2000', periods=3, freq='M')).resample('W').mean() @@ -373,7 +384,9 @@ def test_resample_fill_missing(self): def test_cant_fill_missing_dups(self): rng = PeriodIndex([2000, 2005, 2005, 2007, 2007], freq='A') s = Series(np.random.randn(5), index=rng) - pytest.raises(Exception, lambda: s.resample('A').ffill()) + msg = "Reindexing only valid with uniquely valued Index objects" + with pytest.raises(InvalidIndexError, match=msg): + s.resample('A').ffill() @pytest.mark.parametrize('freq', ['5min']) @pytest.mark.parametrize('kind', ['period', None, 'timestamp']) diff --git a/pandas/tests/resample/test_resample_api.py b/pandas/tests/resample/test_resample_api.py index 69684daf05f3d..69acf4ba6bde8 100644 --- a/pandas/tests/resample/test_resample_api.py +++ b/pandas/tests/resample/test_resample_api.py @@ -1,11 +1,12 @@ # pylint: disable=E1101 +from collections import OrderedDict from datetime import datetime import numpy as np import pytest -from pandas.compat import OrderedDict, range +from pandas.compat import range import pandas as pd from pandas import DataFrame, Series @@ -113,16 +114,14 @@ def test_getitem(): test_frame.columns[[0, 1]]) -def test_select_bad_cols(): - +@pytest.mark.parametrize('key', [['D'], ['A', 'D']]) +def test_select_bad_cols(key): g = test_frame.resample('H') - pytest.raises(KeyError, g.__getitem__, ['D']) - - pytest.raises(KeyError, g.__getitem__, ['A', 'D']) - with pytest.raises(KeyError, match='^[^A]+$'): - # A should not be referenced as a bad column... - # will have to rethink regex if you change message! - g[['A', 'D']] + # 'A' should not be referenced as a bad column... + # will have to rethink regex if you change message! + msg = r"^\"Columns not found: 'D'\"$" + with pytest.raises(KeyError, match=msg): + g[key] def test_attribute_access(): @@ -216,7 +215,9 @@ def test_fillna(): result = r.fillna(method='bfill') assert_series_equal(result, expected) - with pytest.raises(ValueError): + msg = (r"Invalid fill method\. Expecting pad \(ffill\), backfill" + r" \(bfill\) or nearest\. Got 0") + with pytest.raises(ValueError, match=msg): r.fillna(0) @@ -437,12 +438,11 @@ def test_agg_misc(): # errors # invalid names in the agg specification + msg = "\"Column 'B' does not exist!\"" for t in cases: - with pytest.raises(KeyError): - with tm.assert_produces_warning(FutureWarning, - check_stacklevel=False): - t[['A']].agg({'A': ['sum', 'std'], - 'B': ['mean', 'std']}) + with pytest.raises(KeyError, match=msg): + t[['A']].agg({'A': ['sum', 'std'], + 'B': ['mean', 'std']}) def test_agg_nested_dicts(): @@ -464,11 +464,11 @@ def test_agg_nested_dicts(): df.groupby(pd.Grouper(freq='2D')) ] + msg = r"cannot perform renaming for r(1|2) with a nested dictionary" for t in cases: - def f(): + with pytest.raises(pd.core.base.SpecificationError, match=msg): t.aggregate({'r1': {'A': ['mean', 'sum']}, 'r2': {'B': ['mean', 'sum']}}) - pytest.raises(ValueError, f) for t in cases: expected = pd.concat([t['A'].mean(), t['A'].std(), t['B'].mean(), @@ -499,7 +499,8 @@ def test_try_aggregate_non_existing_column(): df = DataFrame(data).set_index('dt') # Error as we don't have 'z' column - with pytest.raises(KeyError): + msg = "\"Column 'z' does not exist!\"" + with pytest.raises(KeyError, match=msg): df.resample('30T').agg({'x': ['mean'], 'y': ['median'], 'z': ['sum']}) @@ -517,23 +518,29 @@ def test_selection_api_validation(): df_exp = DataFrame({'a': rng}, index=index) # non DatetimeIndex - with pytest.raises(TypeError): + msg = ("Only valid with DatetimeIndex, TimedeltaIndex or PeriodIndex," + " but got an instance of 'Int64Index'") + with pytest.raises(TypeError, match=msg): df.resample('2D', level='v') - with pytest.raises(ValueError): + msg = "The Grouper cannot specify both a key and a level!" + with pytest.raises(ValueError, match=msg): df.resample('2D', on='date', level='d') - with pytest.raises(TypeError): + msg = "unhashable type: 'list'" + with pytest.raises(TypeError, match=msg): df.resample('2D', on=['a', 'date']) - with pytest.raises(KeyError): + msg = r"\"Level \['a', 'date'\] not found\"" + with pytest.raises(KeyError, match=msg): df.resample('2D', level=['a', 'date']) # upsampling not allowed - with pytest.raises(ValueError): + msg = ("Upsampling from level= or on= selection is not supported, use" + r" \.set_index\(\.\.\.\) to explicitly set index to datetime-like") + with pytest.raises(ValueError, match=msg): df.resample('2D', level='d').asfreq() - - with pytest.raises(ValueError): + with pytest.raises(ValueError, match=msg): df.resample('2D', on='date').asfreq() exp = df_exp.resample('2D').sum() diff --git a/pandas/tests/resample/test_time_grouper.py b/pandas/tests/resample/test_time_grouper.py index ec29b55ac9d67..2f330d1f2484b 100644 --- a/pandas/tests/resample/test_time_grouper.py +++ b/pandas/tests/resample/test_time_grouper.py @@ -5,7 +5,7 @@ import pytest import pandas as pd -from pandas import DataFrame, Panel, Series +from pandas import DataFrame, Series from pandas.core.indexes.datetimes import date_range from pandas.core.resample import TimeGrouper import pandas.util.testing as tm @@ -79,27 +79,6 @@ def f(df): tm.assert_index_equal(result.index, df.index) -@pytest.mark.filterwarnings("ignore:\\nPanel:FutureWarning") -def test_panel_aggregation(): - ind = pd.date_range('1/1/2000', periods=100) - data = np.random.randn(2, len(ind), 4) - - wp = Panel(data, items=['Item1', 'Item2'], major_axis=ind, - minor_axis=['A', 'B', 'C', 'D']) - - tg = TimeGrouper('M', axis=1) - _, grouper, _ = tg._get_grouper(wp) - bingrouped = wp.groupby(grouper) - binagg = bingrouped.mean() - - def f(x): - assert (isinstance(x, Panel)) - return x.mean(1) - - result = bingrouped.agg(f) - tm.assert_panel_equal(result, binagg) - - @pytest.mark.parametrize('name, func', [ ('Int64Index', tm.makeIntIndex), ('Index', tm.makeUnicodeIndex), @@ -112,7 +91,7 @@ def test_fails_on_no_datetime_index(name, func): df = DataFrame({'a': np.random.randn(n)}, index=index) msg = ("Only valid with DatetimeIndex, TimedeltaIndex " - "or PeriodIndex, but got an instance of %r" % name) + "or PeriodIndex, but got an instance of '{}'".format(name)) with pytest.raises(TypeError, match=msg): df.groupby(TimeGrouper('D')) diff --git a/pandas/tests/reshape/merge/test_join.py b/pandas/tests/reshape/merge/test_join.py index e21f9d0291afa..5d7a9ab6f4cf0 100644 --- a/pandas/tests/reshape/merge/test_join.py +++ b/pandas/tests/reshape/merge/test_join.py @@ -1,7 +1,5 @@ # pylint: disable=E1103 -from warnings import catch_warnings - import numpy as np from numpy.random import randn import pytest @@ -657,95 +655,6 @@ def test_join_dups(self): 'y_y', 'x_x', 'y_x', 'x_y', 'y_y'] assert_frame_equal(dta, expected) - def test_panel_join(self): - with catch_warnings(record=True): - panel = tm.makePanel() - tm.add_nans(panel) - - p1 = panel.iloc[:2, :10, :3] - p2 = panel.iloc[2:, 5:, 2:] - - # left join - result = p1.join(p2) - expected = p1.copy() - expected['ItemC'] = p2['ItemC'] - tm.assert_panel_equal(result, expected) - - # right join - result = p1.join(p2, how='right') - expected = p2.copy() - expected['ItemA'] = p1['ItemA'] - expected['ItemB'] = p1['ItemB'] - expected = expected.reindex(items=['ItemA', 'ItemB', 'ItemC']) - tm.assert_panel_equal(result, expected) - - # inner join - result = p1.join(p2, how='inner') - expected = panel.iloc[:, 5:10, 2:3] - tm.assert_panel_equal(result, expected) - - # outer join - result = p1.join(p2, how='outer') - expected = p1.reindex(major=panel.major_axis, - minor=panel.minor_axis) - expected = expected.join(p2.reindex(major=panel.major_axis, - minor=panel.minor_axis)) - tm.assert_panel_equal(result, expected) - - def test_panel_join_overlap(self): - with catch_warnings(record=True): - panel = tm.makePanel() - tm.add_nans(panel) - - p1 = panel.loc[['ItemA', 'ItemB', 'ItemC']] - p2 = panel.loc[['ItemB', 'ItemC']] - - # Expected index is - # - # ItemA, ItemB_p1, ItemC_p1, ItemB_p2, ItemC_p2 - joined = p1.join(p2, lsuffix='_p1', rsuffix='_p2') - p1_suf = p1.loc[['ItemB', 'ItemC']].add_suffix('_p1') - p2_suf = p2.loc[['ItemB', 'ItemC']].add_suffix('_p2') - no_overlap = panel.loc[['ItemA']] - expected = no_overlap.join(p1_suf.join(p2_suf)) - tm.assert_panel_equal(joined, expected) - - def test_panel_join_many(self): - with catch_warnings(record=True): - tm.K = 10 - panel = tm.makePanel() - tm.K = 4 - - panels = [panel.iloc[:2], panel.iloc[2:6], panel.iloc[6:]] - - joined = panels[0].join(panels[1:]) - tm.assert_panel_equal(joined, panel) - - panels = [panel.iloc[:2, :-5], - panel.iloc[2:6, 2:], - panel.iloc[6:, 5:-7]] - - data_dict = {} - for p in panels: - data_dict.update(p.iteritems()) - - joined = panels[0].join(panels[1:], how='inner') - expected = pd.Panel.from_dict(data_dict, intersect=True) - tm.assert_panel_equal(joined, expected) - - joined = panels[0].join(panels[1:], how='outer') - expected = pd.Panel.from_dict(data_dict, intersect=False) - tm.assert_panel_equal(joined, expected) - - # edge cases - msg = "Suffixes not supported when passing multiple panels" - with pytest.raises(ValueError, match=msg): - panels[0].join(panels[1:], how='outer', lsuffix='foo', - rsuffix='bar') - msg = "Right join not supported with multiple panels" - with pytest.raises(ValueError, match=msg): - panels[0].join(panels[1:], how='right') - def test_join_multi_to_multi(self, join_type): # GH 20475 leftindex = MultiIndex.from_product([list('abc'), list('xy'), [1, 2]], diff --git a/pandas/tests/reshape/merge/test_merge.py b/pandas/tests/reshape/merge/test_merge.py index e123a5171769d..25487ccc76e62 100644 --- a/pandas/tests/reshape/merge/test_merge.py +++ b/pandas/tests/reshape/merge/test_merge.py @@ -616,6 +616,24 @@ def test_merge_on_datetime64tz(self): assert result['value_x'].dtype == 'datetime64[ns, US/Eastern]' assert result['value_y'].dtype == 'datetime64[ns, US/Eastern]' + def test_merge_on_datetime64tz_empty(self): + # https://github.com/pandas-dev/pandas/issues/25014 + dtz = pd.DatetimeTZDtype(tz='UTC') + right = pd.DataFrame({'date': [pd.Timestamp('2018', tz=dtz.tz)], + 'value': [4.0], + 'date2': [pd.Timestamp('2019', tz=dtz.tz)]}, + columns=['date', 'value', 'date2']) + left = right[:0] + result = left.merge(right, on='date') + expected = pd.DataFrame({ + 'value_x': pd.Series(dtype=float), + 'date2_x': pd.Series(dtype=dtz), + 'date': pd.Series(dtype=dtz), + 'value_y': pd.Series(dtype=float), + 'date2_y': pd.Series(dtype=dtz), + }, columns=['value_x', 'date2_x', 'date', 'value_y', 'date2_y']) + tm.assert_frame_equal(result, expected) + def test_merge_datetime64tz_with_dst_transition(self): # GH 18885 df1 = pd.DataFrame(pd.date_range( @@ -939,26 +957,40 @@ def test_merge_two_empty_df_no_division_error(self): with np.errstate(divide='raise'): merge(a, a, on=('a', 'b')) - @pytest.mark.parametrize('how', ['left', 'outer']) + @pytest.mark.parametrize('how', ['right', 'outer']) def test_merge_on_index_with_more_values(self, how): # GH 24212 - # pd.merge gets [-1, -1, 0, 1] as right_indexer, ensure that -1 is - # interpreted as a missing value instead of the last element - df1 = pd.DataFrame([[1, 2], [2, 4], [3, 6], [4, 8]], - columns=['a', 'b']) - df2 = pd.DataFrame([[3, 30], [4, 40]], - columns=['a', 'c']) - df1.set_index('a', drop=False, inplace=True) - df2.set_index('a', inplace=True) - result = pd.merge(df1, df2, left_index=True, right_on='a', how=how) - expected = pd.DataFrame([[1, 2, np.nan], - [2, 4, np.nan], - [3, 6, 30.0], - [4, 8, 40.0]], - columns=['a', 'b', 'c']) - expected.set_index('a', drop=False, inplace=True) + # pd.merge gets [0, 1, 2, -1, -1, -1] as left_indexer, ensure that + # -1 is interpreted as a missing value instead of the last element + df1 = pd.DataFrame({'a': [1, 2, 3], 'key': [0, 2, 2]}) + df2 = pd.DataFrame({'b': [1, 2, 3, 4, 5]}) + result = df1.merge(df2, left_on='key', right_index=True, how=how) + expected = pd.DataFrame([[1.0, 0, 1], + [2.0, 2, 3], + [3.0, 2, 3], + [np.nan, 1, 2], + [np.nan, 3, 4], + [np.nan, 4, 5]], + columns=['a', 'key', 'b']) + expected.set_index(Int64Index([0, 1, 2, 1, 3, 4]), inplace=True) assert_frame_equal(result, expected) + def test_merge_right_index_right(self): + # Note: the expected output here is probably incorrect. + # See https://github.com/pandas-dev/pandas/issues/17257 for more. + # We include this as a regression test for GH-24897. + left = pd.DataFrame({'a': [1, 2, 3], 'key': [0, 1, 1]}) + right = pd.DataFrame({'b': [1, 2, 3]}) + + expected = pd.DataFrame({'a': [1, 2, 3, None], + 'key': [0, 1, 1, 2], + 'b': [1, 2, 2, 3]}, + columns=['a', 'key', 'b'], + index=[0, 1, 2, 2]) + result = left.merge(right, left_on='key', right_index=True, + how='right') + tm.assert_frame_equal(result, expected) + def _check_merge(x, y): for how in ['inner', 'left', 'outer']: @@ -1494,3 +1526,65 @@ def test_merge_series(on, left_on, right_on, left_index, right_index, nm): with pytest.raises(ValueError, match=msg): result = pd.merge(a, b, on=on, left_on=left_on, right_on=right_on, left_index=left_index, right_index=right_index) + + +@pytest.mark.parametrize("col1, col2, kwargs, expected_cols", [ + (0, 0, dict(suffixes=("", "_dup")), ["0", "0_dup"]), + (0, 0, dict(suffixes=(None, "_dup")), [0, "0_dup"]), + (0, 0, dict(suffixes=("_x", "_y")), ["0_x", "0_y"]), + ("a", 0, dict(suffixes=(None, "_y")), ["a", 0]), + (0.0, 0.0, dict(suffixes=("_x", None)), ["0.0_x", 0.0]), + ("b", "b", dict(suffixes=(None, "_y")), ["b", "b_y"]), + ("a", "a", dict(suffixes=("_x", None)), ["a_x", "a"]), + ("a", "b", dict(suffixes=("_x", None)), ["a", "b"]), + ("a", "a", dict(suffixes=[None, "_x"]), ["a", "a_x"]), + (0, 0, dict(suffixes=["_a", None]), ["0_a", 0]), + ("a", "a", dict(), ["a_x", "a_y"]), + (0, 0, dict(), ["0_x", "0_y"]) +]) +def test_merge_suffix(col1, col2, kwargs, expected_cols): + # issue: 24782 + a = pd.DataFrame({col1: [1, 2, 3]}) + b = pd.DataFrame({col2: [4, 5, 6]}) + + expected = pd.DataFrame([[1, 4], [2, 5], [3, 6]], + columns=expected_cols) + + result = a.merge(b, left_index=True, right_index=True, **kwargs) + tm.assert_frame_equal(result, expected) + + result = pd.merge(a, b, left_index=True, right_index=True, **kwargs) + tm.assert_frame_equal(result, expected) + + +@pytest.mark.parametrize("col1, col2, suffixes", [ + ("a", "a", [None, None]), + ("a", "a", (None, None)), + ("a", "a", ("", None)), + (0, 0, [None, None]), + (0, 0, (None, "")) +]) +def test_merge_suffix_error(col1, col2, suffixes): + # issue: 24782 + a = pd.DataFrame({col1: [1, 2, 3]}) + b = pd.DataFrame({col2: [3, 4, 5]}) + + # TODO: might reconsider current raise behaviour, see issue 24782 + msg = "columns overlap but no suffix specified" + with pytest.raises(ValueError, match=msg): + pd.merge(a, b, left_index=True, right_index=True, suffixes=suffixes) + + +@pytest.mark.parametrize("col1, col2, suffixes", [ + ("a", "a", None), + (0, 0, None) +]) +def test_merge_suffix_none_error(col1, col2, suffixes): + # issue: 24782 + a = pd.DataFrame({col1: [1, 2, 3]}) + b = pd.DataFrame({col2: [3, 4, 5]}) + + # TODO: might reconsider current raise behaviour, see GH24782 + msg = "iterable" + with pytest.raises(TypeError, match=msg): + pd.merge(a, b, left_index=True, right_index=True, suffixes=suffixes) diff --git a/pandas/tests/reshape/test_concat.py b/pandas/tests/reshape/test_concat.py index ec6123bae327e..a186d32ed8800 100644 --- a/pandas/tests/reshape/test_concat.py +++ b/pandas/tests/reshape/test_concat.py @@ -3,7 +3,7 @@ from datetime import datetime from decimal import Decimal from itertools import combinations -from warnings import catch_warnings, simplefilter +from warnings import catch_warnings import dateutil import numpy as np @@ -1499,15 +1499,6 @@ def test_concat_mixed_objs(self): result = concat([s1, df, s2], ignore_index=True) assert_frame_equal(result, expected) - # invalid concatente of mixed dims - with catch_warnings(record=True): - simplefilter("ignore", FutureWarning) - panel = tm.makePanel() - msg = ("cannot concatenate unaligned mixed dimensional NDFrame" - " objects") - with pytest.raises(ValueError, match=msg): - concat([panel, s1], axis=1) - def test_empty_dtype_coerce(self): # xref to #12411 @@ -1543,34 +1534,6 @@ def test_dtype_coerceion(self): result = concat([df.iloc[[0]], df.iloc[[1]]]) tm.assert_series_equal(result.dtypes, df.dtypes) - @pytest.mark.filterwarnings("ignore:\\nPanel:FutureWarning") - def test_panel_concat_other_axes(self): - panel = tm.makePanel() - - p1 = panel.iloc[:, :5, :] - p2 = panel.iloc[:, 5:, :] - - result = concat([p1, p2], axis=1) - tm.assert_panel_equal(result, panel) - - p1 = panel.iloc[:, :, :2] - p2 = panel.iloc[:, :, 2:] - - result = concat([p1, p2], axis=2) - tm.assert_panel_equal(result, panel) - - # if things are a bit misbehaved - p1 = panel.iloc[:2, :, :2] - p2 = panel.iloc[:, :, 2:] - p1['ItemC'] = 'baz' - - result = concat([p1, p2], axis=2) - - expected = panel.copy() - expected['ItemC'] = expected['ItemC'].astype('O') - expected.loc['ItemC', :, :2] = 'baz' - tm.assert_panel_equal(result, expected) - @pytest.mark.filterwarnings("ignore:\\nPanel:FutureWarning") # Panel.rename warning we don't care about @pytest.mark.filterwarnings("ignore:Using:FutureWarning") diff --git a/pandas/tests/reshape/test_reshape.py b/pandas/tests/reshape/test_reshape.py index 7b544b7981c1f..a5b6cffd1d86c 100644 --- a/pandas/tests/reshape/test_reshape.py +++ b/pandas/tests/reshape/test_reshape.py @@ -580,23 +580,28 @@ def test_get_dummies_duplicate_columns(self, df): class TestCategoricalReshape(object): - @pytest.mark.filterwarnings("ignore:\\nPanel:FutureWarning") - def test_reshaping_panel_categorical(self): + def test_reshaping_multi_index_categorical(self): - p = tm.makePanel() - p['str'] = 'foo' - df = p.to_frame() + # construct a MultiIndexed DataFrame formerly created + # via `tm.makePanel().to_frame()` + cols = ['ItemA', 'ItemB', 'ItemC'] + data = {c: tm.makeTimeDataFrame() for c in cols} + df = pd.concat({c: data[c].stack() for c in data}, axis='columns') + df.index.names = ['major', 'minor'] + df['str'] = 'foo' + + dti = df.index.levels[0] df['category'] = df['str'].astype('category') result = df['category'].unstack() - c = Categorical(['foo'] * len(p.major_axis)) + c = Categorical(['foo'] * len(dti)) expected = DataFrame({'A': c.copy(), 'B': c.copy(), 'C': c.copy(), 'D': c.copy()}, columns=Index(list('ABCD'), name='minor'), - index=p.major_axis.set_names('major')) + index=dti) tm.assert_frame_equal(result, expected) diff --git a/pandas/tests/scalar/timedelta/test_timedelta.py b/pandas/tests/scalar/timedelta/test_timedelta.py index 9b5fdfb06a9fa..7d5b479810205 100644 --- a/pandas/tests/scalar/timedelta/test_timedelta.py +++ b/pandas/tests/scalar/timedelta/test_timedelta.py @@ -1,5 +1,6 @@ """ test the scalar Timedelta """ from datetime import timedelta +import re import numpy as np import pytest @@ -309,9 +310,15 @@ def test_iso_conversion(self): assert to_timedelta('P0DT0H0M1S') == expected def test_nat_converters(self): - assert to_timedelta('nat', box=False).astype('int64') == iNaT - assert to_timedelta('nan', box=False).astype('int64') == iNaT + result = to_timedelta('nat', box=False) + assert result.dtype.kind == 'm' + assert result.astype('int64') == iNaT + result = to_timedelta('nan', box=False) + assert result.dtype.kind == 'm' + assert result.astype('int64') == iNaT + + @pytest.mark.filterwarnings("ignore:M and Y units are deprecated") @pytest.mark.parametrize('units, np_unit', [(['Y', 'y'], 'Y'), (['M'], 'M'), @@ -371,6 +378,24 @@ def test_unit_parser(self, units, np_unit, wrapper): result = Timedelta('2{}'.format(unit)) assert result == expected + @pytest.mark.skipif(compat.PY2, reason="requires python3.5 or higher") + @pytest.mark.parametrize('unit', ['Y', 'y', 'M']) + def test_unit_m_y_deprecated(self, unit): + with tm.assert_produces_warning(FutureWarning) as w1: + Timedelta(10, unit) + msg = r'.* units are deprecated .*' + assert re.match(msg, str(w1[0].message)) + with tm.assert_produces_warning(FutureWarning, + check_stacklevel=False) as w2: + to_timedelta(10, unit) + msg = r'.* units are deprecated .*' + assert re.match(msg, str(w2[0].message)) + with tm.assert_produces_warning(FutureWarning, + check_stacklevel=False) as w3: + to_timedelta([1, 2], unit) + msg = r'.* units are deprecated .*' + assert re.match(msg, str(w3[0].message)) + def test_numeric_conversions(self): assert Timedelta(0) == np.timedelta64(0, 'ns') assert Timedelta(10) == np.timedelta64(10, 'ns') diff --git a/pandas/tests/scalar/timestamp/test_unary_ops.py b/pandas/tests/scalar/timestamp/test_unary_ops.py index 3f9a30d254126..adcf66200a672 100644 --- a/pandas/tests/scalar/timestamp/test_unary_ops.py +++ b/pandas/tests/scalar/timestamp/test_unary_ops.py @@ -8,7 +8,7 @@ from pandas._libs.tslibs import conversion from pandas._libs.tslibs.frequencies import INVALID_FREQ_ERR_MSG -from pandas.compat import PY3 +from pandas.compat import PY3, PY36 import pandas.util._test_decorators as td from pandas import NaT, Timestamp @@ -329,6 +329,19 @@ def test_replace_dst_border(self): expected = Timestamp('2013-11-3 03:00:00', tz='America/Chicago') assert result == expected + @pytest.mark.skipif(not PY36, reason='Fold not available until PY3.6') + @pytest.mark.parametrize('fold', [0, 1]) + @pytest.mark.parametrize('tz', ['dateutil/Europe/London', 'Europe/London']) + def test_replace_dst_fold(self, fold, tz): + # GH 25017 + d = datetime(2019, 10, 27, 2, 30) + ts = Timestamp(d, tz=tz) + result = ts.replace(hour=1, fold=fold) + expected = Timestamp(datetime(2019, 10, 27, 1, 30)).tz_localize( + tz, ambiguous=not fold + ) + assert result == expected + # -------------------------------------------------------------- # Timestamp.normalize diff --git a/pandas/tests/series/test_alter_axes.py b/pandas/tests/series/test_alter_axes.py index 04c54bcf8c22c..73adc7d4bf82f 100644 --- a/pandas/tests/series/test_alter_axes.py +++ b/pandas/tests/series/test_alter_axes.py @@ -258,6 +258,17 @@ def test_rename_axis_inplace(self, datetime_series): assert no_return is None tm.assert_series_equal(result, expected) + @pytest.mark.parametrize('kwargs', [{'mapper': None}, {'index': None}, {}]) + def test_rename_axis_none(self, kwargs): + # GH 25034 + index = Index(list('abc'), name='foo') + df = Series([1, 2, 3], index=index) + + result = df.rename_axis(**kwargs) + expected_index = index.rename(None) if kwargs else index + expected = Series([1, 2, 3], index=expected_index) + tm.assert_series_equal(result, expected) + def test_set_axis_inplace_axes(self, axis_series): # GH14636 ser = Series(np.arange(4), index=[1, 3, 5, 7], dtype='int64') diff --git a/pandas/tests/series/test_apply.py b/pandas/tests/series/test_apply.py index 90cf6916df0d1..162a27db34cb1 100644 --- a/pandas/tests/series/test_apply.py +++ b/pandas/tests/series/test_apply.py @@ -163,6 +163,18 @@ def test_apply_dict_depr(self): with tm.assert_produces_warning(FutureWarning): tsdf.A.agg({'foo': ['sum', 'mean']}) + @pytest.mark.parametrize('series', [ + ['1-1', '1-1', np.NaN], + ['1-1', '1-2', np.NaN]]) + def test_apply_categorical_with_nan_values(self, series): + # GH 20714 bug fixed in: GH 24275 + s = pd.Series(series, dtype='category') + result = s.apply(lambda x: x.split('-')[0]) + result = result.astype(object) + expected = pd.Series(['1', '1', np.NaN], dtype='category') + expected = expected.astype(object) + tm.assert_series_equal(result, expected) + class TestSeriesAggregate(): diff --git a/pandas/tests/series/test_dtypes.py b/pandas/tests/series/test_dtypes.py index e29974f56967f..d8046c4944afc 100644 --- a/pandas/tests/series/test_dtypes.py +++ b/pandas/tests/series/test_dtypes.py @@ -291,8 +291,8 @@ def test_astype_categorical_to_other(self): expected = s tm.assert_series_equal(s.astype('category'), expected) tm.assert_series_equal(s.astype(CategoricalDtype()), expected) - msg = (r"could not convert string to float: '(0 - 499|9500 - 9999)'|" - r"invalid literal for float\(\): (0 - 499|9500 - 9999)") + msg = (r"could not convert string to float|" + r"invalid literal for float\(\)") with pytest.raises(ValueError, match=msg): s.astype('float64') diff --git a/pandas/tests/series/test_duplicates.py b/pandas/tests/series/test_duplicates.py index fe47975711a17..a975edacc19c7 100644 --- a/pandas/tests/series/test_duplicates.py +++ b/pandas/tests/series/test_duplicates.py @@ -59,12 +59,18 @@ def test_unique_data_ownership(): Series(Series(["a", "c", "b"]).unique()).sort_values() -def test_is_unique(): - # GH11946 - s = Series(np.random.randint(0, 10, size=1000)) - assert s.is_unique is False - s = Series(np.arange(1000)) - assert s.is_unique is True +@pytest.mark.parametrize('data, expected', [ + (np.random.randint(0, 10, size=1000), False), + (np.arange(1000), True), + ([], True), + ([np.nan], True), + (['foo', 'bar', np.nan], True), + (['foo', 'foo', np.nan], False), + (['foo', 'bar', np.nan, np.nan], False)]) +def test_is_unique(data, expected): + # GH11946 / GH25180 + s = Series(data) + assert s.is_unique is expected def test_is_unique_class_ne(capsys): diff --git a/pandas/tests/series/test_period.py b/pandas/tests/series/test_period.py index 0a86bb0b67797..7e0feb418e8df 100644 --- a/pandas/tests/series/test_period.py +++ b/pandas/tests/series/test_period.py @@ -164,3 +164,12 @@ def test_end_time_timevalues(self, input_vals): result = s.dt.end_time expected = s.apply(lambda x: x.end_time) tm.assert_series_equal(result, expected) + + @pytest.mark.parametrize('input_vals', [ + ('2001'), ('NaT') + ]) + def test_to_period(self, input_vals): + # GH 21205 + expected = Series([input_vals], dtype='Period[D]') + result = Series([input_vals], dtype='datetime64[ns]').dt.to_period('D') + tm.assert_series_equal(result, expected) diff --git a/pandas/tests/series/test_repr.py b/pandas/tests/series/test_repr.py index b4e7708e2456e..842207f2a572f 100644 --- a/pandas/tests/series/test_repr.py +++ b/pandas/tests/series/test_repr.py @@ -198,6 +198,14 @@ def test_latex_repr(self): assert s._repr_latex_() is None + def test_index_repr_in_frame_with_nan(self): + # see gh-25061 + i = Index([1, np.nan]) + s = Series([1, 2], index=i) + exp = """1.0 1\nNaN 2\ndtype: int64""" + + assert repr(s) == exp + class TestCategoricalRepr(object): diff --git a/pandas/tests/test_nanops.py b/pandas/tests/test_nanops.py index 4bcd16a86e865..cf5ef6cf15eca 100644 --- a/pandas/tests/test_nanops.py +++ b/pandas/tests/test_nanops.py @@ -971,6 +971,9 @@ def prng(self): class TestDatetime64NaNOps(object): @pytest.mark.parametrize('tz', [None, 'UTC']) + @pytest.mark.xfail(reason="disabled") + # Enabling mean changes the behavior of DataFrame.mean + # See https://github.com/pandas-dev/pandas/issues/24752 def test_nanmean(self, tz): dti = pd.date_range('2016-01-01', periods=3, tz=tz) expected = dti[1] diff --git a/pandas/tests/test_panel.py b/pandas/tests/test_panel.py index ba0ad72e624f7..bfcafda1dc783 100644 --- a/pandas/tests/test_panel.py +++ b/pandas/tests/test_panel.py @@ -1,57 +1,29 @@ # -*- coding: utf-8 -*- # pylint: disable=W0612,E1101 - +from collections import OrderedDict from datetime import datetime -import operator -from warnings import catch_warnings, simplefilter import numpy as np import pytest -from pandas.compat import OrderedDict, StringIO, lrange, range, signature -import pandas.util._test_decorators as td - -from pandas.core.dtypes.common import is_float_dtype +from pandas.compat import lrange -from pandas import ( - DataFrame, Index, MultiIndex, Series, compat, date_range, isna, notna) -from pandas.core.nanops import nanall, nanany +from pandas import DataFrame, MultiIndex, Series, date_range, notna import pandas.core.panel as panelm from pandas.core.panel import Panel import pandas.util.testing as tm from pandas.util.testing import ( assert_almost_equal, assert_frame_equal, assert_panel_equal, - assert_series_equal, ensure_clean, makeCustomDataframe as mkdf, - makeMixedDataFrame) + assert_series_equal, makeCustomDataframe as mkdf, makeMixedDataFrame) from pandas.io.formats.printing import pprint_thing -from pandas.tseries.offsets import BDay, MonthEnd - - -def make_test_panel(): - with catch_warnings(record=True): - simplefilter("ignore", FutureWarning) - _panel = tm.makePanel() - tm.add_nans(_panel) - _panel = _panel.copy() - return _panel +from pandas.tseries.offsets import MonthEnd @pytest.mark.filterwarnings("ignore:\\nPanel:FutureWarning") class PanelTests(object): panel = None - def test_pickle(self): - unpickled = tm.round_trip_pickle(self.panel) - assert_frame_equal(unpickled['ItemA'], self.panel['ItemA']) - - def test_rank(self): - pytest.raises(NotImplementedError, lambda: self.panel.rank()) - - def test_cumsum(self): - cumsum = self.panel.cumsum() - assert_frame_equal(cumsum['ItemA'], self.panel['ItemA'].cumsum()) - def not_hashable(self): c_empty = Panel() c = Panel(Panel([[[1]]])) @@ -59,298 +31,9 @@ def not_hashable(self): pytest.raises(TypeError, hash, c) -@pytest.mark.filterwarnings("ignore:\\nPanel:FutureWarning") -class SafeForLongAndSparse(object): - - def test_repr(self): - repr(self.panel) - - def test_copy_names(self): - for attr in ('major_axis', 'minor_axis'): - getattr(self.panel, attr).name = None - cp = self.panel.copy() - getattr(cp, attr).name = 'foo' - assert getattr(self.panel, attr).name is None - - def test_iter(self): - tm.equalContents(list(self.panel), self.panel.items) - - def test_count(self): - f = lambda s: notna(s).sum() - self._check_stat_op('count', f, obj=self.panel, has_skipna=False) - - def test_sum(self): - self._check_stat_op('sum', np.sum, skipna_alternative=np.nansum) - - def test_mean(self): - self._check_stat_op('mean', np.mean) - - def test_prod(self): - self._check_stat_op('prod', np.prod, skipna_alternative=np.nanprod) - - @pytest.mark.filterwarnings("ignore:Invalid value:RuntimeWarning") - @pytest.mark.filterwarnings("ignore:All-NaN:RuntimeWarning") - def test_median(self): - def wrapper(x): - if isna(x).any(): - return np.nan - return np.median(x) - - self._check_stat_op('median', wrapper) - - @pytest.mark.filterwarnings("ignore:Invalid value:RuntimeWarning") - def test_min(self): - self._check_stat_op('min', np.min) - - @pytest.mark.filterwarnings("ignore:Invalid value:RuntimeWarning") - def test_max(self): - self._check_stat_op('max', np.max) - - @td.skip_if_no_scipy - def test_skew(self): - from scipy.stats import skew - - def this_skew(x): - if len(x) < 3: - return np.nan - return skew(x, bias=False) - - self._check_stat_op('skew', this_skew) - - def test_var(self): - def alt(x): - if len(x) < 2: - return np.nan - return np.var(x, ddof=1) - - self._check_stat_op('var', alt) - - def test_std(self): - def alt(x): - if len(x) < 2: - return np.nan - return np.std(x, ddof=1) - - self._check_stat_op('std', alt) - - def test_sem(self): - def alt(x): - if len(x) < 2: - return np.nan - return np.std(x, ddof=1) / np.sqrt(len(x)) - - self._check_stat_op('sem', alt) - - def _check_stat_op(self, name, alternative, obj=None, has_skipna=True, - skipna_alternative=None): - if obj is None: - obj = self.panel - - # # set some NAs - # obj.loc[5:10] = np.nan - # obj.loc[15:20, -2:] = np.nan - - f = getattr(obj, name) - - if has_skipna: - - skipna_wrapper = tm._make_skipna_wrapper(alternative, - skipna_alternative) - - def wrapper(x): - return alternative(np.asarray(x)) - - for i in range(obj.ndim): - result = f(axis=i, skipna=False) - assert_frame_equal(result, obj.apply(wrapper, axis=i)) - else: - skipna_wrapper = alternative - wrapper = alternative - - for i in range(obj.ndim): - result = f(axis=i) - if name in ['sum', 'prod']: - assert_frame_equal(result, obj.apply(skipna_wrapper, axis=i)) - - pytest.raises(Exception, f, axis=obj.ndim) - - # Unimplemented numeric_only parameter. - if 'numeric_only' in signature(f).args: - with pytest.raises(NotImplementedError, match=name): - f(numeric_only=True) - - @pytest.mark.filterwarnings("ignore:\\nPanel:FutureWarning") class SafeForSparse(object): - def test_get_axis(self): - assert (self.panel._get_axis(0) is self.panel.items) - assert (self.panel._get_axis(1) is self.panel.major_axis) - assert (self.panel._get_axis(2) is self.panel.minor_axis) - - def test_set_axis(self): - new_items = Index(np.arange(len(self.panel.items))) - new_major = Index(np.arange(len(self.panel.major_axis))) - new_minor = Index(np.arange(len(self.panel.minor_axis))) - - # ensure propagate to potentially prior-cached items too - item = self.panel['ItemA'] - self.panel.items = new_items - - if hasattr(self.panel, '_item_cache'): - assert 'ItemA' not in self.panel._item_cache - assert self.panel.items is new_items - - # TODO: unused? - item = self.panel[0] # noqa - - self.panel.major_axis = new_major - assert self.panel[0].index is new_major - assert self.panel.major_axis is new_major - - # TODO: unused? - item = self.panel[0] # noqa - - self.panel.minor_axis = new_minor - assert self.panel[0].columns is new_minor - assert self.panel.minor_axis is new_minor - - def test_get_axis_number(self): - assert self.panel._get_axis_number('items') == 0 - assert self.panel._get_axis_number('major') == 1 - assert self.panel._get_axis_number('minor') == 2 - - with pytest.raises(ValueError, match="No axis named foo"): - self.panel._get_axis_number('foo') - - with pytest.raises(ValueError, match="No axis named foo"): - self.panel.__ge__(self.panel, axis='foo') - - def test_get_axis_name(self): - assert self.panel._get_axis_name(0) == 'items' - assert self.panel._get_axis_name(1) == 'major_axis' - assert self.panel._get_axis_name(2) == 'minor_axis' - - def test_get_plane_axes(self): - # what to do here? - - index, columns = self.panel._get_plane_axes('items') - index, columns = self.panel._get_plane_axes('major_axis') - index, columns = self.panel._get_plane_axes('minor_axis') - index, columns = self.panel._get_plane_axes(0) - - def test_truncate(self): - dates = self.panel.major_axis - start, end = dates[1], dates[5] - - trunced = self.panel.truncate(start, end, axis='major') - expected = self.panel['ItemA'].truncate(start, end) - - assert_frame_equal(trunced['ItemA'], expected) - - trunced = self.panel.truncate(before=start, axis='major') - expected = self.panel['ItemA'].truncate(before=start) - - assert_frame_equal(trunced['ItemA'], expected) - - trunced = self.panel.truncate(after=end, axis='major') - expected = self.panel['ItemA'].truncate(after=end) - - assert_frame_equal(trunced['ItemA'], expected) - - def test_arith(self): - self._test_op(self.panel, operator.add) - self._test_op(self.panel, operator.sub) - self._test_op(self.panel, operator.mul) - self._test_op(self.panel, operator.truediv) - self._test_op(self.panel, operator.floordiv) - self._test_op(self.panel, operator.pow) - - self._test_op(self.panel, lambda x, y: y + x) - self._test_op(self.panel, lambda x, y: y - x) - self._test_op(self.panel, lambda x, y: y * x) - self._test_op(self.panel, lambda x, y: y / x) - self._test_op(self.panel, lambda x, y: y ** x) - - self._test_op(self.panel, lambda x, y: x + y) # panel + 1 - self._test_op(self.panel, lambda x, y: x - y) # panel - 1 - self._test_op(self.panel, lambda x, y: x * y) # panel * 1 - self._test_op(self.panel, lambda x, y: x / y) # panel / 1 - self._test_op(self.panel, lambda x, y: x ** y) # panel ** 1 - - pytest.raises(Exception, self.panel.__add__, - self.panel['ItemA']) - - @staticmethod - def _test_op(panel, op): - result = op(panel, 1) - assert_frame_equal(result['ItemA'], op(panel['ItemA'], 1)) - - def test_keys(self): - tm.equalContents(list(self.panel.keys()), self.panel.items) - - def test_iteritems(self): - # Test panel.iteritems(), aka panel.iteritems() - # just test that it works - for k, v in self.panel.iteritems(): - pass - - assert len(list(self.panel.iteritems())) == len(self.panel.items) - - def test_combineFrame(self): - def check_op(op, name): - # items - df = self.panel['ItemA'] - - func = getattr(self.panel, name) - - result = func(df, axis='items') - - assert_frame_equal( - result['ItemB'], op(self.panel['ItemB'], df)) - - # major - xs = self.panel.major_xs(self.panel.major_axis[0]) - result = func(xs, axis='major') - - idx = self.panel.major_axis[1] - - assert_frame_equal(result.major_xs(idx), - op(self.panel.major_xs(idx), xs)) - - # minor - xs = self.panel.minor_xs(self.panel.minor_axis[0]) - result = func(xs, axis='minor') - - idx = self.panel.minor_axis[1] - - assert_frame_equal(result.minor_xs(idx), - op(self.panel.minor_xs(idx), xs)) - - ops = ['add', 'sub', 'mul', 'truediv', 'floordiv', 'pow', 'mod'] - if not compat.PY3: - ops.append('div') - - for op in ops: - try: - check_op(getattr(operator, op), op) - except AttributeError: - pprint_thing("Failing operation: %r" % op) - raise - if compat.PY3: - try: - check_op(operator.truediv, 'div') - except AttributeError: - pprint_thing("Failing operation: %r" % 'div') - raise - - def test_combinePanel(self): - result = self.panel.add(self.panel) - assert_panel_equal(result, self.panel * 2) - - def test_neg(self): - assert_panel_equal(-self.panel, self.panel * -1) - # issue 7692 def test_raise_when_not_implemented(self): p = Panel(np.arange(3 * 4 * 5).reshape(3, 4, 5), @@ -364,84 +47,11 @@ def test_raise_when_not_implemented(self): with pytest.raises(NotImplementedError): getattr(p, op)(d, axis=0) - def test_select(self): - p = self.panel - - # select items - with tm.assert_produces_warning(FutureWarning, check_stacklevel=False): - result = p.select(lambda x: x in ('ItemA', 'ItemC'), axis='items') - expected = p.reindex(items=['ItemA', 'ItemC']) - assert_panel_equal(result, expected) - - # select major_axis - with tm.assert_produces_warning(FutureWarning, check_stacklevel=False): - result = p.select(lambda x: x >= datetime( - 2000, 1, 15), axis='major') - new_major = p.major_axis[p.major_axis >= datetime(2000, 1, 15)] - expected = p.reindex(major=new_major) - assert_panel_equal(result, expected) - - # select minor_axis - with tm.assert_produces_warning(FutureWarning, check_stacklevel=False): - result = p.select(lambda x: x in ('D', 'A'), axis=2) - expected = p.reindex(minor=['A', 'D']) - assert_panel_equal(result, expected) - - # corner case, empty thing - with tm.assert_produces_warning(FutureWarning, check_stacklevel=False): - result = p.select(lambda x: x in ('foo', ), axis='items') - assert_panel_equal(result, p.reindex(items=[])) - - def test_get_value(self): - for item in self.panel.items: - for mjr in self.panel.major_axis[::2]: - for mnr in self.panel.minor_axis: - with tm.assert_produces_warning(FutureWarning, - check_stacklevel=False): - result = self.panel.get_value(item, mjr, mnr) - expected = self.panel[item][mnr][mjr] - assert_almost_equal(result, expected) - - def test_abs(self): - - result = self.panel.abs() - result2 = abs(self.panel) - expected = np.abs(self.panel) - assert_panel_equal(result, expected) - assert_panel_equal(result2, expected) - - df = self.panel['ItemA'] - result = df.abs() - result2 = abs(df) - expected = np.abs(df) - assert_frame_equal(result, expected) - assert_frame_equal(result2, expected) - - s = df['A'] - result = s.abs() - result2 = abs(s) - expected = np.abs(s) - assert_series_equal(result, expected) - assert_series_equal(result2, expected) - assert result.name == 'A' - assert result2.name == 'A' - @pytest.mark.filterwarnings("ignore:\\nPanel:FutureWarning") class CheckIndexing(object): - def test_getitem(self): - pytest.raises(Exception, self.panel.__getitem__, 'ItemQ') - def test_delitem_and_pop(self): - expected = self.panel['ItemA'] - result = self.panel.pop('ItemA') - assert_frame_equal(expected, result) - assert 'ItemA' not in self.panel.items - - del self.panel['ItemB'] - assert 'ItemB' not in self.panel.items - pytest.raises(Exception, self.panel.__delitem__, 'ItemB') values = np.empty((3, 3, 3)) values[0] = 0 @@ -468,38 +78,6 @@ def test_delitem_and_pop(self): tm.assert_frame_equal(panelc[0], panel[0]) def test_setitem(self): - lp = self.panel.filter(['ItemA', 'ItemB']).to_frame() - - with pytest.raises(TypeError): - self.panel['ItemE'] = lp - - # DataFrame - df = self.panel['ItemA'][2:].filter(items=['A', 'B']) - self.panel['ItemF'] = df - self.panel['ItemE'] = df - - df2 = self.panel['ItemF'] - - assert_frame_equal(df, df2.reindex( - index=df.index, columns=df.columns)) - - # scalar - self.panel['ItemG'] = 1 - self.panel['ItemE'] = True - assert self.panel['ItemG'].values.dtype == np.int64 - assert self.panel['ItemE'].values.dtype == np.bool_ - - # object dtype - self.panel['ItemQ'] = 'foo' - assert self.panel['ItemQ'].values.dtype == np.object_ - - # boolean dtype - self.panel['ItemP'] = self.panel['ItemA'] > 0 - assert self.panel['ItemP'].values.dtype == np.bool_ - - pytest.raises(TypeError, self.panel.__setitem__, 'foo', - self.panel.loc[['ItemP']]) - # bad shape p = Panel(np.random.randn(4, 3, 2)) msg = (r"shape of value must be \(3, 2\), " @@ -537,159 +115,9 @@ def test_set_minor_major(self): assert_frame_equal(panel.loc[:, 'NewMajor', :], newmajor.astype(object)) - def test_major_xs(self): - ref = self.panel['ItemA'] - - idx = self.panel.major_axis[5] - xs = self.panel.major_xs(idx) - - result = xs['ItemA'] - assert_series_equal(result, ref.xs(idx), check_names=False) - assert result.name == 'ItemA' - - # not contained - idx = self.panel.major_axis[0] - BDay() - pytest.raises(Exception, self.panel.major_xs, idx) - - def test_major_xs_mixed(self): - self.panel['ItemD'] = 'foo' - xs = self.panel.major_xs(self.panel.major_axis[0]) - assert xs['ItemA'].dtype == np.float64 - assert xs['ItemD'].dtype == np.object_ - - def test_minor_xs(self): - ref = self.panel['ItemA'] - - idx = self.panel.minor_axis[1] - xs = self.panel.minor_xs(idx) - - assert_series_equal(xs['ItemA'], ref[idx], check_names=False) - - # not contained - pytest.raises(Exception, self.panel.minor_xs, 'E') - - def test_minor_xs_mixed(self): - self.panel['ItemD'] = 'foo' - - xs = self.panel.minor_xs('D') - assert xs['ItemA'].dtype == np.float64 - assert xs['ItemD'].dtype == np.object_ - - def test_xs(self): - itemA = self.panel.xs('ItemA', axis=0) - expected = self.panel['ItemA'] - tm.assert_frame_equal(itemA, expected) - - # Get a view by default. - itemA_view = self.panel.xs('ItemA', axis=0) - itemA_view.values[:] = np.nan - - assert np.isnan(self.panel['ItemA'].values).all() - - # Mixed-type yields a copy. - self.panel['strings'] = 'foo' - result = self.panel.xs('D', axis=2) - assert result._is_copy is not None - - def test_getitem_fancy_labels(self): - p = self.panel - - items = p.items[[1, 0]] - dates = p.major_axis[::2] - cols = ['D', 'C', 'F'] - - # all 3 specified - with catch_warnings(): - simplefilter("ignore", FutureWarning) - # XXX: warning in _validate_read_indexer - assert_panel_equal(p.loc[items, dates, cols], - p.reindex(items=items, major=dates, minor=cols)) - - # 2 specified - assert_panel_equal(p.loc[:, dates, cols], - p.reindex(major=dates, minor=cols)) - - assert_panel_equal(p.loc[items, :, cols], - p.reindex(items=items, minor=cols)) - - assert_panel_equal(p.loc[items, dates, :], - p.reindex(items=items, major=dates)) - - # only 1 - assert_panel_equal(p.loc[items, :, :], p.reindex(items=items)) - - assert_panel_equal(p.loc[:, dates, :], p.reindex(major=dates)) - - assert_panel_equal(p.loc[:, :, cols], p.reindex(minor=cols)) - def test_getitem_fancy_slice(self): pass - def test_getitem_fancy_ints(self): - p = self.panel - - # #1603 - result = p.iloc[:, -1, :] - expected = p.loc[:, p.major_axis[-1], :] - assert_frame_equal(result, expected) - - def test_getitem_fancy_xs(self): - p = self.panel - item = 'ItemB' - - date = p.major_axis[5] - col = 'C' - - # get DataFrame - # item - assert_frame_equal(p.loc[item], p[item]) - assert_frame_equal(p.loc[item, :], p[item]) - assert_frame_equal(p.loc[item, :, :], p[item]) - - # major axis, axis=1 - assert_frame_equal(p.loc[:, date], p.major_xs(date)) - assert_frame_equal(p.loc[:, date, :], p.major_xs(date)) - - # minor axis, axis=2 - assert_frame_equal(p.loc[:, :, 'C'], p.minor_xs('C')) - - # get Series - assert_series_equal(p.loc[item, date], p[item].loc[date]) - assert_series_equal(p.loc[item, date, :], p[item].loc[date]) - assert_series_equal(p.loc[item, :, col], p[item][col]) - assert_series_equal(p.loc[:, date, col], p.major_xs(date).loc[col]) - - def test_getitem_fancy_xs_check_view(self): - item = 'ItemB' - date = self.panel.major_axis[5] - - # make sure it's always a view - NS = slice(None, None) - - # DataFrames - comp = assert_frame_equal - self._check_view(item, comp) - self._check_view((item, NS), comp) - self._check_view((item, NS, NS), comp) - self._check_view((NS, date), comp) - self._check_view((NS, date, NS), comp) - self._check_view((NS, NS, 'C'), comp) - - # Series - comp = assert_series_equal - self._check_view((item, date), comp) - self._check_view((item, date, NS), comp) - self._check_view((item, NS, 'C'), comp) - self._check_view((NS, date, 'C'), comp) - - def test_getitem_callable(self): - p = self.panel - # GH 12533 - - assert_frame_equal(p[lambda x: 'ItemB'], p.loc['ItemB']) - assert_panel_equal(p[lambda x: ['ItemB', 'ItemC']], - p.loc[['ItemB', 'ItemC']]) - def test_ix_setitem_slice_dataframe(self): a = Panel(items=[1, 2, 3], major_axis=[11, 22, 33], minor_axis=[111, 222, 333]) @@ -719,43 +147,6 @@ def test_ix_align(self): assert_series_equal(df.loc[0, 0, :].reindex(b.index), b) def test_ix_frame_align(self): - p_orig = tm.makePanel() - df = p_orig.iloc[0].copy() - assert_frame_equal(p_orig['ItemA'], df) - - p = p_orig.copy() - p.iloc[0, :, :] = df - assert_panel_equal(p, p_orig) - - p = p_orig.copy() - p.iloc[0] = df - assert_panel_equal(p, p_orig) - - p = p_orig.copy() - p.iloc[0, :, :] = df - assert_panel_equal(p, p_orig) - - p = p_orig.copy() - p.iloc[0] = df - assert_panel_equal(p, p_orig) - - p = p_orig.copy() - p.loc['ItemA'] = df - assert_panel_equal(p, p_orig) - - p = p_orig.copy() - p.loc['ItemA', :, :] = df - assert_panel_equal(p, p_orig) - - p = p_orig.copy() - p['ItemA'] = df - assert_panel_equal(p, p_orig) - - p = p_orig.copy() - p.iloc[0, [0, 1, 3, 5], -2:] = df - out = p.iloc[0, [0, 1, 3, 5], -2:] - assert_frame_equal(out, df.iloc[[0, 1, 3, 5], [2, 3]]) - # GH3830, panel assignent by values/frame for dtype in ['float64', 'int64']: @@ -782,13 +173,6 @@ def test_ix_frame_align(self): tm.assert_frame_equal(panel.loc['a1'], df1) tm.assert_frame_equal(panel.loc['a2'], df2) - def _check_view(self, indexer, comp): - cp = self.panel.copy() - obj = cp.loc[indexer] - obj.values[:] = 0 - assert (obj.values == 0).all() - comp(cp.loc[indexer].reindex_like(obj), obj) - def test_logical_with_nas(self): d = Panel({'ItemA': {'a': [np.nan, False]}, 'ItemB': {'a': [True, True]}}) @@ -802,157 +186,11 @@ def test_logical_with_nas(self): expected = DataFrame({'a': [True, True]}) assert_frame_equal(result, expected) - def test_neg(self): - assert_panel_equal(-self.panel, -1 * self.panel) - - def test_invert(self): - assert_panel_equal(-(self.panel < 0), ~(self.panel < 0)) - - def test_comparisons(self): - p1 = tm.makePanel() - p2 = tm.makePanel() - - tp = p1.reindex(items=p1.items + ['foo']) - df = p1[p1.items[0]] - - def test_comp(func): - - # versus same index - result = func(p1, p2) - tm.assert_numpy_array_equal(result.values, - func(p1.values, p2.values)) - - # versus non-indexed same objs - pytest.raises(Exception, func, p1, tp) - - # versus different objs - pytest.raises(Exception, func, p1, df) - - # versus scalar - result3 = func(self.panel, 0) - tm.assert_numpy_array_equal(result3.values, - func(self.panel.values, 0)) - - with np.errstate(invalid='ignore'): - test_comp(operator.eq) - test_comp(operator.ne) - test_comp(operator.lt) - test_comp(operator.gt) - test_comp(operator.ge) - test_comp(operator.le) - - def test_get_value(self): - for item in self.panel.items: - for mjr in self.panel.major_axis[::2]: - for mnr in self.panel.minor_axis: - with tm.assert_produces_warning(FutureWarning, - check_stacklevel=False): - result = self.panel.get_value(item, mjr, mnr) - expected = self.panel[item][mnr][mjr] - assert_almost_equal(result, expected) - with catch_warnings(): - simplefilter("ignore", FutureWarning) - msg = "There must be an argument for each axis" - with pytest.raises(TypeError, match=msg): - self.panel.get_value('a') - - def test_set_value(self): - for item in self.panel.items: - for mjr in self.panel.major_axis[::2]: - for mnr in self.panel.minor_axis: - with tm.assert_produces_warning(FutureWarning, - check_stacklevel=False): - self.panel.set_value(item, mjr, mnr, 1.) - tm.assert_almost_equal(self.panel[item][mnr][mjr], 1.) - - # resize - with catch_warnings(): - simplefilter("ignore", FutureWarning) - res = self.panel.set_value('ItemE', 'foo', 'bar', 1.5) - assert isinstance(res, Panel) - assert res is not self.panel - assert res.get_value('ItemE', 'foo', 'bar') == 1.5 - - res3 = self.panel.set_value('ItemE', 'foobar', 'baz', 5) - assert is_float_dtype(res3['ItemE'].values) - - msg = ("There must be an argument for each " - "axis plus the value provided") - with pytest.raises(TypeError, match=msg): - self.panel.set_value('a') - @pytest.mark.filterwarnings("ignore:\\nPanel:FutureWarning") -class TestPanel(PanelTests, CheckIndexing, SafeForLongAndSparse, - SafeForSparse): - - def setup_method(self, method): - self.panel = make_test_panel() - self.panel.major_axis.name = None - self.panel.minor_axis.name = None - self.panel.items.name = None - - def test_constructor(self): - # with BlockManager - wp = Panel(self.panel._data) - assert wp._data is self.panel._data - - wp = Panel(self.panel._data, copy=True) - assert wp._data is not self.panel._data - tm.assert_panel_equal(wp, self.panel) - - # strings handled prop - wp = Panel([[['foo', 'foo', 'foo', ], ['foo', 'foo', 'foo']]]) - assert wp.values.dtype == np.object_ - - vals = self.panel.values - - # no copy - wp = Panel(vals) - assert wp.values is vals - - # copy - wp = Panel(vals, copy=True) - assert wp.values is not vals - - # GH #8285, test when scalar data is used to construct a Panel - # if dtype is not passed, it should be inferred - value_and_dtype = [(1, 'int64'), (3.14, 'float64'), - ('foo', np.object_)] - for (val, dtype) in value_and_dtype: - wp = Panel(val, items=range(2), major_axis=range(3), - minor_axis=range(4)) - vals = np.empty((2, 3, 4), dtype=dtype) - vals.fill(val) - - tm.assert_panel_equal(wp, Panel(vals, dtype=dtype)) - - # test the case when dtype is passed - wp = Panel(1, items=range(2), major_axis=range(3), - minor_axis=range(4), - dtype='float32') - vals = np.empty((2, 3, 4), dtype='float32') - vals.fill(1) - - tm.assert_panel_equal(wp, Panel(vals, dtype='float32')) +class TestPanel(PanelTests, CheckIndexing, SafeForSparse): def test_constructor_cast(self): - zero_filled = self.panel.fillna(0) - - casted = Panel(zero_filled._data, dtype=int) - casted2 = Panel(zero_filled.values, dtype=int) - - exp_values = zero_filled.values.astype(int) - assert_almost_equal(casted.values, exp_values) - assert_almost_equal(casted2.values, exp_values) - - casted = Panel(zero_filled._data, dtype=np.int32) - casted2 = Panel(zero_filled.values, dtype=np.int32) - - exp_values = zero_filled.values.astype(np.int32) - assert_almost_equal(casted.values, exp_values) - assert_almost_equal(casted2.values, exp_values) - # can't cast data = [[['foo', 'bar', 'baz']]] pytest.raises(ValueError, Panel, data, dtype=float) @@ -1017,86 +255,6 @@ def test_constructor_fails_with_not_3d_input(self): with pytest.raises(ValueError, match=msg): Panel(np.random.randn(10, 2)) - def test_consolidate(self): - assert self.panel._data.is_consolidated() - - self.panel['foo'] = 1. - assert not self.panel._data.is_consolidated() - - panel = self.panel._consolidate() - assert panel._data.is_consolidated() - - def test_ctor_dict(self): - itema = self.panel['ItemA'] - itemb = self.panel['ItemB'] - - d = {'A': itema, 'B': itemb[5:]} - d2 = {'A': itema._series, 'B': itemb[5:]._series} - d3 = {'A': None, - 'B': DataFrame(itemb[5:]._series), - 'C': DataFrame(itema._series)} - - wp = Panel.from_dict(d) - wp2 = Panel.from_dict(d2) # nested Dict - - # TODO: unused? - wp3 = Panel.from_dict(d3) # noqa - - tm.assert_index_equal(wp.major_axis, self.panel.major_axis) - assert_panel_equal(wp, wp2) - - # intersect - wp = Panel.from_dict(d, intersect=True) - tm.assert_index_equal(wp.major_axis, itemb.index[5:]) - - # use constructor - assert_panel_equal(Panel(d), Panel.from_dict(d)) - assert_panel_equal(Panel(d2), Panel.from_dict(d2)) - assert_panel_equal(Panel(d3), Panel.from_dict(d3)) - - # a pathological case - d4 = {'A': None, 'B': None} - - # TODO: unused? - wp4 = Panel.from_dict(d4) # noqa - - assert_panel_equal(Panel(d4), Panel(items=['A', 'B'])) - - # cast - dcasted = {k: v.reindex(wp.major_axis).fillna(0) - for k, v in compat.iteritems(d)} - result = Panel(dcasted, dtype=int) - expected = Panel({k: v.astype(int) - for k, v in compat.iteritems(dcasted)}) - assert_panel_equal(result, expected) - - result = Panel(dcasted, dtype=np.int32) - expected = Panel({k: v.astype(np.int32) - for k, v in compat.iteritems(dcasted)}) - assert_panel_equal(result, expected) - - def test_constructor_dict_mixed(self): - data = {k: v.values for k, v in self.panel.iteritems()} - result = Panel(data) - exp_major = Index(np.arange(len(self.panel.major_axis))) - tm.assert_index_equal(result.major_axis, exp_major) - - result = Panel(data, items=self.panel.items, - major_axis=self.panel.major_axis, - minor_axis=self.panel.minor_axis) - assert_panel_equal(result, self.panel) - - data['ItemC'] = self.panel['ItemC'] - result = Panel(data) - assert_panel_equal(result, self.panel) - - # corner, blow up - data['ItemB'] = data['ItemB'][:-1] - pytest.raises(Exception, Panel, data) - - data['ItemB'] = self.panel['ItemB'].values[:, :-1] - pytest.raises(Exception, Panel, data) - def test_ctor_orderedDict(self): keys = list(set(np.random.randint(0, 5000, 100)))[ :50] # unique random int keys @@ -1107,30 +265,6 @@ def test_ctor_orderedDict(self): p = Panel.from_dict(d) assert list(p.items) == keys - def test_constructor_resize(self): - data = self.panel._data - items = self.panel.items[:-1] - major = self.panel.major_axis[:-1] - minor = self.panel.minor_axis[:-1] - - result = Panel(data, items=items, - major_axis=major, minor_axis=minor) - expected = self.panel.reindex( - items=items, major=major, minor=minor) - assert_panel_equal(result, expected) - - result = Panel(data, items=items, major_axis=major) - expected = self.panel.reindex(items=items, major=major) - assert_panel_equal(result, expected) - - result = Panel(data, items=items) - expected = self.panel.reindex(items=items) - assert_panel_equal(result, expected) - - result = Panel(data, minor_axis=minor) - expected = self.panel.reindex(minor=minor) - assert_panel_equal(result, expected) - def test_from_dict_mixed_orient(self): df = tm.makeDataFrame() df['foo'] = 'bar' @@ -1161,13 +295,6 @@ def test_constructor_error_msgs(self): Panel(np.random.randn(3, 4, 5), lrange(5), lrange(5), lrange(4)) - def test_conform(self): - df = self.panel['ItemA'][:-5].filter(items=['A', 'B']) - conformed = self.panel.conform(df) - - tm.assert_index_equal(conformed.index, self.panel.major_axis) - tm.assert_index_equal(conformed.columns, self.panel.minor_axis) - def test_convert_objects(self): # GH 4937 p = Panel(dict(A=dict(a=['1', '1.0']))) @@ -1175,12 +302,6 @@ def test_convert_objects(self): result = p._convert(numeric=True, coerce=True) assert_panel_equal(result, expected) - def test_dtypes(self): - - result = self.panel.dtypes - expected = Series(np.dtype('float64'), index=self.panel.items) - assert_series_equal(result, expected) - def test_astype(self): # GH7271 data = np.array([[[1, 2], [3, 4]], [[5, 6], [7, 8]]]) @@ -1193,121 +314,7 @@ def test_astype(self): pytest.raises(NotImplementedError, panel.astype, {0: str}) - def test_apply(self): - # GH1148 - - # ufunc - applied = self.panel.apply(np.sqrt) - with np.errstate(invalid='ignore'): - expected = np.sqrt(self.panel.values) - assert_almost_equal(applied.values, expected) - - # ufunc same shape - result = self.panel.apply(lambda x: x * 2, axis='items') - expected = self.panel * 2 - assert_panel_equal(result, expected) - result = self.panel.apply(lambda x: x * 2, axis='major_axis') - expected = self.panel * 2 - assert_panel_equal(result, expected) - result = self.panel.apply(lambda x: x * 2, axis='minor_axis') - expected = self.panel * 2 - assert_panel_equal(result, expected) - - # reduction to DataFrame - result = self.panel.apply(lambda x: x.dtype, axis='items') - expected = DataFrame(np.dtype('float64'), - index=self.panel.major_axis, - columns=self.panel.minor_axis) - assert_frame_equal(result, expected) - result = self.panel.apply(lambda x: x.dtype, axis='major_axis') - expected = DataFrame(np.dtype('float64'), - index=self.panel.minor_axis, - columns=self.panel.items) - assert_frame_equal(result, expected) - result = self.panel.apply(lambda x: x.dtype, axis='minor_axis') - expected = DataFrame(np.dtype('float64'), - index=self.panel.major_axis, - columns=self.panel.items) - assert_frame_equal(result, expected) - - # reductions via other dims - expected = self.panel.sum(0) - result = self.panel.apply(lambda x: x.sum(), axis='items') - assert_frame_equal(result, expected) - expected = self.panel.sum(1) - result = self.panel.apply(lambda x: x.sum(), axis='major_axis') - assert_frame_equal(result, expected) - expected = self.panel.sum(2) - result = self.panel.apply(lambda x: x.sum(), axis='minor_axis') - assert_frame_equal(result, expected) - - # pass kwargs - result = self.panel.apply( - lambda x, y: x.sum() + y, axis='items', y=5) - expected = self.panel.sum(0) + 5 - assert_frame_equal(result, expected) - def test_apply_slabs(self): - - # same shape as original - result = self.panel.apply(lambda x: x * 2, - axis=['items', 'major_axis']) - expected = (self.panel * 2).transpose('minor_axis', 'major_axis', - 'items') - assert_panel_equal(result, expected) - result = self.panel.apply(lambda x: x * 2, - axis=['major_axis', 'items']) - assert_panel_equal(result, expected) - - result = self.panel.apply(lambda x: x * 2, - axis=['items', 'minor_axis']) - expected = (self.panel * 2).transpose('major_axis', 'minor_axis', - 'items') - assert_panel_equal(result, expected) - result = self.panel.apply(lambda x: x * 2, - axis=['minor_axis', 'items']) - assert_panel_equal(result, expected) - - result = self.panel.apply(lambda x: x * 2, - axis=['major_axis', 'minor_axis']) - expected = self.panel * 2 - assert_panel_equal(result, expected) - result = self.panel.apply(lambda x: x * 2, - axis=['minor_axis', 'major_axis']) - assert_panel_equal(result, expected) - - # reductions - result = self.panel.apply(lambda x: x.sum(0), axis=[ - 'items', 'major_axis' - ]) - expected = self.panel.sum(1).T - assert_frame_equal(result, expected) - - result = self.panel.apply(lambda x: x.sum(1), axis=[ - 'items', 'major_axis' - ]) - expected = self.panel.sum(0) - assert_frame_equal(result, expected) - - # transforms - f = lambda x: ((x.T - x.mean(1)) / x.std(1)).T - - # make sure that we don't trigger any warnings - result = self.panel.apply(f, axis=['items', 'major_axis']) - expected = Panel({ax: f(self.panel.loc[:, :, ax]) - for ax in self.panel.minor_axis}) - assert_panel_equal(result, expected) - - result = self.panel.apply(f, axis=['major_axis', 'minor_axis']) - expected = Panel({ax: f(self.panel.loc[ax]) - for ax in self.panel.items}) - assert_panel_equal(result, expected) - - result = self.panel.apply(f, axis=['minor_axis', 'items']) - expected = Panel({ax: f(self.panel.loc[:, ax]) - for ax in self.panel.major_axis}) - assert_panel_equal(result, expected) - # with multi-indexes # GH7469 index = MultiIndex.from_tuples([('one', 'a'), ('one', 'b'), ( @@ -1343,53 +350,6 @@ def test_apply_no_or_zero_ndim(self): assert_series_equal(result_float, expected_float) assert_series_equal(result_float64, expected_float64) - def test_reindex(self): - ref = self.panel['ItemB'] - - # items - result = self.panel.reindex(items=['ItemA', 'ItemB']) - assert_frame_equal(result['ItemB'], ref) - - # major - new_major = list(self.panel.major_axis[:10]) - result = self.panel.reindex(major=new_major) - assert_frame_equal(result['ItemB'], ref.reindex(index=new_major)) - - # raise exception put both major and major_axis - pytest.raises(Exception, self.panel.reindex, - major_axis=new_major, - major=new_major) - - # minor - new_minor = list(self.panel.minor_axis[:2]) - result = self.panel.reindex(minor=new_minor) - assert_frame_equal(result['ItemB'], ref.reindex(columns=new_minor)) - - # raise exception put both major and major_axis - pytest.raises(Exception, self.panel.reindex, - minor_axis=new_minor, - minor=new_minor) - - # this ok - result = self.panel.reindex() - assert_panel_equal(result, self.panel) - assert result is not self.panel - - # with filling - smaller_major = self.panel.major_axis[::5] - smaller = self.panel.reindex(major=smaller_major) - - larger = smaller.reindex(major=self.panel.major_axis, method='pad') - - assert_frame_equal(larger.major_xs(self.panel.major_axis[1]), - smaller.major_xs(smaller_major[0])) - - # don't necessarily copy - result = self.panel.reindex( - major=self.panel.major_axis, copy=False) - assert_panel_equal(result, self.panel) - assert result is self.panel - def test_reindex_axis_style(self): panel = Panel(np.random.rand(5, 5, 5)) expected0 = Panel(panel.values).iloc[[0, 1]] @@ -1410,22 +370,6 @@ def test_reindex_axis_style(self): def test_reindex_multi(self): - # with and without copy full reindexing - result = self.panel.reindex( - items=self.panel.items, - major=self.panel.major_axis, - minor=self.panel.minor_axis, copy=False) - - assert result.items is self.panel.items - assert result.major_axis is self.panel.major_axis - assert result.minor_axis is self.panel.minor_axis - - result = self.panel.reindex( - items=self.panel.items, - major=self.panel.major_axis, - minor=self.panel.minor_axis, copy=False) - assert_panel_equal(result, self.panel) - # multi-axis indexing consistency # GH 5900 df = DataFrame(np.random.randn(4, 3)) @@ -1454,86 +398,7 @@ def test_reindex_multi(self): for i, r in enumerate(results): assert_panel_equal(expected, r) - def test_reindex_like(self): - # reindex_like - smaller = self.panel.reindex(items=self.panel.items[:-1], - major=self.panel.major_axis[:-1], - minor=self.panel.minor_axis[:-1]) - smaller_like = self.panel.reindex_like(smaller) - assert_panel_equal(smaller, smaller_like) - - def test_take(self): - # axis == 0 - result = self.panel.take([2, 0, 1], axis=0) - expected = self.panel.reindex(items=['ItemC', 'ItemA', 'ItemB']) - assert_panel_equal(result, expected) - - # axis >= 1 - result = self.panel.take([3, 0, 1, 2], axis=2) - expected = self.panel.reindex(minor=['D', 'A', 'B', 'C']) - assert_panel_equal(result, expected) - - # neg indices ok - expected = self.panel.reindex(minor=['D', 'D', 'B', 'C']) - result = self.panel.take([3, -1, 1, 2], axis=2) - assert_panel_equal(result, expected) - - pytest.raises(Exception, self.panel.take, [4, 0, 1, 2], axis=2) - - def test_sort_index(self): - import random - - ritems = list(self.panel.items) - rmajor = list(self.panel.major_axis) - rminor = list(self.panel.minor_axis) - random.shuffle(ritems) - random.shuffle(rmajor) - random.shuffle(rminor) - - random_order = self.panel.reindex(items=ritems) - sorted_panel = random_order.sort_index(axis=0) - assert_panel_equal(sorted_panel, self.panel) - - # descending - random_order = self.panel.reindex(items=ritems) - sorted_panel = random_order.sort_index(axis=0, ascending=False) - assert_panel_equal( - sorted_panel, - self.panel.reindex(items=self.panel.items[::-1])) - - random_order = self.panel.reindex(major=rmajor) - sorted_panel = random_order.sort_index(axis=1) - assert_panel_equal(sorted_panel, self.panel) - - random_order = self.panel.reindex(minor=rminor) - sorted_panel = random_order.sort_index(axis=2) - assert_panel_equal(sorted_panel, self.panel) - def test_fillna(self): - filled = self.panel.fillna(0) - assert np.isfinite(filled.values).all() - - filled = self.panel.fillna(method='backfill') - assert_frame_equal(filled['ItemA'], - self.panel['ItemA'].fillna(method='backfill')) - - panel = self.panel.copy() - panel['str'] = 'foo' - - filled = panel.fillna(method='backfill') - assert_frame_equal(filled['ItemA'], - panel['ItemA'].fillna(method='backfill')) - - empty = self.panel.reindex(items=[]) - filled = empty.fillna(0) - assert_panel_equal(filled, empty) - - pytest.raises(ValueError, self.panel.fillna) - pytest.raises(ValueError, self.panel.fillna, 5, method='ffill') - - pytest.raises(TypeError, self.panel.fillna, [1, 2]) - pytest.raises(TypeError, self.panel.fillna, (1, 2)) - # limit not implemented when only value is specified p = Panel(np.random.randn(3, 4, 5)) p.iloc[0:2, 0:2, 0:2] = np.nan @@ -1559,155 +424,6 @@ def test_fillna(self): p2.fillna(method='bfill', inplace=True) assert_panel_equal(p2, expected) - def test_ffill_bfill(self): - assert_panel_equal(self.panel.ffill(), - self.panel.fillna(method='ffill')) - assert_panel_equal(self.panel.bfill(), - self.panel.fillna(method='bfill')) - - def test_truncate_fillna_bug(self): - # #1823 - result = self.panel.truncate(before=None, after=None, axis='items') - - # it works! - result.fillna(value=0.0) - - def test_swapaxes(self): - result = self.panel.swapaxes('items', 'minor') - assert result.items is self.panel.minor_axis - - result = self.panel.swapaxes('items', 'major') - assert result.items is self.panel.major_axis - - result = self.panel.swapaxes('major', 'minor') - assert result.major_axis is self.panel.minor_axis - - panel = self.panel.copy() - result = panel.swapaxes('major', 'minor') - panel.values[0, 0, 1] = np.nan - expected = panel.swapaxes('major', 'minor') - assert_panel_equal(result, expected) - - # this should also work - result = self.panel.swapaxes(0, 1) - assert result.items is self.panel.major_axis - - # this works, but return a copy - result = self.panel.swapaxes('items', 'items') - assert_panel_equal(self.panel, result) - assert id(self.panel) != id(result) - - def test_transpose(self): - result = self.panel.transpose('minor', 'major', 'items') - expected = self.panel.swapaxes('items', 'minor') - assert_panel_equal(result, expected) - - # test kwargs - result = self.panel.transpose(items='minor', major='major', - minor='items') - expected = self.panel.swapaxes('items', 'minor') - assert_panel_equal(result, expected) - - # text mixture of args - result = self.panel.transpose( - 'minor', major='major', minor='items') - expected = self.panel.swapaxes('items', 'minor') - assert_panel_equal(result, expected) - - result = self.panel.transpose('minor', - 'major', - minor='items') - expected = self.panel.swapaxes('items', 'minor') - assert_panel_equal(result, expected) - - # duplicate axes - with pytest.raises(TypeError, - match='not enough/duplicate arguments'): - self.panel.transpose('minor', maj='major', minor='items') - - with pytest.raises(ValueError, - match='repeated axis in transpose'): - self.panel.transpose('minor', 'major', major='minor', - minor='items') - - result = self.panel.transpose(2, 1, 0) - assert_panel_equal(result, expected) - - result = self.panel.transpose('minor', 'items', 'major') - expected = self.panel.swapaxes('items', 'minor') - expected = expected.swapaxes('major', 'minor') - assert_panel_equal(result, expected) - - result = self.panel.transpose(2, 0, 1) - assert_panel_equal(result, expected) - - pytest.raises(ValueError, self.panel.transpose, 0, 0, 1) - - def test_transpose_copy(self): - panel = self.panel.copy() - result = panel.transpose(2, 0, 1, copy=True) - expected = panel.swapaxes('items', 'minor') - expected = expected.swapaxes('major', 'minor') - assert_panel_equal(result, expected) - - panel.values[0, 1, 1] = np.nan - assert notna(result.values[1, 0, 1]) - - def test_to_frame(self): - # filtered - filtered = self.panel.to_frame() - expected = self.panel.to_frame().dropna(how='any') - assert_frame_equal(filtered, expected) - - # unfiltered - unfiltered = self.panel.to_frame(filter_observations=False) - assert_panel_equal(unfiltered.to_panel(), self.panel) - - # names - assert unfiltered.index.names == ('major', 'minor') - - # unsorted, round trip - df = self.panel.to_frame(filter_observations=False) - unsorted = df.take(np.random.permutation(len(df))) - pan = unsorted.to_panel() - assert_panel_equal(pan, self.panel) - - # preserve original index names - df = DataFrame(np.random.randn(6, 2), - index=[['a', 'a', 'b', 'b', 'c', 'c'], - [0, 1, 0, 1, 0, 1]], - columns=['one', 'two']) - df.index.names = ['foo', 'bar'] - df.columns.name = 'baz' - - rdf = df.to_panel().to_frame() - assert rdf.index.names == df.index.names - assert rdf.columns.names == df.columns.names - - def test_to_frame_mixed(self): - panel = self.panel.fillna(0) - panel['str'] = 'foo' - panel['bool'] = panel['ItemA'] > 0 - - lp = panel.to_frame() - wp = lp.to_panel() - assert wp['bool'].values.dtype == np.bool_ - # Previously, this was mutating the underlying - # index and changing its name - assert_frame_equal(wp['bool'], panel['bool'], check_names=False) - - # GH 8704 - # with categorical - df = panel.to_frame() - df['category'] = df['str'].astype('category') - - # to_panel - # TODO: this converts back to object - p = df.to_panel() - expected = panel.copy() - expected['category'] = 'foo' - assert_panel_equal(p, expected) - def test_to_frame_multi_major(self): idx = MultiIndex.from_tuples( [(1, 'one'), (1, 'two'), (2, 'one'), (2, 'two')]) @@ -1808,22 +524,6 @@ def test_to_frame_multi_drop_level(self): expected = DataFrame({'i1': [1., 2], 'i2': [1., 2]}, index=exp_idx) assert_frame_equal(result, expected) - def test_to_panel_na_handling(self): - df = DataFrame(np.random.randint(0, 10, size=20).reshape((10, 2)), - index=[[0, 0, 0, 0, 0, 0, 1, 1, 1, 1], - [0, 1, 2, 3, 4, 5, 2, 3, 4, 5]]) - - panel = df.to_panel() - assert isna(panel[0].loc[1, [0, 1]]).all() - - def test_to_panel_duplicates(self): - # #2441 - df = DataFrame({'a': [0, 0, 1], 'b': [1, 1, 1], 'c': [1, 2, 3]}) - idf = df.set_index(['a', 'b']) - - with pytest.raises(ValueError, match='non-uniquely indexed'): - idf.to_panel() - def test_panel_dups(self): # GH 4960 @@ -1886,40 +586,7 @@ def test_panel_dups(self): def test_filter(self): pass - def test_compound(self): - compounded = self.panel.compound() - - assert_series_equal(compounded['ItemA'], - (1 + self.panel['ItemA']).product(0) - 1, - check_names=False) - def test_shift(self): - # major - idx = self.panel.major_axis[0] - idx_lag = self.panel.major_axis[1] - shifted = self.panel.shift(1) - assert_frame_equal(self.panel.major_xs(idx), - shifted.major_xs(idx_lag)) - - # minor - idx = self.panel.minor_axis[0] - idx_lag = self.panel.minor_axis[1] - shifted = self.panel.shift(1, axis='minor') - assert_frame_equal(self.panel.minor_xs(idx), - shifted.minor_xs(idx_lag)) - - # items - idx = self.panel.items[0] - idx_lag = self.panel.items[1] - shifted = self.panel.shift(1, axis='items') - assert_frame_equal(self.panel[idx], shifted[idx_lag]) - - # negative numbers, #2164 - result = self.panel.shift(-1) - expected = Panel({i: f.shift(-1)[:-1] - for i, f in self.panel.iteritems()}) - assert_panel_equal(result, expected) - # mixed dtypes #6959 data = [('item ' + ch, makeMixedDataFrame()) for ch in list('abcde')] @@ -1928,44 +595,6 @@ def test_shift(self): shifted = mixed_panel.shift(1) assert_series_equal(mixed_panel.dtypes, shifted.dtypes) - def test_tshift(self): - # PeriodIndex - ps = tm.makePeriodPanel() - shifted = ps.tshift(1) - unshifted = shifted.tshift(-1) - - assert_panel_equal(unshifted, ps) - - shifted2 = ps.tshift(freq='B') - assert_panel_equal(shifted, shifted2) - - shifted3 = ps.tshift(freq=BDay()) - assert_panel_equal(shifted, shifted3) - - with pytest.raises(ValueError, match='does not match'): - ps.tshift(freq='M') - - # DatetimeIndex - panel = make_test_panel() - shifted = panel.tshift(1) - unshifted = shifted.tshift(-1) - - assert_panel_equal(panel, unshifted) - - shifted2 = panel.tshift(freq=panel.major_axis.freq) - assert_panel_equal(shifted, shifted2) - - inferred_ts = Panel(panel.values, items=panel.items, - major_axis=Index(np.asarray(panel.major_axis)), - minor_axis=panel.minor_axis) - shifted = inferred_ts.tshift(1) - unshifted = shifted.tshift(-1) - assert_panel_equal(shifted, panel.tshift(1)) - assert_panel_equal(unshifted, inferred_ts) - - no_freq = panel.iloc[:, [0, 5, 7], :] - pytest.raises(ValueError, no_freq.tshift) - def test_pct_change(self): df1 = DataFrame({'c1': [1, 2, 5], 'c2': [3, 4, 6]}) df2 = df1 + 1 @@ -2078,97 +707,10 @@ def test_multiindex_get(self): MultiIndex.from_tuples([('a', 1), ('a', 2), ('b', 1)], names=['first', 'second']) - @pytest.mark.filterwarnings("ignore:Using a non-tuple:FutureWarning") - def test_multiindex_blocks(self): - ind = MultiIndex.from_tuples([('a', 1), ('a', 2), ('b', 1)], - names=['first', 'second']) - wp = Panel(self.panel._data) - wp.items = ind - f1 = wp['a'] - assert (f1.items == [1, 2]).all() - - f1 = wp[('b', 1)] - assert (f1.columns == ['A', 'B', 'C', 'D']).all() - def test_repr_empty(self): empty = Panel() repr(empty) - # ignore warning from us, because removing panel - @pytest.mark.filterwarnings("ignore:Using:FutureWarning") - def test_rename(self): - mapper = {'ItemA': 'foo', 'ItemB': 'bar', 'ItemC': 'baz'} - - renamed = self.panel.rename(items=mapper) - exp = Index(['foo', 'bar', 'baz']) - tm.assert_index_equal(renamed.items, exp) - - renamed = self.panel.rename(minor_axis=str.lower) - exp = Index(['a', 'b', 'c', 'd']) - tm.assert_index_equal(renamed.minor_axis, exp) - - # don't copy - renamed_nocopy = self.panel.rename(items=mapper, copy=False) - renamed_nocopy['foo'] = 3. - assert (self.panel['ItemA'].values == 3).all() - - def test_get_attr(self): - assert_frame_equal(self.panel['ItemA'], self.panel.ItemA) - - # specific cases from #3440 - self.panel['a'] = self.panel['ItemA'] - assert_frame_equal(self.panel['a'], self.panel.a) - self.panel['i'] = self.panel['ItemA'] - assert_frame_equal(self.panel['i'], self.panel.i) - - def test_from_frame_level1_unsorted(self): - tuples = [('MSFT', 3), ('MSFT', 2), ('AAPL', 2), ('AAPL', 1), - ('MSFT', 1)] - midx = MultiIndex.from_tuples(tuples) - df = DataFrame(np.random.rand(5, 4), index=midx) - p = df.to_panel() - assert_frame_equal(p.minor_xs(2), df.xs(2, level=1).sort_index()) - - def test_to_excel(self): - try: - import xlwt # noqa - import xlrd # noqa - import openpyxl # noqa - from pandas.io.excel import ExcelFile - except ImportError: - pytest.skip("need xlwt xlrd openpyxl") - - for ext in ['xls', 'xlsx']: - with ensure_clean('__tmp__.' + ext) as path: - self.panel.to_excel(path) - try: - reader = ExcelFile(path) - except ImportError: - pytest.skip("need xlwt xlrd openpyxl") - - for item, df in self.panel.iteritems(): - recdf = reader.parse(str(item), index_col=0) - assert_frame_equal(df, recdf) - - def test_to_excel_xlsxwriter(self): - try: - import xlrd # noqa - import xlsxwriter # noqa - from pandas.io.excel import ExcelFile - except ImportError: - pytest.skip("Requires xlrd and xlsxwriter. Skipping test.") - - with ensure_clean('__tmp__.xlsx') as path: - self.panel.to_excel(path, engine='xlsxwriter') - try: - reader = ExcelFile(path) - except ImportError as e: - pytest.skip("cannot write excel file: %s" % e) - - for item, df in self.panel.iteritems(): - recdf = reader.parse(str(item), index_col=0) - assert_frame_equal(df, recdf) - @pytest.mark.filterwarnings("ignore:'.reindex:FutureWarning") def test_dropna(self): p = Panel(np.random.randn(4, 5, 6), major_axis=list('abcde')) @@ -2369,242 +911,6 @@ def test_update_deprecation(self, raise_conflict): with tm.assert_produces_warning(FutureWarning): pan.update(other, raise_conflict=raise_conflict) - def test_all_any(self): - assert (self.panel.all(axis=0).values == nanall( - self.panel, axis=0)).all() - assert (self.panel.all(axis=1).values == nanall( - self.panel, axis=1).T).all() - assert (self.panel.all(axis=2).values == nanall( - self.panel, axis=2).T).all() - assert (self.panel.any(axis=0).values == nanany( - self.panel, axis=0)).all() - assert (self.panel.any(axis=1).values == nanany( - self.panel, axis=1).T).all() - assert (self.panel.any(axis=2).values == nanany( - self.panel, axis=2).T).all() - - def test_all_any_unhandled(self): - pytest.raises(NotImplementedError, self.panel.all, bool_only=True) - pytest.raises(NotImplementedError, self.panel.any, bool_only=True) - - # GH issue 15960 - def test_sort_values(self): - pytest.raises(NotImplementedError, self.panel.sort_values) - pytest.raises(NotImplementedError, self.panel.sort_values, 'ItemA') - - -@pytest.mark.filterwarnings("ignore:\\nPanel:FutureWarning") -class TestPanelFrame(object): - """ - Check that conversions to and from Panel to DataFrame work. - """ - - def setup_method(self, method): - panel = make_test_panel() - self.panel = panel.to_frame() - self.unfiltered_panel = panel.to_frame(filter_observations=False) - - def test_ops_differently_indexed(self): - # trying to set non-identically indexed panel - wp = self.panel.to_panel() - wp2 = wp.reindex(major=wp.major_axis[:-1]) - lp2 = wp2.to_frame() - - result = self.panel + lp2 - assert_frame_equal(result.reindex(lp2.index), lp2 * 2) - - # careful, mutation - self.panel['foo'] = lp2['ItemA'] - assert_series_equal(self.panel['foo'].reindex(lp2.index), - lp2['ItemA'], - check_names=False) - - def test_ops_scalar(self): - result = self.panel.mul(2) - expected = DataFrame.__mul__(self.panel, 2) - assert_frame_equal(result, expected) - - def test_combineFrame(self): - wp = self.panel.to_panel() - result = self.panel.add(wp['ItemA'].stack(), axis=0) - assert_frame_equal(result.to_panel()['ItemA'], wp['ItemA'] * 2) - - def test_combinePanel(self): - wp = self.panel.to_panel() - result = self.panel.add(self.panel) - wide_result = result.to_panel() - assert_frame_equal(wp['ItemA'] * 2, wide_result['ItemA']) - - # one item - result = self.panel.add(self.panel.filter(['ItemA'])) - - def test_combine_scalar(self): - result = self.panel.mul(2) - expected = DataFrame(self.panel._data) * 2 - assert_frame_equal(result, expected) - - def test_combine_series(self): - s = self.panel['ItemA'][:10] - result = self.panel.add(s, axis=0) - expected = DataFrame.add(self.panel, s, axis=0) - assert_frame_equal(result, expected) - - s = self.panel.iloc[5] - result = self.panel + s - expected = DataFrame.add(self.panel, s, axis=1) - assert_frame_equal(result, expected) - - def test_operators(self): - wp = self.panel.to_panel() - result = (self.panel + 1).to_panel() - assert_frame_equal(wp['ItemA'] + 1, result['ItemA']) - - def test_arith_flex_panel(self): - ops = ['add', 'sub', 'mul', 'div', - 'truediv', 'pow', 'floordiv', 'mod'] - if not compat.PY3: - aliases = {} - else: - aliases = {'div': 'truediv'} - self.panel = self.panel.to_panel() - - for n in [np.random.randint(-50, -1), np.random.randint(1, 50), 0]: - for op in ops: - alias = aliases.get(op, op) - f = getattr(operator, alias) - exp = f(self.panel, n) - result = getattr(self.panel, op)(n) - assert_panel_equal(result, exp, check_panel_type=True) - - # rops - r_f = lambda x, y: f(y, x) - exp = r_f(self.panel, n) - result = getattr(self.panel, 'r' + op)(n) - assert_panel_equal(result, exp) - - def test_sort(self): - def is_sorted(arr): - return (arr[1:] > arr[:-1]).any() - - sorted_minor = self.panel.sort_index(level=1) - assert is_sorted(sorted_minor.index.codes[1]) - - sorted_major = sorted_minor.sort_index(level=0) - assert is_sorted(sorted_major.index.codes[0]) - - def test_to_string(self): - buf = StringIO() - self.panel.to_string(buf) - - def test_to_sparse(self): - if isinstance(self.panel, Panel): - msg = 'sparsifying is not supported' - with pytest.raises(NotImplementedError, match=msg): - self.panel.to_sparse - - def test_truncate(self): - dates = self.panel.index.levels[0] - start, end = dates[1], dates[5] - - trunced = self.panel.truncate(start, end).to_panel() - expected = self.panel.to_panel()['ItemA'].truncate(start, end) - - # TODO truncate drops index.names - assert_frame_equal(trunced['ItemA'], expected, check_names=False) - - trunced = self.panel.truncate(before=start).to_panel() - expected = self.panel.to_panel()['ItemA'].truncate(before=start) - - # TODO truncate drops index.names - assert_frame_equal(trunced['ItemA'], expected, check_names=False) - - trunced = self.panel.truncate(after=end).to_panel() - expected = self.panel.to_panel()['ItemA'].truncate(after=end) - - # TODO truncate drops index.names - assert_frame_equal(trunced['ItemA'], expected, check_names=False) - - # truncate on dates that aren't in there - wp = self.panel.to_panel() - new_index = wp.major_axis[::5] - - wp2 = wp.reindex(major=new_index) - - lp2 = wp2.to_frame() - lp_trunc = lp2.truncate(wp.major_axis[2], wp.major_axis[-2]) - - wp_trunc = wp2.truncate(wp.major_axis[2], wp.major_axis[-2]) - - assert_panel_equal(wp_trunc, lp_trunc.to_panel()) - - # throw proper exception - pytest.raises(Exception, lp2.truncate, wp.major_axis[-2], - wp.major_axis[2]) - - def test_axis_dummies(self): - from pandas.core.reshape.reshape import make_axis_dummies - - minor_dummies = make_axis_dummies(self.panel, 'minor').astype(np.uint8) - assert len(minor_dummies.columns) == len(self.panel.index.levels[1]) - - major_dummies = make_axis_dummies(self.panel, 'major').astype(np.uint8) - assert len(major_dummies.columns) == len(self.panel.index.levels[0]) - - mapping = {'A': 'one', 'B': 'one', 'C': 'two', 'D': 'two'} - - transformed = make_axis_dummies(self.panel, 'minor', - transform=mapping.get).astype(np.uint8) - assert len(transformed.columns) == 2 - tm.assert_index_equal(transformed.columns, Index(['one', 'two'])) - - # TODO: test correctness - - def test_get_dummies(self): - from pandas.core.reshape.reshape import get_dummies, make_axis_dummies - - self.panel['Label'] = self.panel.index.codes[1] - minor_dummies = make_axis_dummies(self.panel, 'minor').astype(np.uint8) - dummies = get_dummies(self.panel['Label']) - tm.assert_numpy_array_equal(dummies.values, minor_dummies.values) - - def test_mean(self): - means = self.panel.mean(level='minor') - - # test versus Panel version - wide_means = self.panel.to_panel().mean('major') - assert_frame_equal(means, wide_means) - - def test_sum(self): - sums = self.panel.sum(level='minor') - - # test versus Panel version - wide_sums = self.panel.to_panel().sum('major') - assert_frame_equal(sums, wide_sums) - - def test_count(self): - index = self.panel.index - - major_count = self.panel.count(level=0)['ItemA'] - level_codes = index.codes[0] - for i, idx in enumerate(index.levels[0]): - assert major_count[i] == (level_codes == i).sum() - - minor_count = self.panel.count(level=1)['ItemA'] - level_codes = index.codes[1] - for i, idx in enumerate(index.levels[1]): - assert minor_count[i] == (level_codes == i).sum() - - def test_join(self): - lp1 = self.panel.filter(['ItemA', 'ItemB']) - lp2 = self.panel.filter(['ItemC']) - - joined = lp1.join(lp2) - - assert len(joined.columns) == 3 - - pytest.raises(Exception, lp1.join, - self.panel.filter(['ItemB', 'ItemC'])) - def test_panel_index(): index = panelm.panel_index([1, 2, 3, 4], [1, 2, 3]) diff --git a/pandas/tests/tools/test_numeric.py b/pandas/tests/tools/test_numeric.py index 537881f3a5e85..97e1dc2f6aefc 100644 --- a/pandas/tests/tools/test_numeric.py +++ b/pandas/tests/tools/test_numeric.py @@ -4,437 +4,580 @@ from numpy import iinfo import pytest +import pandas.compat as compat + import pandas as pd -from pandas import to_numeric +from pandas import DataFrame, Index, Series, to_numeric from pandas.util import testing as tm -class TestToNumeric(object): +@pytest.fixture(params=[None, "ignore", "raise", "coerce"]) +def errors(request): + return request.param - def test_empty(self): - # see gh-16302 - s = pd.Series([], dtype=object) - res = to_numeric(s) - expected = pd.Series([], dtype=np.int64) +@pytest.fixture(params=[True, False]) +def signed(request): + return request.param - tm.assert_series_equal(res, expected) - # Original issue example - res = to_numeric(s, errors='coerce', downcast='integer') - expected = pd.Series([], dtype=np.int8) +@pytest.fixture(params=[lambda x: x, str], ids=["identity", "str"]) +def transform(request): + return request.param - tm.assert_series_equal(res, expected) - def test_series(self): - s = pd.Series(['1', '-3.14', '7']) - res = to_numeric(s) - expected = pd.Series([1, -3.14, 7]) - tm.assert_series_equal(res, expected) +@pytest.fixture(params=[ + 47393996303418497800, + 100000000000000000000 +]) +def large_val(request): + return request.param - s = pd.Series(['1', '-3.14', 7]) - res = to_numeric(s) - tm.assert_series_equal(res, expected) - def test_series_numeric(self): - s = pd.Series([1, 3, 4, 5], index=list('ABCD'), name='XXX') - res = to_numeric(s) - tm.assert_series_equal(res, s) +@pytest.fixture(params=[True, False]) +def multiple_elts(request): + return request.param - s = pd.Series([1., 3., 4., 5.], index=list('ABCD'), name='XXX') - res = to_numeric(s) - tm.assert_series_equal(res, s) - # bool is regarded as numeric - s = pd.Series([True, False, True, True], - index=list('ABCD'), name='XXX') - res = to_numeric(s) - tm.assert_series_equal(res, s) +@pytest.fixture(params=[ + (lambda x: Index(x, name="idx"), tm.assert_index_equal), + (lambda x: Series(x, name="ser"), tm.assert_series_equal), + (lambda x: np.array(Index(x).values), tm.assert_numpy_array_equal) +]) +def transform_assert_equal(request): + return request.param - def test_error(self): - s = pd.Series([1, -3.14, 'apple']) - msg = 'Unable to parse string "apple" at position 2' - with pytest.raises(ValueError, match=msg): - to_numeric(s, errors='raise') - res = to_numeric(s, errors='ignore') - expected = pd.Series([1, -3.14, 'apple']) - tm.assert_series_equal(res, expected) +@pytest.mark.parametrize("input_kwargs,result_kwargs", [ + (dict(), dict(dtype=np.int64)), + (dict(errors="coerce", downcast="integer"), dict(dtype=np.int8)) +]) +def test_empty(input_kwargs, result_kwargs): + # see gh-16302 + ser = Series([], dtype=object) + result = to_numeric(ser, **input_kwargs) - res = to_numeric(s, errors='coerce') - expected = pd.Series([1, -3.14, np.nan]) - tm.assert_series_equal(res, expected) + expected = Series([], **result_kwargs) + tm.assert_series_equal(result, expected) + + +@pytest.mark.parametrize("last_val", ["7", 7]) +def test_series(last_val): + ser = Series(["1", "-3.14", last_val]) + result = to_numeric(ser) + + expected = Series([1, -3.14, 7]) + tm.assert_series_equal(result, expected) + + +@pytest.mark.parametrize("data", [ + [1, 3, 4, 5], + [1., 3., 4., 5.], + + # Bool is regarded as numeric. + [True, False, True, True] +]) +def test_series_numeric(data): + ser = Series(data, index=list("ABCD"), name="EFG") + + result = to_numeric(ser) + tm.assert_series_equal(result, ser) + + +@pytest.mark.parametrize("data,msg", [ + ([1, -3.14, "apple"], + 'Unable to parse string "apple" at position 2'), + (["orange", 1, -3.14, "apple"], + 'Unable to parse string "orange" at position 0') +]) +def test_error(data, msg): + ser = Series(data) + + with pytest.raises(ValueError, match=msg): + to_numeric(ser, errors="raise") + + +@pytest.mark.parametrize("errors,exp_data", [ + ("ignore", [1, -3.14, "apple"]), + ("coerce", [1, -3.14, np.nan]) +]) +def test_ignore_error(errors, exp_data): + ser = Series([1, -3.14, "apple"]) + result = to_numeric(ser, errors=errors) + + expected = Series(exp_data) + tm.assert_series_equal(result, expected) + + +@pytest.mark.parametrize("errors,exp", [ + ("raise", 'Unable to parse string "apple" at position 2'), + ("ignore", [True, False, "apple"]), + + # Coerces to float. + ("coerce", [1., 0., np.nan]) +]) +def test_bool_handling(errors, exp): + ser = Series([True, False, "apple"]) + + if isinstance(exp, str): + with pytest.raises(ValueError, match=exp): + to_numeric(ser, errors=errors) + else: + result = to_numeric(ser, errors=errors) + expected = Series(exp) + + tm.assert_series_equal(result, expected) + + +def test_list(): + ser = ["1", "-3.14", "7"] + res = to_numeric(ser) + + expected = np.array([1, -3.14, 7]) + tm.assert_numpy_array_equal(res, expected) + + +@pytest.mark.parametrize("data,arr_kwargs", [ + ([1, 3, 4, 5], dict(dtype=np.int64)), + ([1., 3., 4., 5.], dict()), + + # Boolean is regarded as numeric. + ([True, False, True, True], dict()) +]) +def test_list_numeric(data, arr_kwargs): + result = to_numeric(data) + expected = np.array(data, **arr_kwargs) + tm.assert_numpy_array_equal(result, expected) - s = pd.Series(['orange', 1, -3.14, 'apple']) - msg = 'Unable to parse string "orange" at position 0' - with pytest.raises(ValueError, match=msg): - to_numeric(s, errors='raise') - def test_error_seen_bool(self): - s = pd.Series([True, False, 'apple']) - msg = 'Unable to parse string "apple" at position 2' +@pytest.mark.parametrize("kwargs", [ + dict(dtype="O"), dict() +]) +def test_numeric(kwargs): + data = [1, -3.14, 7] + + ser = Series(data, **kwargs) + result = to_numeric(ser) + + expected = Series(data) + tm.assert_series_equal(result, expected) + + +@pytest.mark.parametrize("columns", [ + # One column. + "a", + + # Multiple columns. + ["a", "b"] +]) +def test_numeric_df_columns(columns): + # see gh-14827 + df = DataFrame(dict( + a=[1.2, decimal.Decimal(3.14), decimal.Decimal("infinity"), "0.1"], + b=[1.0, 2.0, 3.0, 4.0], + )) + + expected = DataFrame(dict( + a=[1.2, 3.14, np.inf, 0.1], + b=[1.0, 2.0, 3.0, 4.0], + )) + + df_copy = df.copy() + df_copy[columns] = df_copy[columns].apply(to_numeric) + + tm.assert_frame_equal(df_copy, expected) + + +@pytest.mark.parametrize("data,exp_data", [ + ([[decimal.Decimal(3.14), 1.0], decimal.Decimal(1.6), 0.1], + [[3.14, 1.0], 1.6, 0.1]), + ([np.array([decimal.Decimal(3.14), 1.0]), 0.1], + [[3.14, 1.0], 0.1]) +]) +def test_numeric_embedded_arr_likes(data, exp_data): + # Test to_numeric with embedded lists and arrays + df = DataFrame(dict(a=data)) + df["a"] = df["a"].apply(to_numeric) + + expected = DataFrame(dict(a=exp_data)) + tm.assert_frame_equal(df, expected) + + +def test_all_nan(): + ser = Series(["a", "b", "c"]) + result = to_numeric(ser, errors="coerce") + + expected = Series([np.nan, np.nan, np.nan]) + tm.assert_series_equal(result, expected) + + +def test_type_check(errors): + # see gh-11776 + df = DataFrame({"a": [1, -3.14, 7], "b": ["4", "5", "6"]}) + kwargs = dict(errors=errors) if errors is not None else dict() + error_ctx = pytest.raises(TypeError, match="1-d array") + + with error_ctx: + to_numeric(df, **kwargs) + + +@pytest.mark.parametrize("val", [1, 1.1, 20001]) +def test_scalar(val, signed, transform): + val = -val if signed else val + assert to_numeric(transform(val)) == float(val) + + +def test_really_large_scalar(large_val, signed, transform, errors): + # see gh-24910 + kwargs = dict(errors=errors) if errors is not None else dict() + val = -large_val if signed else large_val + + val = transform(val) + val_is_string = isinstance(val, str) + + if val_is_string and errors in (None, "raise"): + msg = "Integer out of range. at position 0" with pytest.raises(ValueError, match=msg): - to_numeric(s, errors='raise') - - res = to_numeric(s, errors='ignore') - expected = pd.Series([True, False, 'apple']) - tm.assert_series_equal(res, expected) - - # coerces to float - res = to_numeric(s, errors='coerce') - expected = pd.Series([1., 0., np.nan]) - tm.assert_series_equal(res, expected) - - def test_list(self): - s = ['1', '-3.14', '7'] - res = to_numeric(s) - expected = np.array([1, -3.14, 7]) - tm.assert_numpy_array_equal(res, expected) - - def test_list_numeric(self): - s = [1, 3, 4, 5] - res = to_numeric(s) - tm.assert_numpy_array_equal(res, np.array(s, dtype=np.int64)) - - s = [1., 3., 4., 5.] - res = to_numeric(s) - tm.assert_numpy_array_equal(res, np.array(s)) - - # bool is regarded as numeric - s = [True, False, True, True] - res = to_numeric(s) - tm.assert_numpy_array_equal(res, np.array(s)) - - def test_numeric(self): - s = pd.Series([1, -3.14, 7], dtype='O') - res = to_numeric(s) - expected = pd.Series([1, -3.14, 7]) - tm.assert_series_equal(res, expected) - - s = pd.Series([1, -3.14, 7]) - res = to_numeric(s) - tm.assert_series_equal(res, expected) - - # GH 14827 - df = pd.DataFrame(dict( - a=[1.2, decimal.Decimal(3.14), decimal.Decimal("infinity"), '0.1'], - b=[1.0, 2.0, 3.0, 4.0], - )) - expected = pd.DataFrame(dict( - a=[1.2, 3.14, np.inf, 0.1], - b=[1.0, 2.0, 3.0, 4.0], - )) - - # Test to_numeric over one column - df_copy = df.copy() - df_copy['a'] = df_copy['a'].apply(to_numeric) - tm.assert_frame_equal(df_copy, expected) - - # Test to_numeric over multiple columns - df_copy = df.copy() - df_copy[['a', 'b']] = df_copy[['a', 'b']].apply(to_numeric) - tm.assert_frame_equal(df_copy, expected) - - def test_numeric_lists_and_arrays(self): - # Test to_numeric with embedded lists and arrays - df = pd.DataFrame(dict( - a=[[decimal.Decimal(3.14), 1.0], decimal.Decimal(1.6), 0.1] - )) - df['a'] = df['a'].apply(to_numeric) - expected = pd.DataFrame(dict( - a=[[3.14, 1.0], 1.6, 0.1], - )) - tm.assert_frame_equal(df, expected) - - df = pd.DataFrame(dict( - a=[np.array([decimal.Decimal(3.14), 1.0]), 0.1] - )) - df['a'] = df['a'].apply(to_numeric) - expected = pd.DataFrame(dict( - a=[[3.14, 1.0], 0.1], - )) - tm.assert_frame_equal(df, expected) - - def test_all_nan(self): - s = pd.Series(['a', 'b', 'c']) - res = to_numeric(s, errors='coerce') - expected = pd.Series([np.nan, np.nan, np.nan]) - tm.assert_series_equal(res, expected) - - @pytest.mark.parametrize("errors", [None, "ignore", "raise", "coerce"]) - def test_type_check(self, errors): - # see gh-11776 - df = pd.DataFrame({"a": [1, -3.14, 7], "b": ["4", "5", "6"]}) - kwargs = dict(errors=errors) if errors is not None else dict() - error_ctx = pytest.raises(TypeError, match="1-d array") - - with error_ctx: - to_numeric(df, **kwargs) - - def test_scalar(self): - assert pd.to_numeric(1) == 1 - assert pd.to_numeric(1.1) == 1.1 - - assert pd.to_numeric('1') == 1 - assert pd.to_numeric('1.1') == 1.1 - - with pytest.raises(ValueError): - to_numeric('XX', errors='raise') - - assert to_numeric('XX', errors='ignore') == 'XX' - assert np.isnan(to_numeric('XX', errors='coerce')) - - def test_numeric_dtypes(self): - idx = pd.Index([1, 2, 3], name='xxx') - res = pd.to_numeric(idx) - tm.assert_index_equal(res, idx) - - res = pd.to_numeric(pd.Series(idx, name='xxx')) - tm.assert_series_equal(res, pd.Series(idx, name='xxx')) - - res = pd.to_numeric(idx.values) - tm.assert_numpy_array_equal(res, idx.values) - - idx = pd.Index([1., np.nan, 3., np.nan], name='xxx') - res = pd.to_numeric(idx) - tm.assert_index_equal(res, idx) - - res = pd.to_numeric(pd.Series(idx, name='xxx')) - tm.assert_series_equal(res, pd.Series(idx, name='xxx')) - - res = pd.to_numeric(idx.values) - tm.assert_numpy_array_equal(res, idx.values) - - def test_str(self): - idx = pd.Index(['1', '2', '3'], name='xxx') - exp = np.array([1, 2, 3], dtype='int64') - res = pd.to_numeric(idx) - tm.assert_index_equal(res, pd.Index(exp, name='xxx')) - - res = pd.to_numeric(pd.Series(idx, name='xxx')) - tm.assert_series_equal(res, pd.Series(exp, name='xxx')) - - res = pd.to_numeric(idx.values) - tm.assert_numpy_array_equal(res, exp) - - idx = pd.Index(['1.5', '2.7', '3.4'], name='xxx') - exp = np.array([1.5, 2.7, 3.4]) - res = pd.to_numeric(idx) - tm.assert_index_equal(res, pd.Index(exp, name='xxx')) - - res = pd.to_numeric(pd.Series(idx, name='xxx')) - tm.assert_series_equal(res, pd.Series(exp, name='xxx')) - - res = pd.to_numeric(idx.values) - tm.assert_numpy_array_equal(res, exp) - - def test_datetime_like(self, tz_naive_fixture): - idx = pd.date_range("20130101", periods=3, - tz=tz_naive_fixture, name="xxx") - res = pd.to_numeric(idx) - tm.assert_index_equal(res, pd.Index(idx.asi8, name="xxx")) - - res = pd.to_numeric(pd.Series(idx, name="xxx")) - tm.assert_series_equal(res, pd.Series(idx.asi8, name="xxx")) - - res = pd.to_numeric(idx.values) - tm.assert_numpy_array_equal(res, idx.asi8) - - def test_timedelta(self): - idx = pd.timedelta_range('1 days', periods=3, freq='D', name='xxx') - res = pd.to_numeric(idx) - tm.assert_index_equal(res, pd.Index(idx.asi8, name='xxx')) - - res = pd.to_numeric(pd.Series(idx, name='xxx')) - tm.assert_series_equal(res, pd.Series(idx.asi8, name='xxx')) - - res = pd.to_numeric(idx.values) - tm.assert_numpy_array_equal(res, idx.asi8) - - def test_period(self): - idx = pd.period_range('2011-01', periods=3, freq='M', name='xxx') - res = pd.to_numeric(idx) - tm.assert_index_equal(res, pd.Index(idx.asi8, name='xxx')) - - # TODO: enable when we can support native PeriodDtype - # res = pd.to_numeric(pd.Series(idx, name='xxx')) - # tm.assert_series_equal(res, pd.Series(idx.asi8, name='xxx')) + to_numeric(val, **kwargs) + else: + expected = float(val) if (errors == "coerce" and + val_is_string) else val + assert tm.assert_almost_equal(to_numeric(val, **kwargs), expected) + - def test_non_hashable(self): - # Test for Bug #13324 - s = pd.Series([[10.0, 2], 1.0, 'apple']) - res = pd.to_numeric(s, errors='coerce') - tm.assert_series_equal(res, pd.Series([np.nan, 1.0, np.nan])) +def test_really_large_in_arr(large_val, signed, transform, + multiple_elts, errors): + # see gh-24910 + kwargs = dict(errors=errors) if errors is not None else dict() + val = -large_val if signed else large_val + val = transform(val) - res = pd.to_numeric(s, errors='ignore') - tm.assert_series_equal(res, pd.Series([[10.0, 2], 1.0, 'apple'])) + extra_elt = "string" + arr = [val] + multiple_elts * [extra_elt] - with pytest.raises(TypeError, match="Invalid object type"): - pd.to_numeric(s) + val_is_string = isinstance(val, str) + coercing = errors == "coerce" - @pytest.mark.parametrize("data", [ - ["1", 2, 3], - [1, 2, 3], - np.array(["1970-01-02", "1970-01-03", - "1970-01-04"], dtype="datetime64[D]") - ]) - def test_downcast_basic(self, data): - # see gh-13352 - invalid_downcast = "unsigned-integer" - msg = "invalid downcasting method provided" + if errors in (None, "raise") and (val_is_string or multiple_elts): + if val_is_string: + msg = "Integer out of range. at position 0" + else: + msg = 'Unable to parse string "string" at position 1' with pytest.raises(ValueError, match=msg): - pd.to_numeric(data, downcast=invalid_downcast) - - expected = np.array([1, 2, 3], dtype=np.int64) - - # Basic function tests. - res = pd.to_numeric(data) - tm.assert_numpy_array_equal(res, expected) - - res = pd.to_numeric(data, downcast=None) - tm.assert_numpy_array_equal(res, expected) - - # Basic dtype support. - smallest_uint_dtype = np.dtype(np.typecodes["UnsignedInteger"][0]) - - # Support below np.float32 is rare and far between. - float_32_char = np.dtype(np.float32).char - smallest_float_dtype = float_32_char - - expected = np.array([1, 2, 3], dtype=smallest_uint_dtype) - res = pd.to_numeric(data, downcast="unsigned") - tm.assert_numpy_array_equal(res, expected) - - expected = np.array([1, 2, 3], dtype=smallest_float_dtype) - res = pd.to_numeric(data, downcast="float") - tm.assert_numpy_array_equal(res, expected) - - @pytest.mark.parametrize("signed_downcast", ["integer", "signed"]) - @pytest.mark.parametrize("data", [ - ["1", 2, 3], - [1, 2, 3], - np.array(["1970-01-02", "1970-01-03", - "1970-01-04"], dtype="datetime64[D]") - ]) - def test_signed_downcast(self, data, signed_downcast): - # see gh-13352 - smallest_int_dtype = np.dtype(np.typecodes["Integer"][0]) - expected = np.array([1, 2, 3], dtype=smallest_int_dtype) - - res = pd.to_numeric(data, downcast=signed_downcast) - tm.assert_numpy_array_equal(res, expected) - - def test_ignore_downcast_invalid_data(self): - # If we can't successfully cast the given - # data to a numeric dtype, do not bother - # with the downcast parameter. - data = ["foo", 2, 3] - expected = np.array(data, dtype=object) - - res = pd.to_numeric(data, errors="ignore", - downcast="unsigned") - tm.assert_numpy_array_equal(res, expected) - - def test_ignore_downcast_neg_to_unsigned(self): - # Cannot cast to an unsigned integer - # because we have a negative number. - data = ["-1", 2, 3] - expected = np.array([-1, 2, 3], dtype=np.int64) - - res = pd.to_numeric(data, downcast="unsigned") - tm.assert_numpy_array_equal(res, expected) - - @pytest.mark.parametrize("downcast", ["integer", "signed", "unsigned"]) - @pytest.mark.parametrize("data,expected", [ - (["1.1", 2, 3], - np.array([1.1, 2, 3], dtype=np.float64)), - ([10000.0, 20000, 3000, 40000.36, 50000, 50000.00], - np.array([10000.0, 20000, 3000, - 40000.36, 50000, 50000.00], dtype=np.float64)) - ]) - def test_ignore_downcast_cannot_convert_float( - self, data, expected, downcast): - # Cannot cast to an integer (signed or unsigned) - # because we have a float number. - res = pd.to_numeric(data, downcast=downcast) - tm.assert_numpy_array_equal(res, expected) - - @pytest.mark.parametrize("downcast,expected_dtype", [ - ("integer", np.int16), - ("signed", np.int16), - ("unsigned", np.uint16) - ]) - def test_downcast_not8bit(self, downcast, expected_dtype): - # the smallest integer dtype need not be np.(u)int8 - data = ["256", 257, 258] - - expected = np.array([256, 257, 258], dtype=expected_dtype) - res = pd.to_numeric(data, downcast=downcast) - tm.assert_numpy_array_equal(res, expected) - - @pytest.mark.parametrize("dtype,downcast,min_max", [ - ("int8", "integer", [iinfo(np.int8).min, - iinfo(np.int8).max]), - ("int16", "integer", [iinfo(np.int16).min, - iinfo(np.int16).max]), - ('int32', "integer", [iinfo(np.int32).min, - iinfo(np.int32).max]), - ('int64', "integer", [iinfo(np.int64).min, - iinfo(np.int64).max]), - ('uint8', "unsigned", [iinfo(np.uint8).min, - iinfo(np.uint8).max]), - ('uint16', "unsigned", [iinfo(np.uint16).min, - iinfo(np.uint16).max]), - ('uint32', "unsigned", [iinfo(np.uint32).min, - iinfo(np.uint32).max]), - ('uint64', "unsigned", [iinfo(np.uint64).min, - iinfo(np.uint64).max]), - ('int16', "integer", [iinfo(np.int8).min, - iinfo(np.int8).max + 1]), - ('int32', "integer", [iinfo(np.int16).min, - iinfo(np.int16).max + 1]), - ('int64', "integer", [iinfo(np.int32).min, - iinfo(np.int32).max + 1]), - ('int16', "integer", [iinfo(np.int8).min - 1, - iinfo(np.int16).max]), - ('int32', "integer", [iinfo(np.int16).min - 1, - iinfo(np.int32).max]), - ('int64', "integer", [iinfo(np.int32).min - 1, - iinfo(np.int64).max]), - ('uint16', "unsigned", [iinfo(np.uint8).min, - iinfo(np.uint8).max + 1]), - ('uint32', "unsigned", [iinfo(np.uint16).min, - iinfo(np.uint16).max + 1]), - ('uint64', "unsigned", [iinfo(np.uint32).min, - iinfo(np.uint32).max + 1]) - ]) - def test_downcast_limits(self, dtype, downcast, min_max): - # see gh-14404: test the limits of each downcast. - series = pd.to_numeric(pd.Series(min_max), downcast=downcast) - assert series.dtype == dtype - - def test_coerce_uint64_conflict(self): - # see gh-17007 and gh-17125 - # - # Still returns float despite the uint64-nan conflict, - # which would normally force the casting to object. - df = pd.DataFrame({"a": [200, 300, "", "NaN", 30000000000000000000]}) - expected = pd.Series([200, 300, np.nan, np.nan, - 30000000000000000000], dtype=float, name="a") - result = to_numeric(df["a"], errors="coerce") - tm.assert_series_equal(result, expected) + to_numeric(arr, **kwargs) + else: + result = to_numeric(arr, **kwargs) + + exp_val = float(val) if (coercing and val_is_string) else val + expected = [exp_val] + + if multiple_elts: + if coercing: + expected.append(np.nan) + exp_dtype = float + else: + expected.append(extra_elt) + exp_dtype = object + else: + exp_dtype = float if isinstance(exp_val, ( + int, compat.long, float)) else object + + tm.assert_almost_equal(result, np.array(expected, dtype=exp_dtype)) + + +def test_really_large_in_arr_consistent(large_val, signed, + multiple_elts, errors): + # see gh-24910 + # + # Even if we discover that we have to hold float, does not mean + # we should be lenient on subsequent elements that fail to be integer. + kwargs = dict(errors=errors) if errors is not None else dict() + arr = [str(-large_val if signed else large_val)] + + if multiple_elts: + arr.insert(0, large_val) + + if errors in (None, "raise"): + index = int(multiple_elts) + msg = "Integer out of range. at position {index}".format(index=index) - s = pd.Series(["12345678901234567890", "1234567890", "ITEM"]) - expected = pd.Series([12345678901234567890, - 1234567890, np.nan], dtype=float) - result = to_numeric(s, errors="coerce") + with pytest.raises(ValueError, match=msg): + to_numeric(arr, **kwargs) + else: + result = to_numeric(arr, **kwargs) + + if errors == "coerce": + expected = [float(i) for i in arr] + exp_dtype = float + else: + expected = arr + exp_dtype = object + + tm.assert_almost_equal(result, np.array(expected, dtype=exp_dtype)) + + +@pytest.mark.parametrize("errors,checker", [ + ("raise", 'Unable to parse string "fail" at position 0'), + ("ignore", lambda x: x == "fail"), + ("coerce", lambda x: np.isnan(x)) +]) +def test_scalar_fail(errors, checker): + scalar = "fail" + + if isinstance(checker, str): + with pytest.raises(ValueError, match=checker): + to_numeric(scalar, errors=errors) + else: + assert checker(to_numeric(scalar, errors=errors)) + + +@pytest.mark.parametrize("data", [ + [1, 2, 3], + [1., np.nan, 3, np.nan] +]) +def test_numeric_dtypes(data, transform_assert_equal): + transform, assert_equal = transform_assert_equal + data = transform(data) + + result = to_numeric(data) + assert_equal(result, data) + + +@pytest.mark.parametrize("data,exp", [ + (["1", "2", "3"], np.array([1, 2, 3], dtype="int64")), + (["1.5", "2.7", "3.4"], np.array([1.5, 2.7, 3.4])) +]) +def test_str(data, exp, transform_assert_equal): + transform, assert_equal = transform_assert_equal + result = to_numeric(transform(data)) + + expected = transform(exp) + assert_equal(result, expected) + + +def test_datetime_like(tz_naive_fixture, transform_assert_equal): + transform, assert_equal = transform_assert_equal + idx = pd.date_range("20130101", periods=3, tz=tz_naive_fixture) + + result = to_numeric(transform(idx)) + expected = transform(idx.asi8) + assert_equal(result, expected) + + +def test_timedelta(transform_assert_equal): + transform, assert_equal = transform_assert_equal + idx = pd.timedelta_range("1 days", periods=3, freq="D") + + result = to_numeric(transform(idx)) + expected = transform(idx.asi8) + assert_equal(result, expected) + + +def test_period(transform_assert_equal): + transform, assert_equal = transform_assert_equal + + idx = pd.period_range("2011-01", periods=3, freq="M", name="") + inp = transform(idx) + + if isinstance(inp, Index): + result = to_numeric(inp) + expected = transform(idx.asi8) + assert_equal(result, expected) + else: + # TODO: PeriodDtype, so support it in to_numeric. + pytest.skip("Missing PeriodDtype support in to_numeric") + + +@pytest.mark.parametrize("errors,expected", [ + ("raise", "Invalid object type at position 0"), + ("ignore", Series([[10.0, 2], 1.0, "apple"])), + ("coerce", Series([np.nan, 1.0, np.nan])) +]) +def test_non_hashable(errors, expected): + # see gh-13324 + ser = Series([[10.0, 2], 1.0, "apple"]) + + if isinstance(expected, str): + with pytest.raises(TypeError, match=expected): + to_numeric(ser, errors=errors) + else: + result = to_numeric(ser, errors=errors) tm.assert_series_equal(result, expected) - # For completeness, check against "ignore" and "raise" - result = to_numeric(s, errors="ignore") - tm.assert_series_equal(result, s) - msg = "Unable to parse string" - with pytest.raises(ValueError, match=msg): - to_numeric(s, errors="raise") +def test_downcast_invalid_cast(): + # see gh-13352 + data = ["1", 2, 3] + invalid_downcast = "unsigned-integer" + msg = "invalid downcasting method provided" + + with pytest.raises(ValueError, match=msg): + to_numeric(data, downcast=invalid_downcast) + + +@pytest.mark.parametrize("data", [ + ["1", 2, 3], + [1, 2, 3], + np.array(["1970-01-02", "1970-01-03", + "1970-01-04"], dtype="datetime64[D]") +]) +@pytest.mark.parametrize("kwargs,exp_dtype", [ + # Basic function tests. + (dict(), np.int64), + (dict(downcast=None), np.int64), + + # Support below np.float32 is rare and far between. + (dict(downcast="float"), np.dtype(np.float32).char), + + # Basic dtype support. + (dict(downcast="unsigned"), np.dtype(np.typecodes["UnsignedInteger"][0])) +]) +def test_downcast_basic(data, kwargs, exp_dtype): + # see gh-13352 + result = to_numeric(data, **kwargs) + expected = np.array([1, 2, 3], dtype=exp_dtype) + tm.assert_numpy_array_equal(result, expected) + + +@pytest.mark.parametrize("signed_downcast", ["integer", "signed"]) +@pytest.mark.parametrize("data", [ + ["1", 2, 3], + [1, 2, 3], + np.array(["1970-01-02", "1970-01-03", + "1970-01-04"], dtype="datetime64[D]") +]) +def test_signed_downcast(data, signed_downcast): + # see gh-13352 + smallest_int_dtype = np.dtype(np.typecodes["Integer"][0]) + expected = np.array([1, 2, 3], dtype=smallest_int_dtype) + + res = to_numeric(data, downcast=signed_downcast) + tm.assert_numpy_array_equal(res, expected) + + +def test_ignore_downcast_invalid_data(): + # If we can't successfully cast the given + # data to a numeric dtype, do not bother + # with the downcast parameter. + data = ["foo", 2, 3] + expected = np.array(data, dtype=object) + + res = to_numeric(data, errors="ignore", + downcast="unsigned") + tm.assert_numpy_array_equal(res, expected) + + +def test_ignore_downcast_neg_to_unsigned(): + # Cannot cast to an unsigned integer + # because we have a negative number. + data = ["-1", 2, 3] + expected = np.array([-1, 2, 3], dtype=np.int64) + + res = to_numeric(data, downcast="unsigned") + tm.assert_numpy_array_equal(res, expected) + + +@pytest.mark.parametrize("downcast", ["integer", "signed", "unsigned"]) +@pytest.mark.parametrize("data,expected", [ + (["1.1", 2, 3], + np.array([1.1, 2, 3], dtype=np.float64)), + ([10000.0, 20000, 3000, 40000.36, 50000, 50000.00], + np.array([10000.0, 20000, 3000, + 40000.36, 50000, 50000.00], dtype=np.float64)) +]) +def test_ignore_downcast_cannot_convert_float(data, expected, downcast): + # Cannot cast to an integer (signed or unsigned) + # because we have a float number. + res = to_numeric(data, downcast=downcast) + tm.assert_numpy_array_equal(res, expected) + + +@pytest.mark.parametrize("downcast,expected_dtype", [ + ("integer", np.int16), + ("signed", np.int16), + ("unsigned", np.uint16) +]) +def test_downcast_not8bit(downcast, expected_dtype): + # the smallest integer dtype need not be np.(u)int8 + data = ["256", 257, 258] + + expected = np.array([256, 257, 258], dtype=expected_dtype) + res = to_numeric(data, downcast=downcast) + tm.assert_numpy_array_equal(res, expected) + + +@pytest.mark.parametrize("dtype,downcast,min_max", [ + ("int8", "integer", [iinfo(np.int8).min, + iinfo(np.int8).max]), + ("int16", "integer", [iinfo(np.int16).min, + iinfo(np.int16).max]), + ("int32", "integer", [iinfo(np.int32).min, + iinfo(np.int32).max]), + ("int64", "integer", [iinfo(np.int64).min, + iinfo(np.int64).max]), + ("uint8", "unsigned", [iinfo(np.uint8).min, + iinfo(np.uint8).max]), + ("uint16", "unsigned", [iinfo(np.uint16).min, + iinfo(np.uint16).max]), + ("uint32", "unsigned", [iinfo(np.uint32).min, + iinfo(np.uint32).max]), + ("uint64", "unsigned", [iinfo(np.uint64).min, + iinfo(np.uint64).max]), + ("int16", "integer", [iinfo(np.int8).min, + iinfo(np.int8).max + 1]), + ("int32", "integer", [iinfo(np.int16).min, + iinfo(np.int16).max + 1]), + ("int64", "integer", [iinfo(np.int32).min, + iinfo(np.int32).max + 1]), + ("int16", "integer", [iinfo(np.int8).min - 1, + iinfo(np.int16).max]), + ("int32", "integer", [iinfo(np.int16).min - 1, + iinfo(np.int32).max]), + ("int64", "integer", [iinfo(np.int32).min - 1, + iinfo(np.int64).max]), + ("uint16", "unsigned", [iinfo(np.uint8).min, + iinfo(np.uint8).max + 1]), + ("uint32", "unsigned", [iinfo(np.uint16).min, + iinfo(np.uint16).max + 1]), + ("uint64", "unsigned", [iinfo(np.uint32).min, + iinfo(np.uint32).max + 1]) +]) +def test_downcast_limits(dtype, downcast, min_max): + # see gh-14404: test the limits of each downcast. + series = to_numeric(Series(min_max), downcast=downcast) + assert series.dtype == dtype + + +@pytest.mark.parametrize("data,exp_data", [ + ([200, 300, "", "NaN", 30000000000000000000], + [200, 300, np.nan, np.nan, 30000000000000000000]), + (["12345678901234567890", "1234567890", "ITEM"], + [12345678901234567890, 1234567890, np.nan]) +]) +def test_coerce_uint64_conflict(data, exp_data): + # see gh-17007 and gh-17125 + # + # Still returns float despite the uint64-nan conflict, + # which would normally force the casting to object. + result = to_numeric(Series(data), errors="coerce") + expected = Series(exp_data, dtype=float) + tm.assert_series_equal(result, expected) + + +@pytest.mark.parametrize("errors,exp", [ + ("ignore", Series(["12345678901234567890", "1234567890", "ITEM"])), + ("raise", "Unable to parse string") +]) +def test_non_coerce_uint64_conflict(errors, exp): + # see gh-17007 and gh-17125 + # + # For completeness. + ser = Series(["12345678901234567890", "1234567890", "ITEM"]) + + if isinstance(exp, str): + with pytest.raises(ValueError, match=exp): + to_numeric(ser, errors=errors) + else: + result = to_numeric(ser, errors=errors) + tm.assert_series_equal(result, ser) diff --git a/doc/source/api/scalars.rst b/pandas/tests/tseries/holiday/__init__.py similarity index 100% rename from doc/source/api/scalars.rst rename to pandas/tests/tseries/holiday/__init__.py diff --git a/pandas/tests/tseries/holiday/test_calendar.py b/pandas/tests/tseries/holiday/test_calendar.py new file mode 100644 index 0000000000000..a5cc4095ce583 --- /dev/null +++ b/pandas/tests/tseries/holiday/test_calendar.py @@ -0,0 +1,77 @@ +from datetime import datetime + +import pytest + +from pandas import DatetimeIndex +import pandas.util.testing as tm + +from pandas.tseries.holiday import ( + AbstractHolidayCalendar, Holiday, Timestamp, USFederalHolidayCalendar, + USThanksgivingDay, get_calendar) + + +@pytest.mark.parametrize("transform", [ + lambda x: x, + lambda x: x.strftime("%Y-%m-%d"), + lambda x: Timestamp(x) +]) +def test_calendar(transform): + start_date = datetime(2012, 1, 1) + end_date = datetime(2012, 12, 31) + + calendar = USFederalHolidayCalendar() + holidays = calendar.holidays(transform(start_date), transform(end_date)) + + expected = [ + datetime(2012, 1, 2), + datetime(2012, 1, 16), + datetime(2012, 2, 20), + datetime(2012, 5, 28), + datetime(2012, 7, 4), + datetime(2012, 9, 3), + datetime(2012, 10, 8), + datetime(2012, 11, 12), + datetime(2012, 11, 22), + datetime(2012, 12, 25) + ] + + assert list(holidays.to_pydatetime()) == expected + + +def test_calendar_caching(): + # see gh-9552. + + class TestCalendar(AbstractHolidayCalendar): + def __init__(self, name=None, rules=None): + super(TestCalendar, self).__init__(name=name, rules=rules) + + jan1 = TestCalendar(rules=[Holiday("jan1", year=2015, month=1, day=1)]) + jan2 = TestCalendar(rules=[Holiday("jan2", year=2015, month=1, day=2)]) + + # Getting holidays for Jan 1 should not alter results for Jan 2. + tm.assert_index_equal(jan1.holidays(), DatetimeIndex(["01-Jan-2015"])) + tm.assert_index_equal(jan2.holidays(), DatetimeIndex(["02-Jan-2015"])) + + +def test_calendar_observance_dates(): + # see gh-11477 + us_fed_cal = get_calendar("USFederalHolidayCalendar") + holidays0 = us_fed_cal.holidays(datetime(2015, 7, 3), datetime( + 2015, 7, 3)) # <-- same start and end dates + holidays1 = us_fed_cal.holidays(datetime(2015, 7, 3), datetime( + 2015, 7, 6)) # <-- different start and end dates + holidays2 = us_fed_cal.holidays(datetime(2015, 7, 3), datetime( + 2015, 7, 3)) # <-- same start and end dates + + # These should all produce the same result. + # + # In addition, calling with different start and end + # dates should not alter the output if we call the + # function again with the same start and end date. + tm.assert_index_equal(holidays0, holidays1) + tm.assert_index_equal(holidays0, holidays2) + + +def test_rule_from_name(): + us_fed_cal = get_calendar("USFederalHolidayCalendar") + assert us_fed_cal.rule_from_name("Thanksgiving") == USThanksgivingDay diff --git a/pandas/tests/tseries/holiday/test_federal.py b/pandas/tests/tseries/holiday/test_federal.py new file mode 100644 index 0000000000000..62b5ab2b849ae --- /dev/null +++ b/pandas/tests/tseries/holiday/test_federal.py @@ -0,0 +1,36 @@ +from datetime import datetime + +from pandas.tseries.holiday import ( + AbstractHolidayCalendar, USMartinLutherKingJr, USMemorialDay) + + +def test_no_mlk_before_1986(): + # see gh-10278 + class MLKCalendar(AbstractHolidayCalendar): + rules = [USMartinLutherKingJr] + + holidays = MLKCalendar().holidays(start="1984", + end="1988").to_pydatetime().tolist() + + # Testing to make sure holiday is not incorrectly observed before 1986. + assert holidays == [datetime(1986, 1, 20, 0, 0), + datetime(1987, 1, 19, 0, 0)] + + +def test_memorial_day(): + class MemorialDay(AbstractHolidayCalendar): + rules = [USMemorialDay] + + holidays = MemorialDay().holidays(start="1971", + end="1980").to_pydatetime().tolist() + + # Fixes 5/31 error and checked manually against Wikipedia. + assert holidays == [datetime(1971, 5, 31, 0, 0), + datetime(1972, 5, 29, 0, 0), + datetime(1973, 5, 28, 0, 0), + datetime(1974, 5, 27, 0, 0), + datetime(1975, 5, 26, 0, 0), + datetime(1976, 5, 31, 0, 0), + datetime(1977, 5, 30, 0, 0), + datetime(1978, 5, 29, 0, 0), + datetime(1979, 5, 28, 0, 0)] diff --git a/pandas/tests/tseries/holiday/test_holiday.py b/pandas/tests/tseries/holiday/test_holiday.py new file mode 100644 index 0000000000000..27bba1cc89dee --- /dev/null +++ b/pandas/tests/tseries/holiday/test_holiday.py @@ -0,0 +1,193 @@ +from datetime import datetime + +import pytest +from pytz import utc + +import pandas.util.testing as tm + +from pandas.tseries.holiday import ( + MO, SA, AbstractHolidayCalendar, DateOffset, EasterMonday, GoodFriday, + Holiday, HolidayCalendarFactory, Timestamp, USColumbusDay, USLaborDay, + USMartinLutherKingJr, USMemorialDay, USPresidentsDay, USThanksgivingDay, + get_calendar, next_monday) + + +def _check_holiday_results(holiday, start, end, expected): + """ + Check that the dates for a given holiday match in date and timezone. + + Parameters + ---------- + holiday : Holiday + The holiday to check. + start : datetime-like + The start date of range in which to collect dates for a given holiday. + end : datetime-like + The end date of range in which to collect dates for a given holiday. + expected : list + The list of dates we expect to get. + """ + assert list(holiday.dates(start, end)) == expected + + # Verify that timezone info is preserved. + assert (list(holiday.dates(utc.localize(Timestamp(start)), + utc.localize(Timestamp(end)))) == + [utc.localize(dt) for dt in expected]) + + +@pytest.mark.parametrize("holiday,start_date,end_date,expected", [ + (USMemorialDay, datetime(2011, 1, 1), datetime(2020, 12, 31), + [datetime(2011, 5, 30), datetime(2012, 5, 28), datetime(2013, 5, 27), + datetime(2014, 5, 26), datetime(2015, 5, 25), datetime(2016, 5, 30), + datetime(2017, 5, 29), datetime(2018, 5, 28), datetime(2019, 5, 27), + datetime(2020, 5, 25)]), + + (Holiday("July 4th Eve", month=7, day=3), "2001-01-01", "2003-03-03", + [Timestamp("2001-07-03 00:00:00"), Timestamp("2002-07-03 00:00:00")]), + (Holiday("July 4th Eve", month=7, day=3, days_of_week=(0, 1, 2, 3)), + "2001-01-01", "2008-03-03", [ + Timestamp("2001-07-03 00:00:00"), Timestamp("2002-07-03 00:00:00"), + Timestamp("2003-07-03 00:00:00"), Timestamp("2006-07-03 00:00:00"), + Timestamp("2007-07-03 00:00:00")]), + + (EasterMonday, datetime(2011, 1, 1), datetime(2020, 12, 31), + [Timestamp("2011-04-25 00:00:00"), Timestamp("2012-04-09 00:00:00"), + Timestamp("2013-04-01 00:00:00"), Timestamp("2014-04-21 00:00:00"), + Timestamp("2015-04-06 00:00:00"), Timestamp("2016-03-28 00:00:00"), + Timestamp("2017-04-17 00:00:00"), Timestamp("2018-04-02 00:00:00"), + Timestamp("2019-04-22 00:00:00"), Timestamp("2020-04-13 00:00:00")]), + (GoodFriday, datetime(2011, 1, 1), datetime(2020, 12, 31), + [Timestamp("2011-04-22 00:00:00"), Timestamp("2012-04-06 00:00:00"), + Timestamp("2013-03-29 00:00:00"), Timestamp("2014-04-18 00:00:00"), + Timestamp("2015-04-03 00:00:00"), Timestamp("2016-03-25 00:00:00"), + Timestamp("2017-04-14 00:00:00"), Timestamp("2018-03-30 00:00:00"), + Timestamp("2019-04-19 00:00:00"), Timestamp("2020-04-10 00:00:00")]), + + (USThanksgivingDay, datetime(2011, 1, 1), datetime(2020, 12, 31), + [datetime(2011, 11, 24), datetime(2012, 11, 22), datetime(2013, 11, 28), + datetime(2014, 11, 27), datetime(2015, 11, 26), datetime(2016, 11, 24), + datetime(2017, 11, 23), datetime(2018, 11, 22), datetime(2019, 11, 28), + datetime(2020, 11, 26)]) +]) +def test_holiday_dates(holiday, start_date, end_date, expected): + _check_holiday_results(holiday, start_date, end_date, expected) + + +@pytest.mark.parametrize("holiday,start,expected", [ + (USMemorialDay, datetime(2015, 7, 1), []), + (USMemorialDay, "2015-05-25", "2015-05-25"), + + (USLaborDay, datetime(2015, 7, 1), []), + (USLaborDay, "2015-09-07", "2015-09-07"), + + (USColumbusDay, datetime(2015, 7, 1), []), + (USColumbusDay, "2015-10-12", "2015-10-12"), + + (USThanksgivingDay, datetime(2015, 7, 1), []), + (USThanksgivingDay, "2015-11-26", "2015-11-26"), + + (USMartinLutherKingJr, datetime(2015, 7, 1), []), + (USMartinLutherKingJr, "2015-01-19", "2015-01-19"), + + (USPresidentsDay, datetime(2015, 7, 1), []), + (USPresidentsDay, "2015-02-16", "2015-02-16"), + + (GoodFriday, datetime(2015, 7, 1), []), + (GoodFriday, "2015-04-03", "2015-04-03"), + + (EasterMonday, "2015-04-06", "2015-04-06"), + (EasterMonday, datetime(2015, 7, 1), []), + (EasterMonday, "2015-04-05", []), + + ("New Years Day", "2015-01-01", "2015-01-01"), + ("New Years Day", "2010-12-31", "2010-12-31"), + ("New Years Day", datetime(2015, 7, 1), []), + ("New Years Day", "2011-01-01", []), + + ("July 4th", "2015-07-03", "2015-07-03"), + ("July 4th", datetime(2015, 7, 1), []), + ("July 4th", "2015-07-04", []), + + ("Veterans Day", "2012-11-12", "2012-11-12"), + ("Veterans Day", datetime(2015, 7, 1), []), + ("Veterans Day", "2012-11-11", []), + + ("Christmas", "2011-12-26", "2011-12-26"), + ("Christmas", datetime(2015, 7, 1), []), + ("Christmas", "2011-12-25", []), +]) +def test_holidays_within_dates(holiday, start, expected): + # see gh-11477 + # + # Fix holiday behavior where holiday.dates returned dates outside + # start/end date, or observed rules could not be applied because the + # holiday was not in the original date range (e.g., 7/4/2015 -> 7/3/2015). + if isinstance(holiday, str): + calendar = get_calendar("USFederalHolidayCalendar") + holiday = calendar.rule_from_name(holiday) + + if isinstance(expected, str): + expected = [Timestamp(expected)] + + _check_holiday_results(holiday, start, start, expected) + + +@pytest.mark.parametrize("transform", [ + lambda x: x.strftime("%Y-%m-%d"), + lambda x: Timestamp(x) +]) +def test_argument_types(transform): + start_date = datetime(2011, 1, 1) + end_date = datetime(2020, 12, 31) + + holidays = USThanksgivingDay.dates(start_date, end_date) + holidays2 = USThanksgivingDay.dates( + transform(start_date), transform(end_date)) + tm.assert_index_equal(holidays, holidays2) + + +@pytest.mark.parametrize("name,kwargs", [ + ("One-Time", dict(year=2012, month=5, day=28)), + ("Range", dict(month=5, day=28, start_date=datetime(2012, 1, 1), + end_date=datetime(2012, 12, 31), + offset=DateOffset(weekday=MO(1)))) +]) +def test_special_holidays(name, kwargs): + base_date = [datetime(2012, 5, 28)] + holiday = Holiday(name, **kwargs) + + start_date = datetime(2011, 1, 1) + end_date = datetime(2020, 12, 31) + + assert base_date == holiday.dates(start_date, end_date) + + +def test_get_calendar(): + class TestCalendar(AbstractHolidayCalendar): + rules = [] + + calendar = get_calendar("TestCalendar") + assert TestCalendar == calendar.__class__ + + +def test_factory(): + class_1 = HolidayCalendarFactory("MemorialDay", + AbstractHolidayCalendar, + USMemorialDay) + class_2 = HolidayCalendarFactory("Thanksgiving", + AbstractHolidayCalendar, + USThanksgivingDay) + class_3 = HolidayCalendarFactory("Combined", class_1, class_2) + + assert len(class_1.rules) == 1 + assert len(class_2.rules) == 1 + assert len(class_3.rules) == 2 + + +def test_both_offset_observance_raises(): + # see gh-10217 + msg = "Cannot use both offset and observance" + with pytest.raises(NotImplementedError, match=msg): + Holiday("Cyber Monday", month=11, day=1, + offset=[DateOffset(weekday=SA(4))], + observance=next_monday) diff --git a/pandas/tests/tseries/holiday/test_observance.py b/pandas/tests/tseries/holiday/test_observance.py new file mode 100644 index 0000000000000..1c22918b2efd8 --- /dev/null +++ b/pandas/tests/tseries/holiday/test_observance.py @@ -0,0 +1,93 @@ +from datetime import datetime + +import pytest + +from pandas.tseries.holiday import ( + after_nearest_workday, before_nearest_workday, nearest_workday, + next_monday, next_monday_or_tuesday, next_workday, previous_friday, + previous_workday, sunday_to_monday, weekend_to_monday) + +_WEDNESDAY = datetime(2014, 4, 9) +_THURSDAY = datetime(2014, 4, 10) +_FRIDAY = datetime(2014, 4, 11) +_SATURDAY = datetime(2014, 4, 12) +_SUNDAY = datetime(2014, 4, 13) +_MONDAY = datetime(2014, 4, 14) +_TUESDAY = datetime(2014, 4, 15) + + +@pytest.mark.parametrize("day", [_SATURDAY, _SUNDAY]) +def test_next_monday(day): + assert next_monday(day) == _MONDAY + + +@pytest.mark.parametrize("day,expected", [ + (_SATURDAY, _MONDAY), + (_SUNDAY, _TUESDAY), + (_MONDAY, _TUESDAY) +]) +def test_next_monday_or_tuesday(day, expected): + assert next_monday_or_tuesday(day) == expected + + +@pytest.mark.parametrize("day", [_SATURDAY, _SUNDAY]) +def test_previous_friday(day): + assert previous_friday(day) == _FRIDAY + + +def test_sunday_to_monday(): + assert sunday_to_monday(_SUNDAY) == _MONDAY + + +@pytest.mark.parametrize("day,expected", [ + (_SATURDAY, _FRIDAY), + (_SUNDAY, _MONDAY), + (_MONDAY, _MONDAY) +]) +def test_nearest_workday(day, expected): + assert nearest_workday(day) == expected + + +@pytest.mark.parametrize("day,expected", [ + (_SATURDAY, _MONDAY), + (_SUNDAY, _MONDAY), + (_MONDAY, _MONDAY) +]) +def test_weekend_to_monday(day, expected): + assert weekend_to_monday(day) == expected + + +@pytest.mark.parametrize("day,expected", [ + (_SATURDAY, _MONDAY), + (_SUNDAY, _MONDAY), + (_MONDAY, _TUESDAY) +]) +def test_next_workday(day, expected): + assert next_workday(day) == expected + + +@pytest.mark.parametrize("day,expected", [ + (_SATURDAY, _FRIDAY), + (_SUNDAY, _FRIDAY), + (_TUESDAY, _MONDAY) +]) +def test_previous_workday(day, expected): + assert previous_workday(day) == expected + + +@pytest.mark.parametrize("day,expected", [ + (_SATURDAY, _THURSDAY), + (_SUNDAY, _FRIDAY), + (_TUESDAY, _MONDAY) +]) +def test_before_nearest_workday(day, expected): + assert before_nearest_workday(day) == expected + + +@pytest.mark.parametrize("day,expected", [ + (_SATURDAY, _MONDAY), + (_SUNDAY, _TUESDAY), + (_FRIDAY, _MONDAY) +]) +def test_after_nearest_workday(day, expected): + assert after_nearest_workday(day) == expected diff --git a/pandas/tests/tseries/offsets/test_offsets.py b/pandas/tests/tseries/offsets/test_offsets.py index ac3955970587f..621572da57541 100644 --- a/pandas/tests/tseries/offsets/test_offsets.py +++ b/pandas/tests/tseries/offsets/test_offsets.py @@ -257,6 +257,26 @@ def test_offset_n(self, offset_types): mul_offset = offset * 3 assert mul_offset.n == 3 + def test_offset_timedelta64_arg(self, offset_types): + # check that offset._validate_n raises TypeError on a timedelt64 + # object + off = self._get_offset(offset_types) + + td64 = np.timedelta64(4567, 's') + with pytest.raises(TypeError, match="argument must be an integer"): + type(off)(n=td64, **off.kwds) + + def test_offset_mul_ndarray(self, offset_types): + off = self._get_offset(offset_types) + + expected = np.array([[off, off * 2], [off * 3, off * 4]]) + + result = np.array([[1, 2], [3, 4]]) * off + tm.assert_numpy_array_equal(result, expected) + + result = off * np.array([[1, 2], [3, 4]]) + tm.assert_numpy_array_equal(result, expected) + def test_offset_freqstr(self, offset_types): offset = self._get_offset(offset_types) diff --git a/pandas/tests/tseries/offsets/test_ticks.py b/pandas/tests/tseries/offsets/test_ticks.py index f4b012ec1897f..9a8251201f75f 100644 --- a/pandas/tests/tseries/offsets/test_ticks.py +++ b/pandas/tests/tseries/offsets/test_ticks.py @@ -11,6 +11,7 @@ import pytest from pandas import Timedelta, Timestamp +import pandas.util.testing as tm from pandas.tseries import offsets from pandas.tseries.offsets import Hour, Micro, Milli, Minute, Nano, Second @@ -262,6 +263,28 @@ def test_tick_division(cls): assert result.delta == off.delta / .001 +@pytest.mark.parametrize('cls', tick_classes) +def test_tick_rdiv(cls): + off = cls(10) + delta = off.delta + td64 = delta.to_timedelta64() + + with pytest.raises(TypeError): + 2 / off + with pytest.raises(TypeError): + 2.0 / off + + assert (td64 * 2.5) / off == 2.5 + + if cls is not Nano: + # skip pytimedelta for Nano since it gets dropped + assert (delta.to_pytimedelta() * 2) / off == 2 + + result = np.array([2 * td64, td64]) / off + expected = np.array([2., 1.]) + tm.assert_numpy_array_equal(result, expected) + + @pytest.mark.parametrize('cls1', tick_classes) @pytest.mark.parametrize('cls2', tick_classes) def test_tick_zero(cls1, cls2): diff --git a/pandas/tests/tseries/test_holiday.py b/pandas/tests/tseries/test_holiday.py deleted file mode 100644 index 86f154ed1acc2..0000000000000 --- a/pandas/tests/tseries/test_holiday.py +++ /dev/null @@ -1,382 +0,0 @@ -from datetime import datetime - -import pytest -from pytz import utc - -from pandas import DatetimeIndex, compat -import pandas.util.testing as tm - -from pandas.tseries.holiday import ( - MO, SA, AbstractHolidayCalendar, DateOffset, EasterMonday, GoodFriday, - Holiday, HolidayCalendarFactory, Timestamp, USColumbusDay, - USFederalHolidayCalendar, USLaborDay, USMartinLutherKingJr, USMemorialDay, - USPresidentsDay, USThanksgivingDay, after_nearest_workday, - before_nearest_workday, get_calendar, nearest_workday, next_monday, - next_monday_or_tuesday, next_workday, previous_friday, previous_workday, - sunday_to_monday, weekend_to_monday) - - -class TestCalendar(object): - - def setup_method(self, method): - self.holiday_list = [ - datetime(2012, 1, 2), - datetime(2012, 1, 16), - datetime(2012, 2, 20), - datetime(2012, 5, 28), - datetime(2012, 7, 4), - datetime(2012, 9, 3), - datetime(2012, 10, 8), - datetime(2012, 11, 12), - datetime(2012, 11, 22), - datetime(2012, 12, 25)] - - self.start_date = datetime(2012, 1, 1) - self.end_date = datetime(2012, 12, 31) - - def test_calendar(self): - - calendar = USFederalHolidayCalendar() - holidays = calendar.holidays(self.start_date, self.end_date) - - holidays_1 = calendar.holidays( - self.start_date.strftime('%Y-%m-%d'), - self.end_date.strftime('%Y-%m-%d')) - holidays_2 = calendar.holidays( - Timestamp(self.start_date), - Timestamp(self.end_date)) - - assert list(holidays.to_pydatetime()) == self.holiday_list - assert list(holidays_1.to_pydatetime()) == self.holiday_list - assert list(holidays_2.to_pydatetime()) == self.holiday_list - - def test_calendar_caching(self): - # Test for issue #9552 - - class TestCalendar(AbstractHolidayCalendar): - - def __init__(self, name=None, rules=None): - super(TestCalendar, self).__init__(name=name, rules=rules) - - jan1 = TestCalendar(rules=[Holiday('jan1', year=2015, month=1, day=1)]) - jan2 = TestCalendar(rules=[Holiday('jan2', year=2015, month=1, day=2)]) - - tm.assert_index_equal(jan1.holidays(), DatetimeIndex(['01-Jan-2015'])) - tm.assert_index_equal(jan2.holidays(), DatetimeIndex(['02-Jan-2015'])) - - def test_calendar_observance_dates(self): - # Test for issue 11477 - USFedCal = get_calendar('USFederalHolidayCalendar') - holidays0 = USFedCal.holidays(datetime(2015, 7, 3), datetime( - 2015, 7, 3)) # <-- same start and end dates - holidays1 = USFedCal.holidays(datetime(2015, 7, 3), datetime( - 2015, 7, 6)) # <-- different start and end dates - holidays2 = USFedCal.holidays(datetime(2015, 7, 3), datetime( - 2015, 7, 3)) # <-- same start and end dates - - tm.assert_index_equal(holidays0, holidays1) - tm.assert_index_equal(holidays0, holidays2) - - def test_rule_from_name(self): - USFedCal = get_calendar('USFederalHolidayCalendar') - assert USFedCal.rule_from_name('Thanksgiving') == USThanksgivingDay - - -class TestHoliday(object): - - def setup_method(self, method): - self.start_date = datetime(2011, 1, 1) - self.end_date = datetime(2020, 12, 31) - - def check_results(self, holiday, start, end, expected): - assert list(holiday.dates(start, end)) == expected - - # Verify that timezone info is preserved. - assert (list(holiday.dates(utc.localize(Timestamp(start)), - utc.localize(Timestamp(end)))) == - [utc.localize(dt) for dt in expected]) - - def test_usmemorialday(self): - self.check_results(holiday=USMemorialDay, - start=self.start_date, - end=self.end_date, - expected=[ - datetime(2011, 5, 30), - datetime(2012, 5, 28), - datetime(2013, 5, 27), - datetime(2014, 5, 26), - datetime(2015, 5, 25), - datetime(2016, 5, 30), - datetime(2017, 5, 29), - datetime(2018, 5, 28), - datetime(2019, 5, 27), - datetime(2020, 5, 25), - ], ) - - def test_non_observed_holiday(self): - - self.check_results( - Holiday('July 4th Eve', month=7, day=3), - start="2001-01-01", - end="2003-03-03", - expected=[ - Timestamp('2001-07-03 00:00:00'), - Timestamp('2002-07-03 00:00:00') - ] - ) - - self.check_results( - Holiday('July 4th Eve', month=7, day=3, days_of_week=(0, 1, 2, 3)), - start="2001-01-01", - end="2008-03-03", - expected=[ - Timestamp('2001-07-03 00:00:00'), - Timestamp('2002-07-03 00:00:00'), - Timestamp('2003-07-03 00:00:00'), - Timestamp('2006-07-03 00:00:00'), - Timestamp('2007-07-03 00:00:00'), - ] - ) - - def test_easter(self): - - self.check_results(EasterMonday, - start=self.start_date, - end=self.end_date, - expected=[ - Timestamp('2011-04-25 00:00:00'), - Timestamp('2012-04-09 00:00:00'), - Timestamp('2013-04-01 00:00:00'), - Timestamp('2014-04-21 00:00:00'), - Timestamp('2015-04-06 00:00:00'), - Timestamp('2016-03-28 00:00:00'), - Timestamp('2017-04-17 00:00:00'), - Timestamp('2018-04-02 00:00:00'), - Timestamp('2019-04-22 00:00:00'), - Timestamp('2020-04-13 00:00:00'), - ], ) - self.check_results(GoodFriday, - start=self.start_date, - end=self.end_date, - expected=[ - Timestamp('2011-04-22 00:00:00'), - Timestamp('2012-04-06 00:00:00'), - Timestamp('2013-03-29 00:00:00'), - Timestamp('2014-04-18 00:00:00'), - Timestamp('2015-04-03 00:00:00'), - Timestamp('2016-03-25 00:00:00'), - Timestamp('2017-04-14 00:00:00'), - Timestamp('2018-03-30 00:00:00'), - Timestamp('2019-04-19 00:00:00'), - Timestamp('2020-04-10 00:00:00'), - ], ) - - def test_usthanksgivingday(self): - - self.check_results(USThanksgivingDay, - start=self.start_date, - end=self.end_date, - expected=[ - datetime(2011, 11, 24), - datetime(2012, 11, 22), - datetime(2013, 11, 28), - datetime(2014, 11, 27), - datetime(2015, 11, 26), - datetime(2016, 11, 24), - datetime(2017, 11, 23), - datetime(2018, 11, 22), - datetime(2019, 11, 28), - datetime(2020, 11, 26), - ], ) - - def test_holidays_within_dates(self): - # Fix holiday behavior found in #11477 - # where holiday.dates returned dates outside start/end date - # or observed rules could not be applied as the holiday - # was not in the original date range (e.g., 7/4/2015 -> 7/3/2015) - start_date = datetime(2015, 7, 1) - end_date = datetime(2015, 7, 1) - - calendar = get_calendar('USFederalHolidayCalendar') - new_years = calendar.rule_from_name('New Years Day') - july_4th = calendar.rule_from_name('July 4th') - veterans_day = calendar.rule_from_name('Veterans Day') - christmas = calendar.rule_from_name('Christmas') - - # Holiday: (start/end date, holiday) - holidays = {USMemorialDay: ("2015-05-25", "2015-05-25"), - USLaborDay: ("2015-09-07", "2015-09-07"), - USColumbusDay: ("2015-10-12", "2015-10-12"), - USThanksgivingDay: ("2015-11-26", "2015-11-26"), - USMartinLutherKingJr: ("2015-01-19", "2015-01-19"), - USPresidentsDay: ("2015-02-16", "2015-02-16"), - GoodFriday: ("2015-04-03", "2015-04-03"), - EasterMonday: [("2015-04-06", "2015-04-06"), - ("2015-04-05", [])], - new_years: [("2015-01-01", "2015-01-01"), - ("2011-01-01", []), - ("2010-12-31", "2010-12-31")], - july_4th: [("2015-07-03", "2015-07-03"), - ("2015-07-04", [])], - veterans_day: [("2012-11-11", []), - ("2012-11-12", "2012-11-12")], - christmas: [("2011-12-25", []), - ("2011-12-26", "2011-12-26")]} - - for rule, dates in compat.iteritems(holidays): - empty_dates = rule.dates(start_date, end_date) - assert empty_dates.tolist() == [] - - if isinstance(dates, tuple): - dates = [dates] - - for start, expected in dates: - if len(expected): - expected = [Timestamp(expected)] - self.check_results(rule, start, start, expected) - - def test_argument_types(self): - holidays = USThanksgivingDay.dates(self.start_date, self.end_date) - - holidays_1 = USThanksgivingDay.dates( - self.start_date.strftime('%Y-%m-%d'), - self.end_date.strftime('%Y-%m-%d')) - - holidays_2 = USThanksgivingDay.dates( - Timestamp(self.start_date), - Timestamp(self.end_date)) - - tm.assert_index_equal(holidays, holidays_1) - tm.assert_index_equal(holidays, holidays_2) - - def test_special_holidays(self): - base_date = [datetime(2012, 5, 28)] - holiday_1 = Holiday('One-Time', year=2012, month=5, day=28) - holiday_2 = Holiday('Range', month=5, day=28, - start_date=datetime(2012, 1, 1), - end_date=datetime(2012, 12, 31), - offset=DateOffset(weekday=MO(1))) - - assert base_date == holiday_1.dates(self.start_date, self.end_date) - assert base_date == holiday_2.dates(self.start_date, self.end_date) - - def test_get_calendar(self): - class TestCalendar(AbstractHolidayCalendar): - rules = [] - - calendar = get_calendar('TestCalendar') - assert TestCalendar == calendar.__class__ - - def test_factory(self): - class_1 = HolidayCalendarFactory('MemorialDay', - AbstractHolidayCalendar, - USMemorialDay) - class_2 = HolidayCalendarFactory('Thansksgiving', - AbstractHolidayCalendar, - USThanksgivingDay) - class_3 = HolidayCalendarFactory('Combined', class_1, class_2) - - assert len(class_1.rules) == 1 - assert len(class_2.rules) == 1 - assert len(class_3.rules) == 2 - - -class TestObservanceRules(object): - - def setup_method(self, method): - self.we = datetime(2014, 4, 9) - self.th = datetime(2014, 4, 10) - self.fr = datetime(2014, 4, 11) - self.sa = datetime(2014, 4, 12) - self.su = datetime(2014, 4, 13) - self.mo = datetime(2014, 4, 14) - self.tu = datetime(2014, 4, 15) - - def test_next_monday(self): - assert next_monday(self.sa) == self.mo - assert next_monday(self.su) == self.mo - - def test_next_monday_or_tuesday(self): - assert next_monday_or_tuesday(self.sa) == self.mo - assert next_monday_or_tuesday(self.su) == self.tu - assert next_monday_or_tuesday(self.mo) == self.tu - - def test_previous_friday(self): - assert previous_friday(self.sa) == self.fr - assert previous_friday(self.su) == self.fr - - def test_sunday_to_monday(self): - assert sunday_to_monday(self.su) == self.mo - - def test_nearest_workday(self): - assert nearest_workday(self.sa) == self.fr - assert nearest_workday(self.su) == self.mo - assert nearest_workday(self.mo) == self.mo - - def test_weekend_to_monday(self): - assert weekend_to_monday(self.sa) == self.mo - assert weekend_to_monday(self.su) == self.mo - assert weekend_to_monday(self.mo) == self.mo - - def test_next_workday(self): - assert next_workday(self.sa) == self.mo - assert next_workday(self.su) == self.mo - assert next_workday(self.mo) == self.tu - - def test_previous_workday(self): - assert previous_workday(self.sa) == self.fr - assert previous_workday(self.su) == self.fr - assert previous_workday(self.tu) == self.mo - - def test_before_nearest_workday(self): - assert before_nearest_workday(self.sa) == self.th - assert before_nearest_workday(self.su) == self.fr - assert before_nearest_workday(self.tu) == self.mo - - def test_after_nearest_workday(self): - assert after_nearest_workday(self.sa) == self.mo - assert after_nearest_workday(self.su) == self.tu - assert after_nearest_workday(self.fr) == self.mo - - -class TestFederalHolidayCalendar(object): - - def test_no_mlk_before_1986(self): - # see gh-10278 - class MLKCalendar(AbstractHolidayCalendar): - rules = [USMartinLutherKingJr] - - holidays = MLKCalendar().holidays(start='1984', - end='1988').to_pydatetime().tolist() - - # Testing to make sure holiday is not incorrectly observed before 1986 - assert holidays == [datetime(1986, 1, 20, 0, 0), - datetime(1987, 1, 19, 0, 0)] - - def test_memorial_day(self): - class MemorialDay(AbstractHolidayCalendar): - rules = [USMemorialDay] - - holidays = MemorialDay().holidays(start='1971', - end='1980').to_pydatetime().tolist() - - # Fixes 5/31 error and checked manually against Wikipedia - assert holidays == [datetime(1971, 5, 31, 0, 0), - datetime(1972, 5, 29, 0, 0), - datetime(1973, 5, 28, 0, 0), - datetime(1974, 5, 27, 0, 0), - datetime(1975, 5, 26, 0, 0), - datetime(1976, 5, 31, 0, 0), - datetime(1977, 5, 30, 0, 0), - datetime(1978, 5, 29, 0, 0), - datetime(1979, 5, 28, 0, 0)] - - -class TestHolidayConflictingArguments(object): - - def test_both_offset_observance_raises(self): - # see gh-10217 - with pytest.raises(NotImplementedError): - Holiday("Cyber Monday", month=11, day=1, - offset=[DateOffset(weekday=SA(4))], - observance=next_monday) diff --git a/pandas/tests/util/test_hashing.py b/pandas/tests/util/test_hashing.py index d36de931e2610..c80b4483c0482 100644 --- a/pandas/tests/util/test_hashing.py +++ b/pandas/tests/util/test_hashing.py @@ -257,8 +257,7 @@ def test_categorical_with_nan_consistency(): assert result[1] in expected -@pytest.mark.filterwarnings("ignore:\\nPanel:FutureWarning") -@pytest.mark.parametrize("obj", [pd.Timestamp("20130101"), tm.makePanel()]) +@pytest.mark.parametrize("obj", [pd.Timestamp("20130101")]) def test_pandas_errors(obj): msg = "Unexpected type for hashing" with pytest.raises(TypeError, match=msg): diff --git a/pandas/tseries/frequencies.py b/pandas/tseries/frequencies.py index c454db3bbdffc..f591b24f5b648 100644 --- a/pandas/tseries/frequencies.py +++ b/pandas/tseries/frequencies.py @@ -77,7 +77,7 @@ def to_offset(freq): See Also -------- - pandas.DateOffset + DateOffset Examples -------- diff --git a/pandas/util/_decorators.py b/pandas/util/_decorators.py index 86cd8b1e698c6..e92051ebbea9a 100644 --- a/pandas/util/_decorators.py +++ b/pandas/util/_decorators.py @@ -4,12 +4,13 @@ import warnings from pandas._libs.properties import cache_readonly # noqa -from pandas.compat import PY2, callable, signature +from pandas.compat import PY2, signature def deprecate(name, alternative, version, alt_name=None, klass=None, stacklevel=2, msg=None): - """Return a new function that emits a deprecation warning on use. + """ + Return a new function that emits a deprecation warning on use. To use this method for a deprecated function, another function `alternative` with the same signature must exist. The deprecated diff --git a/pandas/util/move.c b/pandas/util/move.c index 62860adb1c1f6..9bb662d50cb3f 100644 --- a/pandas/util/move.c +++ b/pandas/util/move.c @@ -1,3 +1,12 @@ +/* +Copyright (c) 2019, PyData Development Team +All rights reserved. + +Distributed under the terms of the BSD Simplified License. + +The full license is in the LICENSE file, distributed with this software. +*/ + #include #define COMPILING_IN_PY2 (PY_VERSION_HEX <= 0x03000000) @@ -31,15 +40,13 @@ typedef struct { static PyTypeObject stolenbuf_type; /* forward declare type */ static void -stolenbuf_dealloc(stolenbufobject *self) -{ +stolenbuf_dealloc(stolenbufobject *self) { Py_DECREF(self->invalid_bytes); PyObject_Del(self); } static int -stolenbuf_getbuffer(stolenbufobject *self, Py_buffer *view, int flags) -{ +stolenbuf_getbuffer(stolenbufobject *self, Py_buffer *view, int flags) { return PyBuffer_FillInfo(view, (PyObject*) self, (void*) PyString_AS_STRING(self->invalid_bytes), @@ -51,8 +58,8 @@ stolenbuf_getbuffer(stolenbufobject *self, Py_buffer *view, int flags) #if COMPILING_IN_PY2 static Py_ssize_t -stolenbuf_getreadwritebuf(stolenbufobject *self, Py_ssize_t segment, void **out) -{ +stolenbuf_getreadwritebuf(stolenbufobject *self, + Py_ssize_t segment, void **out) { if (segment != 0) { PyErr_SetString(PyExc_SystemError, "accessing non-existent string segment"); @@ -63,8 +70,7 @@ stolenbuf_getreadwritebuf(stolenbufobject *self, Py_ssize_t segment, void **out) } static Py_ssize_t -stolenbuf_getsegcount(stolenbufobject *self, Py_ssize_t *len) -{ +stolenbuf_getsegcount(stolenbufobject *self, Py_ssize_t *len) { if (len) { *len = PyString_GET_SIZE(self->invalid_bytes); } @@ -157,8 +163,7 @@ PyDoc_STRVAR( however, if called through *unpacking like ``stolenbuf(*(a,))`` it would only have the one reference (the tuple). */ static PyObject* -move_into_mutable_buffer(PyObject *self, PyObject *bytes_rvalue) -{ +move_into_mutable_buffer(PyObject *self, PyObject *bytes_rvalue) { stolenbufobject *ret; if (!PyString_CheckExact(bytes_rvalue)) { diff --git a/pandas/util/testing.py b/pandas/util/testing.py index f441dd20f3982..387f402348513 100644 --- a/pandas/util/testing.py +++ b/pandas/util/testing.py @@ -1,5 +1,6 @@ from __future__ import division +from collections import Counter from contextlib import contextmanager from datetime import datetime from functools import wraps @@ -20,8 +21,8 @@ from pandas._libs import testing as _testing import pandas.compat as compat from pandas.compat import ( - PY2, PY3, Counter, callable, filter, httplib, lmap, lrange, lzip, map, - raise_with_traceback, range, string_types, u, unichr, zip) + PY2, PY3, filter, httplib, lmap, lrange, lzip, map, raise_with_traceback, + range, string_types, u, unichr, zip) from pandas.core.dtypes.common import ( is_bool, is_categorical_dtype, is_datetime64_dtype, is_datetime64tz_dtype, @@ -33,7 +34,7 @@ import pandas as pd from pandas import ( Categorical, CategoricalIndex, DataFrame, DatetimeIndex, Index, - IntervalIndex, MultiIndex, Panel, RangeIndex, Series, bdate_range) + IntervalIndex, MultiIndex, RangeIndex, Series, bdate_range) from pandas.core.algorithms import take_1d from pandas.core.arrays import ( DatetimeArray, ExtensionArray, IntervalArray, PeriodArray, TimedeltaArray, @@ -2051,22 +2052,6 @@ def makePeriodFrame(nper=None): return DataFrame(data) -def makePanel(nper=None): - with warnings.catch_warnings(record=True): - warnings.filterwarnings("ignore", "\\nPanel", FutureWarning) - cols = ['Item' + c for c in string.ascii_uppercase[:K - 1]] - data = {c: makeTimeDataFrame(nper) for c in cols} - return Panel.fromDict(data) - - -def makePeriodPanel(nper=None): - with warnings.catch_warnings(record=True): - warnings.filterwarnings("ignore", "\\nPanel", FutureWarning) - cols = ['Item' + c for c in string.ascii_uppercase[:K - 1]] - data = {c: makePeriodFrame(nper) for c in cols} - return Panel.fromDict(data) - - def makeCustomIndex(nentries, nlevels, prefix='#', names=False, ndupe_l=None, idx_type=None): """Create an index/multindex with given dimensions, levels, names, etc' @@ -2314,15 +2299,6 @@ def makeMissingDataframe(density=.9, random_state=None): return df -def add_nans(panel): - I, J, N = panel.shape - for i, item in enumerate(panel.items): - dm = panel[item] - for j, col in enumerate(dm.columns): - dm[col][:i + j] = np.NaN - return panel - - class TestSubDict(dict): def __init__(self, *args, **kwargs): diff --git a/scripts/validate_docstrings.py b/scripts/validate_docstrings.py index 4e389aed2b0d2..bce33f7e78daa 100755 --- a/scripts/validate_docstrings.py +++ b/scripts/validate_docstrings.py @@ -796,7 +796,8 @@ def validate_all(prefix, ignore_deprecated=False): seen = {} # functions from the API docs - api_doc_fnames = os.path.join(BASE_PATH, 'doc', 'source', 'api', '*.rst') + api_doc_fnames = os.path.join( + BASE_PATH, 'doc', 'source', 'reference', '*.rst') api_items = [] for api_doc_fname in glob.glob(api_doc_fnames): with open(api_doc_fname) as f: diff --git a/setup.cfg b/setup.cfg index 95c71826a80d4..b15c3ce8a110a 100644 --- a/setup.cfg +++ b/setup.cfg @@ -46,8 +46,8 @@ ignore = E402, # module level import not at top of file E711, # comparison to none should be 'if cond is none:' exclude = - doc/source/basics.rst - doc/source/contributing_docstring.rst + doc/source/getting_started/basics.rst + doc/source/development/contributing_docstring.rst [yapf] @@ -114,7 +114,6 @@ force_sort_within_sections=True skip= pandas/core/api.py, pandas/core/frame.py, - asv_bench/benchmarks/algorithms.py, asv_bench/benchmarks/attrs_caching.py, asv_bench/benchmarks/binary_ops.py, asv_bench/benchmarks/categoricals.py, diff --git a/setup.py b/setup.py index ed2d905f4358b..c8d29a2e4be5a 100755 --- a/setup.py +++ b/setup.py @@ -450,13 +450,19 @@ def run(self): # Note: if not using `cythonize`, coverage can be enabled by # pinning `ext.cython_directives = directives` to each ext in extensions. # github.com/cython/cython/wiki/enhancements-compilerdirectives#in-setuppy -directives = {'linetrace': False} +directives = {'linetrace': False, + 'language_level': 2} macros = [] if linetrace: # https://pypkg.com/pypi/pytest-cython/f/tests/example-project/setup.py directives['linetrace'] = True macros = [('CYTHON_TRACE', '1'), ('CYTHON_TRACE_NOGIL', '1')] +# in numpy>=1.16.0, silence build warnings about deprecated API usage +# we can't do anything about these warnings because they stem from +# cython+numpy version mismatches. +macros.append(('NPY_NO_DEPRECATED_API', '0')) + # ---------------------------------------------------------------------- # Specification of Dependencies