Merge remote-tracking branch 'upstream/master' into bug/categorical-i…

…ndexing-1row-df * upstream/master: (194 commits) DOC Remove Python 2 specific comments from documentation (pandas-dev#31198) Follow up PR: pandas-dev#28097 Simplify branch statement (pandas-dev#29243) BUG: DatetimeIndex.snap incorrectly setting freq (pandas-dev#31188) Move DataFrame.info() to live with similar functions (pandas-dev#31317) ENH: accept a dictionary in plot colors (pandas-dev#31071) PERF: add shortcut to Timestamp constructor (pandas-dev#30676) CLN/MAINT: Clean and annotate stata reader and writers (pandas-dev#31072) REF: define _get_slice_axis in correct classes (pandas-dev#31304) BUG: DataFrame.floordiv(ser, axis=0) not matching column-wise bheavior (pandas-dev#31271) PERF: optimize is_scalar, is_iterator (pandas-dev#31294) BUG: Series rolling count ignores min_periods (pandas-dev#30923) xfail sparse warning; closes pandas-dev#31310 (pandas-dev#31311) REF: DatetimeIndex.get_value wrap DTI.get_loc (pandas-dev#31314) CLN: internals.managers (pandas-dev#31316) PERF: avoid copies if possible in fill_binop (pandas-dev#31300) Add test for multiindex json (pandas-dev#31307) BUG: passing TDA and wrong freq to TimedeltaIndex (pandas-dev#31268) BUG: inconsistency between PeriodIndex.get_value vs get_loc (pandas-dev#31172) CLN: remove _set_subtyp (pandas-dev#31301) CI: Updated version of macos image (pandas-dev#31292) ...
keechongtan · Jan 27, 2020 · 41e6ce4 · 41e6ce4
2 parents 241bd7c + ca3bfcc
commit 41e6ce4
Show file tree

Hide file tree

Showing 403 changed files with 8,078 additions and 12,046 deletions.
diff --git a/.devcontainer.json b/.devcontainer.json
@@ -0,0 +1,28 @@
+// For format details, see https://aka.ms/vscode-remote/devcontainer.json or the definition README at
+// https://github.com/microsoft/vscode-dev-containers/tree/master/containers/python-3-miniconda
+{
+	"name": "pandas",
+	"context": ".",
+	"dockerFile": "Dockerfile",
+
+	// Use 'settings' to set *default* container specific settings.json values on container create.
+	// You can edit these settings after create using File > Preferences > Settings > Remote.
+	"settings": {
+		"terminal.integrated.shell.linux": "/bin/bash",
+		"python.condaPath": "/opt/conda/bin/conda",
+		"python.pythonPath": "/opt/conda/bin/python",
+		"python.formatting.provider": "black",
+		"python.linting.enabled": true,
+		"python.linting.flake8Enabled": true,
+		"python.linting.pylintEnabled": false,
+		"python.linting.mypyEnabled": true,
+		"python.testing.pytestEnabled": true,
+		"python.testing.cwd": "pandas/tests"
+	},
+
+	// Add the IDs of extensions you want installed when the container is created in the array below.
+	"extensions": [
+		"ms-python.python",
+		"ms-vscode.cpptools"
+	]
+}
diff --git a/.github/CODE_OF_CONDUCT.md b/.github/CODE_OF_CONDUCT.md
@@ -54,10 +54,10 @@ incident.
 
 This Code of Conduct is adapted from the [Contributor Covenant][homepage],
 version 1.3.0, available at
-[http://contributor-covenant.org/version/1/3/0/][version],
+[https://www.contributor-covenant.org/version/1/3/0/][version],
 and the [Swift Code of Conduct][swift].
 
-[homepage]: http://contributor-covenant.org
-[version]: http://contributor-covenant.org/version/1/3/0/
+[homepage]: https://www.contributor-covenant.org
+[version]: https://www.contributor-covenant.org/version/1/3/0/
 [swift]: https://swift.org/community/#code-of-conduct
 
diff --git a/.github/CONTRIBUTING.md b/.github/CONTRIBUTING.md
@@ -16,7 +16,7 @@ If you notice a bug in the code or documentation, or have suggestions for how we
 
 ## Contributing to the Codebase
 
-The code is hosted on [GitHub](https://www.github.com/pandas-dev/pandas), so you will need to use [Git](http://git-scm.com/) to clone the project and make changes to the codebase. Once you have obtained a copy of the code, you should create a development environment that is separate from your existing Python environment so that you can make and test changes without compromising your own work environment. For more information, please refer to the "[Working with the code](https://github.com/pandas-dev/pandas/blob/master/doc/source/development/contributing.rst#working-with-the-code)" section.
+The code is hosted on [GitHub](https://www.github.com/pandas-dev/pandas), so you will need to use [Git](https://git-scm.com/) to clone the project and make changes to the codebase. Once you have obtained a copy of the code, you should create a development environment that is separate from your existing Python environment so that you can make and test changes without compromising your own work environment. For more information, please refer to the "[Working with the code](https://github.com/pandas-dev/pandas/blob/master/doc/source/development/contributing.rst#working-with-the-code)" section.
 
 Before submitting your changes for review, make sure to check that your changes do not break any tests. You can find more information about our test suites in the "[Test-driven development/code writing](https://github.com/pandas-dev/pandas/blob/master/doc/source/development/contributing.rst#test-driven-development-code-writing)" section. We also have guidelines regarding coding style that will be enforced during testing, which can be found in the "[Code standards](https://github.com/pandas-dev/pandas/blob/master/doc/source/development/contributing.rst#code-standards)" section.
 

diff --git a/.github/workflows/assign.yml b/.github/workflows/assign.yml
@@ -7,9 +7,8 @@ jobs:
   one:
     runs-on: ubuntu-latest
     steps:
-      - name:
-        run: |
-            if [[ "${{ github.event.comment.body }}" == "take" ]]; then
-                echo "Assigning issue ${{ github.event.issue.number }} to ${{ github.event.comment.user.login }}"
-                curl -H "Authorization: token ${{ secrets.GITHUB_TOKEN }}" -d '{"assignees": ["${{ github.event.comment.user.login }}"]}' https://api.github.com/repos/${{ github.repository }}/issues/${{ github.event.issue.number }}/assignees
-            fi
+    - if: github.event.comment.body == 'take'
+      name:
+      run: |
+        echo "Assigning issue ${{ github.event.issue.number }} to ${{ github.event.comment.user.login }}"
+        curl -H "Authorization: token ${{ secrets.GITHUB_TOKEN }}" -d '{"assignees": ["${{ github.event.comment.user.login }}"]}' https://api.github.com/repos/${{ github.repository }}/issues/${{ github.event.issue.number }}/assignees
diff --git a/.pre-commit-config.yaml b/.pre-commit-config.yaml
@@ -20,11 +20,11 @@ repos:
     rev: v0.730
     hooks:
      -  id: mypy
-        # We run mypy over all files because of:
-        #  * changes in type definitions may affect non-touched files.
-        #  * Running it with `mypy pandas` and the filenames will lead to
-        #    spurious duplicate module errors,
-        #    see also https://github.com/pre-commit/mirrors-mypy/issues/5
-        pass_filenames: false
         args:
-        - pandas
+          # As long as a some files are excluded from check-untyped-defs
+          # we have to exclude it from the pre-commit hook as the configuration
+          # is based on modules but the hook runs on files.
+          - --no-check-untyped-defs
+          - --follow-imports
+          - skip
+        files: pandas/
diff --git a/.travis.yml b/.travis.yml
@@ -7,10 +7,10 @@ python: 3.7
 # travis cache --delete inside the project directory from the travis command line client
 # The cache directories will be deleted if anything in ci/ changes in a commit
 cache:
- ccache: true
- directories:
-  - $HOME/.cache # cython cache
-  - $HOME/.ccache # compiler cache
+  ccache: true
+  directories:
+    - $HOME/.cache # cython cache
+    - $HOME/.ccache # compiler cache
 
 env:
   global:
@@ -20,30 +20,30 @@ env:
     - secure: "EkWLZhbrp/mXJOx38CHjs7BnjXafsqHtwxPQrqWy457VDFWhIY1DMnIR/lOWG+a20Qv52sCsFtiZEmMfUjf0pLGXOqurdxbYBGJ7/ikFLk9yV2rDwiArUlVM9bWFnFxHvdz9zewBH55WurrY4ShZWyV+x2dWjjceWG5VpWeI6sA="
 
 git:
-    # for cloning
-    depth: false
+  # for cloning
+  depth: false
 
 matrix:
-    fast_finish: true
-    exclude:
-      # Exclude the default Python 3.5 build
-      - python: 3.5
+  fast_finish: true
 
-    include:
+  include:
     - env:
-        - JOB="3.8" ENV_FILE="ci/deps/travis-38.yaml" PATTERN="(not slow and not network)"
+        - JOB="3.8" ENV_FILE="ci/deps/travis-38.yaml" PATTERN="(not slow and not network and not clipboard)"
 
     - env:
-        - JOB="3.7" ENV_FILE="ci/deps/travis-37.yaml" PATTERN="(not slow and not network)"
+        - JOB="3.7" ENV_FILE="ci/deps/travis-37.yaml" PATTERN="(not slow and not network and not clipboard)"
 
     - env:
-        - JOB="3.6, locale" ENV_FILE="ci/deps/travis-36-locale.yaml" PATTERN="((not slow and not network) or (single and db))" LOCALE_OVERRIDE="zh_CN.UTF-8" SQL="1"
+        - JOB="3.6, locale" ENV_FILE="ci/deps/travis-36-locale.yaml" PATTERN="((not slow and not network and not clipboard) or (single and db))" LOCALE_OVERRIDE="zh_CN.UTF-8" SQL="1"
       services:
         - mysql
         - postgresql
 
     - env:
-        - JOB="3.6, coverage" ENV_FILE="ci/deps/travis-36-cov.yaml" PATTERN="((not slow and not network) or (single and db))" PANDAS_TESTING_MODE="deprecate" COVERAGE=true SQL="1"
+        # Enabling Deprecations when running tests
+        # PANDAS_TESTING_MODE="deprecate" causes DeprecationWarning messages to be displayed in the logs
+        # See pandas/_testing.py for more details.
+        - JOB="3.6, coverage" ENV_FILE="ci/deps/travis-36-cov.yaml" PATTERN="((not slow and not network and not clipboard) or (single and db))" PANDAS_TESTING_MODE="deprecate" COVERAGE=true SQL="1"
       services:
         - mysql
         - postgresql
@@ -73,7 +73,6 @@ before_install:
   # This overrides travis and tells it to look nowhere.
   - export BOTO_CONFIG=/dev/null
 
-
 install:
   - echo "install start"
   - ci/prep_cython_cache.sh
@@ -90,5 +89,5 @@ script:
 after_script:
   - echo "after_script start"
   - source activate pandas-dev && pushd /tmp && python -c "import pandas; pandas.show_versions();" && popd
-  - ci/print_skipped.py 
+  - ci/print_skipped.py
   - echo "after_script done"
diff --git a/AUTHORS.md b/AUTHORS.md
@@ -14,7 +14,7 @@ About the Copyright Holders
     The PyData Development Team is the collection of developers of the PyData
     project. This includes all of the PyData sub-projects, including pandas. The
     core team that coordinates development on GitHub can be found here:
-    http://github.com/pydata.
+    https://github.com/pydata.
 
 Full credits for pandas contributors can be found in the documentation.
 

diff --git a/Dockerfile b/Dockerfile
@@ -0,0 +1,47 @@
+FROM continuumio/miniconda3
+
+# if you forked pandas, you can pass in your own GitHub username to use your fork
+# i.e. gh_username=myname
+ARG gh_username=pandas-dev
+ARG pandas_home="/home/pandas"
+
+# Avoid warnings by switching to noninteractive
+ENV DEBIAN_FRONTEND=noninteractive
+
+# Configure apt and install packages
+RUN apt-get update \
+    && apt-get -y install --no-install-recommends apt-utils dialog 2>&1 \
+    #
+    # Verify git, process tools, lsb-release (common in install instructions for CLIs) installed
+    && apt-get -y install git iproute2 procps iproute2 lsb-release \
+    #
+    # Install C compilers (gcc not enough, so just went with build-essential which admittedly might be overkill),
+    # needed to build pandas C extensions
+    && apt-get -y install build-essential \
+    #
+    # cleanup
+    && apt-get autoremove -y \
+    && apt-get clean -y \
+    && rm -rf /var/lib/apt/lists/*
+
+# Switch back to dialog for any ad-hoc use of apt-get
+ENV DEBIAN_FRONTEND=dialog
+
+# Clone pandas repo
+RUN mkdir "$pandas_home" \
+    && git clone "https://github.com/$gh_username/pandas.git" "$pandas_home" \
+    && cd "$pandas_home" \
+    && git remote add upstream "https://github.com/pandas-dev/pandas.git" \
+    && git pull upstream master
+
+# Because it is surprisingly difficult to activate a conda environment inside a DockerFile
+# (from personal experience and per https://github.com/ContinuumIO/docker-images/issues/89),
+# we just update the base/root one from the 'environment.yml' file instead of creating a new one.
+#
+# Set up environment
+RUN conda env update -n base -f "$pandas_home/environment.yml"
+
+# Build C extensions and pandas
+RUN cd "$pandas_home" \
+    && python setup.py build_ext --inplace -j 4 \
+    && python -m pip install -e .
diff --git a/LICENSE b/LICENSE
@@ -1,8 +1,10 @@
 BSD 3-Clause License
 
-Copyright (c) 2008-2012, AQR Capital Management, LLC, Lambda Foundry, Inc. and PyData Development Team
+Copyright (c) 2008-2011, AQR Capital Management, LLC, Lambda Foundry, Inc. and PyData Development Team
 All rights reserved.
 
+Copyright (c) 2011-2020, Open source contributors.
+
 Redistribution and use in source and binary forms, with or without
 modification, are permitted provided that the following conditions are met:
 

diff --git a/RELEASE.md b/RELEASE.md
@@ -3,4 +3,4 @@ Release Notes
 
 The list of changes to Pandas between each release can be found
 [here](https://pandas.pydata.org/pandas-docs/stable/whatsnew/index.html). For full
-details, see the commit logs at http://github.com/pandas-dev/pandas.
+details, see the commit logs at https://github.com/pandas-dev/pandas.
diff --git a/asv_bench/asv.conf.json b/asv_bench/asv.conf.json
@@ -43,6 +43,7 @@
         "matplotlib": [],
         "sqlalchemy": [],
         "scipy": [],
+        "numba": [],
         "numexpr": [],
         "pytables": [null, ""],  // platform dependent, see excludes below
         "tables": [null, ""],

diff --git a/asv_bench/benchmarks/attrs_caching.py b/asv_bench/benchmarks/attrs_caching.py
@@ -1,12 +1,18 @@
 import numpy as np
 
+import pandas as pd
 from pandas import DataFrame
 
 try:
     from pandas.util import cache_readonly
 except ImportError:
     from pandas.util.decorators import cache_readonly
 
+try:
+    from pandas.core.construction import extract_array
+except ImportError:
+    extract_array = None
+
 
 class DataFrameAttributes:
     def setup(self):
@@ -20,6 +26,33 @@ def time_set_index(self):
         self.df.index = self.cur_index
 
 
+class SeriesArrayAttribute:
+
+    params = [["numeric", "object", "category", "datetime64", "datetime64tz"]]
+    param_names = ["dtype"]
+
+    def setup(self, dtype):
+        if dtype == "numeric":
+            self.series = pd.Series([1, 2, 3])
+        elif dtype == "object":
+            self.series = pd.Series(["a", "b", "c"], dtype=object)
+        elif dtype == "category":
+            self.series = pd.Series(["a", "b", "c"], dtype="category")
+        elif dtype == "datetime64":
+            self.series = pd.Series(pd.date_range("2013", periods=3))
+        elif dtype == "datetime64tz":
+            self.series = pd.Series(pd.date_range("2013", periods=3, tz="UTC"))
+
+    def time_array(self, dtype):
+        self.series.array
+
+    def time_extract_array(self, dtype):
+        extract_array(self.series)
+
+    def time_extract_array_numpy(self, dtype):
+        extract_array(self.series, extract_numpy=True)
+
+
 class CacheReadonly:
     def setup(self):
         class Foo:

diff --git a/asv_bench/benchmarks/pandas_vb_common.py b/asv_bench/benchmarks/pandas_vb_common.py
@@ -56,7 +56,7 @@
 def setup(*args, **kwargs):
     # This function just needs to be imported into each benchmark file to
     # set up the random seed before each function.
-    # http://asv.readthedocs.io/en/latest/writing_benchmarks.html
+    # https://asv.readthedocs.io/en/latest/writing_benchmarks.html
     np.random.seed(1234)
 
 

diff --git a/asv_bench/benchmarks/reshape.py b/asv_bench/benchmarks/reshape.py
@@ -161,6 +161,9 @@ def time_pivot_table_categorical_observed(self):
             observed=True,
         )
 
+    def time_pivot_table_margins_only_column(self):
+        self.df.pivot_table(columns=["key2", "key3"], margins=True)
+
 
 class Crosstab:
     def setup(self):

diff --git a/asv_bench/benchmarks/rolling.py b/asv_bench/benchmarks/rolling.py
@@ -44,6 +44,27 @@ def time_rolling(self, constructor, window, dtype, function, raw):
         self.roll.apply(function, raw=raw)
 
 
+class Engine:
+    params = (
+        ["DataFrame", "Series"],
+        ["int", "float"],
+        [np.sum, lambda x: np.sum(x) + 5],
+        ["cython", "numba"],
+    )
+    param_names = ["constructor", "dtype", "function", "engine"]
+
+    def setup(self, constructor, dtype, function, engine):
+        N = 10 ** 3
+        arr = (100 * np.random.random(N)).astype(dtype)
+        self.data = getattr(pd, constructor)(arr)
+
+    def time_rolling_apply(self, constructor, dtype, function, engine):
+        self.data.rolling(10).apply(function, raw=True, engine=engine)
+
+    def time_expanding_apply(self, constructor, dtype, function, engine):
+        self.data.expanding().apply(function, raw=True, engine=engine)
+
+
 class ExpandingMethods:
 
     params = (

diff --git a/asv_bench/benchmarks/tslibs/timedelta.py b/asv_bench/benchmarks/tslibs/timedelta.py
@@ -10,6 +10,11 @@
 
 
 class TimedeltaConstructor:
+    def setup(self):
+        self.nptimedelta64 = np.timedelta64(3600)
+        self.dttimedelta = datetime.timedelta(seconds=3600)
+        self.td = Timedelta(3600, unit="s")
+
     def time_from_int(self):
         Timedelta(123456789)
 
@@ -28,10 +33,10 @@ def time_from_components(self):
         )
 
     def time_from_datetime_timedelta(self):
-        Timedelta(datetime.timedelta(days=1, seconds=1))
+        Timedelta(self.dttimedelta)
 
     def time_from_np_timedelta(self):
-        Timedelta(np.timedelta64(1, "ms"))
+        Timedelta(self.nptimedelta64)
 
     def time_from_string(self):
         Timedelta("1 days")
@@ -42,6 +47,9 @@ def time_from_iso_format(self):
     def time_from_missing(self):
         Timedelta("nat")
 
+    def time_from_pd_timedelta(self):
+        Timedelta(self.td)
+
 
 class TimedeltaProperties:
     def setup_cache(self):