Merge branch 'master' into fix-25587

* master: (22 commits) Fixturize tests/frame/test_operators.py (pandas-dev#25641) Update ValueError message in corr (pandas-dev#25729) DOC: fix some grammar and inconsistency issues in the User Guide (pandas-dev#25728) ENH: Add public start, stop, and step attributes to RangeIndex (pandas-dev#25720) Make Rolling.apply documentation clearer (pandas-dev#25712) pandas-dev#25707 - Fixed flakiness in stata write test (pandas-dev#25714) Json normalize nan support (pandas-dev#25619) TST: resolve issues with test_constructor_dtype_datetime64 (pandas-dev#24868) DEPR: Deprecate box kwarg for to_timedelta and to_datetime (pandas-dev#24486) BUG: Preserve name in DatetimeIndex.snap (pandas-dev#25585) Fix concat not respecting order of OrderedDict (pandas-dev#25224) CLN: remove pandas.core.categorical (pandas-dev#25655) TST/CLN: Remove more Panel tests (pandas-dev#25675) Pinned pycodestyle (pandas-dev#25701) DOC: update date of 0.24.2 release notes (pandas-dev#25699) BUG: Fix error in replace with strings that are large numbers (pandas-dev#25616) (pandas-dev#25644) BUG: fix usage of na_sentinel with sort=True in factorize() (pandas-dev#25592) BUG: Fix to_string output when using header (pandas-dev#16718) (pandas-dev#25602) CLN: Remove unused test code (pandas-dev#25670) CLN: remove Panel from concat error message (pandas-dev#25676) ... # Conflicts: # doc/source/whatsnew/v0.25.0.rst
sighingnow · Mar 14, 2019 · 4095c8c · 4095c8c
2 parents aafd214 + 998e1de
commit 4095c8c
Show file tree

Hide file tree

Showing 51 changed files with 583 additions and 631 deletions.
diff --git a/doc/source/user_guide/text.rst b/doc/source/user_guide/text.rst
@@ -46,8 +46,8 @@ Since ``df.columns`` is an Index object, we can use the ``.str`` accessor
    df.columns.str.lower()
 
 These string methods can then be used to clean up the columns as needed.
-Here we are removing leading and trailing white spaces, lower casing all names,
-and replacing any remaining white spaces with underscores:
+Here we are removing leading and trailing whitespaces, lower casing all names,
+and replacing any remaining whitespaces with underscores:
 
 .. ipython:: python
 
@@ -65,7 +65,7 @@ and replacing any remaining white spaces with underscores:
     ``Series``.
 
     Please note that a ``Series`` of type ``category`` with string ``.categories`` has
-    some limitations in comparison of ``Series`` of type string (e.g. you can't add strings to
+    some limitations in comparison to ``Series`` of type string (e.g. you can't add strings to
     each other: ``s + " " + s`` won't work if ``s`` is a ``Series`` of type ``category``). Also,
     ``.str`` methods which operate on elements of type ``list`` are not available on such a
     ``Series``.

diff --git a/doc/source/whatsnew/v0.24.2.rst b/doc/source/whatsnew/v0.24.2.rst
@@ -2,8 +2,8 @@
 
 .. _whatsnew_0242:
 
-Whats New in 0.24.2 (February XX, 2019)
----------------------------------------
+Whats New in 0.24.2 (March 12, 2019)
+------------------------------------
 
 .. warning::
 
@@ -18,7 +18,7 @@ including other versions of pandas.
 .. _whatsnew_0242.regressions:
 
 Fixed Regressions
-^^^^^^^^^^^^^^^^^
+~~~~~~~~~~~~~~~~~
 
 - Fixed regression in :meth:`DataFrame.all` and :meth:`DataFrame.any` where ``bool_only=True`` was ignored (:issue:`25101`)
 - Fixed issue in ``DataFrame`` construction with passing a mixed list of mixed types could segfault. (:issue:`25075`)
@@ -31,71 +31,32 @@ Fixed Regressions
 - Fixed regression in ``IntervalDtype`` construction where passing an incorrect string with 'Interval' as a prefix could result in a ``RecursionError``. (:issue:`25338`)
 - Fixed regression in creating a period-dtype array from a read-only NumPy array of period objects. (:issue:`25403`)
 - Fixed regression in :class:`Categorical`, where constructing it from a categorical ``Series`` and an explicit ``categories=`` that differed from that in the ``Series`` created an invalid object which could trigger segfaults. (:issue:`25318`)
+- Fixed regression in :func:`to_timedelta` losing precision when converting floating data to ``Timedelta`` data (:issue:`25077`).
 - Fixed pip installing from source into an environment without NumPy (:issue:`25193`)
+- Fixed regression in :meth:`DataFrame.replace` where large strings of numbers would be coerced into ``int64``, causing an ``OverflowError`` (:issue:`25616`)
+- Fixed regression in :func:`factorize` when passing a custom ``na_sentinel`` value with ``sort=True`` (:issue:`25409`).
 - Fixed regression in :meth:`DataFrame.to_csv` writing duplicate line endings with gzip compress (:issue:`25311`)
 
-.. _whatsnew_0242.enhancements:
-
-Enhancements
-^^^^^^^^^^^^
-
--
--
-
 .. _whatsnew_0242.bug_fixes:
 
 Bug Fixes
 ~~~~~~~~~
 
-**Conversion**
-
--
--
--
-
-**Indexing**
-
--
--
--
-
 **I/O**
 
 - Better handling of terminal printing when the terminal dimensions are not known (:issue:`25080`)
 - Bug in reading a HDF5 table-format ``DataFrame`` created in Python 2, in Python 3 (:issue:`24925`)
 - Bug in reading a JSON with ``orient='table'`` generated by :meth:`DataFrame.to_json` with ``index=False`` (:issue:`25170`)
 - Bug where float indexes could have misaligned values when printing (:issue:`25061`)
--
-
-**Categorical**
-
--
--
--
-
-**Timezones**
-
--
--
--
-
-**Timedelta**
-
--
--
--
 
 **Reshaping**
 
 - Bug in :meth:`~pandas.core.groupby.GroupBy.transform` where applying a function to a timezone aware column would return a timezone naive result (:issue:`24198`)
 - Bug in :func:`DataFrame.join` when joining on a timezone aware :class:`DatetimeIndex` (:issue:`23931`)
--
 
 **Visualization**
 
 - Bug in :meth:`Series.plot` where a secondary y axis could not be set to log scale (:issue:`25545`)
--
--
 
 **Other**
 
@@ -130,6 +91,7 @@ A total of 25 people contributed patches to this release. People with a "+" by t
 * Joris Van den Bossche
 * Josh
 * Justin Zheng
+* Kendall Masse
 * Matthew Roeschke
 * Max Bolingbroke +
 * rbenes +

diff --git a/doc/source/whatsnew/v0.25.0.rst b/doc/source/whatsnew/v0.25.0.rst
@@ -26,6 +26,7 @@ Other Enhancements
 - :meth:`DataFrame.set_index` now works for instances of ``abc.Iterator``, provided their output is of the same length as the calling frame (:issue:`22484`, :issue:`24984`)
 - :meth:`DatetimeIndex.union` now supports the ``sort`` argument. The behaviour of the sort parameter matches that of :meth:`Index.union` (:issue:`24994`)
 - :meth:`DataFrame.rename` now supports the ``errors`` argument to raise errors when attempting to rename nonexistent keys (:issue:`13473`)
+- :class:`RangeIndex` has gained :attr:`~RangeIndex.start`, :attr:`~RangeIndex.stop`, and :attr:`~RangeIndex.step` attributes (:issue:`25710`)
 
 .. _whatsnew_0250.api_breaking:
 
@@ -86,14 +87,15 @@ Other API Changes
 - :class:`DatetimeTZDtype` will now standardize pytz timezones to a common timezone instance (:issue:`24713`)
 - ``Timestamp`` and ``Timedelta`` scalars now implement the :meth:`to_numpy` method as aliases to :meth:`Timestamp.to_datetime64` and :meth:`Timedelta.to_timedelta64`, respectively. (:issue:`24653`)
 - :meth:`Timestamp.strptime` will now rise a ``NotImplementedError`` (:issue:`25016`)
--
+- Bug in :meth:`DatetimeIndex.snap` which didn't preserving the ``name`` of the input :class:`Index` (:issue:`25575`)
 
 .. _whatsnew_0250.deprecations:
 
 Deprecations
 ~~~~~~~~~~~~
 
 - Deprecated the `M (months)` and `Y (year)` `units` parameter of :func: `pandas.to_timedelta`, :func: `pandas.Timedelta` and :func: `pandas.TimedeltaIndex` (:issue:`16344`)
+- The functions :func:`pandas.to_datetime` and :func:`pandas.to_timedelta` have deprecated the ``box`` keyword. Instead, use :meth:`to_numpy` or :meth:`Timestamp.to_datetime64`/:meth:`Timedelta.to_timedelta64`. (:issue:`24416`)
 
 .. _whatsnew_0250.prior_deprecations:
 
@@ -122,7 +124,7 @@ Bug Fixes
 ~~~~~~~~~
 - Bug in :func:`to_datetime` which would raise an (incorrect) ``ValueError`` when called with a date far into the future and the ``format`` argument specified instead of raising ``OutOfBoundsDatetime`` (:issue:`23830`)
 - Bug in an error message in :meth:`DataFrame.plot`. Improved the error message if non-numerics are passed to :meth:`DataFrame.plot` (:issue:`25481`)
-- Fixed bug where :class:`api.extensions.ExtensionArray` could not be used in matplotlib plotting (:issue:`25587`)
+- Bug in error messages in :meth:`DataFrame.corr` and :meth:`Series.corr`. Added the possibility of using a callable. (:issue:`25729`)
 
 Categorical
 ^^^^^^^^^^^
@@ -214,14 +216,16 @@ I/O
 - Bug in :func:`read_json` for ``orient='table'`` when it tries to infer dtypes by default, which is not applicable as dtypes are already defined in the JSON schema (:issue:`21345`)
 - Bug in :func:`read_json` for ``orient='table'`` and float index, as it infers index dtype by default, which is not applicable because index dtype is already defined in the JSON schema (:issue:`25433`)
 - Bug in :func:`read_json` for ``orient='table'`` and string of float column names, as it makes a column name type conversion to Timestamp, which is not applicable because column names are already defined in the JSON schema (:issue:`25435`)
+- Bug in :func:`json_normalize` for ``errors='ignore'`` where missing values in the input data, were filled in resulting ``DataFrame`` with the string "nan" instead of ``numpy.nan`` (:issue:`25468`)
 - :meth:`DataFrame.to_html` now raises ``TypeError`` when using an invalid type for the ``classes`` parameter instead of ``AsseertionError`` (:issue:`25608`)
--
+- Bug in :meth:`DataFrame.to_string` and :meth:`DataFrame.to_latex` that would lead to incorrect output when the ``header`` keyword is used (:issue:`16718`)
 -
 
 
 Plotting
 ^^^^^^^^
 
+- Fixed bug where :class:`api.extensions.ExtensionArray` could not be used in matplotlib plotting (:issue:`25587`)
 -
 -
 -
@@ -241,6 +245,7 @@ Reshaping
 - Bug in :func:`pandas.merge` adds a string of ``None`` if ``None`` is assigned in suffixes instead of remain the column name as-is (:issue:`24782`).
 - Bug in :func:`merge` when merging by index name would sometimes result in an incorrectly numbered index (:issue:`24212`)
 - :func:`to_records` now accepts dtypes to its `column_dtypes` parameter (:issue:`24895`)
+- Bug in :func:`concat` where order of ``OrderedDict`` (and ``dict`` in Python 3.6+) is not respected, when passed in as  ``objs`` argument (:issue:`21510`)
 
 
 Sparse

diff --git a/environment.yml b/environment.yml
@@ -19,6 +19,7 @@ dependencies:
   - hypothesis>=3.82
   - isort
   - moto
+  - pycodestyle=2.4
   - pytest>=4.0.2
   - pytest-mock
   - sphinx

diff --git a/pandas/_libs/tslibs/timedeltas.pyx b/pandas/_libs/tslibs/timedeltas.pyx
@@ -246,9 +246,11 @@ def array_to_timedelta64(object[:] values, unit='ns', errors='raise'):
     return iresult.base  # .base to access underlying np.ndarray
 
 
-cdef inline int64_t cast_from_unit(object ts, object unit) except? -1:
-    """ return a casting of the unit represented to nanoseconds
-        round the fractional part of a float to our precision, p """
+cpdef inline object precision_from_unit(object unit):
+    """
+    Return a casting of the unit represented to nanoseconds + the precision
+    to round the fractional part.
+    """
     cdef:
         int64_t m
         int p
@@ -285,6 +287,17 @@ cdef inline int64_t cast_from_unit(object ts, object unit) except? -1:
         p = 0
     else:
         raise ValueError("cannot cast unit {unit}".format(unit=unit))
+    return m, p
+
+
+cdef inline int64_t cast_from_unit(object ts, object unit) except? -1:
+    """ return a casting of the unit represented to nanoseconds
+        round the fractional part of a float to our precision, p """
+    cdef:
+        int64_t m
+        int p
+
+    m, p = precision_from_unit(unit)
 
     # just give me the unit back
     if ts is None:

diff --git a/pandas/core/algorithms.py b/pandas/core/algorithms.py
@@ -619,13 +619,19 @@ def factorize(values, sort=False, order=None, na_sentinel=-1, size_hint=None):
 
     if sort and len(uniques) > 0:
         from pandas.core.sorting import safe_sort
-        try:
-            order = uniques.argsort()
-            order2 = order.argsort()
-            labels = take_1d(order2, labels, fill_value=na_sentinel)
-            uniques = uniques.take(order)
-        except TypeError:
-            # Mixed types, where uniques.argsort fails.
+        if na_sentinel == -1:
+            # GH-25409 take_1d only works for na_sentinels of -1
+            try:
+                order = uniques.argsort()
+                order2 = order.argsort()
+                labels = take_1d(order2, labels, fill_value=na_sentinel)
+                uniques = uniques.take(order)
+            except TypeError:
+                # Mixed types, where uniques.argsort fails.
+                uniques, labels = safe_sort(uniques, labels,
+                                            na_sentinel=na_sentinel,
+                                            assume_unique=True)
+        else:
             uniques, labels = safe_sort(uniques, labels,
                                         na_sentinel=na_sentinel,
                                         assume_unique=True)

diff --git a/pandas/core/arrays/timedeltas.py b/pandas/core/arrays/timedeltas.py
@@ -11,7 +11,7 @@
 from pandas._libs.tslibs import NaT, Timedelta, Timestamp, iNaT
 from pandas._libs.tslibs.fields import get_timedelta_field
 from pandas._libs.tslibs.timedeltas import (
-    array_to_timedelta64, parse_timedelta_unit)
+    array_to_timedelta64, parse_timedelta_unit, precision_from_unit)
 import pandas.compat as compat
 from pandas.util._decorators import Appender
 
@@ -918,12 +918,15 @@ def sequence_to_td64ns(data, copy=False, unit="ns", errors="raise"):
         copy = copy and not copy_made
 
     elif is_float_dtype(data.dtype):
-        # treat as multiples of the given unit.  If after converting to nanos,
-        #  there are fractional components left, these are truncated
-        #  (i.e. NOT rounded)
+        # cast the unit, multiply base/frace separately
+        # to avoid precision issues from float -> int
         mask = np.isnan(data)
-        coeff = np.timedelta64(1, unit) / np.timedelta64(1, 'ns')
-        data = (coeff * data).astype(np.int64).view('timedelta64[ns]')
+        m, p = precision_from_unit(unit)
+        base = data.astype(np.int64)
+        frac = data - base
+        if p:
+            frac = np.round(frac, p)
+        data = (base * m + (frac * m).astype(np.int64)).view('timedelta64[ns]')
         data[mask] = iNaT
         copy = False
 

diff --git a/pandas/core/categorical.py b/pandas/core/categorical.py
diff --git a/pandas/core/dtypes/cast.py b/pandas/core/dtypes/cast.py
@@ -794,10 +794,10 @@ def soft_convert_objects(values, datetime=True, numeric=True, timedelta=True,
         # Immediate return if coerce
         if datetime:
             from pandas import to_datetime
-            return to_datetime(values, errors='coerce', box=False)
+            return to_datetime(values, errors='coerce').to_numpy()
         elif timedelta:
             from pandas import to_timedelta
-            return to_timedelta(values, errors='coerce', box=False)
+            return to_timedelta(values, errors='coerce').to_numpy()
         elif numeric:
             from pandas import to_numeric
             return to_numeric(values, errors='coerce')

diff --git a/pandas/core/frame.py b/pandas/core/frame.py
@@ -7088,8 +7088,8 @@ def corr(self, method='pearson', min_periods=1):
                     correl[j, i] = c
         else:
             raise ValueError("method must be either 'pearson', "
-                             "'spearman', or 'kendall', '{method}' "
-                             "was supplied".format(method=method))
+                             "'spearman', 'kendall', or a callable, "
+                             "'{method}' was supplied".format(method=method))
 
         return self._constructor(correl, index=idx, columns=cols)
 

diff --git a/pandas/core/groupby/generic.py b/pandas/core/groupby/generic.py
@@ -822,7 +822,7 @@ def _aggregate_multiple_funcs(self, arg, _level):
                     columns.append(com.get_callable_name(f))
             arg = lzip(columns, arg)
 
-        results = {}
+        results = collections.OrderedDict()
         for name, func in arg:
             obj = self
             if name in results:

diff --git a/pandas/core/indexes/datetimelike.py b/pandas/core/indexes/datetimelike.py
@@ -300,7 +300,8 @@ def asobject(self):
         return self.astype(object)
 
     def _convert_tolerance(self, tolerance, target):
-        tolerance = np.asarray(to_timedelta(tolerance, box=False))
+        tolerance = np.asarray(to_timedelta(tolerance).to_numpy())
+
         if target.size != tolerance.size and tolerance.size > 1:
             raise ValueError('list-like tolerance size must match '
                              'target index size')

diff --git a/pandas/core/indexes/datetimes.py b/pandas/core/indexes/datetimes.py
@@ -787,8 +787,8 @@ def snap(self, freq='S'):
             snapped[i] = s
 
         # we know it conforms; skip check
-        return DatetimeIndex._simple_new(snapped, freq=freq)
-        # TODO: what about self.name?  tz? if so, use shallow_copy?
+        return DatetimeIndex._simple_new(snapped, name=self.name, tz=self.tz,
+                                         freq=freq)
 
     def join(self, other, how='left', level=None, return_indexers=False,
              sort=False):

diff --git a/pandas/core/indexes/range.py b/pandas/core/indexes/range.py
@@ -48,7 +48,9 @@ class RangeIndex(Int64Index):
 
     Attributes
     ----------
-    None
+    start
+    stop
+    step
 
     Methods
     -------
@@ -209,6 +211,29 @@ def _format_data(self, name=None):
         return None
 
     # --------------------------------------------------------------------
+    @property
+    def start(self):
+        """
+        The value of the `start` parameter (or ``0`` if this was not supplied)
+        """
+        # GH 25710
+        return self._start
+
+    @property
+    def stop(self):
+        """
+        The value of the `stop` parameter
+        """
+        # GH 25710
+        return self._stop
+
+    @property
+    def step(self):
+        """
+        The value of the `step` parameter (or ``1`` if this was not supplied)
+        """
+        # GH 25710
+        return self._step
 
     @cache_readonly
     def nbytes(self):