Skip to content

Commit

Permalink
Merge branch 'master' into fix-25587
Browse files Browse the repository at this point in the history
* master: (22 commits)
  Fixturize tests/frame/test_operators.py (pandas-dev#25641)
  Update ValueError message in corr (pandas-dev#25729)
  DOC: fix some grammar and inconsistency issues in the User Guide (pandas-dev#25728)
  ENH: Add public start, stop, and step attributes to RangeIndex (pandas-dev#25720)
  Make Rolling.apply documentation clearer (pandas-dev#25712)
  pandas-dev#25707 - Fixed flakiness in stata write test (pandas-dev#25714)
  Json normalize nan support (pandas-dev#25619)
  TST: resolve issues with test_constructor_dtype_datetime64 (pandas-dev#24868)
  DEPR: Deprecate box kwarg for to_timedelta and to_datetime (pandas-dev#24486)
  BUG: Preserve name in DatetimeIndex.snap (pandas-dev#25585)
  Fix concat not respecting order of OrderedDict (pandas-dev#25224)
  CLN: remove pandas.core.categorical (pandas-dev#25655)
  TST/CLN: Remove more Panel tests (pandas-dev#25675)
  Pinned pycodestyle (pandas-dev#25701)
  DOC: update date of 0.24.2 release notes (pandas-dev#25699)
  BUG: Fix error in replace with strings that are large numbers (pandas-dev#25616) (pandas-dev#25644)
  BUG: fix usage of na_sentinel with sort=True in factorize() (pandas-dev#25592)
  BUG: Fix to_string output when using header (pandas-dev#16718) (pandas-dev#25602)
  CLN: Remove unused test code (pandas-dev#25670)
  CLN: remove Panel from concat error message (pandas-dev#25676)
  ...

# Conflicts:
#	doc/source/whatsnew/v0.25.0.rst
  • Loading branch information
sighingnow committed Mar 14, 2019
2 parents aafd214 + 998e1de commit 4095c8c
Show file tree
Hide file tree
Showing 51 changed files with 583 additions and 631 deletions.
6 changes: 3 additions & 3 deletions doc/source/user_guide/text.rst
Original file line number Diff line number Diff line change
Expand Up @@ -46,8 +46,8 @@ Since ``df.columns`` is an Index object, we can use the ``.str`` accessor
df.columns.str.lower()
These string methods can then be used to clean up the columns as needed.
Here we are removing leading and trailing white spaces, lower casing all names,
and replacing any remaining white spaces with underscores:
Here we are removing leading and trailing whitespaces, lower casing all names,
and replacing any remaining whitespaces with underscores:

.. ipython:: python
Expand All @@ -65,7 +65,7 @@ and replacing any remaining white spaces with underscores:
``Series``.

Please note that a ``Series`` of type ``category`` with string ``.categories`` has
some limitations in comparison of ``Series`` of type string (e.g. you can't add strings to
some limitations in comparison to ``Series`` of type string (e.g. you can't add strings to
each other: ``s + " " + s`` won't work if ``s`` is a ``Series`` of type ``category``). Also,
``.str`` methods which operate on elements of type ``list`` are not available on such a
``Series``.
Expand Down
52 changes: 7 additions & 45 deletions doc/source/whatsnew/v0.24.2.rst
Original file line number Diff line number Diff line change
Expand Up @@ -2,8 +2,8 @@

.. _whatsnew_0242:

Whats New in 0.24.2 (February XX, 2019)
---------------------------------------
Whats New in 0.24.2 (March 12, 2019)
------------------------------------

.. warning::

Expand All @@ -18,7 +18,7 @@ including other versions of pandas.
.. _whatsnew_0242.regressions:

Fixed Regressions
^^^^^^^^^^^^^^^^^
~~~~~~~~~~~~~~~~~

- Fixed regression in :meth:`DataFrame.all` and :meth:`DataFrame.any` where ``bool_only=True`` was ignored (:issue:`25101`)
- Fixed issue in ``DataFrame`` construction with passing a mixed list of mixed types could segfault. (:issue:`25075`)
Expand All @@ -31,71 +31,32 @@ Fixed Regressions
- Fixed regression in ``IntervalDtype`` construction where passing an incorrect string with 'Interval' as a prefix could result in a ``RecursionError``. (:issue:`25338`)
- Fixed regression in creating a period-dtype array from a read-only NumPy array of period objects. (:issue:`25403`)
- Fixed regression in :class:`Categorical`, where constructing it from a categorical ``Series`` and an explicit ``categories=`` that differed from that in the ``Series`` created an invalid object which could trigger segfaults. (:issue:`25318`)
- Fixed regression in :func:`to_timedelta` losing precision when converting floating data to ``Timedelta`` data (:issue:`25077`).
- Fixed pip installing from source into an environment without NumPy (:issue:`25193`)
- Fixed regression in :meth:`DataFrame.replace` where large strings of numbers would be coerced into ``int64``, causing an ``OverflowError`` (:issue:`25616`)
- Fixed regression in :func:`factorize` when passing a custom ``na_sentinel`` value with ``sort=True`` (:issue:`25409`).
- Fixed regression in :meth:`DataFrame.to_csv` writing duplicate line endings with gzip compress (:issue:`25311`)

.. _whatsnew_0242.enhancements:

Enhancements
^^^^^^^^^^^^

-
-

.. _whatsnew_0242.bug_fixes:

Bug Fixes
~~~~~~~~~

**Conversion**

-
-
-

**Indexing**

-
-
-

**I/O**

- Better handling of terminal printing when the terminal dimensions are not known (:issue:`25080`)
- Bug in reading a HDF5 table-format ``DataFrame`` created in Python 2, in Python 3 (:issue:`24925`)
- Bug in reading a JSON with ``orient='table'`` generated by :meth:`DataFrame.to_json` with ``index=False`` (:issue:`25170`)
- Bug where float indexes could have misaligned values when printing (:issue:`25061`)
-

**Categorical**

-
-
-

**Timezones**

-
-
-

**Timedelta**

-
-
-

**Reshaping**

- Bug in :meth:`~pandas.core.groupby.GroupBy.transform` where applying a function to a timezone aware column would return a timezone naive result (:issue:`24198`)
- Bug in :func:`DataFrame.join` when joining on a timezone aware :class:`DatetimeIndex` (:issue:`23931`)
-

**Visualization**

- Bug in :meth:`Series.plot` where a secondary y axis could not be set to log scale (:issue:`25545`)
-
-

**Other**

Expand Down Expand Up @@ -130,6 +91,7 @@ A total of 25 people contributed patches to this release. People with a "+" by t
* Joris Van den Bossche
* Josh
* Justin Zheng
* Kendall Masse
* Matthew Roeschke
* Max Bolingbroke +
* rbenes +
Expand Down
11 changes: 8 additions & 3 deletions doc/source/whatsnew/v0.25.0.rst
Original file line number Diff line number Diff line change
Expand Up @@ -26,6 +26,7 @@ Other Enhancements
- :meth:`DataFrame.set_index` now works for instances of ``abc.Iterator``, provided their output is of the same length as the calling frame (:issue:`22484`, :issue:`24984`)
- :meth:`DatetimeIndex.union` now supports the ``sort`` argument. The behaviour of the sort parameter matches that of :meth:`Index.union` (:issue:`24994`)
- :meth:`DataFrame.rename` now supports the ``errors`` argument to raise errors when attempting to rename nonexistent keys (:issue:`13473`)
- :class:`RangeIndex` has gained :attr:`~RangeIndex.start`, :attr:`~RangeIndex.stop`, and :attr:`~RangeIndex.step` attributes (:issue:`25710`)

.. _whatsnew_0250.api_breaking:

Expand Down Expand Up @@ -86,14 +87,15 @@ Other API Changes
- :class:`DatetimeTZDtype` will now standardize pytz timezones to a common timezone instance (:issue:`24713`)
- ``Timestamp`` and ``Timedelta`` scalars now implement the :meth:`to_numpy` method as aliases to :meth:`Timestamp.to_datetime64` and :meth:`Timedelta.to_timedelta64`, respectively. (:issue:`24653`)
- :meth:`Timestamp.strptime` will now rise a ``NotImplementedError`` (:issue:`25016`)
-
- Bug in :meth:`DatetimeIndex.snap` which didn't preserving the ``name`` of the input :class:`Index` (:issue:`25575`)

.. _whatsnew_0250.deprecations:

Deprecations
~~~~~~~~~~~~

- Deprecated the `M (months)` and `Y (year)` `units` parameter of :func: `pandas.to_timedelta`, :func: `pandas.Timedelta` and :func: `pandas.TimedeltaIndex` (:issue:`16344`)
- The functions :func:`pandas.to_datetime` and :func:`pandas.to_timedelta` have deprecated the ``box`` keyword. Instead, use :meth:`to_numpy` or :meth:`Timestamp.to_datetime64`/:meth:`Timedelta.to_timedelta64`. (:issue:`24416`)

.. _whatsnew_0250.prior_deprecations:

Expand Down Expand Up @@ -122,7 +124,7 @@ Bug Fixes
~~~~~~~~~
- Bug in :func:`to_datetime` which would raise an (incorrect) ``ValueError`` when called with a date far into the future and the ``format`` argument specified instead of raising ``OutOfBoundsDatetime`` (:issue:`23830`)
- Bug in an error message in :meth:`DataFrame.plot`. Improved the error message if non-numerics are passed to :meth:`DataFrame.plot` (:issue:`25481`)
- Fixed bug where :class:`api.extensions.ExtensionArray` could not be used in matplotlib plotting (:issue:`25587`)
- Bug in error messages in :meth:`DataFrame.corr` and :meth:`Series.corr`. Added the possibility of using a callable. (:issue:`25729`)

Categorical
^^^^^^^^^^^
Expand Down Expand Up @@ -214,14 +216,16 @@ I/O
- Bug in :func:`read_json` for ``orient='table'`` when it tries to infer dtypes by default, which is not applicable as dtypes are already defined in the JSON schema (:issue:`21345`)
- Bug in :func:`read_json` for ``orient='table'`` and float index, as it infers index dtype by default, which is not applicable because index dtype is already defined in the JSON schema (:issue:`25433`)
- Bug in :func:`read_json` for ``orient='table'`` and string of float column names, as it makes a column name type conversion to Timestamp, which is not applicable because column names are already defined in the JSON schema (:issue:`25435`)
- Bug in :func:`json_normalize` for ``errors='ignore'`` where missing values in the input data, were filled in resulting ``DataFrame`` with the string "nan" instead of ``numpy.nan`` (:issue:`25468`)
- :meth:`DataFrame.to_html` now raises ``TypeError`` when using an invalid type for the ``classes`` parameter instead of ``AsseertionError`` (:issue:`25608`)
-
- Bug in :meth:`DataFrame.to_string` and :meth:`DataFrame.to_latex` that would lead to incorrect output when the ``header`` keyword is used (:issue:`16718`)
-


Plotting
^^^^^^^^

- Fixed bug where :class:`api.extensions.ExtensionArray` could not be used in matplotlib plotting (:issue:`25587`)
-
-
-
Expand All @@ -241,6 +245,7 @@ Reshaping
- Bug in :func:`pandas.merge` adds a string of ``None`` if ``None`` is assigned in suffixes instead of remain the column name as-is (:issue:`24782`).
- Bug in :func:`merge` when merging by index name would sometimes result in an incorrectly numbered index (:issue:`24212`)
- :func:`to_records` now accepts dtypes to its `column_dtypes` parameter (:issue:`24895`)
- Bug in :func:`concat` where order of ``OrderedDict`` (and ``dict`` in Python 3.6+) is not respected, when passed in as ``objs`` argument (:issue:`21510`)


Sparse
Expand Down
1 change: 1 addition & 0 deletions environment.yml
Original file line number Diff line number Diff line change
Expand Up @@ -19,6 +19,7 @@ dependencies:
- hypothesis>=3.82
- isort
- moto
- pycodestyle=2.4
- pytest>=4.0.2
- pytest-mock
- sphinx
Expand Down
19 changes: 16 additions & 3 deletions pandas/_libs/tslibs/timedeltas.pyx
Original file line number Diff line number Diff line change
Expand Up @@ -246,9 +246,11 @@ def array_to_timedelta64(object[:] values, unit='ns', errors='raise'):
return iresult.base # .base to access underlying np.ndarray


cdef inline int64_t cast_from_unit(object ts, object unit) except? -1:
""" return a casting of the unit represented to nanoseconds
round the fractional part of a float to our precision, p """
cpdef inline object precision_from_unit(object unit):
"""
Return a casting of the unit represented to nanoseconds + the precision
to round the fractional part.
"""
cdef:
int64_t m
int p
Expand Down Expand Up @@ -285,6 +287,17 @@ cdef inline int64_t cast_from_unit(object ts, object unit) except? -1:
p = 0
else:
raise ValueError("cannot cast unit {unit}".format(unit=unit))
return m, p


cdef inline int64_t cast_from_unit(object ts, object unit) except? -1:
""" return a casting of the unit represented to nanoseconds
round the fractional part of a float to our precision, p """
cdef:
int64_t m
int p

m, p = precision_from_unit(unit)

# just give me the unit back
if ts is None:
Expand Down
20 changes: 13 additions & 7 deletions pandas/core/algorithms.py
Original file line number Diff line number Diff line change
Expand Up @@ -619,13 +619,19 @@ def factorize(values, sort=False, order=None, na_sentinel=-1, size_hint=None):

if sort and len(uniques) > 0:
from pandas.core.sorting import safe_sort
try:
order = uniques.argsort()
order2 = order.argsort()
labels = take_1d(order2, labels, fill_value=na_sentinel)
uniques = uniques.take(order)
except TypeError:
# Mixed types, where uniques.argsort fails.
if na_sentinel == -1:
# GH-25409 take_1d only works for na_sentinels of -1
try:
order = uniques.argsort()
order2 = order.argsort()
labels = take_1d(order2, labels, fill_value=na_sentinel)
uniques = uniques.take(order)
except TypeError:
# Mixed types, where uniques.argsort fails.
uniques, labels = safe_sort(uniques, labels,
na_sentinel=na_sentinel,
assume_unique=True)
else:
uniques, labels = safe_sort(uniques, labels,
na_sentinel=na_sentinel,
assume_unique=True)
Expand Down
15 changes: 9 additions & 6 deletions pandas/core/arrays/timedeltas.py
Original file line number Diff line number Diff line change
Expand Up @@ -11,7 +11,7 @@
from pandas._libs.tslibs import NaT, Timedelta, Timestamp, iNaT
from pandas._libs.tslibs.fields import get_timedelta_field
from pandas._libs.tslibs.timedeltas import (
array_to_timedelta64, parse_timedelta_unit)
array_to_timedelta64, parse_timedelta_unit, precision_from_unit)
import pandas.compat as compat
from pandas.util._decorators import Appender

Expand Down Expand Up @@ -918,12 +918,15 @@ def sequence_to_td64ns(data, copy=False, unit="ns", errors="raise"):
copy = copy and not copy_made

elif is_float_dtype(data.dtype):
# treat as multiples of the given unit. If after converting to nanos,
# there are fractional components left, these are truncated
# (i.e. NOT rounded)
# cast the unit, multiply base/frace separately
# to avoid precision issues from float -> int
mask = np.isnan(data)
coeff = np.timedelta64(1, unit) / np.timedelta64(1, 'ns')
data = (coeff * data).astype(np.int64).view('timedelta64[ns]')
m, p = precision_from_unit(unit)
base = data.astype(np.int64)
frac = data - base
if p:
frac = np.round(frac, p)
data = (base * m + (frac * m).astype(np.int64)).view('timedelta64[ns]')
data[mask] = iNaT
copy = False

Expand Down
9 changes: 0 additions & 9 deletions pandas/core/categorical.py

This file was deleted.

4 changes: 2 additions & 2 deletions pandas/core/dtypes/cast.py
Original file line number Diff line number Diff line change
Expand Up @@ -794,10 +794,10 @@ def soft_convert_objects(values, datetime=True, numeric=True, timedelta=True,
# Immediate return if coerce
if datetime:
from pandas import to_datetime
return to_datetime(values, errors='coerce', box=False)
return to_datetime(values, errors='coerce').to_numpy()
elif timedelta:
from pandas import to_timedelta
return to_timedelta(values, errors='coerce', box=False)
return to_timedelta(values, errors='coerce').to_numpy()
elif numeric:
from pandas import to_numeric
return to_numeric(values, errors='coerce')
Expand Down
4 changes: 2 additions & 2 deletions pandas/core/frame.py
Original file line number Diff line number Diff line change
Expand Up @@ -7088,8 +7088,8 @@ def corr(self, method='pearson', min_periods=1):
correl[j, i] = c
else:
raise ValueError("method must be either 'pearson', "
"'spearman', or 'kendall', '{method}' "
"was supplied".format(method=method))
"'spearman', 'kendall', or a callable, "
"'{method}' was supplied".format(method=method))

return self._constructor(correl, index=idx, columns=cols)

Expand Down
2 changes: 1 addition & 1 deletion pandas/core/groupby/generic.py
Original file line number Diff line number Diff line change
Expand Up @@ -822,7 +822,7 @@ def _aggregate_multiple_funcs(self, arg, _level):
columns.append(com.get_callable_name(f))
arg = lzip(columns, arg)

results = {}
results = collections.OrderedDict()
for name, func in arg:
obj = self
if name in results:
Expand Down
3 changes: 2 additions & 1 deletion pandas/core/indexes/datetimelike.py
Original file line number Diff line number Diff line change
Expand Up @@ -300,7 +300,8 @@ def asobject(self):
return self.astype(object)

def _convert_tolerance(self, tolerance, target):
tolerance = np.asarray(to_timedelta(tolerance, box=False))
tolerance = np.asarray(to_timedelta(tolerance).to_numpy())

if target.size != tolerance.size and tolerance.size > 1:
raise ValueError('list-like tolerance size must match '
'target index size')
Expand Down
4 changes: 2 additions & 2 deletions pandas/core/indexes/datetimes.py
Original file line number Diff line number Diff line change
Expand Up @@ -787,8 +787,8 @@ def snap(self, freq='S'):
snapped[i] = s

# we know it conforms; skip check
return DatetimeIndex._simple_new(snapped, freq=freq)
# TODO: what about self.name? tz? if so, use shallow_copy?
return DatetimeIndex._simple_new(snapped, name=self.name, tz=self.tz,
freq=freq)

def join(self, other, how='left', level=None, return_indexers=False,
sort=False):
Expand Down
27 changes: 26 additions & 1 deletion pandas/core/indexes/range.py
Original file line number Diff line number Diff line change
Expand Up @@ -48,7 +48,9 @@ class RangeIndex(Int64Index):
Attributes
----------
None
start
stop
step
Methods
-------
Expand Down Expand Up @@ -209,6 +211,29 @@ def _format_data(self, name=None):
return None

# --------------------------------------------------------------------
@property
def start(self):
"""
The value of the `start` parameter (or ``0`` if this was not supplied)
"""
# GH 25710
return self._start

@property
def stop(self):
"""
The value of the `stop` parameter
"""
# GH 25710
return self._stop

@property
def step(self):
"""
The value of the `step` parameter (or ``1`` if this was not supplied)
"""
# GH 25710
return self._step

@cache_readonly
def nbytes(self):
Expand Down
Loading

0 comments on commit 4095c8c

Please sign in to comment.