Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Revert set_index inspection/error handling for 0.24.1 #25085

Merged
merged 34 commits into from
Feb 3, 2019
Merged
Show file tree
Hide file tree
Changes from 29 commits
Commits
Show all changes
34 commits
Select commit Hold shift + click to select a range
31dcbb7
DOC: Minor what's new fix (#24933)
rth Jan 26, 2019
84056c5
Backport PR #24916: BUG-24212 fix regression in #24897 (#24951)
meeseeksmachine Jan 26, 2019
e22a6c8
Revert "Backport PR #24916: BUG-24212 fix regression in #24897 (#24951)"
jorisvandenbossche Jan 28, 2019
638ac19
Backport PR #24965: Fixed itertuples usage in to_dict (#24978)
meeseeksmachine Jan 28, 2019
72dc33f
Backport PR #24989: DOC: Document breaking change to read_csv (#24996)
meeseeksmachine Jan 29, 2019
fd1c66c
Backport PR #24964: DEPR: Fixed warning for implicit registration (#2…
meeseeksmachine Jan 29, 2019
d54c3a5
Backport PR #24973: fix for BUG: grouping with tz-aware: Values falls…
TomAugspurger Jan 29, 2019
e3cc0b1
Backport PR #24967: REGR: Preserve order by default in Index.differen…
meeseeksmachine Jan 30, 2019
c228597
Backport PR #24961: fix+test to_timedelta('NaT', box=False) (#25025)
meeseeksmachine Jan 30, 2019
7956533
Backport PR #25033: BUG: Fixed merging on tz-aware (#25041)
meeseeksmachine Jan 30, 2019
722bb79
Backport PR #24993: Test nested PandasArray (#25042)
meeseeksmachine Jan 30, 2019
e3634b1
Backport PR #25039: BUG: avoid usage in_qtconsole for recent IPython …
meeseeksmachine Jan 31, 2019
4f865c5
Backport PR #25024: REGR: fix read_sql delegation for queries on MySQ…
meeseeksmachine Jan 31, 2019
c21d32f
Backport PR #25069: REGR: rename_axis with None should remove axis na…
meeseeksmachine Feb 1, 2019
5cb622a
DOC: 0.24.1 whatsnew (#25027)
TomAugspurger Feb 1, 2019
c397839
Revert "DOC: update DF.set_index (#24762)"
h-vetinari Feb 1, 2019
4a211e9
Revert "API: better error-handling for df.set_index (#22486)"
h-vetinari Feb 1, 2019
103a092
Replace deprecated assert_raises_regex
h-vetinari Feb 1, 2019
8086f39
Re-migrate 0.24.0 extension (.txt -> .rst)
h-vetinari Feb 1, 2019
999295e
Re-add docstring clarifications
h-vetinari Feb 1, 2019
c24df00
Backport PR #25063: API: change Index set ops sort=True -> sort=None …
meeseeksmachine Feb 1, 2019
627b17a
trigger azure
TomAugspurger Feb 1, 2019
bc405ce
Backport PR #25084: DOC: Cleanup 0.24.1 whatsnew (#25086)
meeseeksmachine Feb 2, 2019
02db6ec
Backport PR #25026: DOC: Start 0.24.2.rst (#25073)
meeseeksmachine Feb 2, 2019
ff34d2e
trigger azure
TomAugspurger Feb 2, 2019
2aa800c
Merge remote-tracking branch 'upstream/0.24.x' into revert_set_index
h-vetinari Feb 3, 2019
330b343
Keep all tests from #24984; xfail where necessary
h-vetinari Feb 3, 2019
24a4df4
Merge remote-tracking branch 'origin/revert_set_index' into revert_se…
h-vetinari Feb 3, 2019
963a813
Remove stray debugging line
h-vetinari Feb 3, 2019
4db4849
Add whatsnew
h-vetinari Feb 3, 2019
8c913c2
Merge remote-tracking branch 'upstream/master' into h-vetinari-revert…
jorisvandenbossche Feb 3, 2019
5a6cc73
Merge remote-tracking branch 'upstream/master' into revert_set_index
h-vetinari Feb 3, 2019
ff62753
Re-add reverted 0.24.0 whatsnew
h-vetinari Feb 3, 2019
65c7880
Re-add handling for duplicate drops
h-vetinari Feb 3, 2019
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion doc/source/index.rst.template
Original file line number Diff line number Diff line change
Expand Up @@ -39,7 +39,7 @@ See the :ref:`overview` for more detail about what's in the library.
{% endif %}

{% if not single_doc -%}
What's New in 0.24.0 <whatsnew/v0.24.0>
What's New in 0.24.1 <whatsnew/v0.24.1>
install
getting_started/index
user_guide/index
Expand Down
30 changes: 30 additions & 0 deletions doc/source/user_guide/io.rst
Original file line number Diff line number Diff line change
Expand Up @@ -989,6 +989,36 @@ a single date rather than the entire array.

os.remove('tmp.csv')


.. _io.csv.mixed_timezones:

Parsing a CSV with mixed Timezones
++++++++++++++++++++++++++++++++++

Pandas cannot natively represent a column or index with mixed timezones. If your CSV
file contains columns with a mixture of timezones, the default result will be
an object-dtype column with strings, even with ``parse_dates``.


.. ipython:: python

content = """\
a
2000-01-01T00:00:00+05:00
2000-01-01T00:00:00+06:00"""
df = pd.read_csv(StringIO(content), parse_dates=['a'])
df['a']

To parse the mixed-timezone values as a datetime column, pass a partially-applied
:func:`to_datetime` with ``utc=True`` as the ``date_parser``.

.. ipython:: python

df = pd.read_csv(StringIO(content), parse_dates=['a'],
date_parser=lambda col: pd.to_datetime(col, utc=True))
df['a']


.. _io.dayfirst:


Expand Down
53 changes: 49 additions & 4 deletions doc/source/whatsnew/v0.24.0.rst
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,8 @@ What's New in 0.24.0 (January 25, 2019)
.. warning::

The 0.24.x series of releases will be the last to support Python 2. Future feature
releases will support Python 3 only. See :ref:`install.dropping-27` for more.
releases will support Python 3 only. See :ref:`install.dropping-27` for more
details.

{{ header }}

Expand Down Expand Up @@ -244,7 +245,7 @@ the new extension arrays that back interval and period data.
Joining with two multi-indexes
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

:func:`DataFrame.merge` and :func:`DataFrame.join` can now be used to join multi-indexed ``Dataframe`` instances on the overlaping index levels (:issue:`6360`)
:func:`DataFrame.merge` and :func:`DataFrame.join` can now be used to join multi-indexed ``Dataframe`` instances on the overlapping index levels (:issue:`6360`)

See the :ref:`Merge, join, and concatenate
<merging.Join_with_two_multi_indexes>` documentation section.
Expand Down Expand Up @@ -647,6 +648,52 @@ that the dates have been converted to UTC
pd.to_datetime(["2015-11-18 15:30:00+05:30",
"2015-11-18 16:30:00+06:30"], utc=True)


.. _whatsnew_0240.api_breaking.read_csv_mixed_tz:

Parsing mixed-timezones with :func:`read_csv`
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

:func:`read_csv` no longer silently converts mixed-timezone columns to UTC (:issue:`24987`).

*Previous Behavior*

.. code-block:: python

>>> import io
>>> content = """\
... a
... 2000-01-01T00:00:00+05:00
... 2000-01-01T00:00:00+06:00"""
>>> df = pd.read_csv(io.StringIO(content), parse_dates=['a'])
>>> df.a
0 1999-12-31 19:00:00
1 1999-12-31 18:00:00
Name: a, dtype: datetime64[ns]

*New Behavior*

.. ipython:: python

import io
content = """\
a
2000-01-01T00:00:00+05:00
2000-01-01T00:00:00+06:00"""
df = pd.read_csv(io.StringIO(content), parse_dates=['a'])
df.a

As can be seen, the ``dtype`` is object; each value in the column is a string.
To convert the strings to an array of datetimes, the ``date_parser`` argument

.. ipython:: python

df = pd.read_csv(io.StringIO(content), parse_dates=['a'],
date_parser=lambda col: pd.to_datetime(col, utc=True))
df.a

See :ref:`whatsnew_0240.api.timezone_offset_parsing` for more.

.. _whatsnew_0240.api_breaking.period_end_time:

Time values in ``dt.end_time`` and ``to_timestamp(how='end')``
Expand Down Expand Up @@ -1148,8 +1195,6 @@ Other API Changes
- :class:`pandas.io.formats.style.Styler` supports a ``number-format`` property when using :meth:`~pandas.io.formats.style.Styler.to_excel` (:issue:`22015`)
- :meth:`DataFrame.corr` and :meth:`Series.corr` now raise a ``ValueError`` along with a helpful error message instead of a ``KeyError`` when supplied with an invalid method (:issue:`22298`)
- :meth:`shift` will now always return a copy, instead of the previous behaviour of returning self when shifting by 0 (:issue:`22397`)
- :meth:`DataFrame.set_index` now gives a better (and less frequent) KeyError, raises a ``ValueError`` for incorrect types,
TomAugspurger marked this conversation as resolved.
Show resolved Hide resolved
and will not fail on duplicate column names with ``drop=True``. (:issue:`22484`)
- Slicing a single row of a DataFrame with multiple ExtensionArrays of the same type now preserves the dtype, rather than coercing to object (:issue:`22784`)
- :class:`DateOffset` attribute `_cacheable` and method `_should_cache` have been removed (:issue:`23118`)
- :meth:`Series.searchsorted`, when supplied a scalar value to search for, now returns a scalar instead of an array (:issue:`23801`).
Expand Down
71 changes: 38 additions & 33 deletions doc/source/whatsnew/v0.24.1.rst
Original file line number Diff line number Diff line change
Expand Up @@ -13,61 +13,66 @@ Whats New in 0.24.1 (February XX, 2019)
{{ header }}

These are the changes in pandas 0.24.1. See :ref:`release` for a full changelog
including other versions of pandas.
including other versions of pandas. See :ref:`whatsnew_0240` for the 0.24.0 changelog.

.. _whatsnew_0241.api:

.. _whatsnew_0241.enhancements:
API Changes
~~~~~~~~~~~

Enhancements
^^^^^^^^^^^^
Changing the ``sort`` parameter for :class:`Index` set operations
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

The default ``sort`` value for :meth:`Index.union` has changed from ``True`` to ``None`` (:issue:`24959`).
The default *behavior*, however, remains the same: the result is sorted, unless

.. _whatsnew_0241.bug_fixes:
1. ``self`` and ``other`` are identical
2. ``self`` or ``other`` is empty
3. ``self`` or ``other`` contain values that can not be compared (a ``RuntimeWarning`` is raised).

Bug Fixes
~~~~~~~~~
This change will allow ``sort=True`` to mean "always sort" in a future release.

**Conversion**
The same change applies to :meth:`Index.difference` and :meth:`Index.symmetric_difference`, which
would not sort the result when the values could not be compared.

-
-
-
The `sort` option for :meth:`Index.intersection` has changed in three ways.

**Indexing**
1. The default has changed from ``True`` to ``False``, to restore the
pandas 0.23.4 and earlier behavior of not sorting by default.
2. The behavior of ``sort=True`` can now be obtained with ``sort=None``.
This will sort the result only if the values in ``self`` and ``other``
are not identical.
3. The value ``sort=True`` is no longer allowed. A future version of pandas
will properly support ``sort=True`` meaning "always sort".

-
-
-
.. _whatsnew_0241.regressions:

**I/O**
Fixed Regressions
~~~~~~~~~~~~~~~~~

-
-
-
- Bug in :meth:`DataFrame.itertuples` with ``records`` orient raising an ``AttributeError`` when the ``DataFrame`` contained more than 255 columns (:issue:`24939`)
- Bug in :meth:`DataFrame.itertuples` orient converting integer column names to strings prepended with an underscore (:issue:`24940`)
- Fixed regression in :func:`read_sql` when passing certain queries with MySQL/pymysql (:issue:`24988`).
- Fixed regression in :class:`Index.intersection` incorrectly sorting the values by default (:issue:`24959`).
- Fixed regression in :func:`merge` when merging an empty ``DataFrame`` with multiple timezone-aware columns on one of the timezone-aware columns (:issue:`25014`).
- Fixed regression in :meth:`Series.rename_axis` and :meth:`DataFrame.rename_axis` where passing ``None`` failed to remove the axis name (:issue:`25034`)

**Categorical**
**Timedelta**

-
-
-
- Bug in :func:`to_timedelta` with `box=False` incorrectly returning a ``datetime64`` object instead of a ``timedelta64`` object (:issue:`24961`)

**Timezones**
**Reshaping**

-
-
-
- Bug in :meth:`DataFrame.groupby` with :class:`Grouper` when there is a time change (DST) and grouping frequency is ``'1d'`` (:issue:`24972`)

**Timedelta**
**Visualization**

-
-
-
- Fixed the warning for implicitly registered matplotlib converters not showing. See :ref:`whatsnew_0211.converters` for more (:issue:`24963`).


**Other**

-
-
- Fixed AttributeError when printing a DataFrame's HTML repr after accessing the IPython config object (:issue:`25036`)

.. _whatsnew_0.241.contributors:

Expand Down
99 changes: 99 additions & 0 deletions doc/source/whatsnew/v0.24.2.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,99 @@
:orphan:

.. _whatsnew_0242:

Whats New in 0.24.2 (February XX, 2019)
---------------------------------------

.. warning::

The 0.24.x series of releases will be the last to support Python 2. Future feature
releases will support Python 3 only. See :ref:`install.dropping-27` for more.

{{ header }}

These are the changes in pandas 0.24.2. See :ref:`release` for a full changelog
including other versions of pandas.

.. _whatsnew_0242.regressions:

Fixed Regressions
^^^^^^^^^^^^^^^^^

-
-
-

.. _whatsnew_0242.enhancements:

Enhancements
^^^^^^^^^^^^

-
-

.. _whatsnew_0242.bug_fixes:

Bug Fixes
~~~~~~~~~

**Conversion**

-
-
-

**Indexing**

-
-
-

**I/O**

-
-
-

**Categorical**

-
-
-

**Timezones**

-
-
-

**Timedelta**

-
-
-

**Reshaping**

-
-
-

**Visualization**

-
-
-

**Other**

-
-
-

.. _whatsnew_0.242.contributors:

Contributors
~~~~~~~~~~~~

.. contributors:: v0.24.1..v0.24.2
3 changes: 2 additions & 1 deletion pandas/_libs/lib.pyx
Original file line number Diff line number Diff line change
Expand Up @@ -231,10 +231,11 @@ def fast_unique_multiple(list arrays, sort: bool=True):
if val not in table:
table[val] = stub
uniques.append(val)
if sort:
if sort is None:
try:
uniques.sort()
except Exception:
# TODO: RuntimeWarning?
pass

return uniques
Expand Down
2 changes: 1 addition & 1 deletion pandas/core/arrays/numpy_.py
Original file line number Diff line number Diff line change
Expand Up @@ -222,7 +222,7 @@ def __getitem__(self, item):
item = item._ndarray

result = self._ndarray[item]
if not lib.is_scalar(result):
if not lib.is_scalar(item):
result = type(self)(result)
return result

Expand Down
Loading