-
-
Notifications
You must be signed in to change notification settings - Fork 18.1k
/
v0.21.0.txt
393 lines (276 loc) · 18.8 KB
/
v0.21.0.txt
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
.. _whatsnew_0210:
v0.21.0 (???)
-------------
This is a major release from 0.20.x and includes a number of API changes, deprecations, new features,
enhancements, and performance improvements along with a large number of bug fixes. We recommend that all
users upgrade to this version.
Highlights include:
- Integration with `Apache Parquet <https://parquet.apache.org/>`__, including a new top-level :func:`read_parquet` and :func:`DataFrame.to_parquet` method, see :ref:`here <io.parquet>`.
Check the :ref:`API Changes <whatsnew_0210.api_breaking>` and :ref:`deprecations <whatsnew_0210.deprecations>` before updating.
.. contents:: What's new in v0.21.0
:local:
:backlinks: none
.. _whatsnew_0210.enhancements:
New features
~~~~~~~~~~~~
- Support for `PEP 519 -- Adding a file system path protocol
<https://www.python.org/dev/peps/pep-0519/>`_ on most readers and writers (:issue:`13823`)
- Added ``__fspath__`` method to :class:`~pandas.HDFStore`, :class:`~pandas.ExcelFile`,
and :class:`~pandas.ExcelWriter` to work properly with the file system path protocol (:issue:`13823`)
- Added ``skipna`` parameter to :func:`~pandas.api.types.infer_dtype` to
support type inference in the presence of missing values (:issue:`17059`).
.. _whatsnew_0210.enhancements.infer_objects:
``infer_objects`` type conversion
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
The :meth:`DataFrame.infer_objects` and :meth:`Series.infer_objects`
methods have been added to perform dtype inference on object columns, replacing
some of the functionality of the deprecated ``convert_objects``
method. See the documentation :ref:`here <basics.object_conversion>`
for more details. (:issue:`11221`)
This method only performs soft conversions on object columns, converting Python objects
to native types, but not any coercive conversions. For example:
.. ipython:: python
df = pd.DataFrame({'A': [1, 2, 3],
'B': np.array([1, 2, 3], dtype='object'),
'C': ['1', '2', '3']})
df.dtypes
df.infer_objects().dtypes
Note that column ``'C'`` was not converted - only scalar numeric types
will be inferred to a new type. Other types of conversion should be accomplished
using the :func:`to_numeric` function (or :func:`to_datetime`, :func:`to_timedelta`).
.. ipython:: python
df = df.infer_objects()
df['C'] = pd.to_numeric(df['C'], errors='coerce')
df.dtypes
.. _whatsnew_0210.enhancements.attribute_access:
Improved warnings when attempting to create columns
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
New users are often flummoxed by the relationship between column operations and attribute
access on ``DataFrame`` instances (:issue:`5904` & :issue:`7175`). Two specific instances
of this confusion include attempting to create a new column by setting into an attribute:
.. code-block:: ipython
In[1]: df = pd.DataFrame({'one': [1., 2., 3.]})
In[2]: df.two = [4, 5, 6]
This does not raise any obvious exceptions, but also does not create a new column:
.. code-block:: ipython
In[3]: df
Out[3]:
one
0 1.0
1 2.0
2 3.0
The second source of confusion is creating a column whose name collides with a method or
attribute already in the instance namespace:
.. code-block:: ipython
In[4]: df['sum'] = [5., 7., 9.]
This does not permit that column to be accessed as an attribute:
.. code-block:: ipython
In[5]: df.sum
Out[5]:
<bound method DataFrame.sum of one sum
0 1.0 5.0
1 2.0 7.0
2 3.0 9.0>
Both of these now raise a ``UserWarning`` about the potential for unexpected behavior. See :ref:`Attribute Access <indexing.attribute_access>`.
.. _whatsnew_0210.enhancements.other:
Other Enhancements
^^^^^^^^^^^^^^^^^^
- The ``validate`` argument for :func:`merge` function now checks whether a merge is one-to-one, one-to-many, many-to-one, or many-to-many. If a merge is found to not be an example of specified merge type, an exception of type ``MergeError`` will be raised. For more, see :ref:`here <merging.validation>` (:issue:`16270`)
- :func:`Series.to_dict` and :func:`DataFrame.to_dict` now support an ``into`` keyword which allows you to specify the ``collections.Mapping`` subclass that you would like returned. The default is ``dict``, which is backwards compatible. (:issue:`16122`)
- :func:`RangeIndex.append` now returns a ``RangeIndex`` object when possible (:issue:`16212`)
- :func:`Series.rename_axis` and :func:`DataFrame.rename_axis` with ``inplace=True`` now return ``None`` while renaming the axis inplace. (:issue:`15704`)
- :func:`Series.set_axis` and :func:`DataFrame.set_axis` now support the ``inplace`` parameter. (:issue:`14636`)
- :func:`Series.to_pickle` and :func:`DataFrame.to_pickle` have gained a ``protocol`` parameter (:issue:`16252`). By default, this parameter is set to `HIGHEST_PROTOCOL <https://docs.python.org/3/library/pickle.html#data-stream-format>`__
- :func:`api.types.infer_dtype` now infers decimals. (:issue:`15690`)
- :func:`read_feather` has gained the ``nthreads`` parameter for multi-threaded operations (:issue:`16359`)
- :func:`DataFrame.clip()` and :func:`Series.clip()` have gained an ``inplace`` argument. (:issue:`15388`)
- :func:`crosstab` has gained a ``margins_name`` parameter to define the name of the row / column that will contain the totals when ``margins=True``. (:issue:`15972`)
- :func:`DataFrame.select_dtypes` now accepts scalar values for include/exclude as well as list-like. (:issue:`16855`)
- :func:`date_range` now accepts 'YS' in addition to 'AS' as an alias for start of year (:issue:`9313`)
- :func:`date_range` now accepts 'Y' in addition to 'A' as an alias for end of year (:issue:`9313`)
- Integration with `Apache Parquet <https://parquet.apache.org/>`__, including a new top-level :func:`read_parquet` and :func:`DataFrame.to_parquet` method, see :ref:`here <io.parquet>`.
- :func:`DataFrame.add_prefix` and :func:`DataFrame.add_suffix` now accept strings containing the '%' character. (:issue:`17151`)
- `read_*` methods can now infer compression from non-string paths, such as ``pathlib.Path`` objects (:issue:`17206`).
.. _whatsnew_0210.api_breaking:
Backwards incompatible API changes
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. _whatsnew_0210.api_breaking.pandas_eval:
Improved error handling during item assignment in pd.eval
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
:func:`eval` will now raise a ``ValueError`` when item assignment malfunctions, or
inplace operations are specified, but there is no item assignment in the expression (:issue:`16732`)
.. ipython:: python
arr = np.array([1, 2, 3])
Previously, if you attempted the following expression, you would get a not very helpful error message:
.. code-block:: ipython
In [3]: pd.eval("a = 1 + 2", target=arr, inplace=True)
...
IndexError: only integers, slices (`:`), ellipsis (`...`), numpy.newaxis (`None`)
and integer or boolean arrays are valid indices
This is a very long way of saying numpy arrays don't support string-item indexing. With this
change, the error message is now this:
.. code-block:: python
In [3]: pd.eval("a = 1 + 2", target=arr, inplace=True)
...
ValueError: Cannot assign expression output to target
It also used to be possible to evaluate expressions inplace, even if there was no item assignment:
.. code-block:: ipython
In [4]: pd.eval("1 + 2", target=arr, inplace=True)
Out[4]: 3
However, this input does not make much sense because the output is not being assigned to
the target. Now, a ``ValueError`` will be raised when such an input is passed in:
.. code-block:: ipython
In [4]: pd.eval("1 + 2", target=arr, inplace=True)
...
ValueError: Cannot operate inplace if there is no assignment
Dtype Conversions
^^^^^^^^^^^^^^^^^
- Previously assignments, ``.where()`` and ``.fillna()`` with a ``bool`` assignment, would coerce to
same the type (e.g. int / float), or raise for datetimelikes. These will now preseve the bools with ``object`` dtypes. (:issue:`16821`).
.. ipython:: python
s = Series([1, 2, 3])
.. code-block:: python
In [5]: s[1] = True
In [6]: s
Out[6]:
0 1
1 1
2 3
dtype: int64
New Behavior
.. ipython:: python
s[1] = True
s
- Previously, as assignment to a datetimelike with a non-datetimelike would coerce the
non-datetime-like item being assigned (:issue:`14145`).
.. ipython:: python
s = pd.Series([pd.Timestamp('2011-01-01'), pd.Timestamp('2012-01-01')])
.. code-block:: python
In [1]: s[1] = 1
In [2]: s
Out[2]:
0 2011-01-01 00:00:00.000000000
1 1970-01-01 00:00:00.000000001
dtype: datetime64[ns]
These now coerce to ``object`` dtype.
.. ipython:: python
s[1] = 1
s
- Inconsistent behavior in ``.where()`` with datetimelikes which would raise rather than coerce to ``object`` (:issue:`16402`)
- Bug in assignment against ``int64`` data with ``np.ndarray`` with ``float64`` dtype may keep ``int64`` dtype (:issue:`14001`)
.. _whatsnew_0210.api.na_changes:
NA naming Changes
^^^^^^^^^^^^^^^^^
In order to promote more consistency among the pandas API, we have added additional top-level
functions :func:`isna` and :func:`notna` that are aliases for :func:`isnull` and :func:`notnull`.
The naming scheme is now more consistent with methods like ``.dropna()`` and ``.fillna()``. Furthermore
in all cases where ``.isnull()`` and ``.notnull()`` methods are defined, these have additional methods
named ``.isna()`` and ``.notna()``, these are included for classes ``Categorical``,
``Index``, ``Series``, and ``DataFrame``. (:issue:`15001`).
The configuration option ``pd.options.mode.use_inf_as_null`` is deprecated, and ``pd.options.mode.use_inf_as_na`` is added as a replacement.
.. _whatsnew_0210.api:
Other API Changes
^^^^^^^^^^^^^^^^^
- Support has been dropped for Python 3.4 (:issue:`15251`)
- Support has been dropped for bottleneck < 1.0.0 (:issue:`15214`)
- The Categorical constructor no longer accepts a scalar for the ``categories`` keyword. (:issue:`16022`)
- Accessing a non-existent attribute on a closed :class:`~pandas.HDFStore` will now
raise an ``AttributeError`` rather than a ``ClosedFileError`` (:issue:`16301`)
- :func:`read_csv` now treats ``'null'`` strings as missing values by default (:issue:`16471`)
- :func:`read_csv` now treats ``'n/a'`` strings as missing values by default (:issue:`16078`)
- :class:`pandas.HDFStore`'s string representation is now faster and less detailed. For the previous behavior, use ``pandas.HDFStore.info()``. (:issue:`16503`).
- Compression defaults in HDF stores now follow pytable standards. Default is no compression and if ``complib`` is missing and ``complevel`` > 0 ``zlib`` is used (:issue:`15943`)
- ``Index.get_indexer_non_unique()`` now returns a ndarray indexer rather than an ``Index``; this is consistent with ``Index.get_indexer()`` (:issue:`16819`)
- Removed the ``@slow`` decorator from ``pandas.util.testing``, which caused issues for some downstream packages' test suites. Use ``@pytest.mark.slow`` instead, which achieves the same thing (:issue:`16850`)
- Moved definition of ``MergeError`` to the ``pandas.errors`` module.
- The signature of :func:`Series.set_axis` and :func:`DataFrame.set_axis` has been changed from ``set_axis(axis, labels)`` to ``set_axis(labels, axis=0)``, for consistency with the rest of the API. The old signature is deprecated and will show a ``FutureWarning`` (:issue:`14636`)
- :func:`Series.argmin` and :func:`Series.argmax` will now raise a ``TypeError`` when used with ``object`` dtypes, instead of a ``ValueError`` (:issue:`13595`)
.. _whatsnew_0210.deprecations:
Deprecations
~~~~~~~~~~~~
- :func:`read_excel()` has deprecated ``sheetname`` in favor of ``sheet_name`` for consistency with ``.to_excel()`` (:issue:`10559`).
- ``pd.options.html.border`` has been deprecated in favor of ``pd.options.display.html.border`` (:issue:`15793`).
.. _whatsnew_0210.prior_deprecations:
Removal of prior version deprecations/changes
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
- :func:`read_excel()` has dropped the ``has_index_names`` parameter (:issue:`10967`)
- The ``pd.options.display.height`` configuration has been dropped (:issue:`3663`)
- The ``pd.options.display.line_width`` configuration has been dropped (:issue:`2881`)
- The ``pd.options.display.mpl_style`` configuration has been dropped (:issue:`12190`)
- ``Index`` has dropped the ``.sym_diff()`` method in favor of ``.symmetric_difference()`` (:issue:`12591`)
- ``Categorical`` has dropped the ``.order()`` and ``.sort()`` methods in favor of ``.sort_values()`` (:issue:`12882`)
- :func:`eval` and :func:`DataFrame.eval` have changed the default of ``inplace`` from ``None`` to ``False`` (:issue:`11149`)
- The function ``get_offset_name`` has been dropped in favor of the ``.freqstr`` attribute for an offset (:issue:`11834`)
.. _whatsnew_0210.performance:
Performance Improvements
~~~~~~~~~~~~~~~~~~~~~~~~
- Improved performance of instantiating :class:`SparseDataFrame` (:issue:`16773`)
- :attr:`Series.dt` no longer performs frequency inference, yielding a large speedup when accessing the attribute (:issue:`17210`)
.. _whatsnew_0210.bug_fixes:
Bug Fixes
~~~~~~~~~
Conversion
^^^^^^^^^^
- Bug in assignment against datetime-like data with ``int`` may incorrectly convert to datetime-like (:issue:`14145`)
- Bug in assignment against ``int64`` data with ``np.ndarray`` with ``float64`` dtype may keep ``int64`` dtype (:issue:`14001`)
- Fix :func:`DataFrame.memory_usage` to support PyPy. Objects on PyPy do not have a fixed size, so an approximation is used instead (:issue:`17228`)
- Fixed the return type of ``IntervalIndex.is_non_overlapping_monotonic`` to be a Python ``bool`` for consistency with similar attributes/methods. Previously returned a ``numpy.bool_``. (:issue:`17237`)
- Bug in ``IntervalIndex.is_non_overlapping_monotonic`` when intervals are closed on both sides and overlap at a point (:issue:`16560`)
Indexing
^^^^^^^^
- When called with a null slice (e.g. ``df.iloc[:]``), the ``.iloc`` and ``.loc`` indexers return a shallow copy of the original object. Previously they returned the original object. (:issue:`13873`).
- When called on an unsorted ``MultiIndex``, the ``loc`` indexer now will raise ``UnsortedIndexError`` only if proper slicing is used on non-sorted levels (:issue:`16734`).
- Fixes regression in 0.20.3 when indexing with a string on a ``TimedeltaIndex`` (:issue:`16896`).
- Fixed :func:`TimedeltaIndex.get_loc` handling of ``np.timedelta64`` inputs (:issue:`16909`).
- Fix :func:`MultiIndex.sort_index` ordering when ``ascending`` argument is a list, but not all levels are specified, or are in a different order (:issue:`16934`).
- Fixes bug where indexing with ``np.inf`` caused an ``OverflowError`` to be raised (:issue:`16957`)
- Bug in reindexing on an empty ``CategoricalIndex`` (:issue:`16770`)
- Fixes ``DataFrame.loc`` for setting with alignment and tz-aware ``DatetimeIndex`` (:issue:`16889`)
- Avoids ``IndexError`` when passing an Index or Series to ``.iloc`` with older numpy (:issue:`17193`)
- Allow unicode empty strings as placeholders in multilevel columns in Python 2 (:issue:`17099`)
I/O
^^^
- Bug in :func:`read_csv` in which columns were not being thoroughly de-duplicated (:issue:`17060`)
- Bug in :func:`read_csv` in which specified column names were not being thoroughly de-duplicated (:issue:`17095`)
- Bug in :func:`read_csv` in which non integer values for the header argument generated an unhelpful / unrelated error message (:issue:`16338`)
- Bug in :func:`read_csv` in which memory management issues in exception handling, under certain conditions, would cause the interpreter to segfault (:issue:`14696`, :issue:`16798`).
- Bug in :func:`read_csv` when called with ``low_memory=False`` in which a CSV with at least one column > 2GB in size would incorrectly raise a ``MemoryError`` (:issue:`16798`).
- Bug in :func:`read_csv` when called with a single-element list ``header`` would return a ``DataFrame`` of all NaN values (:issue:`7757`)
- Bug in :func:`read_stata` where value labels could not be read when using an iterator (:issue:`16923`)
- Bug in :func:`read_html` where import check fails when run in multiple threads (:issue:`16928`)
Plotting
^^^^^^^^
- Bug in plotting methods using ``secondary_y`` and ``fontsize`` not setting secondary axis font size (:issue:`12565`)
Groupby/Resample/Rolling
^^^^^^^^^^^^^^^^^^^^^^^^
- Bug in ``DataFrame.resample(...).size()`` where an empty ``DataFrame`` did not return a ``Series`` (:issue:`14962`)
- Bug in :func:`infer_freq` causing indices with 2-day gaps during the working week to be wrongly inferred as business daily (:issue:`16624`)
- Bug in ``.rolling(...).quantile()`` which incorrectly used different defaults than :func:`Series.quantile()` and :func:`DataFrame.quantile()` (:issue:`9413`, :issue:`16211`)
- Bug in ``groupby.transform()`` that would coerce boolean dtypes back to float (:issue:`16875`)
- Bug in ``Series.resample(...).apply()`` where an empty ``Series`` modified the source index and did not return the name of a ``Series`` (:issue:`14313`)
- Bug in ``.rolling(...).apply(...)`` with a ``DataFrame`` with a ``DatetimeIndex``, a ``window`` of a timedelta-convertible and ``min_periods >= 1` (:issue:`15305`)
Sparse
^^^^^^
- Bug in ``SparseSeries`` raises ``AttributeError`` when a dictionary is passed in as data (:issue:`16905`)
- Bug in :func:`SparseDataFrame.fillna` not filling all NaNs when frame was instantiated from SciPy sparse matrix (:issue:`16112`)
Reshaping
^^^^^^^^^
- Joining/Merging with a non unique ``PeriodIndex`` raised a ``TypeError`` (:issue:`16871`)
- Bug in :func:`crosstab` where non-aligned series of integers were casted to float (:issue:`17005`)
- Bug in merging with categorical dtypes with datetimelikes incorrectly raised a ``TypeError`` (:issue:`16900`)
- Bug when using :func:`isin` on a large object series and large comparison array (:issue:`16012`)
- Fixes regression from 0.20, :func:`Series.aggregate` and :func:`DataFrame.aggregate` allow dictionaries as return values again (:issue:`16741`)
- Fixes dtype of result with integer dtype input, from :func:`pivot_table` when called with ``margins=True`` (:issue:`17013`)
- Bug in :func:`crosstab` where passing two ``Series`` with the same name raised a ``KeyError`` (:issue:`13279`)
- :func:`Series.argmin`, :func:`Series.argmax`, and their counterparts on ``DataFrame`` and groupby objects work correctly with floating point data that contains infinite values (:issue:`13595`).
Numeric
^^^^^^^
- Bug in ``.clip()`` with ``axis=1`` and a list-like for ``threshold`` is passed; previously this raised ``ValueError`` (:issue:`15390`)
Categorical
^^^^^^^^^^^
- Bug in :func:`Series.isin` when called with a categorical (:issue`16639`)
Other
^^^^^
- Bug in :func:`eval` where the ``inplace`` parameter was being incorrectly handled (:issue:`16732`)
- Bug in ``.isin()`` in which checking membership in empty ``Series`` objects raised an error (:issue:`16991`)
- Bug in :func:`unique` where checking a tuple of strings raised a ``TypeError`` (:issue:`17108`)