Skip to content

Commit

Permalink
DOC: clean-up recent doc errors/warnings (pandas-dev#23636)
Browse files Browse the repository at this point in the history
  • Loading branch information
jorisvandenbossche authored and Pingviinituutti committed Feb 28, 2019
1 parent 574f8e2 commit a3c1946
Show file tree
Hide file tree
Showing 9 changed files with 91 additions and 79 deletions.
2 changes: 1 addition & 1 deletion doc/source/advanced.rst
Original file line number Diff line number Diff line change
Expand Up @@ -702,7 +702,7 @@ Index Types

We have discussed ``MultiIndex`` in the previous sections pretty extensively.
Documentation about ``DatetimeIndex`` and ``PeriodIndex`` are shown :ref:`here <timeseries.overview>`,
and documentation about ``TimedeltaIndex`` is found :ref:`here <timedeltas.timedeltaindex>`.
and documentation about ``TimedeltaIndex`` is found :ref:`here <timedeltas.index>`.

In the following sub-sections we will highlight some other index types.

Expand Down
2 changes: 1 addition & 1 deletion doc/source/ecosystem.rst
Original file line number Diff line number Diff line change
Expand Up @@ -140,7 +140,7 @@ which are utilized by Jupyter Notebook for displaying
(Note: HTML tables may or may not be
compatible with non-HTML Jupyter output formats.)

See :ref:`Options and Settings <options>` and :ref:`options.available <available>`
See :ref:`Options and Settings <options>` and :ref:`options.available`
for pandas ``display.`` settings.

`quantopian/qgrid <https://github.com/quantopian/qgrid>`__
Expand Down
4 changes: 3 additions & 1 deletion doc/source/timeseries.rst
Original file line number Diff line number Diff line change
Expand Up @@ -2372,7 +2372,8 @@ can be controlled by the ``nonexistent`` argument. The following options are ava
* ``shift``: Shifts nonexistent times forward to the closest real time

.. ipython:: python
dti = date_range(start='2015-03-29 01:30:00', periods=3, freq='H')
dti = pd.date_range(start='2015-03-29 01:30:00', periods=3, freq='H')
# 2:30 is a nonexistent time
Localization of nonexistent times will raise an error by default.
Expand All @@ -2385,6 +2386,7 @@ Localization of nonexistent times will raise an error by default.
Transform nonexistent times to ``NaT`` or the closest real time forward in time.

.. ipython:: python
dti
dti.tz_localize('Europe/Warsaw', nonexistent='shift')
dti.tz_localize('Europe/Warsaw', nonexistent='NaT')
Expand Down
114 changes: 60 additions & 54 deletions doc/source/whatsnew/v0.24.0.txt
Original file line number Diff line number Diff line change
Expand Up @@ -14,11 +14,10 @@ New features
~~~~~~~~~~~~
- :func:`merge` now directly allows merge between objects of type ``DataFrame`` and named ``Series``, without the need to convert the ``Series`` object into a ``DataFrame`` beforehand (:issue:`21220`)
- ``ExcelWriter`` now accepts ``mode`` as a keyword argument, enabling append to existing workbooks when using the ``openpyxl`` engine (:issue:`3441`)
- ``FrozenList`` has gained the ``.union()`` and ``.difference()`` methods. This functionality greatly simplifies groupby's that rely on explicitly excluding certain columns. See :ref:`Splitting an object into groups
<groupby.split>` for more information (:issue:`15475`, :issue:`15506`)
- ``FrozenList`` has gained the ``.union()`` and ``.difference()`` methods. This functionality greatly simplifies groupby's that rely on explicitly excluding certain columns. See :ref:`Splitting an object into groups <groupby.split>` for more information (:issue:`15475`, :issue:`15506`).
- :func:`DataFrame.to_parquet` now accepts ``index`` as an argument, allowing
the user to override the engine's default behavior to include or omit the
dataframe's indexes from the resulting Parquet file. (:issue:`20768`)
the user to override the engine's default behavior to include or omit the
dataframe's indexes from the resulting Parquet file. (:issue:`20768`)
- :meth:`DataFrame.corr` and :meth:`Series.corr` now accept a callable for generic calculation methods of correlation, e.g. histogram intersection (:issue:`22684`)


Expand Down Expand Up @@ -227,7 +226,7 @@ Other Enhancements
- :class:`Series` and :class:`DataFrame` now support :class:`Iterable` in constructor (:issue:`2193`)
- :class:`DatetimeIndex` gained :attr:`DatetimeIndex.timetz` attribute. Returns local time with timezone information. (:issue:`21358`)
- :meth:`round`, :meth:`ceil`, and meth:`floor` for :class:`DatetimeIndex` and :class:`Timestamp` now support an ``ambiguous`` argument for handling datetimes that are rounded to ambiguous times (:issue:`18946`)
- :meth:`round`, :meth:`ceil`, and meth:`floor` for :class:`DatetimeIndex` and :class:`Timestamp` now support a ``nonexistent`` argument for handling datetimes that are rounded to nonexistent times. See :ref:`timeseries.timezone_nonexsistent` (:issue:`22647`)
- :meth:`round`, :meth:`ceil`, and meth:`floor` for :class:`DatetimeIndex` and :class:`Timestamp` now support a ``nonexistent`` argument for handling datetimes that are rounded to nonexistent times. See :ref:`timeseries.timezone_nonexistent` (:issue:`22647`)
- :class:`Resampler` now is iterable like :class:`GroupBy` (:issue:`15314`).
- :meth:`Series.resample` and :meth:`DataFrame.resample` have gained the :meth:`Resampler.quantile` (:issue:`15023`).
- :meth:`pandas.core.dtypes.is_list_like` has gained a keyword ``allow_sets`` which is ``True`` by default; if ``False``,
Expand All @@ -237,7 +236,7 @@ Other Enhancements
- Compatibility with Matplotlib 3.0 (:issue:`22790`).
- Added :meth:`Interval.overlaps`, :meth:`IntervalArray.overlaps`, and :meth:`IntervalIndex.overlaps` for determining overlaps between interval-like objects (:issue:`21998`)
- :func:`~DataFrame.to_parquet` now supports writing a ``DataFrame`` as a directory of parquet files partitioned by a subset of the columns when ``engine = 'pyarrow'`` (:issue:`23283`)
- :meth:`Timestamp.tz_localize`, :meth:`DatetimeIndex.tz_localize`, and :meth:`Series.tz_localize` have gained the ``nonexistent`` argument for alternative handling of nonexistent times. See :ref:`timeseries.timezone_nonexsistent` (:issue:`8917`)
- :meth:`Timestamp.tz_localize`, :meth:`DatetimeIndex.tz_localize`, and :meth:`Series.tz_localize` have gained the ``nonexistent`` argument for alternative handling of nonexistent times. See :ref:`timeseries.timezone_nonexistent` (:issue:`8917`)
- :meth:`read_excel()` now accepts ``usecols`` as a list of column names or callable (:issue:`18273`)

.. _whatsnew_0240.api_breaking:
Expand Down Expand Up @@ -283,37 +282,37 @@ and replaced it with references to `pyarrow` (:issue:`21639` and :issue:`23053`)
.. _whatsnew_0240.api_breaking.csv_line_terminator:

`os.linesep` is used for ``line_terminator`` of ``DataFrame.to_csv``
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

:func:`DataFrame.to_csv` now uses :func:`os.linesep` rather than ``'\n'``
for the default line terminator (:issue:`20353`).
for the default line terminator (:issue:`20353`).
This change only affects when running on Windows, where ``'\r\n'`` was used for line terminator
even when ``'\n'`` was passed in ``line_terminator``.

Previous Behavior on Windows:

.. code-block:: ipython

In [1]: data = pd.DataFrame({
...: "string_with_lf": ["a\nbc"],
...: "string_with_crlf": ["a\r\nbc"]
...: })
In [1]: data = pd.DataFrame({
...: "string_with_lf": ["a\nbc"],
...: "string_with_crlf": ["a\r\nbc"]
...: })

In [2]: # When passing file PATH to to_csv, line_terminator does not work, and csv is saved with '\r\n'.
...: # Also, this converts all '\n's in the data to '\r\n'.
...: data.to_csv("test.csv", index=False, line_terminator='\n')
In [2]: # When passing file PATH to to_csv, line_terminator does not work, and csv is saved with '\r\n'.
...: # Also, this converts all '\n's in the data to '\r\n'.
...: data.to_csv("test.csv", index=False, line_terminator='\n')

In [3]: with open("test.csv", mode='rb') as f:
...: print(f.read())
b'string_with_lf,string_with_crlf\r\n"a\r\nbc","a\r\r\nbc"\r\n'
In [3]: with open("test.csv", mode='rb') as f:
...: print(f.read())
b'string_with_lf,string_with_crlf\r\n"a\r\nbc","a\r\r\nbc"\r\n'

In [4]: # When passing file OBJECT with newline option to to_csv, line_terminator works.
...: with open("test2.csv", mode='w', newline='\n') as f:
...: data.to_csv(f, index=False, line_terminator='\n')
In [4]: # When passing file OBJECT with newline option to to_csv, line_terminator works.
...: with open("test2.csv", mode='w', newline='\n') as f:
...: data.to_csv(f, index=False, line_terminator='\n')

In [5]: with open("test2.csv", mode='rb') as f:
...: print(f.read())
b'string_with_lf,string_with_crlf\n"a\nbc","a\r\nbc"\n'
In [5]: with open("test2.csv", mode='rb') as f:
...: print(f.read())
b'string_with_lf,string_with_crlf\n"a\nbc","a\r\nbc"\n'


New Behavior on Windows:
Expand All @@ -322,54 +321,54 @@ New Behavior on Windows:
- The value of ``line_terminator`` only affects the line terminator of CSV,
so it does not change the value inside the data.

.. code-block:: ipython
.. code-block:: ipython

In [1]: data = pd.DataFrame({
...: "string_with_lf": ["a\nbc"],
...: "string_with_crlf": ["a\r\nbc"]
...: })
In [1]: data = pd.DataFrame({
...: "string_with_lf": ["a\nbc"],
...: "string_with_crlf": ["a\r\nbc"]
...: })

In [2]: data.to_csv("test.csv", index=False, line_terminator='\n')
In [2]: data.to_csv("test.csv", index=False, line_terminator='\n')

In [3]: with open("test.csv", mode='rb') as f:
...: print(f.read())
b'string_with_lf,string_with_crlf\n"a\nbc","a\r\nbc"\n'
In [3]: with open("test.csv", mode='rb') as f:
...: print(f.read())
b'string_with_lf,string_with_crlf\n"a\nbc","a\r\nbc"\n'


- On Windows, the value of ``os.linesep`` is ``'\r\n'``,
so if ``line_terminator`` is not set, ``'\r\n'`` is used for line terminator.
- Again, it does not affect the value inside the data.

.. code-block:: ipython
.. code-block:: ipython

In [1]: data = pd.DataFrame({
...: "string_with_lf": ["a\nbc"],
...: "string_with_crlf": ["a\r\nbc"]
...: })
In [1]: data = pd.DataFrame({
...: "string_with_lf": ["a\nbc"],
...: "string_with_crlf": ["a\r\nbc"]
...: })

In [2]: data.to_csv("test.csv", index=False)
In [2]: data.to_csv("test.csv", index=False)

In [3]: with open("test.csv", mode='rb') as f:
...: print(f.read())
b'string_with_lf,string_with_crlf\r\n"a\nbc","a\r\nbc"\r\n'
In [3]: with open("test.csv", mode='rb') as f:
...: print(f.read())
b'string_with_lf,string_with_crlf\r\n"a\nbc","a\r\nbc"\r\n'


- For files objects, specifying ``newline`` is not sufficient to set the line terminator.
You must pass in the ``line_terminator`` explicitly, even in this case.

.. code-block:: ipython
.. code-block:: ipython

In [1]: data = pd.DataFrame({
...: "string_with_lf": ["a\nbc"],
...: "string_with_crlf": ["a\r\nbc"]
...: })
In [1]: data = pd.DataFrame({
...: "string_with_lf": ["a\nbc"],
...: "string_with_crlf": ["a\r\nbc"]
...: })

In [2]: with open("test2.csv", mode='w', newline='\n') as f:
...: data.to_csv(f, index=False)
In [2]: with open("test2.csv", mode='w', newline='\n') as f:
...: data.to_csv(f, index=False)

In [3]: with open("test2.csv", mode='rb') as f:
...: print(f.read())
b'string_with_lf,string_with_crlf\r\n"a\nbc","a\r\nbc"\r\n'
In [3]: with open("test2.csv", mode='rb') as f:
...: print(f.read())
b'string_with_lf,string_with_crlf\r\n"a\nbc","a\r\nbc"\r\n'

.. _whatsnew_0240.api_breaking.interval_values:

Expand Down Expand Up @@ -777,17 +776,20 @@ Previous Behavior:
df = pd.DataFrame(arr)

.. ipython:: python

# Comparison operations and arithmetic operations both broadcast.
df == arr[[0], :]
df + arr[[0], :]

.. ipython:: python

# Comparison operations and arithmetic operations both broadcast.
df == (1, 2)
df + (1, 2)

.. ipython:: python
:okexcept:

# Comparison operations and arithmetic opeartions both raise ValueError.
df == (1, 2, 3)
df + (1, 2, 3)
Expand All @@ -797,8 +799,9 @@ Previous Behavior:

DataFrame Arithmetic Operations Broadcasting Changes
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

:class:`DataFrame` arithmetic operations when operating with 2-dimensional
``np.ndarray`` objects now broadcast in the same way as ``np.ndarray``s
``np.ndarray`` objects now broadcast in the same way as ``np.ndarray``
broadcast. (:issue:`23000`)

Previous Behavior:
Expand All @@ -817,11 +820,13 @@ Previous Behavior:
*Current Behavior*:

.. ipython:: python

arr = np.arange(6).reshape(3, 2)
df = pd.DataFrame(arr)
df

.. ipython:: python

df + arr[[0], :] # 1 row, 2 columns
df + arr[:, [1]] # 1 column, 3 rows

Expand Down Expand Up @@ -888,7 +893,7 @@ Current Behavior:
...
OverflowError: Trying to coerce negative values to unsigned integers

.. _whatsnew_0240.api.crosstab_dtypes
.. _whatsnew_0240.api.crosstab_dtypes:

Crosstab Preserves Dtypes
^^^^^^^^^^^^^^^^^^^^^^^^^
Expand Down Expand Up @@ -1008,6 +1013,7 @@ Current Behavior:

.. ipython:: python
:okwarning:

per = pd.Period('2016Q1')
per + 3

Expand Down
14 changes: 7 additions & 7 deletions pandas/_libs/tslibs/timedeltas.pyx
Original file line number Diff line number Diff line change
Expand Up @@ -1111,14 +1111,14 @@ class Timedelta(_Timedelta):
Parameters
----------
value : Timedelta, timedelta, np.timedelta64, string, or integer
unit : string, {'Y', 'M', 'W', 'D', 'days', 'day',
'hours', hour', 'hr', 'h', 'm', 'minute', 'min', 'minutes',
'T', 'S', 'seconds', 'sec', 'second', 'ms',
'milliseconds', 'millisecond', 'milli', 'millis', 'L',
'us', 'microseconds', 'microsecond', 'micro', 'micros',
'U', 'ns', 'nanoseconds', 'nano', 'nanos', 'nanosecond'
'N'}, optional
unit : str, optional
Denote the unit of the input, if input is an integer. Default 'ns'.
Possible values:
{'Y', 'M', 'W', 'D', 'days', 'day', 'hours', hour', 'hr', 'h',
'm', 'minute', 'min', 'minutes', 'T', 'S', 'seconds', 'sec', 'second',
'ms', 'milliseconds', 'millisecond', 'milli', 'millis', 'L',
'us', 'microseconds', 'microsecond', 'micro', 'micros', 'U',
'ns', 'nanoseconds', 'nano', 'nanos', 'nanosecond', 'N'}
days, seconds, microseconds,
milliseconds, minutes, hours, weeks : numeric, optional
Values for construction in compat with datetime.timedelta.
Expand Down
3 changes: 3 additions & 0 deletions pandas/core/frame.py
Original file line number Diff line number Diff line change
Expand Up @@ -3409,13 +3409,15 @@ def assign(self, **kwargs):
Berkeley 25.0
Where the value is a callable, evaluated on `df`:
>>> df.assign(temp_f=lambda x: x.temp_c * 9 / 5 + 32)
temp_c temp_f
Portland 17.0 62.6
Berkeley 25.0 77.0
Alternatively, the same behavior can be achieved by directly
referencing an existing Series or sequence:
>>> df.assign(temp_f=df['temp_c'] * 9 / 5 + 32)
temp_c temp_f
Portland 17.0 62.6
Expand All @@ -3424,6 +3426,7 @@ def assign(self, **kwargs):
In Python 3.6+, you can create multiple columns within the same assign
where one of the columns depends on another one defined within the same
assign:
>>> df.assign(temp_f=lambda x: x['temp_c'] * 9 / 5 + 32,
... temp_k=lambda x: (x['temp_f'] + 459.67) * 5 / 9)
temp_c temp_f temp_k
Expand Down
14 changes: 7 additions & 7 deletions pandas/core/generic.py
Original file line number Diff line number Diff line change
Expand Up @@ -6508,16 +6508,16 @@ def interpolate(self, method='linear', axis=0, limit=None, inplace=False,

def asof(self, where, subset=None):
"""
Return the last row(s) without any `NaN`s before `where`.
Return the last row(s) without any NaNs before `where`.
The last row (for each element in `where`, if list) without any
`NaN` is taken.
In case of a :class:`~pandas.DataFrame`, the last row without `NaN`
NaN is taken.
In case of a :class:`~pandas.DataFrame`, the last row without NaN
considering only the subset of columns (if not `None`)
.. versionadded:: 0.19.0 For DataFrame
If there is no good value, `NaN` is returned for a Series or
If there is no good value, NaN is returned for a Series or
a Series of NaN values for a DataFrame
Parameters
Expand All @@ -6526,7 +6526,7 @@ def asof(self, where, subset=None):
Date(s) before which the last row(s) are returned.
subset : str or array-like of str, default `None`
For DataFrame, if not `None`, only use these columns to
check for `NaN`s.
check for NaNs.
Notes
-----
Expand Down Expand Up @@ -6562,7 +6562,7 @@ def asof(self, where, subset=None):
2.0
For a sequence `where`, a Series is returned. The first value is
``NaN``, because the first element of `where` is before the first
NaN, because the first element of `where` is before the first
index value.
>>> s.asof([5, 20])
Expand All @@ -6571,7 +6571,7 @@ def asof(self, where, subset=None):
dtype: float64
Missing values are not considered. The following is ``2.0``, not
``NaN``, even though ``NaN`` is at the index location for ``30``.
NaN, even though NaN is at the index location for ``30``.
>>> s.asof(30)
2.0
Expand Down
3 changes: 2 additions & 1 deletion pandas/core/indexes/datetimelike.py
Original file line number Diff line number Diff line change
Expand Up @@ -91,6 +91,8 @@ class TimelikeOps(object):
:ref:`frequency aliases <timeseries.offset_aliases>` for
a list of possible `freq` values.
ambiguous : 'infer', bool-ndarray, 'NaT', default 'raise'
Only relevant for DatetimeIndex:
- 'infer' will attempt to infer fall dst-transition hours based on
order
- bool-ndarray where True signifies a DST time, False designates
Expand All @@ -99,7 +101,6 @@ class TimelikeOps(object):
- 'NaT' will return NaT where there are ambiguous times
- 'raise' will raise an AmbiguousTimeError if there are ambiguous
times
Only relevant for DatetimeIndex
.. versionadded:: 0.24.0
nonexistent : 'shift', 'NaT', default 'raise'
Expand Down
Loading

0 comments on commit a3c1946

Please sign in to comment.