diff --git a/doc/source/advanced.rst b/doc/source/advanced.rst index 24c117a5342093..563c869eff54d6 100644 --- a/doc/source/advanced.rst +++ b/doc/source/advanced.rst @@ -702,7 +702,7 @@ Index Types We have discussed ``MultiIndex`` in the previous sections pretty extensively. Documentation about ``DatetimeIndex`` and ``PeriodIndex`` are shown :ref:`here `, -and documentation about ``TimedeltaIndex`` is found :ref:`here `. +and documentation about ``TimedeltaIndex`` is found :ref:`here `. In the following sub-sections we will highlight some other index types. diff --git a/doc/source/ecosystem.rst b/doc/source/ecosystem.rst index edbd6629a617d5..ad389bbe35b71f 100644 --- a/doc/source/ecosystem.rst +++ b/doc/source/ecosystem.rst @@ -140,7 +140,7 @@ which are utilized by Jupyter Notebook for displaying (Note: HTML tables may or may not be compatible with non-HTML Jupyter output formats.) -See :ref:`Options and Settings ` and :ref:`options.available ` +See :ref:`Options and Settings ` and :ref:`options.available` for pandas ``display.`` settings. `quantopian/qgrid `__ diff --git a/doc/source/timeseries.rst b/doc/source/timeseries.rst index 42fd356bbe65ae..cc377f45c4b8d0 100644 --- a/doc/source/timeseries.rst +++ b/doc/source/timeseries.rst @@ -2372,7 +2372,8 @@ can be controlled by the ``nonexistent`` argument. The following options are ava * ``shift``: Shifts nonexistent times forward to the closest real time .. ipython:: python - dti = date_range(start='2015-03-29 01:30:00', periods=3, freq='H') + + dti = pd.date_range(start='2015-03-29 01:30:00', periods=3, freq='H') # 2:30 is a nonexistent time Localization of nonexistent times will raise an error by default. @@ -2385,6 +2386,7 @@ Localization of nonexistent times will raise an error by default. Transform nonexistent times to ``NaT`` or the closest real time forward in time. .. ipython:: python + dti dti.tz_localize('Europe/Warsaw', nonexistent='shift') dti.tz_localize('Europe/Warsaw', nonexistent='NaT') diff --git a/doc/source/whatsnew/v0.24.0.txt b/doc/source/whatsnew/v0.24.0.txt index 20496c9fb3f31c..3938fba4648b16 100644 --- a/doc/source/whatsnew/v0.24.0.txt +++ b/doc/source/whatsnew/v0.24.0.txt @@ -14,11 +14,10 @@ New features ~~~~~~~~~~~~ - :func:`merge` now directly allows merge between objects of type ``DataFrame`` and named ``Series``, without the need to convert the ``Series`` object into a ``DataFrame`` beforehand (:issue:`21220`) - ``ExcelWriter`` now accepts ``mode`` as a keyword argument, enabling append to existing workbooks when using the ``openpyxl`` engine (:issue:`3441`) -- ``FrozenList`` has gained the ``.union()`` and ``.difference()`` methods. This functionality greatly simplifies groupby's that rely on explicitly excluding certain columns. See :ref:`Splitting an object into groups -` for more information (:issue:`15475`, :issue:`15506`) +- ``FrozenList`` has gained the ``.union()`` and ``.difference()`` methods. This functionality greatly simplifies groupby's that rely on explicitly excluding certain columns. See :ref:`Splitting an object into groups ` for more information (:issue:`15475`, :issue:`15506`). - :func:`DataFrame.to_parquet` now accepts ``index`` as an argument, allowing -the user to override the engine's default behavior to include or omit the -dataframe's indexes from the resulting Parquet file. (:issue:`20768`) + the user to override the engine's default behavior to include or omit the + dataframe's indexes from the resulting Parquet file. (:issue:`20768`) - :meth:`DataFrame.corr` and :meth:`Series.corr` now accept a callable for generic calculation methods of correlation, e.g. histogram intersection (:issue:`22684`) @@ -227,7 +226,7 @@ Other Enhancements - :class:`Series` and :class:`DataFrame` now support :class:`Iterable` in constructor (:issue:`2193`) - :class:`DatetimeIndex` gained :attr:`DatetimeIndex.timetz` attribute. Returns local time with timezone information. (:issue:`21358`) - :meth:`round`, :meth:`ceil`, and meth:`floor` for :class:`DatetimeIndex` and :class:`Timestamp` now support an ``ambiguous`` argument for handling datetimes that are rounded to ambiguous times (:issue:`18946`) -- :meth:`round`, :meth:`ceil`, and meth:`floor` for :class:`DatetimeIndex` and :class:`Timestamp` now support a ``nonexistent`` argument for handling datetimes that are rounded to nonexistent times. See :ref:`timeseries.timezone_nonexsistent` (:issue:`22647`) +- :meth:`round`, :meth:`ceil`, and meth:`floor` for :class:`DatetimeIndex` and :class:`Timestamp` now support a ``nonexistent`` argument for handling datetimes that are rounded to nonexistent times. See :ref:`timeseries.timezone_nonexistent` (:issue:`22647`) - :class:`Resampler` now is iterable like :class:`GroupBy` (:issue:`15314`). - :meth:`Series.resample` and :meth:`DataFrame.resample` have gained the :meth:`Resampler.quantile` (:issue:`15023`). - :meth:`pandas.core.dtypes.is_list_like` has gained a keyword ``allow_sets`` which is ``True`` by default; if ``False``, @@ -237,7 +236,7 @@ Other Enhancements - Compatibility with Matplotlib 3.0 (:issue:`22790`). - Added :meth:`Interval.overlaps`, :meth:`IntervalArray.overlaps`, and :meth:`IntervalIndex.overlaps` for determining overlaps between interval-like objects (:issue:`21998`) - :func:`~DataFrame.to_parquet` now supports writing a ``DataFrame`` as a directory of parquet files partitioned by a subset of the columns when ``engine = 'pyarrow'`` (:issue:`23283`) -- :meth:`Timestamp.tz_localize`, :meth:`DatetimeIndex.tz_localize`, and :meth:`Series.tz_localize` have gained the ``nonexistent`` argument for alternative handling of nonexistent times. See :ref:`timeseries.timezone_nonexsistent` (:issue:`8917`) +- :meth:`Timestamp.tz_localize`, :meth:`DatetimeIndex.tz_localize`, and :meth:`Series.tz_localize` have gained the ``nonexistent`` argument for alternative handling of nonexistent times. See :ref:`timeseries.timezone_nonexistent` (:issue:`8917`) - :meth:`read_excel()` now accepts ``usecols`` as a list of column names or callable (:issue:`18273`) .. _whatsnew_0240.api_breaking: @@ -283,10 +282,10 @@ and replaced it with references to `pyarrow` (:issue:`21639` and :issue:`23053`) .. _whatsnew_0240.api_breaking.csv_line_terminator: `os.linesep` is used for ``line_terminator`` of ``DataFrame.to_csv`` -^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ :func:`DataFrame.to_csv` now uses :func:`os.linesep` rather than ``'\n'`` - for the default line terminator (:issue:`20353`). +for the default line terminator (:issue:`20353`). This change only affects when running on Windows, where ``'\r\n'`` was used for line terminator even when ``'\n'`` was passed in ``line_terminator``. @@ -294,26 +293,26 @@ Previous Behavior on Windows: .. code-block:: ipython -In [1]: data = pd.DataFrame({ - ...: "string_with_lf": ["a\nbc"], - ...: "string_with_crlf": ["a\r\nbc"] - ...: }) + In [1]: data = pd.DataFrame({ + ...: "string_with_lf": ["a\nbc"], + ...: "string_with_crlf": ["a\r\nbc"] + ...: }) -In [2]: # When passing file PATH to to_csv, line_terminator does not work, and csv is saved with '\r\n'. - ...: # Also, this converts all '\n's in the data to '\r\n'. - ...: data.to_csv("test.csv", index=False, line_terminator='\n') + In [2]: # When passing file PATH to to_csv, line_terminator does not work, and csv is saved with '\r\n'. + ...: # Also, this converts all '\n's in the data to '\r\n'. + ...: data.to_csv("test.csv", index=False, line_terminator='\n') -In [3]: with open("test.csv", mode='rb') as f: - ...: print(f.read()) -b'string_with_lf,string_with_crlf\r\n"a\r\nbc","a\r\r\nbc"\r\n' + In [3]: with open("test.csv", mode='rb') as f: + ...: print(f.read()) + b'string_with_lf,string_with_crlf\r\n"a\r\nbc","a\r\r\nbc"\r\n' -In [4]: # When passing file OBJECT with newline option to to_csv, line_terminator works. - ...: with open("test2.csv", mode='w', newline='\n') as f: - ...: data.to_csv(f, index=False, line_terminator='\n') + In [4]: # When passing file OBJECT with newline option to to_csv, line_terminator works. + ...: with open("test2.csv", mode='w', newline='\n') as f: + ...: data.to_csv(f, index=False, line_terminator='\n') -In [5]: with open("test2.csv", mode='rb') as f: - ...: print(f.read()) -b'string_with_lf,string_with_crlf\n"a\nbc","a\r\nbc"\n' + In [5]: with open("test2.csv", mode='rb') as f: + ...: print(f.read()) + b'string_with_lf,string_with_crlf\n"a\nbc","a\r\nbc"\n' New Behavior on Windows: @@ -322,54 +321,54 @@ New Behavior on Windows: - The value of ``line_terminator`` only affects the line terminator of CSV, so it does not change the value inside the data. -.. code-block:: ipython + .. code-block:: ipython -In [1]: data = pd.DataFrame({ - ...: "string_with_lf": ["a\nbc"], - ...: "string_with_crlf": ["a\r\nbc"] - ...: }) + In [1]: data = pd.DataFrame({ + ...: "string_with_lf": ["a\nbc"], + ...: "string_with_crlf": ["a\r\nbc"] + ...: }) -In [2]: data.to_csv("test.csv", index=False, line_terminator='\n') + In [2]: data.to_csv("test.csv", index=False, line_terminator='\n') -In [3]: with open("test.csv", mode='rb') as f: - ...: print(f.read()) -b'string_with_lf,string_with_crlf\n"a\nbc","a\r\nbc"\n' + In [3]: with open("test.csv", mode='rb') as f: + ...: print(f.read()) + b'string_with_lf,string_with_crlf\n"a\nbc","a\r\nbc"\n' - On Windows, the value of ``os.linesep`` is ``'\r\n'``, so if ``line_terminator`` is not set, ``'\r\n'`` is used for line terminator. - Again, it does not affect the value inside the data. -.. code-block:: ipython + .. code-block:: ipython -In [1]: data = pd.DataFrame({ - ...: "string_with_lf": ["a\nbc"], - ...: "string_with_crlf": ["a\r\nbc"] - ...: }) + In [1]: data = pd.DataFrame({ + ...: "string_with_lf": ["a\nbc"], + ...: "string_with_crlf": ["a\r\nbc"] + ...: }) -In [2]: data.to_csv("test.csv", index=False) + In [2]: data.to_csv("test.csv", index=False) -In [3]: with open("test.csv", mode='rb') as f: - ...: print(f.read()) -b'string_with_lf,string_with_crlf\r\n"a\nbc","a\r\nbc"\r\n' + In [3]: with open("test.csv", mode='rb') as f: + ...: print(f.read()) + b'string_with_lf,string_with_crlf\r\n"a\nbc","a\r\nbc"\r\n' - For files objects, specifying ``newline`` is not sufficient to set the line terminator. You must pass in the ``line_terminator`` explicitly, even in this case. -.. code-block:: ipython + .. code-block:: ipython -In [1]: data = pd.DataFrame({ - ...: "string_with_lf": ["a\nbc"], - ...: "string_with_crlf": ["a\r\nbc"] - ...: }) + In [1]: data = pd.DataFrame({ + ...: "string_with_lf": ["a\nbc"], + ...: "string_with_crlf": ["a\r\nbc"] + ...: }) -In [2]: with open("test2.csv", mode='w', newline='\n') as f: - ...: data.to_csv(f, index=False) + In [2]: with open("test2.csv", mode='w', newline='\n') as f: + ...: data.to_csv(f, index=False) -In [3]: with open("test2.csv", mode='rb') as f: - ...: print(f.read()) -b'string_with_lf,string_with_crlf\r\n"a\nbc","a\r\nbc"\r\n' + In [3]: with open("test2.csv", mode='rb') as f: + ...: print(f.read()) + b'string_with_lf,string_with_crlf\r\n"a\nbc","a\r\nbc"\r\n' .. _whatsnew_0240.api_breaking.interval_values: @@ -777,17 +776,20 @@ Previous Behavior: df = pd.DataFrame(arr) .. ipython:: python + # Comparison operations and arithmetic operations both broadcast. df == arr[[0], :] df + arr[[0], :] .. ipython:: python + # Comparison operations and arithmetic operations both broadcast. df == (1, 2) df + (1, 2) .. ipython:: python :okexcept: + # Comparison operations and arithmetic opeartions both raise ValueError. df == (1, 2, 3) df + (1, 2, 3) @@ -797,8 +799,9 @@ Previous Behavior: DataFrame Arithmetic Operations Broadcasting Changes ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ + :class:`DataFrame` arithmetic operations when operating with 2-dimensional -``np.ndarray`` objects now broadcast in the same way as ``np.ndarray``s +``np.ndarray`` objects now broadcast in the same way as ``np.ndarray`` broadcast. (:issue:`23000`) Previous Behavior: @@ -817,11 +820,13 @@ Previous Behavior: *Current Behavior*: .. ipython:: python + arr = np.arange(6).reshape(3, 2) df = pd.DataFrame(arr) df .. ipython:: python + df + arr[[0], :] # 1 row, 2 columns df + arr[:, [1]] # 1 column, 3 rows @@ -888,7 +893,7 @@ Current Behavior: ... OverflowError: Trying to coerce negative values to unsigned integers -.. _whatsnew_0240.api.crosstab_dtypes +.. _whatsnew_0240.api.crosstab_dtypes: Crosstab Preserves Dtypes ^^^^^^^^^^^^^^^^^^^^^^^^^ @@ -1008,6 +1013,7 @@ Current Behavior: .. ipython:: python :okwarning: + per = pd.Period('2016Q1') per + 3 diff --git a/pandas/_libs/tslibs/timedeltas.pyx b/pandas/_libs/tslibs/timedeltas.pyx index c09a8e5b395ee1..ca8491726a5f73 100644 --- a/pandas/_libs/tslibs/timedeltas.pyx +++ b/pandas/_libs/tslibs/timedeltas.pyx @@ -1111,14 +1111,14 @@ class Timedelta(_Timedelta): Parameters ---------- value : Timedelta, timedelta, np.timedelta64, string, or integer - unit : string, {'Y', 'M', 'W', 'D', 'days', 'day', - 'hours', hour', 'hr', 'h', 'm', 'minute', 'min', 'minutes', - 'T', 'S', 'seconds', 'sec', 'second', 'ms', - 'milliseconds', 'millisecond', 'milli', 'millis', 'L', - 'us', 'microseconds', 'microsecond', 'micro', 'micros', - 'U', 'ns', 'nanoseconds', 'nano', 'nanos', 'nanosecond' - 'N'}, optional + unit : str, optional Denote the unit of the input, if input is an integer. Default 'ns'. + Possible values: + {'Y', 'M', 'W', 'D', 'days', 'day', 'hours', hour', 'hr', 'h', + 'm', 'minute', 'min', 'minutes', 'T', 'S', 'seconds', 'sec', 'second', + 'ms', 'milliseconds', 'millisecond', 'milli', 'millis', 'L', + 'us', 'microseconds', 'microsecond', 'micro', 'micros', 'U', + 'ns', 'nanoseconds', 'nano', 'nanos', 'nanosecond', 'N'} days, seconds, microseconds, milliseconds, minutes, hours, weeks : numeric, optional Values for construction in compat with datetime.timedelta. diff --git a/pandas/core/frame.py b/pandas/core/frame.py index f8d153327f1358..f6f91ff5081f13 100644 --- a/pandas/core/frame.py +++ b/pandas/core/frame.py @@ -3409,6 +3409,7 @@ def assign(self, **kwargs): Berkeley 25.0 Where the value is a callable, evaluated on `df`: + >>> df.assign(temp_f=lambda x: x.temp_c * 9 / 5 + 32) temp_c temp_f Portland 17.0 62.6 @@ -3416,6 +3417,7 @@ def assign(self, **kwargs): Alternatively, the same behavior can be achieved by directly referencing an existing Series or sequence: + >>> df.assign(temp_f=df['temp_c'] * 9 / 5 + 32) temp_c temp_f Portland 17.0 62.6 @@ -3424,6 +3426,7 @@ def assign(self, **kwargs): In Python 3.6+, you can create multiple columns within the same assign where one of the columns depends on another one defined within the same assign: + >>> df.assign(temp_f=lambda x: x['temp_c'] * 9 / 5 + 32, ... temp_k=lambda x: (x['temp_f'] + 459.67) * 5 / 9) temp_c temp_f temp_k diff --git a/pandas/core/generic.py b/pandas/core/generic.py index 2c7f6ae8e35330..2cb3df421325e0 100644 --- a/pandas/core/generic.py +++ b/pandas/core/generic.py @@ -6508,16 +6508,16 @@ def interpolate(self, method='linear', axis=0, limit=None, inplace=False, def asof(self, where, subset=None): """ - Return the last row(s) without any `NaN`s before `where`. + Return the last row(s) without any NaNs before `where`. The last row (for each element in `where`, if list) without any - `NaN` is taken. - In case of a :class:`~pandas.DataFrame`, the last row without `NaN` + NaN is taken. + In case of a :class:`~pandas.DataFrame`, the last row without NaN considering only the subset of columns (if not `None`) .. versionadded:: 0.19.0 For DataFrame - If there is no good value, `NaN` is returned for a Series or + If there is no good value, NaN is returned for a Series or a Series of NaN values for a DataFrame Parameters @@ -6526,7 +6526,7 @@ def asof(self, where, subset=None): Date(s) before which the last row(s) are returned. subset : str or array-like of str, default `None` For DataFrame, if not `None`, only use these columns to - check for `NaN`s. + check for NaNs. Notes ----- @@ -6562,7 +6562,7 @@ def asof(self, where, subset=None): 2.0 For a sequence `where`, a Series is returned. The first value is - ``NaN``, because the first element of `where` is before the first + NaN, because the first element of `where` is before the first index value. >>> s.asof([5, 20]) @@ -6571,7 +6571,7 @@ def asof(self, where, subset=None): dtype: float64 Missing values are not considered. The following is ``2.0``, not - ``NaN``, even though ``NaN`` is at the index location for ``30``. + NaN, even though NaN is at the index location for ``30``. >>> s.asof(30) 2.0 diff --git a/pandas/core/indexes/datetimelike.py b/pandas/core/indexes/datetimelike.py index 4547f47314bada..2ce6a0ec2a7a46 100644 --- a/pandas/core/indexes/datetimelike.py +++ b/pandas/core/indexes/datetimelike.py @@ -91,6 +91,8 @@ class TimelikeOps(object): :ref:`frequency aliases ` for a list of possible `freq` values. ambiguous : 'infer', bool-ndarray, 'NaT', default 'raise' + Only relevant for DatetimeIndex: + - 'infer' will attempt to infer fall dst-transition hours based on order - bool-ndarray where True signifies a DST time, False designates @@ -99,7 +101,6 @@ class TimelikeOps(object): - 'NaT' will return NaT where there are ambiguous times - 'raise' will raise an AmbiguousTimeError if there are ambiguous times - Only relevant for DatetimeIndex .. versionadded:: 0.24.0 nonexistent : 'shift', 'NaT', default 'raise' diff --git a/pandas/core/tools/timedeltas.py b/pandas/core/tools/timedeltas.py index fad136b3b5a458..58673c80c0c552 100644 --- a/pandas/core/tools/timedeltas.py +++ b/pandas/core/tools/timedeltas.py @@ -21,14 +21,14 @@ def to_timedelta(arg, unit='ns', box=True, errors='raise'): Parameters ---------- arg : string, timedelta, list, tuple, 1-d array, or Series - unit : string, {'Y', 'M', 'W', 'D', 'days', 'day', - 'hours', hour', 'hr', 'h', 'm', 'minute', 'min', 'minutes', - 'T', 'S', 'seconds', 'sec', 'second', 'ms', - 'milliseconds', 'millisecond', 'milli', 'millis', 'L', - 'us', 'microseconds', 'microsecond', 'micro', 'micros', - 'U', 'ns', 'nanoseconds', 'nano', 'nanos', 'nanosecond' - 'N'}, optional + unit : str, optional Denote the unit of the input, if input is an integer. Default 'ns'. + Possible values: + {'Y', 'M', 'W', 'D', 'days', 'day', 'hours', hour', 'hr', 'h', + 'm', 'minute', 'min', 'minutes', 'T', 'S', 'seconds', 'sec', 'second', + 'ms', 'milliseconds', 'millisecond', 'milli', 'millis', 'L', + 'us', 'microseconds', 'microsecond', 'micro', 'micros', 'U', + 'ns', 'nanoseconds', 'nano', 'nanos', 'nanosecond', 'N'} box : boolean, default True - If True returns a Timedelta/TimedeltaIndex of the results - if False returns a np.timedelta64 or ndarray of values of dtype