Skip to content

Commit

Permalink
Merge remote-tracking branch 'upstream/master' into timestamp_tz_cons…
Browse files Browse the repository at this point in the history
…tructor_depr
  • Loading branch information
Matt Roeschke committed Nov 15, 2018
2 parents 656beff + e98032d commit 96da473
Show file tree
Hide file tree
Showing 32 changed files with 1,227 additions and 727 deletions.
19 changes: 6 additions & 13 deletions doc/source/contributing.rst
Original file line number Diff line number Diff line change
Expand Up @@ -591,21 +591,14 @@ run this slightly modified command::

git diff master --name-only -- "*.py" | grep "pandas/" | xargs flake8

Note that on Windows, these commands are unfortunately not possible because
commands like ``grep`` and ``xargs`` are not available natively. To imitate the
behavior with the commands above, you should run::
Windows does not support the ``grep`` and ``xargs`` commands (unless installed
for example via the `MinGW <http://www.mingw.org/>`__ toolchain), but one can
imitate the behaviour as follows::

git diff master --name-only -- "*.py"
for /f %i in ('git diff upstream/master --name-only ^| findstr pandas/') do flake8 %i

This will list all of the Python files that have been modified. The only ones
that matter during linting are any whose directory filepath begins with "pandas."
For each filepath, copy and paste it after the ``flake8`` command as shown below:

flake8 <python-filepath>

Alternatively, you can install the ``grep`` and ``xargs`` commands via the
`MinGW <http://www.mingw.org/>`__ toolchain, and it will allow you to run the
commands above.
This will also get all the files being changed by the PR (and within the
``pandas/`` folder), and run ``flake8`` on them one after the other.

.. _contributing.import-formatting:

Expand Down
51 changes: 49 additions & 2 deletions doc/source/whatsnew/v0.24.0.rst
Original file line number Diff line number Diff line change
Expand Up @@ -24,7 +24,8 @@ New features
the user to override the engine's default behavior to include or omit the
dataframe's indexes from the resulting Parquet file. (:issue:`20768`)
- :meth:`DataFrame.corr` and :meth:`Series.corr` now accept a callable for generic calculation methods of correlation, e.g. histogram intersection (:issue:`22684`)

- :func:`DataFrame.to_string` now accepts ``decimal`` as an argument, allowing
the user to specify which decimal separator should be used in the output. (:issue:`23614`)

.. _whatsnew_0240.enhancements.extension_array_operators:

Expand Down Expand Up @@ -183,6 +184,47 @@ array, but rather an ``ExtensionArray``:
This is the same behavior as ``Series.values`` for categorical data. See
:ref:`whatsnew_0240.api_breaking.interval_values` for more.

.. _whatsnew_0240.enhancements.join_with_two_multiindexes:

Joining with two multi-indexes
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

:func:`Datafame.merge` and :func:`Dataframe.join` can now be used to join multi-indexed ``Dataframe`` instances on the overlaping index levels (:issue:`6360`)

See the :ref:`Merge, join, and concatenate
<merging.Join_with_two_multi_indexes>` documentation section.

.. ipython:: python
index_left = pd.MultiIndex.from_tuples([('K0', 'X0'), ('K0', 'X1'),
('K1', 'X2')],
names=['key', 'X'])
left = pd.DataFrame({'A': ['A0', 'A1', 'A2'],
'B': ['B0', 'B1', 'B2']},
index=index_left)
index_right = pd.MultiIndex.from_tuples([('K0', 'Y0'), ('K1', 'Y1'),
('K2', 'Y2'), ('K2', 'Y3')],
names=['key', 'Y'])
right = pd.DataFrame({'C': ['C0', 'C1', 'C2', 'C3'],
'D': ['D0', 'D1', 'D2', 'D3']},
index=index_right)
left.join(right)
For earlier versions this can be done using the following.

.. ipython:: python
pd.merge(left.reset_index(), right.reset_index(),
on=['key'], how='inner').set_index(['key', 'X', 'Y'])
.. _whatsnew_0240.enhancements.rename_axis:

Renaming names in a MultiIndex
Expand Down Expand Up @@ -961,6 +1003,7 @@ Other API Changes
- :class:`DateOffset` attribute `_cacheable` and method `_should_cache` have been removed (:issue:`23118`)
- Comparing :class:`Timedelta` to be less or greater than unknown types now raises a ``TypeError`` instead of returning ``False`` (:issue:`20829`)
- :meth:`Index.hasnans` and :meth:`Series.hasnans` now always return a python boolean. Previously, a python or a numpy boolean could be returned, depending on circumstances (:issue:`23294`).
- The order of the arguments of :func:`DataFrame.to_html` and :func:`DataFrame.to_string` is rearranged to be consistent with each other. (:issue:`23614`)

.. _whatsnew_0240.deprecations:

Expand All @@ -981,6 +1024,7 @@ Deprecations
- The ``fastpath`` keyword of the different Index constructors is deprecated (:issue:`23110`).
- :meth:`Timestamp.tz_localize`, :meth:`DatetimeIndex.tz_localize`, and :meth:`Series.tz_localize` have deprecated the ``errors`` argument in favor of the ``nonexistent`` argument (:issue:`8917`)
- The class ``FrozenNDArray`` has been deprecated. When unpickling, ``FrozenNDArray`` will be unpickled to ``np.ndarray`` once this class is removed (:issue:`9031`)
- The methods :meth:`DataFrame.update` and :meth:`Panel.update` have deprecated the ``raise_conflict=False|True`` keyword in favor of ``errors='ignore'|'raise'`` (:issue:`23585`)
- Deprecated the `nthreads` keyword of :func:`pandas.read_feather` in favor of
`use_threads` to reflect the changes in pyarrow 0.11.0. (:issue:`23053`)
- :func:`pandas.read_excel` has deprecated accepting ``usecols`` as an integer. Please pass in a list of ints from 0 to ``usecols`` inclusive instead (:issue:`23527`)
Expand Down Expand Up @@ -1320,7 +1364,9 @@ Notice how we now instead output ``np.nan`` itself instead of a stringified form
- :func:`read_sas()` will correctly parse sas7bdat files with many columns (:issue:`22628`)
- :func:`read_sas()` will correctly parse sas7bdat files with data page types having also bit 7 set (so page type is 128 + 256 = 384) (:issue:`16615`)
- Bug in :meth:`detect_client_encoding` where potential ``IOError`` goes unhandled when importing in a mod_wsgi process due to restricted access to stdout. (:issue:`21552`)
- Bug in :func:`to_string()` that broke column alignment when ``index=False`` and width of first column's values is greater than the width of first column's header (:issue:`16839`, :issue:`13032`)
- Bug in :func:`to_html()` with ``index=False`` misses truncation indicators (...) on truncated DataFrame (:issue:`15019`, :issue:`22783`)
- Bug in :func:`DataFrame.to_string()` that broke column alignment when ``index=False`` and width of first column's values is greater than the width of first column's header (:issue:`16839`, :issue:`13032`)
- Bug in :func:`DataFrame.to_string()` that caused representations of :class:`DataFrame` to not take up the whole window (:issue:`22984`)
- Bug in :func:`DataFrame.to_csv` where a single level MultiIndex incorrectly wrote a tuple. Now just the value of the index is written (:issue:`19589`).
- Bug in :meth:`HDFStore.append` when appending a :class:`DataFrame` with an empty string column and ``min_itemsize`` < 8 (:issue:`12242`)
- Bug in :meth:`read_csv()` in which :class:`MultiIndex` index names were being improperly handled in the cases when they were not provided (:issue:`23484`)
Expand Down Expand Up @@ -1373,6 +1419,7 @@ Reshaping
- Bug in :func:`pandas.concat` when concatenating a multicolumn DataFrame with tz-aware data against a DataFrame with a different number of columns (:issue:`22796`)
- Bug in :func:`merge_asof` where confusing error message raised when attempting to merge with missing values (:issue:`23189`)
- Bug in :meth:`DataFrame.nsmallest` and :meth:`DataFrame.nlargest` for dataframes that have a :class:`MultiIndex` for columns (:issue:`23033`).
- Bug in :meth:`DataFrame.append` with a :class:`Series` with a dateutil timezone would raise a ``TypeError`` (:issue:`23682`)

.. _whatsnew_0240.bug_fixes.sparse:

Expand Down
31 changes: 20 additions & 11 deletions doc/sphinxext/contributors.py
Original file line number Diff line number Diff line change
Expand Up @@ -10,6 +10,7 @@
"""
from docutils import nodes
from docutils.parsers.rst import Directive
import git

from announce import build_components

Expand All @@ -19,17 +20,25 @@ class ContributorsDirective(Directive):
name = 'contributors'

def run(self):
components = build_components(self.arguments[0])

message = nodes.paragraph()
message += nodes.Text(components['author_message'])

listnode = nodes.bullet_list()

for author in components['authors']:
para = nodes.paragraph()
para += nodes.Text(author)
listnode += nodes.list_item('', para)
range_ = self.arguments[0]
try:
components = build_components(range_)
except git.GitCommandError:
return [
self.state.document.reporter.warning(
"Cannot find contributors for range '{}'".format(range_),
line=self.lineno)
]
else:
message = nodes.paragraph()
message += nodes.Text(components['author_message'])

listnode = nodes.bullet_list()

for author in components['authors']:
para = nodes.paragraph()
para += nodes.Text(author)
listnode += nodes.list_item('', para)

return [message, listnode]

Expand Down
23 changes: 12 additions & 11 deletions pandas/_libs/lib.pyx
Original file line number Diff line number Diff line change
Expand Up @@ -48,8 +48,7 @@ cdef extern from "src/parse_helper.h":
int floatify(object, float64_t *result, int *maybe_int) except -1

cimport util
from util cimport (is_nan,
UINT8_MAX, UINT64_MAX, INT64_MAX, INT64_MIN)
from util cimport is_nan, UINT64_MAX, INT64_MAX, INT64_MIN

from tslib import array_to_datetime
from tslibs.nattype cimport NPY_NAT
Expand Down Expand Up @@ -1642,20 +1641,22 @@ def is_datetime_with_singletz_array(values: ndarray) -> bool:

if n == 0:
return False

# Get a reference timezone to compare with the rest of the tzs in the array
for i in range(n):
base_val = values[i]
if base_val is not NaT:
base_tz = get_timezone(getattr(base_val, 'tzinfo', None))

for j in range(i, n):
val = values[j]
if val is not NaT:
tz = getattr(val, 'tzinfo', None)
if not tz_compare(base_tz, tz):
return False
break

for j in range(i, n):
# Compare val's timezone with the reference timezone
# NaT can coexist with tz-aware datetimes, so skip if encountered
val = values[j]
if val is not NaT:
tz = getattr(val, 'tzinfo', None)
if not tz_compare(base_tz, tz):
return False

return True


Expand Down Expand Up @@ -2045,7 +2046,7 @@ def maybe_convert_objects(ndarray[object] objects, bint try_float=0,

# we try to coerce datetime w/tz but must all have the same tz
if seen.datetimetz_:
if len({getattr(val, 'tzinfo', None) for val in objects}) == 1:
if is_datetime_with_singletz_array(objects):
from pandas import DatetimeIndex
return DatetimeIndex(objects)
seen.object_ = 1
Expand Down
1 change: 0 additions & 1 deletion pandas/core/arrays/categorical.py
Original file line number Diff line number Diff line change
Expand Up @@ -2435,7 +2435,6 @@ class CategoricalAccessor(PandasDelegate, PandasObject, NoNewAttributesMixin):
>>> s.cat.set_categories(list('abcde'))
>>> s.cat.as_ordered()
>>> s.cat.as_unordered()
"""

def __init__(self, data):
Expand Down
1 change: 0 additions & 1 deletion pandas/core/arrays/datetimes.py
Original file line number Diff line number Diff line change
Expand Up @@ -764,7 +764,6 @@ def tz_localize(self, tz, ambiguous='raise', nonexistent='raise',
1 2018-10-28 02:36:00+02:00
2 2018-10-28 03:46:00+01:00
dtype: datetime64[ns, CET]
"""
if errors is not None:
warnings.warn("The errors argument is deprecated and will be "
Expand Down
68 changes: 37 additions & 31 deletions pandas/core/frame.py
Original file line number Diff line number Diff line change
Expand Up @@ -2035,24 +2035,21 @@ def to_parquet(self, fname, engine='auto', compression='snappy',
def to_string(self, buf=None, columns=None, col_space=None, header=True,
index=True, na_rep='NaN', formatters=None, float_format=None,
sparsify=None, index_names=True, justify=None,
line_width=None, max_rows=None, max_cols=None,
show_dimensions=False):
max_rows=None, max_cols=None, show_dimensions=False,
decimal='.', line_width=None):
"""
Render a DataFrame to a console-friendly tabular output.
%(shared_params)s
line_width : int, optional
Width to wrap a line in characters.
%(returns)s
See Also
--------
to_html : Convert DataFrame to HTML.
Examples
--------
>>> d = {'col1' : [1, 2, 3], 'col2' : [4, 5, 6]}
>>> d = {'col1': [1, 2, 3], 'col2': [4, 5, 6]}
>>> df = pd.DataFrame(d)
>>> print(df.to_string())
col1 col2
Expand All @@ -2068,42 +2065,37 @@ def to_string(self, buf=None, columns=None, col_space=None, header=True,
sparsify=sparsify, justify=justify,
index_names=index_names,
header=header, index=index,
line_width=line_width,
max_rows=max_rows,
max_cols=max_cols,
show_dimensions=show_dimensions)
show_dimensions=show_dimensions,
decimal=decimal,
line_width=line_width)
formatter.to_string()

if buf is None:
result = formatter.buf.getvalue()
return result

@Substitution(header='whether to print column labels, default True')
@Substitution(header='Whether to print column labels, default True')
@Substitution(shared_params=fmt.common_docstring,
returns=fmt.return_docstring)
def to_html(self, buf=None, columns=None, col_space=None, header=True,
index=True, na_rep='NaN', formatters=None, float_format=None,
sparsify=None, index_names=True, justify=None, bold_rows=True,
classes=None, escape=True, max_rows=None, max_cols=None,
show_dimensions=False, notebook=False, decimal='.',
border=None, table_id=None):
sparsify=None, index_names=True, justify=None, max_rows=None,
max_cols=None, show_dimensions=False, decimal='.',
bold_rows=True, classes=None, escape=True,
notebook=False, border=None, table_id=None):
"""
Render a DataFrame as an HTML table.
%(shared_params)s
bold_rows : boolean, default True
Make the row labels bold in the output
bold_rows : bool, default True
Make the row labels bold in the output.
classes : str or list or tuple, default None
CSS class(es) to apply to the resulting html table
escape : boolean, default True
CSS class(es) to apply to the resulting html table.
escape : bool, default True
Convert the characters <, >, and & to HTML-safe sequences.
notebook : {True, False}, default False
Whether the generated HTML is for IPython Notebook.
decimal : string, default '.'
Character recognized as decimal separator, e.g. ',' in Europe
.. versionadded:: 0.18.0
border : int
A ``border=border`` attribute is included in the opening
`<table>` tag. Default ``pd.options.html.border``.
Expand All @@ -2114,9 +2106,7 @@ def to_html(self, buf=None, columns=None, col_space=None, header=True,
A css id is included in the opening `<table>` tag if specified.
.. versionadded:: 0.23.0
%(returns)s
See Also
--------
to_string : Convert DataFrame to a string.
Expand Down Expand Up @@ -5213,8 +5203,10 @@ def combiner(x, y):

return self.combine(other, combiner, overwrite=False)

@deprecate_kwarg(old_arg_name='raise_conflict', new_arg_name='errors',
mapping={False: 'ignore', True: 'raise'})
def update(self, other, join='left', overwrite=True, filter_func=None,
raise_conflict=False):
errors='ignore'):
"""
Modify in place using non-NA values from another DataFrame.
Expand All @@ -5238,17 +5230,28 @@ def update(self, other, join='left', overwrite=True, filter_func=None,
* False: only update values that are NA in
the original DataFrame.
filter_func : callable(1d-array) -> boolean 1d-array, optional
filter_func : callable(1d-array) -> bool 1d-array, optional
Can choose to replace values other than NA. Return True for values
that should be updated.
raise_conflict : bool, default False
If True, will raise a ValueError if the DataFrame and `other`
errors : {'raise', 'ignore'}, default 'ignore'
If 'raise', will raise a ValueError if the DataFrame and `other`
both contain non-NA data in the same place.
.. versionchanged :: 0.24.0
Changed from `raise_conflict=False|True`
to `errors='ignore'|'raise'`.
Returns
-------
None : method directly changes calling object
Raises
------
ValueError
When `raise_conflict` is True and there's overlapping non-NA data.
* When `errors='raise'` and there's overlapping non-NA data.
* When `errors` is not either `'ignore'` or `'raise'`
NotImplementedError
* If `join != 'left'`
See Also
--------
Expand Down Expand Up @@ -5319,6 +5322,9 @@ def update(self, other, join='left', overwrite=True, filter_func=None,
# TODO: Support other joins
if join != 'left': # pragma: no cover
raise NotImplementedError("Only left join is supported")
if errors not in ['ignore', 'raise']:
raise ValueError("The parameter errors must be either "
"'ignore' or 'raise'")

if not isinstance(other, DataFrame):
other = DataFrame(other)
Expand All @@ -5332,7 +5338,7 @@ def update(self, other, join='left', overwrite=True, filter_func=None,
with np.errstate(all='ignore'):
mask = ~filter_func(this) | isna(that)
else:
if raise_conflict:
if errors == 'raise':
mask_this = notna(that)
mask_that = notna(this)
if any(mask_this & mask_that):
Expand Down
Loading

0 comments on commit 96da473

Please sign in to comment.