ExtensionBlock.take_nd crashes in 1.1.0 #35768

vmarkovtsev · 2020-08-17T11:57:26Z

Line 1718 in 934e9f8

new_values = self.values.take(indexer, fill_value=fill_value, allow_fill=True)

I've got self.values of type pd.Series with pd.Timestamp-s, and that does not have fill_value and allow_fill, so the kwargs check fails.

athenian/api/controllers/miners/github/branches.py:34: in extract_branches
    for repo, repo_branches in branches.groupby(Branch.repository_full_name.key, sort=False):
/usr/local/lib/python3.8/dist-packages/pandas/core/groupby/ops.py:133: in get_iterator
    for key, (i, group) in zip(keys, splitter):
/usr/local/lib/python3.8/dist-packages/pandas/core/groupby/ops.py:935: in __iter__
    sdata = self._get_sorted_data()
/usr/local/lib/python3.8/dist-packages/pandas/core/groupby/ops.py:948: in _get_sorted_data
    return self.data.take(self.sort_idx, axis=self.axis)
/usr/local/lib/python3.8/dist-packages/pandas/core/generic.py:3341: in take
    new_data = self._mgr.take(
/usr/local/lib/python3.8/dist-packages/pandas/core/internals/managers.py:1414: in take
    return self.reindex_indexer(
/usr/local/lib/python3.8/dist-packages/pandas/core/internals/managers.py:1251: in reindex_indexer
    new_blocks = [
/usr/local/lib/python3.8/dist-packages/pandas/core/internals/managers.py:1252: in <listcomp>
    blk.take_nd(
/usr/local/lib/python3.8/dist-packages/pandas/core/internals/blocks.py:1720: in take_nd
    new_values = self.values.take(indexer, fill_value=fill_value, allow_fill=True)
/usr/local/lib/python3.8/dist-packages/pandas/core/series.py:829: in take
    nv.validate_take(tuple(), kwargs)
/usr/local/lib/python3.8/dist-packages/pandas/compat/numpy/function.py:68: in __call__
    validate_kwargs(fname, kwargs, self.defaults)
/usr/local/lib/python3.8/dist-packages/pandas/util/_validators.py:148: in validate_kwargs
    _check_for_invalid_keys(fname, kwargs, compat_args)
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 

fname = 'take', kwargs = {'allow_fill': True, 'fill_value': numpy.datetime64('NaT')}, compat_args = OrderedDict([('out', None), ('mode', 'raise')])

    def _check_for_invalid_keys(fname, kwargs, compat_args):
        """
        Checks whether 'kwargs' contains any keys that are not
        in 'compat_args' and raises a TypeError if there is one.
        """
        # set(dict) --> set of the dictionary's keys
        diff = set(kwargs) - set(compat_args)
    
        if diff:
            bad_arg = list(diff)[0]
>           raise TypeError(f"{fname}() got an unexpected keyword argument '{bad_arg}'")
E           TypeError: take() got an unexpected keyword argument 'allow_fill'

/usr/local/lib/python3.8/dist-packages/pandas/util/_validators.py:122: TypeError

(Pdb++) self.values
0   2019-11-01 09:08:16+00:00
1   2017-01-30 18:04:00+00:00
2   2016-12-05 10:59:00+00:00
3   2019-05-16 11:16:00+00:00
Name: commit_date, dtype: datetime64[ns, UTC]

It appeared due to

-> fc.replace(0, pd.NaT, inplace=True)
[50]   /usr/local/lib/python3.8/dist-packages/pandas/core/series.py(4563)replace()
-> return super().replace(
[51]   /usr/local/lib/python3.8/dist-packages/pandas/core/generic.py(6583)replace()
-> return self._update_inplace(result)
[52]   /usr/local/lib/python3.8/dist-packages/pandas/core/generic.py(3955)_update_inplace()
-> self._maybe_update_cacher(verify_is_copy=verify_is_copy)
[53]   /usr/local/lib/python3.8/dist-packages/pandas/core/generic.py(3235)_maybe_update_cacher()
-> ref._maybe_cache_changed(cacher[0], self)
[54]   /usr/local/lib/python3.8/dist-packages/pandas/core/generic.py(3196)_maybe_cache_changed()
-> self._mgr.iset(loc, value)
[55]   /usr/local/lib/python3.8/dist-packages/pandas/core/internals/managers.py(1066)iset()
-> blk.set(blk_locs, value_getitem(val_locs))
[56] > /usr/local/lib/python3.8/dist-packages/pandas/core/internals/blocks.py(1593)set()
-> self.values = values

The text was updated successfully, but these errors were encountered:

jreback · 2020-08-17T12:13:01Z

pls show a user facing example
this is an internal function

vmarkovtsev · 2020-08-17T14:08:41Z

Sure.

Grab branches.pickle.gz, then run:

import pickle
import pandas as pd

with open("branches.pickle", "rb") as fin:
    branches, dt_cols = pickle.load(fin)

for col in dt_cols:
    fc = branches[col]
    if 0 in fc:
        fc.replace(0, pd.NaT, inplace=True)

for repo, repo_branches in branches.groupby("repository_full_name", sort=False):
    print(repo, repo_branches)

jbrockmendel · 2020-08-17T16:01:35Z

Grab branches.pickle.gz,

@vmarkovtsev can you give an example that we can just copy/paste?

vmarkovtsev · 2020-08-17T16:20:27Z

I can inline that 2KB pickle file as a bytes literal @jbrockmendel. It comes directly from a database. If you are afraid of loading foreign pickles and are not familiar with docker/VMs, I can dump it as CSV.

jbrockmendel · 2020-08-17T16:31:11Z

Pickle safety is a concern, but mainly its the fact that we have 3500 issues to deal with, so making your issue simple to reproduce increases the odds of it getting looked at in a timely manner. https://matthewrocklin.com/blog/work/2018/02/28/minimal-bug-reports

vmarkovtsev · 2020-08-17T17:09:53Z

Pfff, I have a simple workaround, so whatever.

bast0006 · 2021-01-11T13:21:40Z

I've managed to reproduce this. This is related to pull request #37023, and issues #36953 and #35509. The issue was fixed in release 1.1.2, and is present in 1.1.1.

import pandas as pd
import datetime
from io import StringIO

csv = """a,b
a,2021-01-01 08:00:00+00:00
a,2021-01-01 08:00:00+00:00
a,2021-01-01 08:00:00+00:00
a,2021-01-01 08:00:00+00:00
a,2021-01-01 08:00:00+00:00"""

df = pd.read_csv(StringIO(csv))

df['b'] = pd.to_datetime(df['b'])

df['b'].replace(0, pd.NaT, inplace=True)

print(*df.groupby("a"))
print(df, df.info(verbose=True))

This code reproduces the above exception. If the groupby() call is commented out, another exception is raised in .info():
AttributeError: 'Series' object has no attribute 'reshape'

The timezone piece of the datetime seems required to trigger the bug, strangely enough.

simonjayhawkins added the Needs Info Clarification about behavior needed to assess issue label Aug 17, 2020

vmarkovtsev closed this as completed Aug 17, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ExtensionBlock.take_nd crashes in 1.1.0 #35768

ExtensionBlock.take_nd crashes in 1.1.0 #35768

vmarkovtsev commented Aug 17, 2020 •

edited

Loading

jreback commented Aug 17, 2020

vmarkovtsev commented Aug 17, 2020

jbrockmendel commented Aug 17, 2020

vmarkovtsev commented Aug 17, 2020

jbrockmendel commented Aug 17, 2020

vmarkovtsev commented Aug 17, 2020

bast0006 commented Jan 11, 2021 •

edited

Loading

ExtensionBlock.take_nd crashes in 1.1.0 #35768

ExtensionBlock.take_nd crashes in 1.1.0 #35768

Comments

vmarkovtsev commented Aug 17, 2020 • edited Loading

jreback commented Aug 17, 2020

vmarkovtsev commented Aug 17, 2020

jbrockmendel commented Aug 17, 2020

vmarkovtsev commented Aug 17, 2020

jbrockmendel commented Aug 17, 2020

vmarkovtsev commented Aug 17, 2020

bast0006 commented Jan 11, 2021 • edited Loading

vmarkovtsev commented Aug 17, 2020 •

edited

Loading

bast0006 commented Jan 11, 2021 •

edited

Loading