Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[#19431] Regression in make_block_same_class (tests failing for new fastparquet release) #19434

Merged
merged 23 commits into from
Jan 29, 2018
Merged
Show file tree
Hide file tree
Changes from 20 commits
Commits
Show all changes
23 commits
Select commit Hold shift + click to select a range
eb55928
udpate test_parquet since fastparquet now handles tz
minggli Jan 28, 2018
2f4fc07
bring back dtype kwarg because it is needed for DatetimeTZBlock
minggli Jan 28, 2018
e18f8e6
version dependence test_datetime_tz
minggli Jan 28, 2018
9a07d97
separate test cases for new and old behaviour of fastparquet
minggli Jan 28, 2018
3d9810d
tidy test_datetime_tz to test old behaviour of fastparquet<0.14
minggli Jan 28, 2018
ee75fdf
rephase reason to skip test case for oldder fp
minggli Jan 28, 2018
68d6324
follow pytest fixture pattern as in pyarrow
minggli Jan 28, 2018
985081d
follow pyarrow test_basic style for fastparquet new behaviour>=0.1.4
minggli Jan 28, 2018
0cfcd37
fastparquet=0.1.3
minggli Jan 28, 2018
0c4a6d7
other api change
minggli Jan 28, 2018
6ce68cf
fix typo
minggli Jan 28, 2018
f414743
deprecation warning for dtype in make_block_same_class.
minggli Jan 28, 2018
0d76fe7
Future warning for dtype in make_block_same_class.
minggli Jan 28, 2018
bb95dc6
update notes as fastparquet nows supports timezone
minggli Jan 28, 2018
d9a2e2a
remove fastparquet pin
minggli Jan 29, 2018
97b17e9
remove other api change as it is internal
minggli Jan 29, 2018
800b741
remove version
minggli Jan 29, 2018
52220b9
remove dtype for make_block and DeprecationWarning on make_block_same…
minggli Jan 29, 2018
c602a76
FutureWarning on make_block_same_class
minggli Jan 29, 2018
ddbbde3
test case for dtype and warning generation
minggli Jan 29, 2018
6e6b5f0
issue number and simplify test
minggli Jan 29, 2018
326394f
misc doc
minggli Jan 29, 2018
77422ba
use pandas warning assert
minggli Jan 29, 2018
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion doc/source/io.rst
Original file line number Diff line number Diff line change
Expand Up @@ -4537,7 +4537,7 @@ See the documentation for `pyarrow <http://arrow.apache.org/docs/python/>`__ and
.. note::

These engines are very similar and should read/write nearly identical parquet format files.
Currently ``pyarrow`` does not support timedelta data, and ``fastparquet`` does not support timezone aware datetimes (they are coerced to UTC).
Currently ``pyarrow`` does not support timedelta data, ``fastparquet`` now supports timezone aware datetimes.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this needs to change. you can't say 'now supports' there is no context here. Simply remove the fp statement (or qualify it with fp >1.4)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

These libraries differ by having different underlying dependencies (``fastparquet`` by using ``numba``, while ``pyarrow`` uses a c-library).

.. ipython:: python
Expand Down
9 changes: 7 additions & 2 deletions pandas/core/internals.py
Original file line number Diff line number Diff line change
Expand Up @@ -224,12 +224,17 @@ def make_block_scalar(self, values):
"""
return ScalarBlock(values)

def make_block_same_class(self, values, placement=None, ndim=None):
def make_block_same_class(self, values, placement=None, ndim=None,
dtype=None):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

deprecate dtype here (add as a FutureWarning)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done.

""" Wrap given values in a block of same type as self. """
if dtype is not None:
# issue 19431 fastparquet is passing this
warnings.warn("dtype argument is deprecated, will be removed "
"in a future release.", FutureWarning)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's make this a DeprecationWarning instead of FutureWarning, as it is typically only something developers need to see, not users.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

post an issue on fastparquet to fix this as well

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

let’s leave this as a FutureWarning
it will encourage fp to fix this as it will be visible

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can use the issue I opened yesterday: dask/fastparquet#297

FutureWarning will just be annoying for users in this case, and I am confident fastparquet will change that

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if it’s changed then we can simply move the bar on fp min version and this is no problem

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we are not keeping private API around for external packages
this is way too much work

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

gents, which is it? FutureWarning or DeprecationWarning? it seems to me that DeprecationWarning is meant to be used here.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

pls leave it as FutureWarning

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok

if placement is None:
placement = self.mgr_locs
return make_block(values, placement=placement, ndim=ndim,
klass=self.__class__)
klass=self.__class__, dtype=dtype)

def __unicode__(self):

Expand Down
8 changes: 8 additions & 0 deletions pandas/tests/internals/test_internals.py
Original file line number Diff line number Diff line change
Expand Up @@ -285,6 +285,14 @@ def test_delete(self):
with pytest.raises(Exception):
newb.delete(3)

def test_make_block_same_class(self):
block = create_block('M8[ns, US/Eastern]', [3])
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

add the issue number as a comment

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

with pytest.warns(FutureWarning):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

use tm.assert_produces_warning instead. we catch this usage of pytest.warns in the linter and is not standard for pandas codease.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good point.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

copy = block.make_block_same_class(block.values,
dtype=block.values.dtype)
assert block.dtype == copy.dtype
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you don't need these asserts, just running this is enough

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

assert block.__class__ == copy.__class__


class TestDatetimeBlock(object):

Expand Down
24 changes: 18 additions & 6 deletions pandas/tests/io/test_parquet.py
Original file line number Diff line number Diff line change
Expand Up @@ -71,6 +71,15 @@ def fp():
return 'fastparquet'


@pytest.fixture
def fp_lt_014():
if not _HAVE_FASTPARQUET:
pytest.skip("fastparquet is not installed")
if LooseVersion(fastparquet.__version__) >= LooseVersion('0.1.4'):
pytest.skip("fastparquet is >= 0.1.4")
return 'fastparquet'


@pytest.fixture
def df_compat():
return pd.DataFrame({'A': [1, 2, 3], 'B': 'foo'})
Expand Down Expand Up @@ -448,9 +457,11 @@ class TestParquetFastParquet(Base):
def test_basic(self, fp, df_full):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

make 2 tests here, one for < 0.1.4 and 1 for >

df = df_full

# additional supported types for fastparquet
# additional supported types for fastparquet>=0.1.4
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you can leave out the ">=0.1.4" as the timedelta is not specific for the newer versions

if LooseVersion(fastparquet.__version__) >= LooseVersion('0.1.4'):
df['datetime_tz'] = pd.date_range('20130101', periods=3,
tz='US/Eastern')
df['timedelta'] = pd.timedelta_range('1 day', periods=3)

check_round_trip(df, fp)

@pytest.mark.skip(reason="not supported")
Expand Down Expand Up @@ -482,14 +493,15 @@ def test_categorical(self, fp):
df = pd.DataFrame({'a': pd.Categorical(list('abc'))})
check_round_trip(df, fp)

def test_datetime_tz(self, fp):
# doesn't preserve tz
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same need 2 tests (1 with each fixture)

Copy link
Contributor Author

@minggli minggli Jan 28, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

pyarrow is using similar style. there are already two tests for datetime_tz, one < 0.1.4 one >.

def test_datetime_tz(self, fp_lt_014):

# fastparquet<0.1.4 doesn't preserve tz
df = pd.DataFrame({'a': pd.date_range('20130101', periods=3,
tz='US/Eastern')})

# warns on the coercion
with catch_warnings(record=True):
check_round_trip(df, fp, expected=df.astype('datetime64[ns]'))
check_round_trip(df, fp_lt_014,
expected=df.astype('datetime64[ns]'))

def test_filter_row_groups(self, fp):
d = {'a': list(range(0, 3))}
Expand Down