-
-
Notifications
You must be signed in to change notification settings - Fork 17.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[#19431] Regression in make_block_same_class (tests failing for new fastparquet release) #19434
Changes from 5 commits
eb55928
2f4fc07
e18f8e6
9a07d97
3d9810d
ee75fdf
68d6324
985081d
0cfcd37
0c4a6d7
6ce68cf
f414743
0d76fe7
bb95dc6
d9a2e2a
97b17e9
800b741
52220b9
c602a76
ddbbde3
6e6b5f0
326394f
77422ba
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -206,7 +206,7 @@ def array_dtype(self): | |
""" | ||
return self.dtype | ||
|
||
def make_block(self, values, placement=None, ndim=None): | ||
def make_block(self, values, placement=None, ndim=None, dtype=None): | ||
""" | ||
Create a new block, with type inference propagate any values that are | ||
not specified | ||
|
@@ -216,20 +216,21 @@ def make_block(self, values, placement=None, ndim=None): | |
if ndim is None: | ||
ndim = self.ndim | ||
|
||
return make_block(values, placement=placement, ndim=ndim) | ||
return make_block(values, placement=placement, ndim=ndim, dtype=dtype) | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. maybe remove dtype here? As fastparquet was not using it (and I don't think anybody else will) |
||
|
||
def make_block_scalar(self, values): | ||
""" | ||
Create a ScalarBlock | ||
""" | ||
return ScalarBlock(values) | ||
|
||
def make_block_same_class(self, values, placement=None, ndim=None): | ||
def make_block_same_class(self, values, placement=None, ndim=None, | ||
dtype=None): | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. deprecate dtype here (add as a FutureWarning) There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. done. |
||
""" Wrap given values in a block of same type as self. """ | ||
if placement is None: | ||
placement = self.mgr_locs | ||
return make_block(values, placement=placement, ndim=ndim, | ||
klass=self.__class__) | ||
klass=self.__class__, dtype=dtype) | ||
|
||
def __unicode__(self): | ||
|
||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -448,6 +448,11 @@ class TestParquetFastParquet(Base): | |
def test_basic(self, fp, df_full): | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. make 2 tests here, one for < 0.1.4 and 1 for > |
||
df = df_full | ||
|
||
# additional supported types for fastparquet >= 0.1.4 | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. pls follow the existing style. see how we have a fixture for pa_ge_070. you need to make similar for fp There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. this is the existing style (see how it is for pyarrow) and what I asked for, so please leave it like this (in this PR). For the old test we can use a fixture like There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. good point! There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. little confused. But I agree it's better to match existing style in test_basic to avoid unnecessary style variation. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. this is a totally incorrect style ATM. So please see what you can fix. skipping INSIDE a test function is not correct, rather a fixture should be used. This must have slipped in. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Jeff, please leave this. I asked @minggli to do this, and it is the current style. You can do a follow-up PR if you want to change this in all places. |
||
if LooseVersion(fastparquet.__version__) > LooseVersion('0.1.3'): | ||
df['datetime_tz'] = pd.date_range('20130101', periods=3, | ||
tz='US/Eastern') | ||
|
||
# additional supported types for fastparquet | ||
df['timedelta'] = pd.timedelta_range('1 day', periods=3) | ||
|
||
|
@@ -483,10 +488,12 @@ def test_categorical(self, fp): | |
check_round_trip(df, fp) | ||
|
||
def test_datetime_tz(self, fp): | ||
# doesn't preserve tz | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. same need 2 tests (1 with each fixture) There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. pyarrow is using similar style. there are already two tests for datetime_tz, one < 0.1.4 one >. |
||
if LooseVersion(fastparquet.__version__) > LooseVersion('0.1.3'): | ||
pytest.skip("timezone not supported for older fp") | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. maybe rather "timezones supported in newer versions of fp" |
||
|
||
# fastparquet<0.1.4 doesn't preserve tz | ||
df = pd.DataFrame({'a': pd.date_range('20130101', periods=3, | ||
tz='US/Eastern')}) | ||
|
||
# warns on the coercion | ||
with catch_warnings(record=True): | ||
check_round_trip(df, fp, expected=df.astype('datetime64[ns]')) | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you can add this back, but it needs to be deprecated anyhow.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Then please answer in the original issue how to achieve the same result (see the code example I posted there, to create a new DatetimeTZ block from a numpy array).
But I don't see why this needs to be deprecated. The
dtype
argument was actually doing something, in contrast tofastpath
(which we therefore deprecated)There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fp should be using
make_block
and notmake_block_same_class