Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support pandas >=1.0.0 #1197

Merged
merged 70 commits into from
Feb 21, 2020
Merged

Conversation

itholic
Copy link
Contributor

@itholic itholic commented Jan 16, 2020

Related with #1194

This PR is for updating newly added & deprecated & deleted APIs on pandas 1.0.0 to keep compatible with pandas 1.0.0 to be released soon, based on What’s new in 1.0.0

@itholic
Copy link
Contributor Author

itholic commented Jan 16, 2020

While i'm here, i'll keep checking the pandas release note and updating this PR in order to be compatible with Pandas 1.0.0 as soon as possible after it is released.

@codecov-io
Copy link

codecov-io commented Jan 16, 2020

Codecov Report

Merging #1197 into master will increase coverage by <.01%.
The diff coverage is 100%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master    #1197      +/-   ##
==========================================
+ Coverage   95.02%   95.03%   +<.01%     
==========================================
  Files          34       34              
  Lines        7220     7232      +12     
==========================================
+ Hits         6861     6873      +12     
  Misses        359      359
Impacted Files Coverage Δ
databricks/koalas/missing/frame.py 100% <ø> (ø) ⬆️
databricks/koalas/generic.py 96.95% <ø> (ø) ⬆️
databricks/koalas/window.py 94.64% <ø> (ø) ⬆️
databricks/koalas/missing/series.py 100% <ø> (ø) ⬆️
databricks/koalas/series.py 96.4% <100%> (ø) ⬆️
databricks/koalas/indexing.py 95.96% <100%> (ø) ⬆️
databricks/koalas/groupby.py 91.43% <100%> (ø) ⬆️
databricks/koalas/testing/utils.py 78.98% <100%> (+0.46%) ⬆️
databricks/koalas/plot.py 94.28% <100%> (ø) ⬆️
databricks/koalas/indexes.py 95.9% <100%> (ø) ⬆️
... and 2 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update b1a491a...bad607d. Read the comment docs.

@itholic
Copy link
Contributor Author

itholic commented Feb 14, 2020

@HyukjinKwon @ueshin

Thanks for the conclusion :D

Okay, so I'll check again whole changes one last time based on that concept.

@databricks databricks deleted a comment from itholic Feb 14, 2020
@databricks databricks deleted a comment from itholic Feb 14, 2020
@databricks databricks deleted a comment from itholic Feb 14, 2020
@databricks databricks deleted a comment from itholic Feb 14, 2020
@HyukjinKwon
Copy link
Member

HyukjinKwon commented Feb 14, 2020

Sorry, @itholic, I removed my last comments for nits but it removed your replies together. Ignore my last comments.

@itholic
Copy link
Contributor Author

itholic commented Feb 17, 2020

@HyukjinKwon ah, it's okay. 👍

@@ -777,39 +777,40 @@ def test_monotonic(self):
datas.append([(-5, 'e'), (-4, 'c'), (-3, 'b'), (-2, 'd'), (-1, 'a')])

# None type tests (None type is treated as the largets value)
datas.append([(None, 100), (2, 200), (3, 300), (4, 400), (5, 500)])
# TODO: the commented tests below should be uncommented after fixing for pandas >= 1.0.0
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it difficult to patch to pandas' 1.0.0 behaviours?

Copy link
Contributor Author

@itholic itholic Feb 20, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah its changing in 1.0.0 looks a bit complicated, let me try to fix this follow-up PR

@HyukjinKwon
Copy link
Member

Okay, let's merge this and address my comments in a followup @itholic. Can you just resolve conflicts?

@itholic
Copy link
Contributor Author

itholic commented Feb 20, 2020

Okay, let's merge this and address my comments in a followup @itholic. Can you just resolve conflicts?

yeah, i've resolved nit comments and will address rest ones in follow-up.

thanks for the comments !

@@ -51,6 +54,16 @@ def _apply_as_series_or_frame(self, func):
def count(self):
def count(scol):
return F.count(scol).over(self._window)

if LooseVersion(pd.__version__) >= LooseVersion('1.0.0'):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@itholic, given our discussion, we should match it to the latest behaviour. I think we don't have to check the pandas version.

@HyukjinKwon
Copy link
Member

Okay, I am going to merge this to make it rolling. @itholic, can you do all todos in the next followup?

@HyukjinKwon HyukjinKwon merged commit 24726bd into databricks:master Feb 21, 2020
@itholic
Copy link
Contributor Author

itholic commented Feb 21, 2020

@HyukjinKwon yup. i'm working on that.

@itholic itholic deleted the support_pandas_1.0.0 branch February 21, 2020 02:39
HyukjinKwon pushed a commit that referenced this pull request Feb 26, 2020
Missing/deprecated functions/properties removed in pandas 1.0 were also removed from Koalas (#1197), but we should still show error messages at least when a user using pandas<1.0 tries to use such functions/properties.
HyukjinKwon pushed a commit that referenced this pull request Feb 27, 2020
Follow-up for #1197

Since we're following latest version of pandas, should fix several TODOs with matching pandas>=1.0.0 for now.

## For example.

the behaviour of `Expanding.count()` and `ExpandingGroupby.count()` are different depending on what pandas version has been installed.

- pandas < 1.0.0
```python
>>> s = pd.Series([2, 2, 3, 3, 3, 4, 4, 4, 4, 5, 5])
>>> s.groupby(s).expanding(3).count().sort_index()
2  0     1.0
   1     2.0
3  2     1.0
   3     2.0
   4     3.0
4  5     1.0
   6     2.0
   7     3.0
   8     4.0
5  9     1.0
   10    2.0
dtype: float64
```

- pandas >= 1.0.0
```python
>>> s = pd.Series([2, 2, 3, 3, 3, 4, 4, 4, 4, 5, 5])
>>> s.groupby(s).expanding(3).count().sort_index()
2  0     NaN
   1     NaN
3  2     NaN
   3     NaN
   4     3.0
4  5     NaN
   6     NaN
   7     3.0
   8     4.0
5  9     NaN
   10    NaN
dtype: float64
```

Since we're following latest version of pandas, need to fix this.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants