Support pandas >=1.0.0 #1197

itholic · 2020-01-16T15:48:24Z

Related with #1194

This PR is for updating newly added & deprecated & deleted APIs on pandas 1.0.0 to keep compatible with pandas 1.0.0 to be released soon, based on What’s new in 1.0.0

itholic · 2020-01-16T15:52:25Z

While i'm here, i'll keep checking the pandas release note and updating this PR in order to be compatible with Pandas 1.0.0 as soon as possible after it is released.

codecov-io · 2020-01-16T16:34:17Z

Codecov Report

Merging #1197 into master will increase coverage by <.01%.
The diff coverage is 100%.

@@            Coverage Diff             @@
##           master    #1197      +/-   ##
==========================================
+ Coverage   95.02%   95.03%   +<.01%     
==========================================
  Files          34       34              
  Lines        7220     7232      +12     
==========================================
+ Hits         6861     6873      +12     
  Misses        359      359

Impacted Files	Coverage Δ
databricks/koalas/missing/frame.py	`100% <ø> (ø)`	⬆️
databricks/koalas/generic.py	`96.95% <ø> (ø)`	⬆️
databricks/koalas/window.py	`94.64% <ø> (ø)`	⬆️
databricks/koalas/missing/series.py	`100% <ø> (ø)`	⬆️
databricks/koalas/series.py	`96.4% <100%> (ø)`	⬆️
databricks/koalas/indexing.py	`95.96% <100%> (ø)`	⬆️
databricks/koalas/groupby.py	`91.43% <100%> (ø)`	⬆️
databricks/koalas/testing/utils.py	`78.98% <100%> (+0.46%)`	⬆️
databricks/koalas/plot.py	`94.28% <100%> (ø)`	⬆️
databricks/koalas/indexes.py	`95.9% <100%> (ø)`	⬆️
... and 2 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update b1a491a...bad607d. Read the comment docs.

…pport_pandas_1.0.0

…version

…pport_pandas_1.0.0

itholic · 2020-02-14T01:43:24Z

@HyukjinKwon @ueshin

Thanks for the conclusion :D

Okay, so I'll check again whole changes one last time based on that concept.

HyukjinKwon · 2020-02-14T06:23:46Z

Sorry, @itholic, I removed my last comments for nits but it removed your replies together. Ignore my last comments.

itholic · 2020-02-17T04:38:37Z

@HyukjinKwon ah, it's okay. 👍

…pport_pandas_1.0.0

.github/workflows/master.yml

databricks/koalas/generic.py

databricks/koalas/tests/test_dataframe_conversion.py

databricks/koalas/tests/test_groupby.py

HyukjinKwon · 2020-02-20T07:48:41Z

databricks/koalas/tests/test_indexes.py

@@ -777,39 +777,40 @@ def test_monotonic(self):
        datas.append([(-5, 'e'), (-4, 'c'), (-3, 'b'), (-2, 'd'), (-1, 'a')])

        # None type tests (None type is treated as the largets value)
-        datas.append([(None, 100), (2, 200), (3, 300), (4, 400), (5, 500)])
+        # TODO: the commented tests below should be uncommented after fixing for pandas >= 1.0.0


Is it difficult to patch to pandas' 1.0.0 behaviours?

yeah its changing in 1.0.0 looks a bit complicated, let me try to fix this follow-up PR

databricks/koalas/window.py

HyukjinKwon · 2020-02-20T07:51:51Z

Okay, let's merge this and address my comments in a followup @itholic. Can you just resolve conflicts?

itholic · 2020-02-20T08:29:41Z

Okay, let's merge this and address my comments in a followup @itholic. Can you just resolve conflicts?

yeah, i've resolved nit comments and will address rest ones in follow-up.

thanks for the comments !

HyukjinKwon · 2020-02-21T01:02:32Z

databricks/koalas/window.py

@@ -51,6 +54,16 @@ def _apply_as_series_or_frame(self, func):
    def count(self):
        def count(scol):
            return F.count(scol).over(self._window)
+
+        if LooseVersion(pd.__version__) >= LooseVersion('1.0.0'):


@itholic, given our discussion, we should match it to the latest behaviour. I think we don't have to check the pandas version.

HyukjinKwon · 2020-02-21T01:03:18Z

Okay, I am going to merge this to make it rolling. @itholic, can you do all todos in the next followup?

itholic · 2020-02-21T02:05:46Z

@HyukjinKwon yup. i'm working on that.

Missing/deprecated functions/properties removed in pandas 1.0 were also removed from Koalas (#1197), but we should still show error messages at least when a user using pandas<1.0 tries to use such functions/properties.

Follow-up for #1197 Since we're following latest version of pandas, should fix several TODOs with matching pandas>=1.0.0 for now. ## For example. the behaviour of `Expanding.count()` and `ExpandingGroupby.count()` are different depending on what pandas version has been installed. - pandas < 1.0.0 ```python >>> s = pd.Series([2, 2, 3, 3, 3, 4, 4, 4, 4, 5, 5]) >>> s.groupby(s).expanding(3).count().sort_index() 2 0 1.0 1 2.0 3 2 1.0 3 2.0 4 3.0 4 5 1.0 6 2.0 7 3.0 8 4.0 5 9 1.0 10 2.0 dtype: float64 ``` - pandas >= 1.0.0 ```python >>> s = pd.Series([2, 2, 3, 3, 3, 4, 4, 4, 4, 5, 5]) >>> s.groupby(s).expanding(3).count().sort_index() 2 0 NaN 1 NaN 3 2 NaN 3 NaN 4 3.0 4 5 NaN 6 NaN 7 3.0 8 4.0 5 9 NaN 10 NaN dtype: float64 ``` Since we're following latest version of pandas, need to fix this.

update removed & changed APIs

c581e9e

labels -> codes

e891ae7

itholic added 26 commits January 28, 2020 14:38

Merge branch 'master' of https://github.com/databricks/koalas into su…

cb484ab

…pport_pandas_1.0.0

Merge branch 'master' of https://github.com/databricks/koalas into su…

e69d0d7

…pport_pandas_1.0.0

manage for supporting pandas 1.0.0

e93a8d6

using pandas 1.0.0rc0 in .travis.yml

6729c5e

fix conda forge path

3c51348

revert pandas1.0.0 for python3.5

c77d11c

resolve lint

c7c7034

Merge branch 'master' of https://github.com/databricks/koalas into su…

ab8c1b9

…pport_pandas_1.0.0

recover get_dtype_counts temporarily

61c03d0

[fix] RollingGroupBy

6be375b

[Common] get_dummies

75351c5

[fix] ValueError at several functions for DataFrameGroupBy

4d68cf6

[fix] ExpandingGroupBy

71b78d7

Fix doctest for DataFrame.info

3c867dd

Comment to_latex test raising ValueError in pandas >= 1.0.0

1bc76b3

Resolve conflicts

994247c

fix conda-forge

371d5ea

skip doctest of DataFrame.info since inconsistency depends on python …

a3f252b

…version

add new function for DataFrame to_markdown to missing list

7ab1386

pandas 1.0.0rc0 -> pandas 1.0.0

57e33d7

Merge branch 'master' of https://github.com/databricks/koalas into su…

0ba7378

…pport_pandas_1.0.0

[fix] Expanding.count to support pandas 1.0.0

079721d

[fix] Expanding.count rearrange

1343204

[fix] ExpandingGroupby.count

957a4f9

[fix] doctest for ExpandingGroupby.count

fe6fd57

[requirements-dev] pandas>=0.23.2,<1.0 -> pandas>=1.0.0

c4d6550

itholic added 2 commits February 14, 2020 10:54

Match the doctest of DataFrame.info to 1.0.1

eb820d2

Match missing list to 1.0.1

7a063a5

databricks deleted a comment from itholic Feb 14, 2020

Merge branch 'master' of https://github.com/databricks/koalas into su…

b1a491a

…pport_pandas_1.0.0