RF: python api behaviour refactor #4207

venkywonka · 2021-09-14T11:12:03Z

This PR ⬇️

fixes [BUG] Random forest is not compatible with dask-ml GridsearchCV #4193 and fixes Am i using randomForest Classifier with gridsearch wrong & is xgboost supported? #4194 that relates to API incompatibility with dask-ml GridSearchCV
changes the behaviour of cuml RF in the following cases:
- In the not-so-uncommon case when n_bins > number of rows in training sample, instead of throwing error and exiting, the estimator is made to print a warning and use the n_bins as the number of training samples.
- When .predict() is called using float64 data, instead of throwing an error asking user to explicitly specify predict_model="CPU" and rerun, a warning is displayed and implicity defaults to CPU-based prediction from the default GPU-based prediction.
Corresponding tests to capture the warnings from above added
the estimators now accept both numbers and strings as input for split_criterion parameter thus in parity with sklearn's API that takes in strings as criterion.
split_algo and use_experimental_backend parameters of the estimator class have now been completely removed from both documentation and warnings after deprecation in previous releases (from both single-gpu and dask RF).
num_classes parameter of predict and score methods have also been similarly removed

venkywonka · 2021-09-18T05:59:12Z

rerun tests

dantegd · 2021-09-18T17:26:40Z

@venkywonka I just reproduced the issue of CI in plain branch-21.10 locally, so on Monday we'll work on unblocking CI

venkywonka · 2021-09-18T19:12:44Z

that's great @dantegd, thank you 🙏

dantegd · 2021-09-19T17:07:30Z

rerun tests

dantegd · 2021-09-19T17:07:51Z

The latest libcumlprims package should solve all issues

dantegd

Pre-approving, just had one comment, though I could deal with in in #4196 after merging this

dantegd · 2021-09-20T15:35:01Z

python/cuml/test/test_random_forest.py

+               "the number of samples used for training. "
+               "Changing `n_bins` to number of training samples."
+               in str(w[-1].message))
+        print(str(w[-1].message))


Suggested change

print(str(w[-1].message))

I don't think it is necessary to print the message, maybe only if it is wrong?

oh yea, that's on me will get rid of it, dante

dantegd · 2021-09-20T19:39:28Z

@gpucibot merge

dantegd · 2021-09-21T03:35:25Z

rerun tests

codecov-commenter · 2021-09-21T06:07:23Z

Codecov Report

❗ No coverage uploaded for pull request base (branch-21.10@36b3746). Click here to learn what that means.
The diff coverage is n/a.

@@               Coverage Diff               @@
##             branch-21.10    #4207   +/-   ##
===============================================
  Coverage                ?   86.07%           
===============================================
  Files                   ?      231           
  Lines                   ?    18633           
  Branches                ?        0           
===============================================
  Hits                    ?    16039           
  Misses                  ?     2594           
  Partials                ?        0

Flag	Coverage Δ
dask	`47.05% <0.00%> (?)`
non-dask	`78.74% <0.00%> (?)`

Flags with carried forward coverage won't be shown. Click here to find out more.

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 36b3746...0b4e7f0. Read the comment docs.

This PR ⬇️ * fixes rapidsai#4193 and fixes rapidsai#4194 that relates to API incompatibility with dask-ml GridSearchCV * changes the behaviour of cuml RF in the following cases: * In the not-so-uncommon case when `n_bins` > number of rows in training sample, instead of throwing error and exiting, the estimator is made to print a warning and use the `n_bins` as the number of training samples. * When `.predict()` is called using `float64` data, instead of throwing an error asking user to explicitly specify `predict_model="CPU"` and rerun, a warning is displayed and implicity defaults to CPU-based prediction from the default GPU-based prediction. * Corresponding tests to capture the warnings from above added * the estimators now accept both numbers and strings as input for `split_criterion` parameter thus in parity with sklearn's API that takes in strings as criterion. * `split_algo` and `use_experimental_backend` parameters of the estimator class have now been completely removed from both documentation and warnings after deprecation in previous releases (from both single-gpu and dask RF). * `num_classes` parameter of predict and score methods have also been similarly removed Authors: - Venkat (https://github.com/venkywonka) Approvers: - Dante Gama Dessavre (https://github.com/dantegd) - Rory Mitchell (https://github.com/RAMitchell) URL: rapidsai#4207

python api behaviour refactor

207fa3a

venkywonka requested a review from a team as a code owner September 14, 2021 11:12

github-actions bot added the Cython / Python Cython or Python issue label Sep 14, 2021

flake8 fix

baaa425

github-actions bot added the Cython / Python Cython or Python issue label Sep 14, 2021

fix a failing test

0ae902d

venkywonka removed doc Documentation Cython / Python Cython or Python issue Dask / cuml.dask Issue/PR related to Python level dask or cuml.dask features. labels Sep 14, 2021

github-actions bot added the Cython / Python Cython or Python issue label Sep 14, 2021

venkywonka added 3 - Ready for Review Ready for review by team and removed 2 - In Progress Currenty a work in progress labels Sep 14, 2021

dantegd approved these changes Sep 20, 2021

View reviewed changes

prune print

0b4e7f0

RAMitchell approved these changes Sep 21, 2021

View reviewed changes

rapids-bot bot merged commit b375320 into rapidsai:branch-21.10 Sep 21, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

RF: python api behaviour refactor #4207

RF: python api behaviour refactor #4207

venkywonka commented Sep 14, 2021 •

edited

Loading

venkywonka commented Sep 18, 2021

dantegd commented Sep 18, 2021 •

edited

Loading

venkywonka commented Sep 18, 2021

dantegd commented Sep 19, 2021

dantegd commented Sep 19, 2021

dantegd left a comment

dantegd Sep 20, 2021

venkywonka Sep 20, 2021

dantegd commented Sep 20, 2021

dantegd commented Sep 21, 2021

codecov-commenter commented Sep 21, 2021

RF: python api behaviour refactor #4207

RF: python api behaviour refactor #4207

Conversation

venkywonka commented Sep 14, 2021 • edited Loading

venkywonka commented Sep 18, 2021

dantegd commented Sep 18, 2021 • edited Loading

venkywonka commented Sep 18, 2021

dantegd commented Sep 19, 2021

dantegd commented Sep 19, 2021

dantegd left a comment

Choose a reason for hiding this comment

dantegd Sep 20, 2021

Choose a reason for hiding this comment

venkywonka Sep 20, 2021

Choose a reason for hiding this comment

dantegd commented Sep 20, 2021

dantegd commented Sep 21, 2021

codecov-commenter commented Sep 21, 2021

Codecov Report

venkywonka commented Sep 14, 2021 •

edited

Loading

dantegd commented Sep 18, 2021 •

edited

Loading