Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] Negative R-Squared with Nightly (0.17) #3305

Closed
Tracked by #4139
ibrisa opened this issue Dec 15, 2020 · 12 comments
Closed
Tracked by #4139

[BUG] Negative R-Squared with Nightly (0.17) #3305

ibrisa opened this issue Dec 15, 2020 · 12 comments
Assignees
Labels
4 - Waiting on Author Waiting for author to respond to review bug Something isn't working

Comments

@ibrisa
Copy link

ibrisa commented Dec 15, 2020

Describe the bug
I am using nightly v0.17 RAPIDS on x86 to run the random forest regression algorithm and observing negative r-squared cuML values. I am comparing to sklearn on the same data and those r-squared values are positive. The MSE values do look correct (low and close to the sklearn MSE). The smallest example dataset I consistently see this occurring with is available here. Also linked to it below.

This occurs on MNMG (dask) version, which I've run with both 8 workers and just 1 worker. It is worth noting that I started by using the single node version of random forest regression (without dask) and that version never ran into a negative r-squared value. Any help to fix the negative r2 would be much appreciated.

Steps/Code to reproduce bug
Python Code:

import cudf
import cuml
import cupy
import time
import numpy as np
import sklearn

from sklearn.metrics import accuracy_score
from sklearn import model_selection, datasets

from cuml.dask.common import utils as dask_utils
from dask.distributed import Client, wait
from dask_cuda import LocalCUDACluster
import dask_cudf

from cuml.dask.ensemble import RandomForestClassifier as cumlDaskRFC
from cuml.dask.ensemble import RandomForestRegressor as cumlDaskRFR
from sklearn.ensemble import RandomForestClassifier as sklRFC
from sklearn.ensemble import RandomForestRegressor as sklRFR


def main():
    ## Load Data
    data_type = np.float32
    test_data = cudf.read_csv("/mnt/DGX01/Personal/ibrunkan/rapids/data/mid_formatted_wHeader.tsv",sep='\t')

   ## Split data into X and y
    test_data_f32=test_data.astype('float32')
    X=test_data_f32.iloc[0:,0:-1]
    y=test_data_f32.pheno0

    # Random Forest building parameters
    max_depth = 20
    n_bins = 8
    n_trees = 1000

    ## Split train-test
    X_train, X_test, y_train, y_test = model_selection.train_test_split(X,
                                                            y, test_size=0.2)
    X_train_pd =X_train.to_pandas()
    X_test_pd  =X_test.to_pandas()
    y_train_pd =y_train.to_pandas()
    y_test_pd  =y_test.to_pandas()

    skl_model = sklRFR(max_depth=max_depth, n_estimators=n_trees, n_jobs=-1)
    skl_model.fit(X_train_pd, y_train_pd)

    # Predict
    skl_y_pred = skl_model.predict(X_test_pd)

    # Partition with Dask
    n_partitions = n_workers
    # In this case, each worker will train on 1/n_partitions fraction of the data
    X_train_dask = dask_cudf.from_cudf(X_train, npartitions=n_partitions)
    y_train_dask = dask_cudf.from_cudf(y_train, npartitions=n_partitions)
    X_test_dask  = dask_cudf.from_cudf(X_test, npartitions=n_partitions)
    # Persist to cache the data in active memory
    X_train_dask, y_train_dask = dask_utils.persist_across_workers(client,
                                        [X_train_dask, y_train_dask], workers=workers)

    # Build model
    cuml_model = cumlDaskRFR(max_depth=max_depth, n_estimators=n_trees,
                              n_streams=n_streams,n_bins=n_bins)
    cuml_model.fit(X_train_dask, y_train_dask)
    wait(cuml_model.rfs) # Allow asynchronous training tasks to finish
 
   # Predict
    cuml_y_pred = cuml_model.predict(X_test_dask)

    print("===== Accuracy Metrics =====",file=f,flush=True)
    # Due to randomness in the algorithm, you may see slight variation in accuracies
    print("-----SKLearn", file=f)
    print("SKLearn MSE:  ", sklearn.metrics.mean_squared_error(y_test_pd, skl_y_pred),file=f)
    print("SKLearn r2:  ", sklearn.metrics.r2_score(y_test_pd, skl_y_pred),file=f)
    print("-----CuML",file=f)
    print("CuML MSE:     ", cuml.metrics.regression.mean_squared_error(y_test, cuml_y_pred.compute()), file=f)
    print("CuML r2:   ", cuml.metrics.regression.r2_score(y_test, cuml_y_pred.compute()), file=f)
    print("DONE",file=f,flush=True)

if __name__ == '__main__':
   # This will use all GPUs on the local host by default
    f=open("results/rfr-mnmg-nightly-mid-sampletest4", 'w')
    cluster = LocalCUDACluster(threads_per_worker=1)
    client = Client(cluster)
    print("Sleeping... 20 s",flush=True)
    time.sleep(20)
    print("Awake!",flush=True)

    # Query the client for all connected workers
    workers = client.has_what().keys()
    print(f'workers: {workers}',file=f)
    n_workers = len(workers)
    print(f'n_workers: {n_workers}',file=f)
    n_streams = 8 # Performance optimization

    main()

    client.shutdown()

Input and Sample Output
Input Link: https://drive.google.com/file/d/1zD533JPbTAmXU6zu9fr4Tp4Ef4k3VEbA/view?usp=sharing

Sample output (Nightly):
Test 1:
Sklearn MSE: 0.005 Sklearn R2: 0.7988 Cuml MSE: 0.0284 Cuml R2: -0.13

Test 2:
Sklearn MSE: 0.0045 Sklearn R2: 0.8086 Cuml MSE: 0.0276 Cuml R2: -0.152

etc

Environment details (please complete the following information):
x86:
Eight NVIDIA A100 40GB “Ampere” GPU accelerators
NVSwitch provides full 600GB/s NVLink connectivity between all GPUs
320GB total GPU Memory
Two AMD EPYC 7002-Series 64-core CPUs
1TB DDR4 System Memory
(4) 3.84TB NVMe SSDs for 15TB Local Storage
200GbE QSFP28 Ethernet adapter
Eight Mellanox ConnectX-6 QSFP28 200Gbps HDR InfiniBand/200GigE adapters
One Gigabit Ethernet RJ45 management port and one BMC RJ45 port

  • Method of cuDF & cuML install:
    With conda:
Click to see `Conda List` output:
Name                    Version                   Build  Channel
_libgcc_mutex             0.1                 conda_forge    conda-forge
_openmp_mutex             4.5                       1_gnu    conda-forge
_pytorch_select           0.1                       cpu_0
abseil-cpp                20200225.2           he1b5a44_2    conda-forge
arrow                     0.17.0           py37hc8dfbb8_1    conda-forge
arrow-cpp                 1.0.1           py37h9631afc_16_cuda    conda-forge
arrow-cpp-proc            2.0.0                      cuda    conda-forge
aws-c-common              0.4.59               h36c2ea0_1    conda-forge
aws-c-event-stream        0.1.6                had2084c_6    conda-forge
aws-checksums             0.1.10               h4e93380_0    conda-forge
aws-sdk-cpp               1.8.70               h57dc084_1    conda-forge
binaryornot               0.4.4                      py_1    conda-forge
blas                      1.0                         mkl    conda-forge
bokeh                     2.2.3            py37h89c1867_0    conda-forge
boost-cpp                 1.72.0               h9359b55_3    conda-forge
brotli                    1.0.9                he1b5a44_3    conda-forge
brotlipy                  0.7.0           py37hb5d75c8_1001    conda-forge
bzip2                     1.0.8                h7f98852_4    conda-forge
c-ares                    1.17.1               h36c2ea0_0    conda-forge
ca-certificates           2020.12.5            ha878542_0    conda-forge
certifi                   2020.12.5        py37h89c1867_0    conda-forge
cffi                      1.14.4           py37h11fe52a_0    conda-forge
chardet                   3.0.4           py37he5f6b98_1008    conda-forge
click                     7.1.2              pyh9f0ad1d_0    conda-forge
cloudpickle               1.6.0                      py_0    conda-forge
clx                       0.17.0a201110   py37_g68eaf3e_47    rapidsai-nightly
cookiecutter              1.7.2              pyh9f0ad1d_0    conda-forge
cryptography              3.3.1            py37h7f0c10b_0    conda-forge
cudatoolkit               11.0.221             h6bb024c_0    nvidia
cudf                      0.17.0a201211   cuda_11.0_py37_g00ca24625e_383    rapidsai-nightly
cudnn                     8.0.0                cuda11.0_0    nvidia
cugraph                   0.17.0a201211   py37_ge205fd07_289    rapidsai-nightly
cuml                      0.17.0a201210   cuda11.0_py37_g2c0aacf44_173    rapidsai-nightly
cupy                      8.0.0            py37h0ce7dbb_0    rapidsai-nightly
cytoolz                   0.11.0           py37h4abf009_1    conda-forge
dask                      2020.12.0          pyhd8ed1ab_0    conda-forge
dask-core                 2020.12.0          pyhd8ed1ab_0    conda-forge
dask-cuda                 0.17.0a201211           py37_71    rapidsai-nightly
dask-cudf                 0.17.0a201211   py37_g00ca24625e_383    rapidsai-nightly
distributed               2020.12.0        py37h89c1867_0    conda-forge
dlpack                    0.3                  he1b5a44_1    conda-forge
faiss-proc                1.0.0                      cuda    rapidsai-nightly
fastavro                  1.2.1            py37h5e8e339_0    conda-forge
fastrlock                 0.5              py37h3340039_1    conda-forge
filelock                  3.0.12             pyh9f0ad1d_0    conda-forge
freetype                  2.10.4               h7ca028e_0    conda-forge
fsspec                    0.8.4                      py_0    conda-forge
gflags                    2.2.2             he1b5a44_1004    conda-forge
glog                      0.4.0                h49b9bf7_3    conda-forge
gperftools                2.7                  h767d802_2    conda-forge
grpc-cpp                  1.33.2               h1870a98_1    conda-forge
heapdict                  1.0.1                      py_0    conda-forge
icu                       67.1                 he1b5a44_0    conda-forge
idna                      2.10               pyh9f0ad1d_0    conda-forge
intel-openmp              2019.4                      243
jinja2                    2.11.2             pyh9f0ad1d_0    conda-forge
jinja2-time               0.2.0                      py_2    conda-forge
joblib                    0.17.0                     py_0    conda-forge
jpeg                      9d                   h36c2ea0_0    conda-forge
krb5                      1.17.2               h926e7f8_0    conda-forge
lcms2                     2.11                 hcbb858e_1    conda-forge
ld_impl_linux-64          2.35.1               hed1e6ac_0    conda-forge
libblas                   3.8.0                    14_mkl    conda-forge
libcblas                  3.8.0                    14_mkl    conda-forge
libcudf                   0.17.0a201211   cuda11.0_g00ca24625e_383    rapidsai-nightly
libcugraph                0.17.0a201211   cuda11.0_ge205fd07_289    rapidsai-nightly
libcuml                   0.17.0a201210   cuda11.0_g2c0aacf44_173    rapidsai-nightly
libcumlprims              0.17.0a201030   cuda11.0_g1fa28a5_8    rapidsai-nightly
libcurl                   7.71.1               hcdd3856_8    conda-forge
libedit                   3.1.20191231         h46ee950_2    conda-forge
libev                     4.33                 h516909a_1    conda-forge
libevent                  2.1.10               hcdb4288_3    conda-forge
libfaiss                  1.6.3           h328c4c8_1_cuda    rapidsai-nightly
libffi                    3.2.1             he1b5a44_1007    conda-forge
libgcc-ng                 9.3.0               h5dbcf3e_17    conda-forge
libgfortran-ng            9.3.0               he4bcb1c_17    conda-forge
libgfortran5              9.3.0               he4bcb1c_17    conda-forge
libgomp                   9.3.0               h5dbcf3e_17    conda-forge
libhwloc                  2.3.0                h5e5b7d1_1    conda-forge
libiconv                  1.16                 h516909a_0    conda-forge
liblapack                 3.8.0                    14_mkl    conda-forge
libllvm10                 10.0.1               he513fc3_3    conda-forge
libnghttp2                1.41.0               h8cfc5f6_2    conda-forge
libpng                    1.6.37               h21135ba_2    conda-forge
libprotobuf               3.13.0.1             h8b12597_0    conda-forge
librmm                    0.17.0a201211   cuda11.0_gb8c8310_60    rapidsai-nightly
libssh2                   1.9.0                hab1572f_5    conda-forge
libstdcxx-ng              9.3.0               h2ae2ef3_17    conda-forge
libthrift                 0.13.0               h5aa387f_6    conda-forge
libtiff                   4.1.0                h4f3a223_6    conda-forge
libutf8proc               2.6.0                h36c2ea0_0    conda-forge
libwebp-base              1.1.0                h36c2ea0_3    conda-forge
libxml2                   2.9.10               h68273f3_2    conda-forge
llvmlite                  0.35.0           py37h9d7f4d0_0    conda-forge
locket                    0.2.0                      py_2    conda-forge
lz4-c                     1.9.2                he1b5a44_3    conda-forge
markupsafe                1.1.1            py37hb5d75c8_2    conda-forge
mkl                       2019.4                      243
mkl-service               2.3.0            py37h516909a_0    conda-forge
msgpack-python            1.0.1            py37h2527ec5_0    conda-forge
nccl                      2.7.8.1            h4962215_100    nvidia
ncurses                   6.1               hf484d3e_1002    conda-forge
ninja                     1.10.2               h4bd325d_0    conda-forge
numba                     0.52.0           py37hdc94413_0    conda-forge
numpy                     1.19.4           py37h7e9df27_1    conda-forge
nvtx                      0.2.1            py37h8f50634_2    conda-forge
olefile                   0.46               pyh9f0ad1d_1    conda-forge
openssl                   1.1.1h               h516909a_0    conda-forge
orc                       1.6.5                hd3605a7_0    conda-forge
packaging                 20.7               pyhd3deb0d_0    conda-forge
pandas                    1.1.5            py37hdc94413_0    conda-forge
parquet-cpp               1.5.1                         2    conda-forge
partd                     1.1.0                      py_0    conda-forge
perl                      5.32.0               h36c2ea0_0    conda-forge
pillow                    8.0.1            py37h63a5d19_0    conda-forge
pip                       20.1.1             pyh9f0ad1d_0    conda-forge
poyo                      0.5.0                      py_0    conda-forge
protobuf                  3.13.0.1         py37h745909e_1    conda-forge
psutil                    5.7.3            py37hb5d75c8_0    conda-forge
pyarrow                   1.0.1           py37hbeecfa9_16_cuda    conda-forge
pycparser                 2.20               pyh9f0ad1d_2    conda-forge
pynvml                    8.0.4                      py_1    conda-forge
pyopenssl                 20.0.0             pyhd8ed1ab_0    conda-forge
pyparsing                 2.4.7              pyh9f0ad1d_0    conda-forge
pysocks                   1.7.1            py37he5f6b98_2    conda-forge
python                    3.7.8           h6f2ec95_1_cpython    conda-forge
python-dateutil           2.8.1                      py_0    conda-forge
python-slugify            4.0.1              pyh9f0ad1d_0    conda-forge
python_abi                3.7                     1_cp37m    conda-forge
pytorch                   1.5.0           cpu_py37hd91cbb3_0
pytz                      2020.4             pyhd8ed1ab_0    conda-forge
pyyaml                    5.3.1            py37hb5d75c8_1    conda-forge
re2                       2020.11.01           h58526e2_0    conda-forge
readline                  8.0                  h46ee950_1    conda-forge
regex                     2020.11.13       py37h4abf009_0    conda-forge
requests                  2.25.0             pyhd3deb0d_0    conda-forge
rmm                       0.17.0a201211   cuda_11.0_py37_gb8c8310_60    rapidsai-nightly
sacremoses                0.0.43             pyh9f0ad1d_0    conda-forge
scikit-learn              0.23.2           py37hddcf8d6_3    conda-forge
scipy                     1.5.3            py37h14a347d_0    conda-forge
sentencepiece             0.1.92           py37h99015e2_0    conda-forge
setuptools                49.6.0           py37he5f6b98_2    conda-forge
six                       1.15.0             pyh9f0ad1d_0    conda-forge
snappy                    1.1.8                he1b5a44_3    conda-forge
sortedcontainers          2.3.0              pyhd8ed1ab_0    conda-forge
spdlog                    1.7.0                hc9558a2_2    conda-forge
sqlite                    3.32.3               hcee41ef_1    conda-forge
tblib                     1.6.0                      py_0    conda-forge
text-unidecode            1.3                        py_0    conda-forge
threadpoolctl             2.1.0              pyh5ca1d4c_0    conda-forge
tk                        8.6.10               hed695b0_1    conda-forge
tokenizers                0.9.4            py37h17e0dd7_1    conda-forge
toolz                     0.11.1                     py_0    conda-forge
torchvision               0.2.1                    py37_0
tornado                   6.1              py37h4abf009_0    conda-forge
tqdm                      4.54.1             pyhd8ed1ab_0    conda-forge
transformers              4.0.1              pyhd8ed1ab_0    conda-forge
treelite                  0.93             py37h745909e_3    conda-forge
treelite-runtime          0.93                     pypi_0    pypi
typing_extensions         3.7.4.3                    py_0    conda-forge
ucx                       1.8.1+g6b29558       cuda11.0_0    rapidsai-nightly
ucx-proc                  1.0.0                       gpu    rapidsai-nightly
ucx-py                    0.17.0a201211   py37_g6b29558_27    rapidsai-nightly
unidecode                 1.1.1                      py_0    conda-forge
urllib3                   1.25.11                    py_0    conda-forge
wheel                     0.36.1             pyhd3deb0d_0    conda-forge
whichcraft                0.6.1                      py_0    conda-forge
xz                        5.2.5                h516909a_1    conda-forge
yaml                      0.2.5                h516909a_0    conda-forge
zict                      2.0.0                      py_0    conda-forge
zlib                      1.2.11            h516909a_1010    conda-forge
zstd                      1.4.5                h6597ccf_2    conda-forge
@ibrisa ibrisa added ? - Needs Triage Need team to review and classify bug Something isn't working labels Dec 15, 2020
@ibrisa ibrisa changed the title [BUG] Negative R-Squared with Nightly (0.17) and v0.15 [BUG] Negative R-Squared with Nightly (0.17) Dec 17, 2020
@Nanthini10
Copy link
Contributor

R2 score can be negative depending on how the classifier fits. For more detailed explanation: https://stats.stackexchange.com/questions/12900/when-is-r-squared-negative

Below is a small code to show that sklearn and cuml both produce negative R2 scores. In your case, these values are still different because you're using sklearn RFR and cuml.dask RFR - these might produce different results.

from cuml.datasets import make_blobs

from cuml.metrics.regression import r2_score
from sklearn.metrics import r2_score as sk_r2

import cupy

_, y = make_blobs(n_samples = 1000)
_, y2 = make_blobs(n_samples = 1000)

sk_r2(cupy.asnumpy(y), cupy.asnumpy(y2)) # -0.9925956959632654

r2_score(y, y2) #  -0.9925957918167114

@Nanthini10 Nanthini10 added 4 - Waiting on Author Waiting for author to respond to review and removed ? - Needs Triage Need team to review and classify 4 - Waiting on Author Waiting for author to respond to review labels Dec 29, 2020
@ibrisa
Copy link
Author

ibrisa commented Dec 31, 2020

Thank you so much for the response and the link, it was helpful to read. It sounds like the R2 is poor because of the model fit then. Our main concern is that there’s such a big difference between the sklearn and cuml, mainly that we always get a negative on cuml, and always get a positive on sklearn. Can you help us identify what the reason might be for that?

Here are the MSE and R2 results for 5 different runs from the data I'd attached previously.

SKLearn MSE cuML MSE SKLearn R2 cuML R2
0.005 0.028 0.80 -0.13
0.005 0.028 0.81 -0.15
0.004 0.025 0.81 -0.12
0.004 0.024 0.80 -0.15
0.004 0.027 0.82 -0.07

@Nanthini10
Copy link
Contributor

@ibrisa Sorry for the late response. Can you try to run the workflow on single-GPU instead of multi-GPU? And set n_streams=1 for better reproducibility.

@ibrisa
Copy link
Author

ibrisa commented Jan 15, 2021

Hi @Nanthini10, thanks for the response.

I have tried it on single-GPU before (on a Power9 system with v14) and the MSE and R2 results from cuML and sklearn matched, leading me to think this is an issue specific to the multi-gpu version.

I reran the single GPU code on x86 system with v17 with the previously linked data and n_streams=1 for reproducibility, and it matched the results I had previously, where the MSE and R2 essentially match.

Here are the MSE/R2 results run on Single GPU, x86 system, v17 with n_streams=1:

SKLearn MSE cuML MSE SKLearn R2 cuML R2
0.004 0.005 0.82 0.81
0.004 0.004 0.81 0.81
0.004 0.005 0.81 0.81

Are there any other tests that would be helpful?

@Nanthini10
Copy link
Contributor

Thanks for the details @ibrisa! This is helpful, let me debug this and get back to you.

@ibrisa
Copy link
Author

ibrisa commented Feb 17, 2021

Hi @Nanthini10,

An update on this ticket - I tried rerunning this using the dask version of the code but specifying only 1 GPU and it gave similarly poor performance metrics.

SKLearn MSE cuML MSE SKLearn R2 cuML R2
0.004 0.042 0.81 -0.78
0.004 0.036 0.82 -0.55
0.004 0.032 0.82 -0.46

The only change in the python code is changing the following line to specify n_workers as 1:

cluster = LocalCUDACluster(threads_per_worker=1, n_workers=1)

@Nanthini10
Copy link
Contributor

@ibrisa Thanks for the update! I was able to reproduce similar behavior and it looks like the predictions are not lining up. There's going to be an update for RandomForest estimators soon - https://github.com/rapidsai/roadmap/issues/299 which might fix the issue. I'll try to get you an update soon.

@ibrisa
Copy link
Author

ibrisa commented Feb 19, 2021

@Nanthini10 Great news that you were able to reproduce it! Unfortunately, the link to the roadmap is broken. Is there a timeline for when the RF update will come out?

@Nanthini10
Copy link
Contributor

@ibrisa Oh sorry! It should come out in about 6 weeks with 0.19

@github-actions
Copy link

This issue has been labeled inactive-30d due to no recent activity in the past 30 days. Please close this issue if no further response or action is needed. Otherwise, please respond with a comment indicating any updates or changes to the original issue and/or confirm this issue still needs to be addressed. This issue will be labeled inactive-90d if there is no activity in the next 60 days.

@ibrisa
Copy link
Author

ibrisa commented Mar 29, 2021

This is still an active issue. Waiting to test with 0.19 fixes

@Nanthini10
Copy link
Contributor

@ibrisa

This is an issue because of how the partitions are made by dask. When you call r2_score on y_test, which is unpartitioned with a partitioned object cuml_y_pred it results in a numerical error.

Use a partitioned y_test_dask instead : y_test_dask = dask_cudf.from_cudf(y_test, npartitions=n_partitions)

print("-----SKLearn")
print("SKLearn MSE:  ", sklearn.metrics.mean_squared_error(y_test_pd,
                                                           skl_y_pred))
print("SKLearn r2:  ", sklearn.metrics.r2_score(y_test_pd,
                                                skl_y_pred))


print("-----CuML Single-GPU")
print("CuML MSE:     ", cuml.metrics.regression.mean_squared_error(y_test,
                                                                   cuml_y_pred_))
print("CuML r2:   ", cuml.metrics.regression.r2_score(y_test,
                                                      cuml_y_pred_))

print("-----CuML Multi-GPU")
print("CuML MSE (y_test_dask, pred):     ",
      cuml.metrics.regression.mean_squared_error(y_test_dask.compute(),
                                                 cuml_y_pred.compute()))

print("CuML MSE (y_test, pred):  ",
      cuml.metrics.regression.mean_squared_error(y_test,
                                       cuml_y_pred.compute()))

print("CuML r2 (y_test_dask, pred):   ",
      cuml.metrics.regression.r2_score(y_test_dask.compute(),
                                       cuml_y_pred.compute()))
print("CuML r2 (y_test, pred):  ",
      cuml.metrics.regression.r2_score(y_test,
                                       cuml_y_pred.compute()))

Outputs:

-----SKLearn
SKLearn MSE:   0.004290874689188821
SKLearn r2:   0.7956350772263211
-----CuML Single-GPU
CuML MSE:      0.0056744325
CuML r2:    0.729739248752594
-----CuML Multi-GPU
CuML MSE (y_test_dask, pred):      0.0068411767
CuML MSE (y_test, pred):   0.031057306
CuML r2 (y_test_dask, pred):    0.6741697788238525
CuML r2 (y_test, pred):   -0.47919130325317383

As you can see, the y_test_dask scores are much closer to the sklearn values (even MSE) than un-partitioned objects.

@Nanthini10 Nanthini10 self-assigned this Aug 11, 2021
@Nanthini10 Nanthini10 added the 4 - Waiting on Author Waiting for author to respond to review label Aug 11, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
4 - Waiting on Author Waiting for author to respond to review bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants