-
-
Notifications
You must be signed in to change notification settings - Fork 256
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fixing CI #929
Fixing CI #929
Conversation
This reverts commit 54771f3.
cc @jrbourbeau |
Need to add checks to swtich between scikit-learn 1.0.* and 1.1. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Minor quibble
Co-authored-by: James Bourbeau <jrbourbeau@users.noreply.github.com>
@@ -420,6 +420,7 @@ This section uses :class:`~dask_ml.model_selection.HyperbandSearchCV`, but it ca | |||
also be applied to to :class:`~dask_ml.model_selection.IncrementalSearchCV` too. | |||
|
|||
.. ipython:: python | |||
:okwarning: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should open issues corresponding to the problems that we are ignoring by this. Essentially to ensure that we have it in the backlog.
informative_idx, beta = dask.compute(
informative_idx, beta, scheduler="single-threaded"
)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
💯 agreed. @VibhuJawa - Do you have a reproducer? I wasn't able to get one outside of sphinx ipython directives.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
import dask.distributed # sets dask.config["scheduler"] == "dask.distributed"
import dask.array as da
import dask
a = dask.compute(da.arange(10), scheduler="single-threaded")
make_classification
explicitly calls (as @VibhuJawa indicates) dask.compute
setting the scheduler to "single-threaded"
, which then mismatches with the scheduler set up by importing dask.distributed
.
Plausibly this is the correct fix
diff --git a/dask_ml/datasets.py b/dask_ml/datasets.py
index a561ee0d..3a5f5e99 100644
--- a/dask_ml/datasets.py
+++ b/dask_ml/datasets.py
@@ -370,9 +370,7 @@ def make_classification(
informative_idx = rng.choice(n_features, n_informative, chunks=n_informative)
beta = (rng.random(n_features, chunks=n_features) - 1) * scale
- informative_idx, beta = dask.compute(
- informative_idx, beta, scheduler="single-threaded"
- )
+ informative_idx, beta = dask.compute(informative_idx, beta)
z0 = X[:, informative_idx].dot(beta[informative_idx])
y = rng.random(z0.shape, chunks=chunks[0]) < 1 / (1 + da.exp(-z0))
Although that backs out cb5c2ee which looks like it was a deliberate choice at the time.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@mmccarty, Minimal Example
from dask.distributed import Client
import dask_ml.datasets
client = Client()
X,y = dask_ml.datasets.make_classification_df(chunks=100)
/datasets/vjawa/miniconda3/envs/dask-ml-dev/lib/python3.8/site-packages/dask/base.py:1282: UserWarning: Running on a single-machine scheduler when a distributed client is active might lead to unexpected results.
warnings.warn(
Please note that there is a call in k-means
that will need to be eventually fixed too.
dask-ml/dask_ml/cluster/k_means.py
Lines 472 to 475 in 30b3dea
rng2 = ( | |
random_state.randint(0, np.iinfo("i4").max - 1, chunks=()) | |
.compute(scheduler="single-threaded") | |
.item() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@VibhuJawa - Ah, of course, the warning is reproducible. Sorry, I was thinking about the key error. Opened an issue #933
@@ -420,6 +420,7 @@ This section uses :class:`~dask_ml.model_selection.HyperbandSearchCV`, but it ca | |||
also be applied to to :class:`~dask_ml.model_selection.IncrementalSearchCV` too. | |||
|
|||
.. ipython:: python | |||
:okwarning: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should open issues corresponding to the problems that we are ignoring by this. Essentially to ensure that we have it in the backlog.
informative_idx, beta = dask.compute(
informative_idx, beta, scheduler="single-threaded"
)
@@ -56,6 +56,7 @@ This class is useful for predicting for or transforming large datasets. | |||
We'll make a larger dask array ``X_big`` with 10,000 samples per block. | |||
|
|||
.. ipython:: python | |||
:okwarning: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same for this.
@@ -56,6 +56,7 @@ This class is useful for predicting for or transforming large datasets. | |||
We'll make a larger dask array ``X_big`` with 10,000 samples per block. | |||
|
|||
.. ipython:: python | |||
:okwarning: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same for this.
@@ -29,6 +29,14 @@ | |||
PANDAS_1_2_0 = PANDAS_VERSION > packaging.version.parse("1.2.0") | |||
WINDOWS = os.name == "nt" | |||
|
|||
SKLEARN_1_1_X = SK_VERSION >= packaging.version.parse("1.1") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This name is misleading, since it is True
for 1.2 (say). I realise this mimics the naming above, so I guess fine.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is the convention used by packages like six
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Modulo minor comments, I think this (from my inexpert eye) looks fine.
@@ -29,6 +29,14 @@ | |||
PANDAS_1_2_0 = PANDAS_VERSION > packaging.version.parse("1.2.0") | |||
WINDOWS = os.name == "nt" | |||
|
|||
SKLEARN_1_1_X = SK_VERSION >= packaging.version.parse("1.1") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is the convention used by packages like six
.
Thanks! Just to confirm, we have issues for each of the distinct warnings we're now silencing? |
Thanks @TomAugspurger! We have one issue #933 to follow up, which I'll look at soon. It may lead to more issues but we can track them there. |
This PR follows on from #909 and fixes the CI build with following changes
:okwarning:
and:okexcept:
options to sphinx ipython blockspin versions ofsklearn
andscipy
that are currently compatible withdask-ml
xgboost
fromci/environment-docs.yml
If this PR is acceptable, #924 and #931 can be closed.