Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fixing CI #929

Merged
merged 10 commits into from
May 27, 2022
Merged

Fixing CI #929

merged 10 commits into from
May 27, 2022

Conversation

mmccarty
Copy link
Member

@mmccarty mmccarty commented May 20, 2022

This PR follows on from #909 and fixes the CI build with following changes

  • adds :okwarning: and :okexcept: options to sphinx ipython blocks
  • pin versions of sklearn and scipy that are currently compatible with dask-ml
  • upgrade test and associated code to support both sklearn 1.0 and 1.1
  • removes xgboost from ci/environment-docs.yml

If this PR is acceptable, #924 and #931 can be closed.

stsievert added a commit to kchare/dask-ml that referenced this pull request May 21, 2022
stsievert added a commit to kchare/dask-ml that referenced this pull request May 21, 2022
@mmccarty mmccarty marked this pull request as ready for review May 23, 2022 14:43
@mmccarty mmccarty mentioned this pull request May 23, 2022
@jakirkham jakirkham requested a review from TomAugspurger May 23, 2022 19:31
@mmccarty
Copy link
Member Author

cc @jrbourbeau

@mmccarty
Copy link
Member Author

Need to add checks to swtich between scikit-learn 1.0.* and 1.1.

@mmccarty mmccarty closed this May 25, 2022
@mmccarty mmccarty reopened this May 25, 2022
dask_ml/metrics/regression.py Show resolved Hide resolved
dask_ml/_compat.py Outdated Show resolved Hide resolved
Copy link

@wence- wence- left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor quibble

ci/environment-3.8.yaml Outdated Show resolved Hide resolved
docs/source/conf.py Outdated Show resolved Hide resolved
@@ -420,6 +420,7 @@ This section uses :class:`~dask_ml.model_selection.HyperbandSearchCV`, but it ca
also be applied to to :class:`~dask_ml.model_selection.IncrementalSearchCV` too.

.. ipython:: python
:okwarning:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should open issues corresponding to the problems that we are ignoring by this. Essentially to ensure that we have it in the backlog.

informative_idx, beta = dask.compute(
        informative_idx, beta, scheduler="single-threaded"
    )

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💯 agreed. @VibhuJawa - Do you have a reproducer? I wasn't able to get one outside of sphinx ipython directives.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

import dask.distributed # sets dask.config["scheduler"] == "dask.distributed"
import dask.array as da
import dask

a = dask.compute(da.arange(10), scheduler="single-threaded")

make_classification explicitly calls (as @VibhuJawa indicates) dask.compute setting the scheduler to "single-threaded", which then mismatches with the scheduler set up by importing dask.distributed.

Plausibly this is the correct fix

diff --git a/dask_ml/datasets.py b/dask_ml/datasets.py
index a561ee0d..3a5f5e99 100644
--- a/dask_ml/datasets.py
+++ b/dask_ml/datasets.py
@@ -370,9 +370,7 @@ def make_classification(
     informative_idx = rng.choice(n_features, n_informative, chunks=n_informative)
     beta = (rng.random(n_features, chunks=n_features) - 1) * scale
 
-    informative_idx, beta = dask.compute(
-        informative_idx, beta, scheduler="single-threaded"
-    )
+    informative_idx, beta = dask.compute(informative_idx, beta)
 
     z0 = X[:, informative_idx].dot(beta[informative_idx])
     y = rng.random(z0.shape, chunks=chunks[0]) < 1 / (1 + da.exp(-z0))

Although that backs out cb5c2ee which looks like it was a deliberate choice at the time.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mmccarty, Minimal Example

from dask.distributed import Client
import dask_ml.datasets
 
client = Client()


X,y = dask_ml.datasets.make_classification_df(chunks=100)
/datasets/vjawa/miniconda3/envs/dask-ml-dev/lib/python3.8/site-packages/dask/base.py:1282: UserWarning: Running on a single-machine scheduler when a distributed client is active might lead to unexpected results.
  warnings.warn(

Please note that there is a call in k-means that will need to be eventually fixed too.

rng2 = (
random_state.randint(0, np.iinfo("i4").max - 1, chunks=())
.compute(scheduler="single-threaded")
.item()

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@wence- , I tried fixing it by removing the explicit cals with #924 but that causes issues with key-names downstream here . I suggest going with @mmccarty's fix to unblock the CI and then we can tackle this issue from the backlog.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@VibhuJawa - Ah, of course, the warning is reproducible. Sorry, I was thinking about the key error. Opened an issue #933

@@ -420,6 +420,7 @@ This section uses :class:`~dask_ml.model_selection.HyperbandSearchCV`, but it ca
also be applied to to :class:`~dask_ml.model_selection.IncrementalSearchCV` too.

.. ipython:: python
:okwarning:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should open issues corresponding to the problems that we are ignoring by this. Essentially to ensure that we have it in the backlog.

informative_idx, beta = dask.compute(
        informative_idx, beta, scheduler="single-threaded"
    )

@@ -56,6 +56,7 @@ This class is useful for predicting for or transforming large datasets.
We'll make a larger dask array ``X_big`` with 10,000 samples per block.

.. ipython:: python
:okwarning:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same for this.

@@ -56,6 +56,7 @@ This class is useful for predicting for or transforming large datasets.
We'll make a larger dask array ``X_big`` with 10,000 samples per block.

.. ipython:: python
:okwarning:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same for this.

@@ -29,6 +29,14 @@
PANDAS_1_2_0 = PANDAS_VERSION > packaging.version.parse("1.2.0")
WINDOWS = os.name == "nt"

SKLEARN_1_1_X = SK_VERSION >= packaging.version.parse("1.1")
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This name is misleading, since it is True for 1.2 (say). I realise this mimics the naming above, so I guess fine.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the convention used by packages like six.

Copy link

@wence- wence- left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Modulo minor comments, I think this (from my inexpert eye) looks fine.

@@ -29,6 +29,14 @@
PANDAS_1_2_0 = PANDAS_VERSION > packaging.version.parse("1.2.0")
WINDOWS = os.name == "nt"

SKLEARN_1_1_X = SK_VERSION >= packaging.version.parse("1.1")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the convention used by packages like six.

@TomAugspurger TomAugspurger merged commit 7b8c8e0 into dask:main May 27, 2022
@TomAugspurger
Copy link
Member

TomAugspurger commented May 27, 2022

Thanks! Just to confirm, we have issues for each of the distinct warnings we're now silencing?

@mmccarty
Copy link
Member Author

Thanks @TomAugspurger! We have one issue #933 to follow up, which I'll look at soon. It may lead to more issues but we can track them there.

@mmccarty mmccarty deleted the ci-fixes branch May 27, 2022 14:13
This was referenced May 27, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants