Skip to content

Commit

Permalink
Update pydoc and promote ColumnTransformer out of experimental (#4509)
Browse files Browse the repository at this point in the history
- Promote `ColumnTransformer` out of experimental
- Add `ColumnTransformer` to documentation (answers to #4418)
- Update preprocessing documentation

Authors:
  - Victor Lafargue (https://github.com/viclafargue)
  - Dante Gama Dessavre (https://github.com/dantegd)

Approvers:
  - Micka (https://github.com/lowener)
  - Dante Gama Dessavre (https://github.com/dantegd)

URL: #4509
  • Loading branch information
viclafargue authored May 23, 2022
1 parent bc43c0e commit d9dd8ca
Show file tree
Hide file tree
Showing 8 changed files with 90 additions and 83 deletions.
47 changes: 38 additions & 9 deletions docs/source/api.rst
Original file line number Diff line number Diff line change
Expand Up @@ -82,9 +82,46 @@ Feature and Label Encoding (Single-GPU)
.. autoclass:: cuml.preprocessing.TargetEncoder.TargetEncoder
:members:

Feature Scaling and Normalization (Single-GPU)
----------------------------------------------
.. autoclass:: cuml.preprocessing.MaxAbsScaler
:members:
.. autoclass:: cuml.preprocessing.MinMaxScaler
:members:
.. autoclass:: cuml.preprocessing.Normalizer
:members:
.. autoclass:: cuml.preprocessing.RobustScaler
:members:
.. autoclass:: cuml.preprocessing.StandardScaler
:members:
.. autofunction:: cuml.preprocessing.maxabs_scale
.. autofunction:: cuml.preprocessing.minmax_scale
.. autofunction:: cuml.preprocessing.normalize
.. autofunction:: cuml.preprocessing.robust_scale
.. autofunction:: cuml.preprocessing.scale

Other preprocessing methods (Single-GPU)
----------------------------------------
.. autoclass:: cuml.preprocessing.Binarizer
:members:
.. autoclass:: cuml.preprocessing.FunctionTransformer
:members:
.. autoclass:: cuml.preprocessing.KBinsDiscretizer
:members:
.. autoclass:: cuml.preprocessing.MissingIndicator
:members:
.. autoclass:: cuml.preprocessing.PolynomialFeatures
:members:
.. autoclass:: cuml.preprocessing.SimpleImputer
:members:
.. autofunction:: cuml.preprocessing.add_dummy_feature
.. autofunction:: cuml.preprocessing.binarize

.. automodule:: cuml.compose
:members: ColumnTransformer, make_column_transformer, make_column_selector

Text Preprocessing (Single-GPU)
---------------------------------------
-------------------------------
.. autoclass:: cuml.preprocessing.text.stem.PorterStemmer
:members:

Expand Down Expand Up @@ -589,14 +626,6 @@ Experimental
the root `cuml` package. Each `experimental` submodule must be imported
separately.

Preprocessing
-------------
.. automodule:: cuml.experimental.preprocessing
:members: Binarizer, KBinsDiscretizer, MaxAbsScaler, MinMaxScaler,
Normalizer, RobustScaler, SimpleImputer, StandardScaler,
add_dummy_feature, binarize, minmax_scale, normalize,
PolynomialFeatures, robust_scale, scale

Linear Models
-------------
.. autoclass:: cuml.experimental.linear_model.Lars
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -439,7 +439,7 @@ class ColumnTransformer(TransformerMixin, BaseComposition, BaseEstimator):
its parameters to be set using ``set_params`` and searched in grid
search.
transformer : {'drop', 'passthrough'} or estimator
Estimator must support :term:`fit` and :term:`transform`.
Estimator must support `fit` and `transform`.
Special-cased strings 'drop' and 'passthrough' are accepted as
well, to indicate to drop the columns or to pass them through
untransformed, respectively.
Expand All @@ -464,9 +464,9 @@ class ColumnTransformer(TransformerMixin, BaseComposition, BaseEstimator):
the transformers.
By setting ``remainder`` to be an estimator, the remaining
non-specified columns will use the ``remainder`` estimator. The
estimator must support :term:`fit` and :term:`transform`.
estimator must support `fit` and `transform`.
Note that using this feature requires that the DataFrame columns
input at :term:`fit` and :term:`transform` have identical order.
input at `fit` and `transform` have identical order.
sparse_threshold : float, default=0.3
If the output of the different transformers contains sparse matrices,
Expand Down Expand Up @@ -1031,7 +1031,7 @@ def make_column_transformer(*transformers,
transformer objects to be applied to subsets of the data.
transformer : {'drop', 'passthrough'} or estimator
Estimator must support :term:`fit` and :term:`transform`.
Estimator must support `fit` and `transform`.
Special-cased strings 'drop' and 'passthrough' are accepted as
well, to indicate to drop the columns or to pass them through
untransformed, respectively.
Expand All @@ -1056,7 +1056,7 @@ def make_column_transformer(*transformers,
the transformers.
By setting ``remainder`` to be an estimator, the remaining
non-specified columns will use the ``remainder`` estimator. The
estimator must support :term:`fit` and :term:`transform`.
estimator must support `fit` and `transform`.
sparse_threshold : float, default=0.3
If the transformed output consists of a mix of sparse and dense data,
Expand All @@ -1069,7 +1069,7 @@ def make_column_transformer(*transformers,
n_jobs : int, default=None
Number of jobs to run in parallel.
``None`` means 1 unless in a :obj:`joblib.parallel_backend` context.
``-1`` means using all processors. See :term:`Glossary <n_jobs>`
``-1`` means using all processors. See `Glossary <n_jobs>`
for more details.
verbose : bool, default=False
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -93,7 +93,7 @@ class KBinsDiscretizer(TransformerMixin,
np.concatenate([-np.inf, bin_edges_[i][1:-1], np.inf])
You can combine ``KBinsDiscretizer`` with
:class:`sklearn.compose.ColumnTransformer` if you only want to preprocess
:class:`cuml.compose.ColumnTransformer` if you only want to preprocess
part of the features.
``KBinsDiscretizer`` might produce constant features (e.g., when
Expand Down
27 changes: 27 additions & 0 deletions python/cuml/compose/__init__.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,27 @@
#
# Copyright (c) 2020-2022, NVIDIA CORPORATION.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
#

from cuml._thirdparty.sklearn.preprocessing import ColumnTransformer, \
make_column_transformer, make_column_selector


__all__ = [
# Classes
'ColumnTransformer',
# Functions
'make_column_transformer',
'make_column_selector'
]
53 changes: 0 additions & 53 deletions python/cuml/experimental/preprocessing/__init__.py

This file was deleted.

28 changes: 16 additions & 12 deletions python/cuml/preprocessing/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -20,38 +20,42 @@
from cuml.preprocessing.TargetEncoder import TargetEncoder
from cuml.preprocessing import text

from cuml._thirdparty.sklearn.preprocessing import StandardScaler, \
MinMaxScaler, MaxAbsScaler, Normalizer, Binarizer, PolynomialFeatures, \
SimpleImputer, RobustScaler, KBinsDiscretizer, MissingIndicator
from cuml._thirdparty.sklearn.preprocessing import scale, minmax_scale, \
maxabs_scale, normalize, add_dummy_feature, binarize, robust_scale
from cuml._thirdparty.sklearn.preprocessing import Binarizer, \
FunctionTransformer, KBinsDiscretizer, MaxAbsScaler, MinMaxScaler, \
MissingIndicator, Normalizer, PolynomialFeatures, RobustScaler, \
SimpleImputer, StandardScaler

from cuml._thirdparty.sklearn.preprocessing import add_dummy_feature, \
binarize, maxabs_scale, minmax_scale, normalize, robust_scale, scale


__all__ = [
# Classes
'Binarizer',
'FunctionTransformer',
'KBinsDiscretizer',
'LabelBinarizer',
'LabelEncoder',
'MaxAbsScaler',
'MinMaxScaler',
'MissingIndicator',
'Normalizer',
'OneHotEncoder',
'PolynomialFeatures',
'RobustScaler',
'SimpleImputer',
'MissingIndicator',
'StandardScaler',
'LabelEncoder',
'LabelBinarizer',
'OneHotEncoder',
'TargetEncoder',
# Functions
'add_dummy_feature',
'binarize',
'minmax_scale',
'label_binarize',
'maxabs_scale',
'minmax_scale',
'normalize',
'robust_scale',
'scale',
'label_binarize',
'train_test_split',
# Modules
'text',
'text'
]
2 changes: 1 addition & 1 deletion python/cuml/tests/test_compose.py
Original file line number Diff line number Diff line change
Expand Up @@ -20,7 +20,7 @@
from pandas import DataFrame as pdDataFrame
from cudf import DataFrame as cuDataFrame

from cuml.experimental.preprocessing import \
from cuml.compose import \
ColumnTransformer as cuColumnTransformer, \
make_column_transformer as cu_make_column_transformer, \
make_column_selector as cu_make_column_selector
Expand Down
2 changes: 1 addition & 1 deletion python/cuml/tests/test_preprocessing.py
Original file line number Diff line number Diff line change
Expand Up @@ -26,7 +26,7 @@
RobustScaler as cuRobustScaler, \
KBinsDiscretizer as cuKBinsDiscretizer, \
MissingIndicator as cuMissingIndicator
from cuml.experimental.preprocessing import \
from cuml.preprocessing import \
FunctionTransformer as cuFunctionTransformer
from cuml.preprocessing import scale as cu_scale, \
minmax_scale as cu_minmax_scale, \
Expand Down

0 comments on commit d9dd8ca

Please sign in to comment.