Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

TableVectoriser's "numerical_transformer" does not accept Pipelines #886

Closed
DSoudis opened this issue Jan 29, 2024 · 3 comments
Closed

TableVectoriser's "numerical_transformer" does not accept Pipelines #886

DSoudis opened this issue Jan 29, 2024 · 3 comments
Labels
bug Something isn't working

Comments

@DSoudis
Copy link

DSoudis commented Jan 29, 2024

Describe the bug

As per the Documentation of TableVectoriser here:

Transformer used on numerical features. Can either be a transformer object instance (e.g. StandardScaler), a Pipeline containing the preprocessing steps, ‘drop’ for dropping the columns, ‘remainder’ for applying remainder, or ‘passthrough’ to return the unencoded columns (default).

So i would assume that i can pass a pipeline.

Steps/Code to Reproduce

from sklearn.datasets import load_breast_cancer

from sklearn.linear_model import LogisticRegression
from sklearn.preprocessing import StandardScaler
from sklearn.impute import SimpleImputer
from sklearn.pipeline import make_pipeline

from skrub import TableVectorizer

# get data
cancer = load_breast_cancer(return_X_y = True, as_frame = True)
X = cancer[0]
y = cancer[1]


# Numerical transformer. No NAN in the data but it could be any pipeline
num_prep = make_pipeline(SimpleImputer(add_indicator = True), 
                         StandardScaler())


#TableVectoriser
encoder = TableVectorizer(numerical_transformer = num_prep)


# Model
clf = make_pipeline(encoder, LogisticRegression())
clf.fit(X, y)```

### Expected Results

Should fit the data

### Actual Results

ValueError: 'transformer' must be an instance of sklearn.base.TransformerMixin, 'remainder' or 'passthrough'. Got transformer=Pipeline(steps=[('simpleimputer', SimpleImputer(add_indicator=True)),
                ('standardscaler', StandardScaler())]).

### Versions

```shell
System:
    python: 3.12.1 | packaged by conda-forge | (main, Dec 23 2023, 08:01:35) [Clang 16.0.6 ]
executable: /opt/homebrew/Caskroom/miniforge/base/envs/test_skrub/bin/python
   machine: macOS-14.3-arm64-arm-64bit

Python dependencies:
      sklearn: 1.4.0
          pip: 23.3.2
   setuptools: 69.0.3
        numpy: 1.26.3
        scipy: 1.12.0
       Cython: None
       pandas: 2.2.0
   matplotlib: None
       joblib: 1.3.2
threadpoolctl: 3.2.0

Built with OpenMP: True

threadpoolctl info:
       user_api: blas
   internal_api: openblas
    num_threads: 8
         prefix: libopenblas
       filepath: /opt/homebrew/Caskroom/miniforge/base/envs/test_skrub/lib/libopenblas.0.dylib
        version: 0.3.26
threading_layer: openmp
   architecture: VORTEX

       user_api: openmp
   internal_api: openmp
    num_threads: 8
         prefix: libomp
       filepath: /opt/homebrew/Caskroom/miniforge/base/envs/test_skrub/lib/libomp.dylib
        version: None
0.1.0
@DSoudis DSoudis added the bug Something isn't working label Jan 29, 2024
@jeromedockes
Copy link
Member

thanks a lot for reporting this! We'll make sure to address it in #877

@jeromedockes
Copy link
Member

here is a reproducer, to be added to our test suite:

import pandas as pd
from skrub import TableVectorizer
from sklearn.pipeline import make_pipeline

df = pd.DataFrame(dict(a=[1.1, 2.2]))
tv = TableVectorizer(numerical_transformer=make_pipeline('passthrough'))
tv.fit(df)

@jeromedockes
Copy link
Member

fixed by #902

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants