-
Notifications
You must be signed in to change notification settings - Fork 87
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Regressor doen't work properly #58
Comments
Hi @SkylakeXx , from the error is sounds like you have 9 features in your feature definitions, but have passed 10 columns in the training data. Can you double check your feature definitions and the training data being passed to the SSE? If you can paste your Qlik load script here I can take a look. Also, this error might be misleading. Can you paste the full stack trace of the error here from the SSE logs? There might be multiple exceptions and this may not be the relevant one. The actual error may have occurred before the last SSE function call so make sure you scroll up to the first exception message. |
@priyanka181088 , can you open a new issue please, since this is unrelated to the SkylakeXx 's problem. Also, I'll need to see the expression that you have used together with the logs from the SSE terminal. You can add |
Here you have the log on debug mode, it includes the definition and scrpts I'm using. `SET ThousandSep=','; [Estimators.Model Name] AS [Model Name], Scaler, [Feature Definitions]: [Train-Test]: [Optimization.Model Name] AS [Model Name], // Set up a variable for the scaler parameters // Set up a variable for execution parameters // Set up a variable for grid search parameters LET i = 0; // Create a model for each estimator
Next vModel; // Set up a temporary table for the training and test dataset Fecha & '|' & LET i = 0; // Train and Test each model
Next vModel Drop table TEMP_TRAIN_TEST; ` |
Hi @SkylakeXx , Did you forget to attach the SSE log? I'm referring to the one found under the I did notice one problem in your script; Cheers, |
Thanks nabeel, I changed the order, also this time I attach all the logs generated by the LR, including the SSE. Also I don\t really need this particular exercise, if you have posted any example using a regressor I can manage from there. LOG1: `SKLearnForQlik Log: Thu Oct 31 09:23:28 2019 Model Name: HR-Attrition-LR-GSCV Execution arguments: {'overwrite': True, 'test_size': 0.33, 'random_state': 42, 'compress': 3, 'retain_data': False, 'debug': True} Scaler: StandardScaler, missing: zeros, scale_hashed: True, scale_vectors: True Estimator: LinearRegression Cache updated. Models in cache: TABLE DESCRIPTION SENT TO QLIK: fields { RESPONSE: (1, 3) rows x cols
0 HR-Attrition-LR-GSCV Model successfully saved to disk 09:23:28 10/31/19 Hora estándar romance LOG 2: `Model HR-Attrition-LR-GSCV loaded from cache. REQUEST: (1, 3) rows x cols
0 HR-Attrition-LR-GSCV n_jobs=1|list|int scoring=f1_micro|str Model Name: HR-Attrition-LR-GSCV, Estimator: LinearRegression Grid Search Arguments: {'scoring': 'f1_micro', 'refit': True} Parameter Grid: [{'n_jobs': [1]}] Cache updated. Models in cache: TABLE DESCRIPTION SENT TO QLIK: fields { RESPONSE: (1, 3) rows x cols
0 HR-Attrition-LR-GSCV Hyperparameter grid successfully saved to disk 09:23:28 10/31/19 Hora estándar romance `Model HR-Attrition-LR-GSCV loaded from cache. REQUEST: (10, 6) rows x cols
0 HR-Attrition-LR-GSCV Fecha feature str hashing 4 Cache updated. Models in cache: TABLE DESCRIPTION SENT TO QLIK: fields { RESPONSE: (1, 3) rows x cols
0 HR-Attrition-LR-GSCV Feature definitions successfully saved to model 09:23:29 10/31/19 Hora estándar romance LOG 4: `Model HR-Attrition-LR-GSCV loaded from cache. REQUEST: (20, 2) rows x cols
0 HR-Attrition-LR-GSCV 42737|Asadores|Orense|08103-Fernandez Lopez El... Features for hashing: `Model HR-Attrition-LR-GSCV loaded from cache. REQUEST: (20, 2) rows x cols
0 HR-Attrition-LR-GSCV 42737|Asadores|Orense|08103-Fernandez Lopez El... SSE LOG: `2019-10-31 09:07:49,130 - INFO - main : 61 - Logging enabled The above exception was the direct cause of the following exception: Traceback (most recent call last): The above exception was the direct cause of the following exception: Traceback (most recent call last): |
Hi @SkylakeXx , you've uncovered a bug in the code. The bug has been fixed in the latest release. So one option is to update to v.6.0. There is a sample app for regression with this release as well, but that uses a Keras deep learning model instead of a standard sklearn one. You'll find that app in the Usage section but it won't work with v.5.1. The other option is to use the fix I've added to release 5.1. You'll need to download release 5.1 again, copy Thanks for finding the bug! |
You're welcome, now I'm facing another error in working with the same regressor, should be implied that the inputs and features are continuous but outputs exactly that error, I attach the log errors as always: LOG 1: Model Name: HR-Attrition-LR-GSCV Execution arguments: {'overwrite': True, 'test_size': 0.33, 'cv': 3, 'time_series_split': 0, 'max_train_size': None, 'lags': None, 'lag_target': False, 'scale_target': False, 'make_stationary': None, 'random_state': 42, 'compress': 3, 'retain_data': False, 'calculate_importances': True, 'debug': True} Scaler: StandardScaler, missing: zeros, scale_hashed: True, scale_vectors: True Estimator: LinearRegression Cache updated. Models in cache: TABLE DESCRIPTION SENT TO QLIK: fields { RESPONSE: (1, 3) rows x cols
0 HR-Attrition-LR-GSCV Model successfully saved to disk 12:49:53 11/04/19 Hora estándar romance LOG 2: REQUEST: (1, 3) rows x cols
0 HR-Attrition-LR-GSCV n_jobs=1;2|list|int scoring=f1_micro|str Model Name: HR-Attrition-LR-GSCV, Estimator: LinearRegression Grid Search Arguments: {'scoring': 'f1_micro', 'refit': True} Parameter Grid: [{'n_jobs': [1, 2]}] Cache updated. Models in cache: TABLE DESCRIPTION SENT TO QLIK: fields { RESPONSE: (1, 3) rows x cols
0 HR-Attrition-LR-GSCV Hyperparameter grid successfully saved to disk 12:49:53 11/04/19 Hora estándar romance REQUEST: (10, 6) rows x cols
0 HR-Attrition-LR-GSCV Fecha feature str hashing 4 Cache updated. Models in cache: TABLE DESCRIPTION SENT TO QLIK: fields { RESPONSE: (1, 3) rows x cols
0 HR-Attrition-LR-GSCV Feature definitions successfully saved to model 12:49:53 11/04/19 Hora estándar romance LOG 4: REQUEST: (20, 2) rows x cols
0 HR-Attrition-LR-GSCV 42737|Asadores|Orense|08103-Fernandez Lopez El... Features for hashing: Fit hash_df shape:(13, 24) [5 rows x 24 columns] Fit scale_df shape:(13, 24) [5 rows x 24 columns] Transform hash_df shape:(13, 24) [5 rows x 24 columns] Transform scale_df shape:(13, 24) [5 rows x 24 columns] X_transform shape:(13, 24) [5 rows x 24 columns] LOG 5: REQUEST: (20, 2) rows x cols
0 HR-Attrition-LR-GSCV 42737|Asadores|Orense|08103-Fernandez Lopez El... Features for hashing: Fit hash_df shape:(13, 24) [5 rows x 24 columns] Fit scale_df shape:(13, 24) [5 rows x 24 columns] Transform hash_df shape:(13, 24) [5 rows x 24 columns] Transform scale_df shape:(13, 24) [5 rows x 24 columns] X_transform shape:(13, 24) [5 rows x 24 columns] LOG SSE:
|
I believe the problem is your grid search arguments for the model. You have set the scoring to f1_micro which is only valid for classification. You should use one of the regression scoring parameters defined here. Btw, the SSE log you posted doesn't have line breaks making it very hard to read! For further issues please attach the log found under |
Fix for the final estimator being added to the wrong step in the pipeline when using grid search
Ah this was a bit tricky to comprehend. After a grid search the final estimator needs to be inserted at the end of the pipeline. But there is an issue in the code causing it to be inserted before the final step of the pipeline. Please extract |
That's done, here is the new error: SKLearn Log 1.txt |
@SkylakeXx , this looks like a bug in the Skater package which is used by the SSE for calculating feature importances. Looking at Skater's code I see that it always looks at the first 10 rows of data for each column, and decides whether the column is numeric based on those samples. In your case I think you have a column where the first 10 rows look numeric, but the column actually contains strings. Note that the data is passed to Skater in its original form without transformations like one hot encoding and hashing. You have two options:
|
All working again, I close the issue, thanks for all. |
Hi Nabeel, I have been experiencing difficulties using the regressor, I have used my own data with the classifiers but I have no clue from this point since I;m not able to notice what's the error here (I understand it but works well with any classifier which doesn't makes any sense), I left the code write down:
I'm using 9 features + 1 target, the regressor is LR but crash at the same point with any of the other choices.
ERROR - Exception iterating responses: 9 columns passed, passed data had 10 columns
Traceback (most recent call last):
File "D:\EDISA\Python\qlik-py-tools-5.1\qlik-py-env\lib\site-packages\pandas\core\internals\construction.py", line 500, in _list_to_arrays
content, columns, dtype=dtype, coerce_float=coerce_float
File "D:\EDISA\Python\qlik-py-tools-5.1\qlik-py-env\lib\site-packages\pandas\core\internals\construction.py", line 583, in _convert_object_array
"{con} columns".format(col=len(columns), con=len(content))
AssertionError: 9 columns passed, passed data had 10 columns
,
The text was updated successfully, but these errors were encountered: