Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Regressor doen't work properly #58

Closed
SkylakeXx opened this issue Oct 24, 2019 · 14 comments
Closed

Regressor doen't work properly #58

SkylakeXx opened this issue Oct 24, 2019 · 14 comments

Comments

@SkylakeXx
Copy link

Hi Nabeel, I have been experiencing difficulties using the regressor, I have used my own data with the classifiers but I have no clue from this point since I;m not able to notice what's the error here (I understand it but works well with any classifier which doesn't makes any sense), I left the code write down:

I'm using 9 features + 1 target, the regressor is LR but crash at the same point with any of the other choices.

ERROR - Exception iterating responses: 9 columns passed, passed data had 10 columns
Traceback (most recent call last):
File "D:\EDISA\Python\qlik-py-tools-5.1\qlik-py-env\lib\site-packages\pandas\core\internals\construction.py", line 500, in _list_to_arrays
content, columns, dtype=dtype, coerce_float=coerce_float
File "D:\EDISA\Python\qlik-py-tools-5.1\qlik-py-env\lib\site-packages\pandas\core\internals\construction.py", line 583, in _convert_object_array
"{con} columns".format(col=len(columns), con=len(content))
AssertionError: 9 columns passed, passed data had 10 columns

,

@nabeel-oz
Copy link
Owner

Hi @SkylakeXx , from the error is sounds like you have 9 features in your feature definitions, but have passed 10 columns in the training data. Can you double check your feature definitions and the training data being passed to the SSE? If you can paste your Qlik load script here I can take a look.

Also, this error might be misleading. Can you paste the full stack trace of the error here from the SSE logs? There might be multiple exceptions and this may not be the relevant one. The actual error may have occurred before the last SSE function call so make sure you scroll up to the first exception message.

@priyanka181088
Copy link

priyanka181088 commented Oct 25, 2019

Hi Nabeel,
Thank you so much for the wonderful SSE.
we are trying to implement the pyTools functions on our application but while implementing the formula for forecasting with our own data it keeps on calculating the graph leading us to nowhere and sometimes it gives us unknown error
pytools error

@nabeel-oz
Copy link
Owner

@priyanka181088 , can you open a new issue please, since this is unrelated to the SkylakeXx 's problem.

Also, I'll need to see the expression that you have used together with the logs from the SSE terminal. You can add debug=true to your expression's arguments to get a detailed log.

@SkylakeXx
Copy link
Author

Here you have the log on debug mode, it includes the definition and scrpts I'm using.

`SET ThousandSep=',';
SET DecimalSep='.';
SET MoneyThousandSep=',';
SET MoneyDecimalSep='.';
SET MoneyFormat='$#,##0.00;-$#,##0.00';
SET TimeFormat='h:mm:ss TT';
SET DateFormat='M/D/YYYY';
SET TimestampFormat='M/D/YYYY h:mm:ss[.fff] TT';
SET FirstWeekDay=6;
SET BrokenWeeks=1;
SET ReferenceDay=0;
SET FirstMonthOfYear=1;
SET CollationLocale='en-US';
SET CreateSearchIndexOnReload=1;
SET MonthNames='Jan;Feb;Mar;Apr;May;Jun;Jul;Aug;Sep;Oct;Nov;Dec';
SET LongMonthNames='January;February;March;April;May;June;July;August;September;October;November;December';
SET DayNames='Mon;Tue;Wed;Thu;Fri;Sat;Sun';
SET LongDayNames='Monday;Tuesday;Wednesday;Thursday;Friday;Saturday;Sunday';
SET NumericalAbbreviation='3:k;6:M;9:G;12:T;15:P;18:E;21:Z;24:Y;-3:m;-6:µ;-9:n;-12:p;-15:f;-18:a;-21:z;-24:y';

[Estimators.Model Name] AS [Model Name],
[Estimator];
LOAD * INLINE
[
Estimators.Model Name,Estimator
HR-Attrition-LR-GSCV,LinearRegression
](delimiter is ',');

Scaler,
[Null Value Handling],
[Scale Hashed Features],
[Hyperparameters] AS [Scaler.Hyperparameters];
LOAD * INLINE
[
Scaler;Null Value Handling;Scale Hashed Features;Hyperparameters
StandardScaler;zeros;true;with_mean=True|bool, with_std=True|bool
](delimiter is ';');

[Feature Definitions]:
load * Inline [
Name, Variable_Type, Data_Type, Feature_Strategy, Hash_Features
"Ventas", target, float, "scaling"
"Fecha", feature, str, "hashing", 4
"Tipo Cliente", feature, str, "one hot enconding"
"Zona", feature, str, "one hot enconding"
"Cliente", feature, str, "hashing", 4
"R.Produccion", feature, str, "one hot enconding"
"Fam_desc", feature, str, "hashing", 4
"Articulo", feature, str, "hashing", 4
"Agente_Pedido", feature, str, "hashing", 4
"Agente Cliente", feature, str, "hashing", 4
];

[Train-Test]:
//sample 0.03
First 20
LOAD
Fecha,
"Tipo Cliente",
Zona,
Cliente,
R.Produccion,
Fam_desc,
Articulo,
Agente_Pedido,
"Agente Cliente",
Ventas
FROM [lib://DATOS/Datos.xlsx]
(ooxml, embedded labels, table is Sheet1)
;

[Optimization.Model Name] AS [Model Name],
[Hyperparameters];
LOAD * INLINE
[
Optimization.Model Name:Hyperparameters
HR-Attrition-LR-GSCV:n_jobs=1|list|int
](delimiter is ':');

// Set up a variable for the scaler parameters
LET vScalerArgs = 'scaler=' & peek('Scaler', 0, 'Scaler') & ',' &
'missing=' & peek('Null Value Handling', 0, 'Scaler') & ',' &
'scale_hashed=' & peek('Scale Hashed Features', 0, 'Scaler') & ',' &
peek('Scaler.Hyperparameters', 0, 'Scaler');

// Set up a variable for execution parameters
LET vExecutionArgs = 'overwrite=true,cv=5,calculate_importances=true, debug=true';
//LET vExecutionArgs = 'overwrite=true,cv=3,calculate_importances=true';

// Set up a variable for grid search parameters
LET vGridSearch = 'scoring=f1_micro|str';

LET i = 0;

// Create a model for each estimator
For Each vModel in FieldValueList('Model Name')

// Set up a variable for this estimator's parameters
LET vEstimatorArgs = 'estimator=' & peek('Estimator', $(i), 'Estimators') & ',' & peek('Hyperparameters', $(i), 'Estimators');

// Set up a temporary table for the model parameters
[MODEL_INIT]:
LOAD
    [Model Name] as Model_Name,
    '$(vEstimatorArgs)' as EstimatorArgs,
    '$(vScalerArgs)' as ScalerArgs,
    '$(vExecutionArgs)' as ExecutionArgs    
RESIDENT Estimators
WHERE [Model Name] = '$(vModel)';

// Use the LOAD...EXTENSION syntax to call the Setup function
[Result-Setup]:
LOAD
    model_name,
    result,
    timestamp
EXTENSION PyTools.sklearn_Setup(MODEL_INIT{Model_Name, EstimatorArgs, ScalerArgs, ExecutionArgs});

[PARAM_GRID]:
LOAD
	[Model Name] as Model_Name,
    [Hyperparameters],
    '$(vGridSearch)' as GridSearchArgs  
RESIDENT Optimization
WHERE [Model Name] = '$(vModel)';

// Use the LOAD...EXTENSION syntax to call the Setup function
[Result-Setup]:
LOAD
    model_name,
    result,
    timestamp
EXTENSION PyTools.sklearn_Set_Param_Grid(PARAM_GRID{Model_Name, Hyperparameters, GridSearchArgs});

[FEATURES]:
LOAD
	'$(vModel)' as Model_Name,
    Name,
    Variable_Type,
    Data_Type,
    Feature_Strategy,
    Hash_Features
RESIDENT [Feature Definitions];

// Use the LOAD...EXTENSION syntax to call the Set_Features function
[Result-Setup]:
LOAD
    model_name,
    result,
    timestamp
EXTENSION PyTools.sklearn_Set_Features(FEATURES{Model_Name, Name, Variable_Type, Data_Type, Feature_Strategy, Hash_Features});

Drop table MODEL_INIT, [PARAM_GRID], FEATURES;   

LET i = $(i) + 1;

Next vModel;

// Set up a temporary table for the training and test dataset

Fecha & '|' &
"Tipo Cliente" & '|' &
Zona & '|' &
Cliente & '|' &
R.Produccion & '|' &
Fam_desc & '|' &
Articulo & '|' &
Agente_Pedido & '|' &
"Agente Cliente" & '|' &
Ventas AS N_Features
RESIDENT [Train-Test];

LET i = 0;

// Train and Test each model
For Each vModel in FieldValueList('Model Name')

[TEMP_SAMPLES]:
LOAD
	'$(vModel)' as Model_Name,
    N_Features
RESIDENT [TEMP_TRAIN_TEST];

// Use the LOAD...EXTENSION syntax to call the Fit function
[Result-Fit]:
LOAD
    model_name,
    result as fit_result, 
    time_stamp as fit_timestamp, 
    score_result, 
    score
EXTENSION PyTools.sklearn_Fit(TEMP_SAMPLES{Model_Name, N_Features});

// Use the LOAD...EXTENSION syntax to call the Get_Metrics function
[Result-Metrics]:
LOAD
    model_name,
    class,
    accuracy,
    precision,
    precision_std,
    recall,
    recall_std,
    fscore,
    fscore_std
EXTENSION PyTools.sklearn_Get_Metrics(TEMP_SAMPLES{Model_Name});

// Use the LOAD...EXTENSION syntax to call the Get_Confusion_Matrix function
[Result-ConfusionMatrix]:
LOAD
    model_name,
    true_label,
    pred_label,
    count
EXTENSION PyTools.sklearn_Get_Confusion_Matrix(TEMP_SAMPLES{Model_Name});

// Use the LOAD...EXTENSION syntax to call the Get_Best_Params function
[Best-Params]:
LOAD
	model_name as [Model Name],
    best_params
EXTENSION PyTools.sklearn_Get_Best_Params(TEMP_SAMPLES{Model_Name});



// Use the LOAD...EXTENSION syntax to call the Explain_Importances function
[Result-Importances]:
LOAD
    model_name,
    feature_name,
    importance
EXTENSION PyTools.sklearn_Explain_Importances(TEMP_SAMPLES{Model_Name});

Drop table TEMP_SAMPLES;   

LET i = $(i) + 1;

Next vModel

Drop table TEMP_TRAIN_TEST; `

@nabeel-oz
Copy link
Owner

Hi @SkylakeXx ,

Did you forget to attach the SSE log? I'm referring to the one found under the ../qlik-py-tools/qlik-py-env/core/logs directory.

I did notice one problem in your script; Ventas is the first field in your feature definitions but the last field in your training data. The order of the feature definitions should match what you're sending in the training data as the SSE uses these definitions to interpret the data correctly.

Cheers,
Nabeel

@SkylakeXx
Copy link
Author

Thanks nabeel, I changed the order, also this time I attach all the logs generated by the LR, including the SSE.

Also I don\t really need this particular exercise, if you have posted any example using a regressor I can manage from there.

LOG1:

`SKLearnForQlik Log: Thu Oct 31 09:23:28 2019

Model Name: HR-Attrition-LR-GSCV

Execution arguments: {'overwrite': True, 'test_size': 0.33, 'random_state': 42, 'compress': 3, 'retain_data': False, 'debug': True}

Scaler: StandardScaler, missing: zeros, scale_hashed: True, scale_vectors: True
Scaler kwargs: {'with_mean': True, 'with_std': True}

Estimator: LinearRegression
Estimator kwargs: {}

Cache updated. Models in cache:
['HR-Attrition-LR-GSCV']

TABLE DESCRIPTION SENT TO QLIK:

fields {
name: "model_name"
}
fields {
name: "result"
}
fields {
name: "timestamp"
}
name: "SSE-Response"
numberOfRows: 1

RESPONSE: (1, 3) rows x cols
Sample Data:

         model_name                            result                               time_stamp

0 HR-Attrition-LR-GSCV Model successfully saved to disk 09:23:28 10/31/19 Hora estándar romance
...
model_name result time_stamp
0 HR-Attrition-LR-GSCV Model successfully saved to disk 09:23:28 10/31/19 Hora estándar romance`

LOG 2:

`Model HR-Attrition-LR-GSCV loaded from cache.

REQUEST: (1, 3) rows x cols
Sample Data:

         model_name     estimator_args      grid_search_args

0 HR-Attrition-LR-GSCV n_jobs=1|list|int scoring=f1_micro|str
...
model_name estimator_args grid_search_args
0 HR-Attrition-LR-GSCV n_jobs=1|list|int scoring=f1_micro|str

Model Name: HR-Attrition-LR-GSCV, Estimator: LinearRegression

Grid Search Arguments: {'scoring': 'f1_micro', 'refit': True}

Parameter Grid: [{'n_jobs': [1]}]

Cache updated. Models in cache:
['HR-Attrition-LR-GSCV']

TABLE DESCRIPTION SENT TO QLIK:

fields {
name: "model_name"
}
fields {
name: "result"
}
fields {
name: "timestamp"
}
name: "SSE-Response"
numberOfRows: 1

RESPONSE: (1, 3) rows x cols
Sample Data:

         model_name                                          result                               time_stamp

0 HR-Attrition-LR-GSCV Hyperparameter grid successfully saved to disk 09:23:28 10/31/19 Hora estándar romance
...
model_name result time_stamp
0 HR-Attrition-LR-GSCV Hyperparameter grid successfully saved to disk 09:23:28 10/31/19 Hora estándar romance
`
LOG 3:

`Model HR-Attrition-LR-GSCV loaded from cache.

REQUEST: (10, 6) rows x cols
Sample Data:

         model_name          name variable_type data_type   feature_strategy strategy_args

0 HR-Attrition-LR-GSCV Fecha feature str hashing 4
1 HR-Attrition-LR-GSCV Tipo Cliente feature str one hot enconding
2 HR-Attrition-LR-GSCV Zona feature str one hot enconding
3 HR-Attrition-LR-GSCV Cliente feature str hashing 4
4 HR-Attrition-LR-GSCV R.Produccion feature str one hot enconding
...
model_name name variable_type data_type feature_strategy strategy_args
5 HR-Attrition-LR-GSCV Fam_desc feature str hashing 4
6 HR-Attrition-LR-GSCV Articulo feature str hashing 4
7 HR-Attrition-LR-GSCV Agente_Pedido feature str hashing 4
8 HR-Attrition-LR-GSCV Agente Cliente feature str hashing 4
9 HR-Attrition-LR-GSCV Ventas target float scaling

Cache updated. Models in cache:
['HR-Attrition-LR-GSCV']

TABLE DESCRIPTION SENT TO QLIK:

fields {
name: "model_name"
}
fields {
name: "result"
}
fields {
name: "timestamp"
}
name: "SSE-Response"
numberOfRows: 1

RESPONSE: (1, 3) rows x cols
Sample Data:

         model_name                                           result                               time_stamp

0 HR-Attrition-LR-GSCV Feature definitions successfully saved to model 09:23:29 10/31/19 Hora estándar romance
...
model_name result time_stamp
0 HR-Attrition-LR-GSCV Feature definitions successfully saved to model 09:23:29 10/31/19 Hora estándar romance
`

LOG 4:

`Model HR-Attrition-LR-GSCV loaded from cache.

REQUEST: (20, 2) rows x cols
Sample Data:

         model_name                                         n_features

0 HR-Attrition-LR-GSCV 42737|Asadores|Orense|08103-Fernandez Lopez El...
1 HR-Attrition-LR-GSCV 42737|Asadores|Orense|08103-Fernandez Lopez El...
2 HR-Attrition-LR-GSCV 42737|Asadores|Orense|41859-Xantar Com Para Ll...
3 HR-Attrition-LR-GSCV 42737|Asadores|Orense|41859-Xantar Com Para Ll...
4 HR-Attrition-LR-GSCV 42737|Autoservicio Tradicional|Asturias|02333-...
...
model_name n_features
15 HR-Attrition-LR-GSCV 42737|Bar-Cerveceria-Cafeteria|Asturias|06170-...
16 HR-Attrition-LR-GSCV 42737|Bar-Cerveceria-Cafeteria|Asturias|06170-...
17 HR-Attrition-LR-GSCV 42737|Bar-Cerveceria-Cafeteria|Asturias|06170-...
18 HR-Attrition-LR-GSCV 42737|Bar-Cerveceria-Cafeteria|Asturias|06461-...
19 HR-Attrition-LR-GSCV 42737|Bar-Cerveceria-Cafeteria|Asturias|06937-...

Features for hashing:
model_name name variable_type data_type feature_strategy strategy_args
name
Fecha HR-Attrition-LR-GSCV Fecha feature str hashing 4
Cliente HR-Attrition-LR-GSCV Cliente feature str hashing 4
Fam_desc HR-Attrition-LR-GSCV Fam_desc feature str hashing 4
Articulo HR-Attrition-LR-GSCV Articulo feature str hashing 4
Agente_Pedido HR-Attrition-LR-GSCV Agente_Pedido feature str hashing 4
Agente Cliente HR-Attrition-LR-GSCV Agente Cliente feature str hashing 4
`
LOG 5:

`Model HR-Attrition-LR-GSCV loaded from cache.

REQUEST: (20, 2) rows x cols
Sample Data:

         model_name                                         n_features

0 HR-Attrition-LR-GSCV 42737|Asadores|Orense|08103-Fernandez Lopez El...
1 HR-Attrition-LR-GSCV 42737|Asadores|Orense|08103-Fernandez Lopez El...
2 HR-Attrition-LR-GSCV 42737|Asadores|Orense|41859-Xantar Com Para Ll...
3 HR-Attrition-LR-GSCV 42737|Asadores|Orense|41859-Xantar Com Para Ll...
4 HR-Attrition-LR-GSCV 42737|Autoservicio Tradicional|Asturias|02333-...
...
model_name n_features
15 HR-Attrition-LR-GSCV 42737|Bar-Cerveceria-Cafeteria|Asturias|06170-...
16 HR-Attrition-LR-GSCV 42737|Bar-Cerveceria-Cafeteria|Asturias|06170-...
17 HR-Attrition-LR-GSCV 42737|Bar-Cerveceria-Cafeteria|Asturias|06170-...
18 HR-Attrition-LR-GSCV 42737|Bar-Cerveceria-Cafeteria|Asturias|06461-...
19 HR-Attrition-LR-GSCV 42737|Bar-Cerveceria-Cafeteria|Asturias|06937-...
`

SSE LOG:

`2019-10-31 09:07:49,130 - INFO - main : 61 - Logging enabled
2019-10-31 09:07:49,258 - INFO - main : 787 - *** Running server in insecure mode on port: 50055 ***
2019-10-31 09:12:47,079 - INFO - main : 716 - GetCapabilities
2019-10-31 09:12:47,118 - INFO - main : 740 - Adding to capabilities: Cluster(['a_dimension', 'b_features', 'c_other_args'])
2019-10-31 09:12:47,119 - INFO - main : 740 - Adding to capabilities: Cluster_by_Dim(['a_dimension1', 'b_dimension2', 'c_measure', 'd_other_args'])
2019-10-31 09:12:47,119 - INFO - main : 740 - Adding to capabilities: Cluster_Geo(['a_dimension1', 'b_latitude', 'c_longitude', 'd_other_args'])
2019-10-31 09:12:47,120 - INFO - main : 740 - Adding to capabilities: Correlation(['a_series1', 'b_series2', 'c_corr_type'])
2019-10-31 09:12:47,121 - INFO - main : 740 - Adding to capabilities: Pearson(['a_series1', 'b_series2'])
2019-10-31 09:12:47,121 - INFO - main : 740 - Adding to capabilities: Prophet(['a_date', 'b_value', 'c_other_args'])
2019-10-31 09:12:47,121 - INFO - main : 740 - Adding to capabilities: Prophet_Basic(['a_date', 'b_value'])
2019-10-31 09:12:47,122 - INFO - main : 740 - Adding to capabilities: Prophet_Holidays(['a_date', 'b_value', 'c_holidays', 'd_other_args'])
2019-10-31 09:12:47,122 - INFO - main : 740 - Adding to capabilities: Prophet_Seasonality(['a_season', 'b_time_series', 'c_holidays', 'd_other_args'])
2019-10-31 09:12:47,123 - INFO - main : 740 - Adding to capabilities: sklearn_Setup(['a_model_name', 'b_estimator_args', 'c_scaler_args', 'd_execution_args'])
2019-10-31 09:12:47,124 - INFO - main : 740 - Adding to capabilities: sklearn_Set_Features(['a_model_name', 'b_feature_name', 'c_var_type', 'd_data_type', 'e_strategy', 'f_strategy_args'])
2019-10-31 09:12:47,126 - INFO - main : 740 - Adding to capabilities: sklearn_Get_Features(['a_model_name'])
2019-10-31 09:12:47,126 - INFO - main : 740 - Adding to capabilities: sklearn_Fit(['a_model_name', 'n_features'])
2019-10-31 09:12:47,126 - INFO - main : 740 - Adding to capabilities: sklearn_Partial_Fit(['a_model_name', 'n_features'])
2019-10-31 09:12:47,127 - INFO - main : 740 - Adding to capabilities: sklearn_Predict(['a_model_name', 'n_features'])
2019-10-31 09:12:47,127 - INFO - main : 740 - Adding to capabilities: sklearn_Bulk_Predict(['a_model_name', 'b_key', 'n_features'])
2019-10-31 09:12:47,127 - INFO - main : 740 - Adding to capabilities: sklearn_Predict_Proba(['a_model_name', 'n_features'])
2019-10-31 09:12:47,130 - INFO - main : 740 - Adding to capabilities: sklearn_Bulk_Predict_Proba(['a_model_name', 'b_key', 'n_features'])
2019-10-31 09:12:47,131 - INFO - main : 740 - Adding to capabilities: sklearn_Get_Metrics(['a_model_name'])
2019-10-31 09:12:47,131 - INFO - main : 740 - Adding to capabilities: sklearn_List_Models(['a_search_pattern'])
2019-10-31 09:12:47,131 - INFO - main : 740 - Adding to capabilities: sklearn_Get_Features_Expression(['a_model_name'])
2019-10-31 09:12:47,132 - INFO - main : 740 - Adding to capabilities: sklearn_Setup_Adv(['a_model_name', 'b_estimator_args', 'c_scaler_args', 'd_metric_args', 'e_dim_reduction_args', 'f_execution_args'])
2019-10-31 09:12:47,133 - INFO - main : 740 - Adding to capabilities: sklearn_Calculate_Metrics(['a_model_name', 'n_features'])
2019-10-31 09:12:47,133 - INFO - main : 740 - Adding to capabilities: sklearn_Get_Confusion_Matrix(['a_model_name'])
2019-10-31 09:12:47,134 - INFO - main : 740 - Adding to capabilities: sklearn_Set_Param_Grid(['a_model_name', 'b_estimator_args', 'c_grid_search_args'])
2019-10-31 09:12:47,134 - INFO - main : 740 - Adding to capabilities: sklearn_Get_Best_Params(['a_model_name'])
2019-10-31 09:12:47,135 - INFO - main : 740 - Adding to capabilities: sklearn_Fit_Transform(['a_model_name', 'b_key', 'n_features'])
2019-10-31 09:12:47,136 - INFO - main : 740 - Adding to capabilities: sklearn_Fit_Predict(['a_model_name', 'n_features'])
2019-10-31 09:12:47,136 - INFO - main : 740 - Adding to capabilities: sklearn_Bulk_Fit_Predict(['a_model_name', 'b_key', 'n_features'])
2019-10-31 09:12:47,137 - INFO - main : 740 - Adding to capabilities: sklearn_Explain_Importances(['a_model_name'])
2019-10-31 09:12:47,137 - INFO - main : 740 - Adding to capabilities: spaCy_Get_Entities(['a_key', 'b_text', 'c_other_args'])
2019-10-31 09:12:47,138 - INFO - main : 740 - Adding to capabilities: spaCy_Get_Entities_From_Model(['a_key', 'b_text', 'c_model_name', 'd_other_args'])
2019-10-31 09:12:47,139 - INFO - main : 740 - Adding to capabilities: spaCy_Retrain(['a_text', 'b_entity', 'c_entity_type', 'd_model_name', 'e_other_args'])
2019-10-31 09:12:47,143 - INFO - main : 740 - Adding to capabilities: Association_Rules(['a_group', 'b_item', 'c_other_args'])
2019-10-31 09:16:58,345 - INFO - main : 753 - ExecuteFunction (functionId: 9)
2019-10-31 09:16:58,513 - INFO - main : 753 - ExecuteFunction (functionId: 24)
2019-10-31 09:16:58,551 - INFO - main : 753 - ExecuteFunction (functionId: 10)
2019-10-31 09:16:58,597 - INFO - main : 753 - ExecuteFunction (functionId: 12)
2019-10-31 09:16:58,713 - ERROR - _server : 463 - Exception iterating responses: local variable 'y_train' referenced before assignment
Traceback (most recent call last):
File "D:\EDISA\Python\qlik-py-tools-5.1\qlik-py-env\lib\site-packages\grpc_server.py", line 453, in _take_response_from_response_iterator
return next(response_iterator), True
File "main.py", line 482, in _sklearn
response = model.fit()
File "D:\EDISA\Python\qlik-py-tools-5.1\qlik-py-env\core_sklearn.py", line 456, in fit
self._cross_validate()
File "D:\EDISA\Python\qlik-py-tools-5.1\qlik-py-env\core_sklearn.py", line 1330, in _cross_validate
scores = cross_validate(self.model.pipe, self.X_train, y_train, scoring=scoring, cv=self.model.cv, fit_params=fit_params, return_train_score=False)
UnboundLocalError: local variable 'y_train' referenced before assignment
2019-10-31 09:16:58,745 - INFO - main : 753 - ExecuteFunction (functionId: 12)
2019-10-31 09:16:58,756 - ERROR - _server : 463 - Exception iterating responses: 9 columns passed, passed data had 10 columns
Traceback (most recent call last):
File "D:\EDISA\Python\qlik-py-tools-5.1\qlik-py-env\lib\site-packages\pandas\core\internals\construction.py", line 500, in _list_to_arrays
content, columns, dtype=dtype, coerce_float=coerce_float
File "D:\EDISA\Python\qlik-py-tools-5.1\qlik-py-env\lib\site-packages\pandas\core\internals\construction.py", line 583, in _convert_object_array
"{con} columns".format(col=len(columns), con=len(content))
AssertionError: 9 columns passed, passed data had 10 columns

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
File "D:\EDISA\Python\qlik-py-tools-5.1\qlik-py-env\lib\site-packages\grpc_server.py", line 453, in _take_response_from_response_iterator
return next(response_iterator), True
File "main.py", line 482, in _sklearn
response = model.fit()
File "D:\EDISA\Python\qlik-py-tools-5.1\qlik-py-env\core_sklearn.py", line 391, in fit
train_test_df, target_df = self._get_model_and_data()
File "D:\EDISA\Python\qlik-py-tools-5.1\qlik-py-env\core_sklearn.py", line 1262, in _get_model_and_data
index=self.request_df.index)
File "D:\EDISA\Python\qlik-py-tools-5.1\qlik-py-env\lib\site-packages\pandas\core\frame.py", line 450, in init
arrays, columns = to_arrays(data, columns, dtype=dtype)
File "D:\EDISA\Python\qlik-py-tools-5.1\qlik-py-env\lib\site-packages\pandas\core\internals\construction.py", line 464, in to_arrays
return _list_to_arrays(data, columns, coerce_float=coerce_float, dtype=dtype)
File "D:\EDISA\Python\qlik-py-tools-5.1\qlik-py-env\lib\site-packages\pandas\core\internals\construction.py", line 503, in _list_to_arrays
raise ValueError(e) from e
ValueError: 9 columns passed, passed data had 10 columns
2019-10-31 09:23:28,899 - INFO - main : 753 - ExecuteFunction (functionId: 9)
2019-10-31 09:23:28,934 - INFO - main : 753 - ExecuteFunction (functionId: 24)
2019-10-31 09:23:29,085 - INFO - main : 753 - ExecuteFunction (functionId: 10)
2019-10-31 09:23:29,736 - INFO - main : 753 - ExecuteFunction (functionId: 12)
2019-10-31 09:23:29,814 - ERROR - _server : 463 - Exception iterating responses: local variable 'y_train' referenced before assignment
Traceback (most recent call last):
File "D:\EDISA\Python\qlik-py-tools-5.1\qlik-py-env\lib\site-packages\grpc_server.py", line 453, in _take_response_from_response_iterator
return next(response_iterator), True
File "main.py", line 482, in _sklearn
response = model.fit()
File "D:\EDISA\Python\qlik-py-tools-5.1\qlik-py-env\core_sklearn.py", line 456, in fit
self._cross_validate()
File "D:\EDISA\Python\qlik-py-tools-5.1\qlik-py-env\core_sklearn.py", line 1330, in _cross_validate
scores = cross_validate(self.model.pipe, self.X_train, y_train, scoring=scoring, cv=self.model.cv, fit_params=fit_params, return_train_score=False)
UnboundLocalError: local variable 'y_train' referenced before assignment
2019-10-31 09:23:29,817 - INFO - main : 753 - ExecuteFunction (functionId: 12)
2019-10-31 09:23:29,835 - ERROR - _server : 463 - Exception iterating responses: 9 columns passed, passed data had 10 columns
Traceback (most recent call last):
File "D:\EDISA\Python\qlik-py-tools-5.1\qlik-py-env\lib\site-packages\pandas\core\internals\construction.py", line 500, in _list_to_arrays
content, columns, dtype=dtype, coerce_float=coerce_float
File "D:\EDISA\Python\qlik-py-tools-5.1\qlik-py-env\lib\site-packages\pandas\core\internals\construction.py", line 583, in _convert_object_array
"{con} columns".format(col=len(columns), con=len(content))
AssertionError: 9 columns passed, passed data had 10 columns

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
File "D:\EDISA\Python\qlik-py-tools-5.1\qlik-py-env\lib\site-packages\grpc_server.py", line 453, in _take_response_from_response_iterator
return next(response_iterator), True
File "main.py", line 482, in _sklearn
response = model.fit()
File "D:\EDISA\Python\qlik-py-tools-5.1\qlik-py-env\core_sklearn.py", line 391, in fit
train_test_df, target_df = self._get_model_and_data()
File "D:\EDISA\Python\qlik-py-tools-5.1\qlik-py-env\core_sklearn.py", line 1262, in _get_model_and_data
index=self.request_df.index)
File "D:\EDISA\Python\qlik-py-tools-5.1\qlik-py-env\lib\site-packages\pandas\core\frame.py", line 450, in init
arrays, columns = to_arrays(data, columns, dtype=dtype)
File "D:\EDISA\Python\qlik-py-tools-5.1\qlik-py-env\lib\site-packages\pandas\core\internals\construction.py", line 464, in to_arrays
return _list_to_arrays(data, columns, coerce_float=coerce_float, dtype=dtype)
File "D:\EDISA\Python\qlik-py-tools-5.1\qlik-py-env\lib\site-packages\pandas\core\internals\construction.py", line 503, in _list_to_arrays
raise ValueError(e) from e
ValueError: 9 columns passed, passed data had 10 columns`

@nabeel-oz
Copy link
Owner

Hi @SkylakeXx , you've uncovered a bug in the code.

The bug has been fixed in the latest release. So one option is to update to v.6.0. There is a sample app for regression with this release as well, but that uses a Keras deep learning model instead of a standard sklearn one. You'll find that app in the Usage section but it won't work with v.5.1.

The other option is to use the fix I've added to release 5.1. You'll need to download release 5.1 again, copy qlik-py-tools-5.1\core\_sklearn.py and overwrite the file in the installed SSE under D:\EDISA\Python\qlik-py-tools-5.1\qlik-py-env\core\_sklearn.py

Thanks for finding the bug!

@SkylakeXx SkylakeXx reopened this Nov 4, 2019
@SkylakeXx
Copy link
Author

SkylakeXx commented Nov 4, 2019

You're welcome, now I'm facing another error in working with the same regressor, should be implied that the inputs and features are continuous but outputs exactly that error, I attach the log errors as always:

LOG 1:
`SKLearnForQlik Log: Mon Nov 4 12:49:53 2019

Model Name: HR-Attrition-LR-GSCV

Execution arguments: {'overwrite': True, 'test_size': 0.33, 'cv': 3, 'time_series_split': 0, 'max_train_size': None, 'lags': None, 'lag_target': False, 'scale_target': False, 'make_stationary': None, 'random_state': 42, 'compress': 3, 'retain_data': False, 'calculate_importances': True, 'debug': True}

Scaler: StandardScaler, missing: zeros, scale_hashed: True, scale_vectors: True
Scaler kwargs: {'with_mean': True, 'with_std': True}

Estimator: LinearRegression
Estimator kwargs: {}

Cache updated. Models in cache:
['HR-Attrition-LR-GSCV']

TABLE DESCRIPTION SENT TO QLIK:

fields {
name: "model_name"
}
fields {
name: "result"
}
fields {
name: "timestamp"
}
name: "SSE-Response"
numberOfRows: 1

RESPONSE: (1, 3) rows x cols
Sample Data:

         model_name                            result                               time_stamp

0 HR-Attrition-LR-GSCV Model successfully saved to disk 12:49:53 11/04/19 Hora estándar romance
...
model_name result time_stamp
0 HR-Attrition-LR-GSCV Model successfully saved to disk 12:49:53 11/04/19 Hora estándar romance
`

LOG 2:
`Model HR-Attrition-LR-GSCV loaded from cache.

REQUEST: (1, 3) rows x cols
Sample Data:

         model_name       estimator_args      grid_search_args

0 HR-Attrition-LR-GSCV n_jobs=1;2|list|int scoring=f1_micro|str
...
model_name estimator_args grid_search_args
0 HR-Attrition-LR-GSCV n_jobs=1;2|list|int scoring=f1_micro|str

Model Name: HR-Attrition-LR-GSCV, Estimator: LinearRegression

Grid Search Arguments: {'scoring': 'f1_micro', 'refit': True}

Parameter Grid: [{'n_jobs': [1, 2]}]

Cache updated. Models in cache:
['HR-Attrition-LR-GSCV']

TABLE DESCRIPTION SENT TO QLIK:

fields {
name: "model_name"
}
fields {
name: "result"
}
fields {
name: "timestamp"
}
name: "SSE-Response"
numberOfRows: 1

RESPONSE: (1, 3) rows x cols
Sample Data:

         model_name                                          result                               time_stamp

0 HR-Attrition-LR-GSCV Hyperparameter grid successfully saved to disk 12:49:53 11/04/19 Hora estándar romance
...
model_name result time_stamp
0 HR-Attrition-LR-GSCV Hyperparameter grid successfully saved to disk 12:49:53 11/04/19 Hora estándar romance
LOG 3:Model HR-Attrition-LR-GSCV loaded from cache.

REQUEST: (10, 6) rows x cols
Sample Data:

         model_name          name variable_type data_type   feature_strategy strategy_args

0 HR-Attrition-LR-GSCV Fecha feature str hashing 4
1 HR-Attrition-LR-GSCV Tipo Cliente feature str one hot enconding
2 HR-Attrition-LR-GSCV Zona feature str one hot enconding
3 HR-Attrition-LR-GSCV Cliente feature str hashing 4
4 HR-Attrition-LR-GSCV R.Produccion feature str one hot enconding
...
model_name name variable_type data_type feature_strategy strategy_args
5 HR-Attrition-LR-GSCV Fam_desc feature str hashing 4
6 HR-Attrition-LR-GSCV Articulo feature str hashing 4
7 HR-Attrition-LR-GSCV Agente_Pedido feature str hashing 4
8 HR-Attrition-LR-GSCV Agente Cliente feature str hashing 4
9 HR-Attrition-LR-GSCV Ventas target float scaling

Cache updated. Models in cache:
['HR-Attrition-LR-GSCV']

TABLE DESCRIPTION SENT TO QLIK:

fields {
name: "model_name"
}
fields {
name: "result"
}
fields {
name: "timestamp"
}
name: "SSE-Response"
numberOfRows: 1

RESPONSE: (1, 3) rows x cols
Sample Data:

         model_name                                           result                               time_stamp

0 HR-Attrition-LR-GSCV Feature definitions successfully saved to model 12:49:53 11/04/19 Hora estándar romance
...
model_name result time_stamp
0 HR-Attrition-LR-GSCV Feature definitions successfully saved to model 12:49:53 11/04/19 Hora estándar romance
`

LOG 4:
`Model HR-Attrition-LR-GSCV loaded from cache.

REQUEST: (20, 2) rows x cols
Sample Data:

         model_name                                         n_features

0 HR-Attrition-LR-GSCV 42737|Asadores|Orense|08103-Fernandez Lopez El...
1 HR-Attrition-LR-GSCV 42737|Asadores|Orense|08103-Fernandez Lopez El...
2 HR-Attrition-LR-GSCV 42737|Asadores|Orense|41859-Xantar Com Para Ll...
3 HR-Attrition-LR-GSCV 42737|Asadores|Orense|41859-Xantar Com Para Ll...
4 HR-Attrition-LR-GSCV 42737|Autoservicio Tradicional|Asturias|02333-...
...
model_name n_features
15 HR-Attrition-LR-GSCV 42737|Bar-Cerveceria-Cafeteria|Asturias|06170-...
16 HR-Attrition-LR-GSCV 42737|Bar-Cerveceria-Cafeteria|Asturias|06170-...
17 HR-Attrition-LR-GSCV 42737|Bar-Cerveceria-Cafeteria|Asturias|06170-...
18 HR-Attrition-LR-GSCV 42737|Bar-Cerveceria-Cafeteria|Asturias|06461-...
19 HR-Attrition-LR-GSCV 42737|Bar-Cerveceria-Cafeteria|Asturias|06937-...

Features for hashing:
model_name name variable_type data_type feature_strategy strategy_args
name
Fecha HR-Attrition-LR-GSCV Fecha feature str hashing 4
Cliente HR-Attrition-LR-GSCV Cliente feature str hashing 4
Fam_desc HR-Attrition-LR-GSCV Fam_desc feature str hashing 4
Articulo HR-Attrition-LR-GSCV Articulo feature str hashing 4
Agente_Pedido HR-Attrition-LR-GSCV Agente_Pedido feature str hashing 4
Agente Cliente HR-Attrition-LR-GSCV Agente Cliente feature str hashing 4

Fit hash_df shape:(13, 24)
Sample Data:
Fecha0 Fecha1 Fecha2 Fecha3 ... Agente Cliente0 Agente Cliente1 Agente Cliente2 Agente Cliente3
7 2.0 0.0 0.0 1.0 ... -1.0 -3.0 3.0 3.0
8 2.0 0.0 0.0 1.0 ... -1.0 -3.0 3.0 3.0
9 2.0 0.0 0.0 1.0 ... 4.0 -3.0 4.0 4.0
10 2.0 0.0 0.0 1.0 ... 7.0 -2.0 3.0 6.0
11 2.0 0.0 0.0 1.0 ... 7.0 -2.0 3.0 6.0

[5 rows x 24 columns]

Fit scale_df shape:(13, 24)
Sample Data:
Fecha0 Fecha1 Fecha2 Fecha3 ... Agente Cliente0 Agente Cliente1 Agente Cliente2 Agente Cliente3
7 2.0 0.0 0.0 1.0 ... -1.0 -3.0 3.0 3.0
8 2.0 0.0 0.0 1.0 ... -1.0 -3.0 3.0 3.0
9 2.0 0.0 0.0 1.0 ... 4.0 -3.0 4.0 4.0
10 2.0 0.0 0.0 1.0 ... 7.0 -2.0 3.0 6.0
11 2.0 0.0 0.0 1.0 ... 7.0 -2.0 3.0 6.0

[5 rows x 24 columns]

Transform hash_df shape:(13, 24)
Sample Data:
Fecha0 Fecha1 Fecha2 Fecha3 ... Agente Cliente0 Agente Cliente1 Agente Cliente2 Agente Cliente3
7 2.0 0.0 0.0 1.0 ... -1.0 -3.0 3.0 3.0
8 2.0 0.0 0.0 1.0 ... -1.0 -3.0 3.0 3.0
9 2.0 0.0 0.0 1.0 ... 4.0 -3.0 4.0 4.0
10 2.0 0.0 0.0 1.0 ... 7.0 -2.0 3.0 6.0
11 2.0 0.0 0.0 1.0 ... 7.0 -2.0 3.0 6.0

[5 rows x 24 columns]

Transform scale_df shape:(13, 24)
Sample Data:
Fecha0 Fecha1 Fecha2 Fecha3 ... Agente Cliente0 Agente Cliente1 Agente Cliente2 Agente Cliente3
7 0.0 0.0 0.0 0.0 ... -2.118945 -1.500000 0.00000 -1.721457
8 0.0 0.0 0.0 0.0 ... -2.118945 -1.500000 0.00000 -1.721457
9 0.0 0.0 0.0 0.0 ... -0.375509 -1.500000 1.47196 -0.892607
10 0.0 0.0 0.0 0.0 ... 0.670552 0.666667 0.00000 0.765092
11 0.0 0.0 0.0 0.0 ... 0.670552 0.666667 0.00000 0.765092

[5 rows x 24 columns]

X_transform shape:(13, 24)
Sample Data:
Fecha0 Fecha1 Fecha2 Fecha3 ... Agente Cliente0 Agente Cliente1 Agente Cliente2 Agente Cliente3
7 0.0 0.0 0.0 0.0 ... -2.118945 -1.500000 0.00000 -1.721457
8 0.0 0.0 0.0 0.0 ... -2.118945 -1.500000 0.00000 -1.721457
9 0.0 0.0 0.0 0.0 ... -0.375509 -1.500000 1.47196 -0.892607
10 0.0 0.0 0.0 0.0 ... 0.670552 0.666667 0.00000 0.765092
11 0.0 0.0 0.0 0.0 ... 0.670552 0.666667 0.00000 0.765092

[5 rows x 24 columns]
`

LOG 5:
`Model HR-Attrition-LR-GSCV loaded from cache.

REQUEST: (20, 2) rows x cols
Sample Data:

         model_name                                         n_features

0 HR-Attrition-LR-GSCV 42737|Asadores|Orense|08103-Fernandez Lopez El...
1 HR-Attrition-LR-GSCV 42737|Asadores|Orense|08103-Fernandez Lopez El...
2 HR-Attrition-LR-GSCV 42737|Asadores|Orense|41859-Xantar Com Para Ll...
3 HR-Attrition-LR-GSCV 42737|Asadores|Orense|41859-Xantar Com Para Ll...
4 HR-Attrition-LR-GSCV 42737|Autoservicio Tradicional|Asturias|02333-...
...
model_name n_features
15 HR-Attrition-LR-GSCV 42737|Bar-Cerveceria-Cafeteria|Asturias|06170-...
16 HR-Attrition-LR-GSCV 42737|Bar-Cerveceria-Cafeteria|Asturias|06170-...
17 HR-Attrition-LR-GSCV 42737|Bar-Cerveceria-Cafeteria|Asturias|06170-...
18 HR-Attrition-LR-GSCV 42737|Bar-Cerveceria-Cafeteria|Asturias|06461-...
19 HR-Attrition-LR-GSCV 42737|Bar-Cerveceria-Cafeteria|Asturias|06937-...

Features for hashing:
model_name name variable_type data_type feature_strategy strategy_args
name
Fecha HR-Attrition-LR-GSCV Fecha feature str hashing 4
Cliente HR-Attrition-LR-GSCV Cliente feature str hashing 4
Fam_desc HR-Attrition-LR-GSCV Fam_desc feature str hashing 4
Articulo HR-Attrition-LR-GSCV Articulo feature str hashing 4
Agente_Pedido HR-Attrition-LR-GSCV Agente_Pedido feature str hashing 4
Agente Cliente HR-Attrition-LR-GSCV Agente Cliente feature str hashing 4

Fit hash_df shape:(13, 24)
Sample Data:
Fecha0 Fecha1 Fecha2 Fecha3 ... Agente Cliente0 Agente Cliente1 Agente Cliente2 Agente Cliente3
7 2.0 0.0 0.0 1.0 ... -1.0 -3.0 3.0 3.0
8 2.0 0.0 0.0 1.0 ... -1.0 -3.0 3.0 3.0
9 2.0 0.0 0.0 1.0 ... 4.0 -3.0 4.0 4.0
10 2.0 0.0 0.0 1.0 ... 7.0 -2.0 3.0 6.0
11 2.0 0.0 0.0 1.0 ... 7.0 -2.0 3.0 6.0

[5 rows x 24 columns]

Fit scale_df shape:(13, 24)
Sample Data:
Fecha0 Fecha1 Fecha2 Fecha3 ... Agente Cliente0 Agente Cliente1 Agente Cliente2 Agente Cliente3
7 2.0 0.0 0.0 1.0 ... -1.0 -3.0 3.0 3.0
8 2.0 0.0 0.0 1.0 ... -1.0 -3.0 3.0 3.0
9 2.0 0.0 0.0 1.0 ... 4.0 -3.0 4.0 4.0
10 2.0 0.0 0.0 1.0 ... 7.0 -2.0 3.0 6.0
11 2.0 0.0 0.0 1.0 ... 7.0 -2.0 3.0 6.0

[5 rows x 24 columns]

Transform hash_df shape:(13, 24)
Sample Data:
Fecha0 Fecha1 Fecha2 Fecha3 ... Agente Cliente0 Agente Cliente1 Agente Cliente2 Agente Cliente3
7 2.0 0.0 0.0 1.0 ... -1.0 -3.0 3.0 3.0
8 2.0 0.0 0.0 1.0 ... -1.0 -3.0 3.0 3.0
9 2.0 0.0 0.0 1.0 ... 4.0 -3.0 4.0 4.0
10 2.0 0.0 0.0 1.0 ... 7.0 -2.0 3.0 6.0
11 2.0 0.0 0.0 1.0 ... 7.0 -2.0 3.0 6.0

[5 rows x 24 columns]

Transform scale_df shape:(13, 24)
Sample Data:
Fecha0 Fecha1 Fecha2 Fecha3 ... Agente Cliente0 Agente Cliente1 Agente Cliente2 Agente Cliente3
7 0.0 0.0 0.0 0.0 ... -2.118945 -1.500000 0.00000 -1.721457
8 0.0 0.0 0.0 0.0 ... -2.118945 -1.500000 0.00000 -1.721457
9 0.0 0.0 0.0 0.0 ... -0.375509 -1.500000 1.47196 -0.892607
10 0.0 0.0 0.0 0.0 ... 0.670552 0.666667 0.00000 0.765092
11 0.0 0.0 0.0 0.0 ... 0.670552 0.666667 0.00000 0.765092

[5 rows x 24 columns]

X_transform shape:(13, 24)
Sample Data:
Fecha0 Fecha1 Fecha2 Fecha3 ... Agente Cliente0 Agente Cliente1 Agente Cliente2 Agente Cliente3
7 0.0 0.0 0.0 0.0 ... -2.118945 -1.500000 0.00000 -1.721457
8 0.0 0.0 0.0 0.0 ... -2.118945 -1.500000 0.00000 -1.721457
9 0.0 0.0 0.0 0.0 ... -0.375509 -1.500000 1.47196 -0.892607
10 0.0 0.0 0.0 0.0 ... 0.670552 0.666667 0.00000 0.765092
11 0.0 0.0 0.0 0.0 ... 0.670552 0.666667 0.00000 0.765092

[5 rows x 24 columns]
`

LOG SSE:

2019-11-04 12:48:41,870 - INFO - __main__ : 62 - Logging enabled 2019-11-04 12:48:41,877 - INFO - __main__ : 867 - *** Running server in insecure mode on port: 50055 *** 2019-11-04 12:48:58,214 - INFO - __main__ : 794 - GetCapabilities 2019-11-04 12:48:58,215 - INFO - __main__ : 819 - Adding to capabilities: Cluster(['a_dimension', 'b_features', 'c_other_args']) 2019-11-04 12:48:58,216 - INFO - __main__ : 819 - Adding to capabilities: Cluster_by_Dim(['a_dimension1', 'b_dimension2', 'c_measure', 'd_other_args']) 2019-11-04 12:48:58,217 - INFO - __main__ : 819 - Adding to capabilities: Cluster_Geo(['a_dimension1', 'b_latitude', 'c_longitude', 'd_other_args']) 2019-11-04 12:48:58,217 - INFO - __main__ : 819 - Adding to capabilities: Correlation(['a_series1', 'b_series2', 'c_corr_type']) 2019-11-04 12:48:58,218 - INFO - __main__ : 819 - Adding to capabilities: Pearson(['a_series1', 'b_series2']) 2019-11-04 12:48:58,218 - INFO - __main__ : 819 - Adding to capabilities: Prophet(['a_date', 'b_value', 'c_other_args']) 2019-11-04 12:48:58,219 - INFO - __main__ : 819 - Adding to capabilities: Prophet_Basic(['a_date', 'b_value']) 2019-11-04 12:48:58,219 - INFO - __main__ : 819 - Adding to capabilities: Prophet_Holidays(['a_date', 'b_value', 'c_holidays', 'd_other_args']) 2019-11-04 12:48:58,219 - INFO - __main__ : 819 - Adding to capabilities: Prophet_Seasonality(['a_season', 'b_time_series', 'c_holidays', 'd_other_args']) 2019-11-04 12:48:58,220 - INFO - __main__ : 819 - Adding to capabilities: sklearn_Setup(['a_model_name', 'b_estimator_args', 'c_scaler_args', 'd_execution_args']) 2019-11-04 12:48:58,220 - INFO - __main__ : 819 - Adding to capabilities: sklearn_Set_Features(['a_model_name', 'b_feature_name', 'c_var_type', 'd_data_type', 'e_strategy', 'f_strategy_args']) 2019-11-04 12:48:58,221 - INFO - __main__ : 819 - Adding to capabilities: sklearn_Get_Features(['a_model_name']) 2019-11-04 12:48:58,221 - INFO - __main__ : 819 - Adding to capabilities: sklearn_Fit(['a_model_name', 'n_features']) 2019-11-04 12:48:58,221 - INFO - __main__ : 819 - Adding to capabilities: sklearn_Partial_Fit(['a_model_name', 'n_features']) 2019-11-04 12:48:58,221 - INFO - __main__ : 819 - Adding to capabilities: sklearn_Predict(['a_model_name', 'n_features']) 2019-11-04 12:48:58,222 - INFO - __main__ : 819 - Adding to capabilities: sklearn_Bulk_Predict(['a_model_name', 'b_key', 'n_features']) 2019-11-04 12:48:58,222 - INFO - __main__ : 819 - Adding to capabilities: sklearn_Predict_Proba(['a_model_name', 'n_features']) 2019-11-04 12:48:58,222 - INFO - __main__ : 819 - Adding to capabilities: sklearn_Bulk_Predict_Proba(['a_model_name', 'b_key', 'n_features']) 2019-11-04 12:48:58,223 - INFO - __main__ : 819 - Adding to capabilities: sklearn_Get_Metrics(['a_model_name']) 2019-11-04 12:48:58,225 - INFO - __main__ : 819 - Adding to capabilities: sklearn_List_Models(['a_search_pattern']) 2019-11-04 12:48:58,226 - INFO - __main__ : 819 - Adding to capabilities: sklearn_Get_Features_Expression(['a_model_name']) 2019-11-04 12:48:58,227 - INFO - __main__ : 819 - Adding to capabilities: sklearn_Setup_Adv(['a_model_name', 'b_estimator_args', 'c_scaler_args', 'd_metric_args', 'e_dim_reduction_args', 'f_execution_args']) 2019-11-04 12:48:58,228 - INFO - __main__ : 819 - Adding to capabilities: sklearn_Calculate_Metrics(['a_model_name', 'n_features']) 2019-11-04 12:48:58,228 - INFO - __main__ : 819 - Adding to capabilities: sklearn_Get_Confusion_Matrix(['a_model_name']) 2019-11-04 12:48:58,229 - INFO - __main__ : 819 - Adding to capabilities: sklearn_Set_Param_Grid(['a_model_name', 'b_estimator_args', 'c_grid_search_args']) 2019-11-04 12:48:58,230 - INFO - __main__ : 819 - Adding to capabilities: sklearn_Get_Best_Params(['a_model_name']) 2019-11-04 12:48:58,231 - INFO - __main__ : 819 - Adding to capabilities: sklearn_Fit_Transform(['a_model_name', 'b_key', 'n_features']) 2019-11-04 12:48:58,231 - INFO - __main__ : 819 - Adding to capabilities: sklearn_Fit_Predict(['a_model_name', 'n_features']) 2019-11-04 12:48:58,232 - INFO - __main__ : 819 - Adding to capabilities: sklearn_Bulk_Fit_Predict(['a_model_name', 'b_key', 'n_features']) 2019-11-04 12:48:58,232 - INFO - __main__ : 819 - Adding to capabilities: sklearn_Explain_Importances(['a_model_name']) 2019-11-04 12:48:58,233 - INFO - __main__ : 819 - Adding to capabilities: spaCy_Get_Entities(['a_key', 'b_text', 'c_other_args']) 2019-11-04 12:48:58,234 - INFO - __main__ : 819 - Adding to capabilities: spaCy_Get_Entities_From_Model(['a_key', 'b_text', 'c_model_name', 'd_other_args']) 2019-11-04 12:48:58,237 - INFO - __main__ : 819 - Adding to capabilities: spaCy_Retrain(['a_text', 'b_entity', 'c_entity_type', 'd_model_name', 'e_other_args']) 2019-11-04 12:48:58,237 - INFO - __main__ : 819 - Adding to capabilities: Association_Rules(['a_group', 'b_item', 'c_other_args']) 2019-11-04 12:48:58,238 - INFO - __main__ : 819 - Adding to capabilities: Keras_Set_Layers(['a_model_name', 'b_sort_order', 'c_layer_type', 'd_args', 'e_kwargs']) 2019-11-04 12:48:58,238 - INFO - __main__ : 819 - Adding to capabilities: Keras_Get_History(['a_model_name']) 2019-11-04 12:48:58,239 - INFO - __main__ : 819 - Adding to capabilities: sklearn_Predict_Sequence(['a_model_name', 'b_key', 'n_features']) 2019-11-04 12:48:58,239 - INFO - __main__ : 819 - Adding to capabilities: sklearn_Bulk_Predict_Sequence(['a_model_name', 'b_key', 'n_features']) 2019-11-04 12:48:58,240 - INFO - __main__ : 819 - Adding to capabilities: sklearn_Predict_Proba_Sequence(['a_model_name', 'b_key', 'n_features']) 2019-11-04 12:48:58,241 - INFO - __main__ : 819 - Adding to capabilities: sklearn_Bulk_Predict_Proba_Sequence(['a_model_name', 'b_key', 'n_features']) 2019-11-04 12:48:58,241 - INFO - __main__ : 819 - Adding to capabilities: Prophet_Multivariate(['a_date', 'b_value', 'c_holidays', 'd_added_regressors', 'e_regressor_args', 'f_other_args']) 2019-11-04 12:48:58,242 - INFO - __main__ : 819 - Adding to capabilities: Prophet_Seasonality_Multivariate(['a_season', 'b_time_series', 'c_holidays', 'd_added_regressors', 'e_regressor_args', 'f_other_args']) 2019-11-04 12:48:58,242 - INFO - __main__ : 819 - Adding to capabilities: sklearn_Calculate_Metrics_Sequence(['a_model_name', 'b_key', 'n_features']) 2019-11-04 12:49:53,844 - INFO - __main__ : 832 - ipv6:[::1]:63550 - Capability 'sklearn_Setup' called by user Personal\Me from app D:\EDISA\Curso Data Science\QlikSense+ML\Regression\Sample-App-scikit-learn-PruebaRegression.qvf 2019-11-04 12:49:53,844 - INFO - __main__ : 833 - ExecuteFunction (functionId: 9, _sklearn) 2019-11-04 12:49:53,907 - INFO - __main__ : 832 - ipv6:[::1]:63550 - Capability 'sklearn_Set_Param_Grid' called by user Personal\Me from app D:\EDISA\Curso Data Science\QlikSense+ML\Regression\Sample-App-scikit-learn-PruebaRegression.qvf 2019-11-04 12:49:53,908 - INFO - __main__ : 833 - ExecuteFunction (functionId: 24, _sklearn) 2019-11-04 12:49:53,944 - INFO - __main__ : 832 - ipv6:[::1]:63550 - Capability 'sklearn_Set_Features' called by user Personal\Me from app D:\EDISA\Curso Data Science\QlikSense+ML\Regression\Sample-App-scikit-learn-PruebaRegression.qvf 2019-11-04 12:49:53,945 - INFO - __main__ : 833 - ExecuteFunction (functionId: 10, _sklearn) 2019-11-04 12:49:53,983 - INFO - __main__ : 832 - ipv6:[::1]:63550 - Capability 'sklearn_Fit' called by user Personal\Me from app D:\EDISA\Curso Data Science\QlikSense+ML\Regression\Sample-App-scikit-learn-PruebaRegression.qvf 2019-11-04 12:49:53,984 - INFO - __main__ : 833 - ExecuteFunction (functionId: 12, _sklearn) 2019-11-04 12:49:54,575 - ERROR - _server : 463 - Exception iterating responses: continuous is not supported Traceback (most recent call last): File "D:\EDISA\Python\qlik-py-tools-6.0\qlik-py-env\lib\site-packages\grpc\_server.py", line 453, in _take_response_from_response_iterator return next(response_iterator), True File "__main__.py", line 494, in _sklearn response = model.fit() File "D:\EDISA\Python\qlik-py-tools-6.0\qlik-py-env\core\_sklearn.py", line 660, in fit self._cross_validate() File "D:\EDISA\Python\qlik-py-tools-6.0\qlik-py-env\core\_sklearn.py", line 2136, in _cross_validate scores = cross_validate(self.model.pipe, self.X_train, y_train, scoring=scoring, cv=self.model.cv, fit_params=fit_params, return_train_score=False) File "D:\EDISA\Python\qlik-py-tools-6.0\qlik-py-env\lib\site-packages\sklearn\model_selection\_validation.py", line 232, in cross_validate for train, test in cv.split(X, y, groups)) File "D:\EDISA\Python\qlik-py-tools-6.0\qlik-py-env\lib\site-packages\joblib\parallel.py", line 779, in __call__ while self.dispatch_one_batch(iterator): File "D:\EDISA\Python\qlik-py-tools-6.0\qlik-py-env\lib\site-packages\joblib\parallel.py", line 625, in dispatch_one_batch self._dispatch(tasks) File "D:\EDISA\Python\qlik-py-tools-6.0\qlik-py-env\lib\site-packages\joblib\parallel.py", line 588, in _dispatch job = self._backend.apply_async(batch, callback=cb) File "D:\EDISA\Python\qlik-py-tools-6.0\qlik-py-env\lib\site-packages\joblib\_parallel_backends.py", line 111, in apply_async result = ImmediateResult(func) File "D:\EDISA\Python\qlik-py-tools-6.0\qlik-py-env\lib\site-packages\joblib\_parallel_backends.py", line 332, in __init__ self.results = batch() File "D:\EDISA\Python\qlik-py-tools-6.0\qlik-py-env\lib\site-packages\joblib\parallel.py", line 131, in __call__ return [func(*args, **kwargs) for func, args, kwargs in self.items] File "D:\EDISA\Python\qlik-py-tools-6.0\qlik-py-env\lib\site-packages\joblib\parallel.py", line 131, in <listcomp> return [func(*args, **kwargs) for func, args, kwargs in self.items] File "D:\EDISA\Python\qlik-py-tools-6.0\qlik-py-env\lib\site-packages\sklearn\model_selection\_validation.py", line 516, in _fit_and_score estimator.fit(X_train, y_train, **fit_params) File "D:\EDISA\Python\qlik-py-tools-6.0\qlik-py-env\lib\site-packages\sklearn\pipeline.py", line 356, in fit self._final_estimator.fit(Xt, y, **fit_params) File "D:\EDISA\Python\qlik-py-tools-6.0\qlik-py-env\lib\site-packages\sklearn\model_selection\_search.py", line 688, in fit self._run_search(evaluate_candidates) File "D:\EDISA\Python\qlik-py-tools-6.0\qlik-py-env\lib\site-packages\sklearn\model_selection\_search.py", line 1149, in _run_search evaluate_candidates(ParameterGrid(self.param_grid)) File "D:\EDISA\Python\qlik-py-tools-6.0\qlik-py-env\lib\site-packages\sklearn\model_selection\_search.py", line 667, in evaluate_candidates cv.split(X, y, groups))) File "D:\EDISA\Python\qlik-py-tools-6.0\qlik-py-env\lib\site-packages\joblib\parallel.py", line 779, in __call__ while self.dispatch_one_batch(iterator): File "D:\EDISA\Python\qlik-py-tools-6.0\qlik-py-env\lib\site-packages\joblib\parallel.py", line 625, in dispatch_one_batch self._dispatch(tasks) File "D:\EDISA\Python\qlik-py-tools-6.0\qlik-py-env\lib\site-packages\joblib\parallel.py", line 588, in _dispatch job = self._backend.apply_async(batch, callback=cb) File "D:\EDISA\Python\qlik-py-tools-6.0\qlik-py-env\lib\site-packages\joblib\_parallel_backends.py", line 111, in apply_async result = ImmediateResult(func) File "D:\EDISA\Python\qlik-py-tools-6.0\qlik-py-env\lib\site-packages\joblib\_parallel_backends.py", line 332, in __init__ self.results = batch() File "D:\EDISA\Python\qlik-py-tools-6.0\qlik-py-env\lib\site-packages\joblib\parallel.py", line 131, in __call__ return [func(*args, **kwargs) for func, args, kwargs in self.items] File "D:\EDISA\Python\qlik-py-tools-6.0\qlik-py-env\lib\site-packages\joblib\parallel.py", line 131, in <listcomp> return [func(*args, **kwargs) for func, args, kwargs in self.items] File "D:\EDISA\Python\qlik-py-tools-6.0\qlik-py-env\lib\site-packages\sklearn\model_selection\_validation.py", line 556, in _fit_and_score test_scores = _score(estimator, X_test, y_test, scorer, is_multimetric) File "D:\EDISA\Python\qlik-py-tools-6.0\qlik-py-env\lib\site-packages\sklearn\model_selection\_validation.py", line 599, in _score return _multimetric_score(estimator, X_test, y_test, scorer) File "D:\EDISA\Python\qlik-py-tools-6.0\qlik-py-env\lib\site-packages\sklearn\model_selection\_validation.py", line 629, in _multimetric_score score = scorer(estimator, X_test, y_test) File "D:\EDISA\Python\qlik-py-tools-6.0\qlik-py-env\lib\site-packages\sklearn\metrics\scorer.py", line 97, in __call__ **self._kwargs) File "D:\EDISA\Python\qlik-py-tools-6.0\qlik-py-env\lib\site-packages\sklearn\metrics\classification.py", line 1059, in f1_score sample_weight=sample_weight) File "D:\EDISA\Python\qlik-py-tools-6.0\qlik-py-env\lib\site-packages\sklearn\metrics\classification.py", line 1182, in fbeta_score sample_weight=sample_weight) File "D:\EDISA\Python\qlik-py-tools-6.0\qlik-py-env\lib\site-packages\sklearn\metrics\classification.py", line 1415, in precision_recall_fscore_support pos_label) File "D:\EDISA\Python\qlik-py-tools-6.0\qlik-py-env\lib\site-packages\sklearn\metrics\classification.py", line 1239, in _check_set_wise_labels y_type, y_true, y_pred = _check_targets(y_true, y_pred) File "D:\EDISA\Python\qlik-py-tools-6.0\qlik-py-env\lib\site-packages\sklearn\metrics\classification.py", line 88, in _check_targets raise ValueError("{0} is not supported".format(y_type)) ValueError: continuous is not supported 2019-11-04 12:49:54,590 - INFO - __main__ : 832 - ipv6:[::1]:63550 - Capability 'sklearn_Fit' called by user Personal\Me from app D:\EDISA\Curso Data Science\QlikSense+ML\Regression\Sample-App-scikit-learn-PruebaRegression.qvf 2019-11-04 12:49:54,591 - INFO - __main__ : 833 - ExecuteFunction (functionId: 12, _sklearn) 2019-11-04 12:49:55,101 - ERROR - _server : 463 - Exception iterating responses: continuous is not supported Traceback (most recent call last): File "D:\EDISA\Python\qlik-py-tools-6.0\qlik-py-env\lib\site-packages\grpc\_server.py", line 453, in _take_response_from_response_iterator return next(response_iterator), True File "__main__.py", line 494, in _sklearn response = model.fit() File "D:\EDISA\Python\qlik-py-tools-6.0\qlik-py-env\core\_sklearn.py", line 660, in fit self._cross_validate() File "D:\EDISA\Python\qlik-py-tools-6.0\qlik-py-env\core\_sklearn.py", line 2136, in _cross_validate scores = cross_validate(self.model.pipe, self.X_train, y_train, scoring=scoring, cv=self.model.cv, fit_params=fit_params, return_train_score=False) File "D:\EDISA\Python\qlik-py-tools-6.0\qlik-py-env\lib\site-packages\sklearn\model_selection\_validation.py", line 232, in cross_validate for train, test in cv.split(X, y, groups)) File "D:\EDISA\Python\qlik-py-tools-6.0\qlik-py-env\lib\site-packages\joblib\parallel.py", line 779, in __call__ while self.dispatch_one_batch(iterator): File "D:\EDISA\Python\qlik-py-tools-6.0\qlik-py-env\lib\site-packages\joblib\parallel.py", line 625, in dispatch_one_batch self._dispatch(tasks) File "D:\EDISA\Python\qlik-py-tools-6.0\qlik-py-env\lib\site-packages\joblib\parallel.py", line 588, in _dispatch job = self._backend.apply_async(batch, callback=cb) File "D:\EDISA\Python\qlik-py-tools-6.0\qlik-py-env\lib\site-packages\joblib\_parallel_backends.py", line 111, in apply_async result = ImmediateResult(func) File "D:\EDISA\Python\qlik-py-tools-6.0\qlik-py-env\lib\site-packages\joblib\_parallel_backends.py", line 332, in __init__ self.results = batch() File "D:\EDISA\Python\qlik-py-tools-6.0\qlik-py-env\lib\site-packages\joblib\parallel.py", line 131, in __call__ return [func(*args, **kwargs) for func, args, kwargs in self.items] File "D:\EDISA\Python\qlik-py-tools-6.0\qlik-py-env\lib\site-packages\joblib\parallel.py", line 131, in <listcomp> return [func(*args, **kwargs) for func, args, kwargs in self.items] File "D:\EDISA\Python\qlik-py-tools-6.0\qlik-py-env\lib\site-packages\sklearn\model_selection\_validation.py", line 516, in _fit_and_score estimator.fit(X_train, y_train, **fit_params) File "D:\EDISA\Python\qlik-py-tools-6.0\qlik-py-env\lib\site-packages\sklearn\pipeline.py", line 356, in fit self._final_estimator.fit(Xt, y, **fit_params) File "D:\EDISA\Python\qlik-py-tools-6.0\qlik-py-env\lib\site-packages\sklearn\model_selection\_search.py", line 688, in fit self._run_search(evaluate_candidates) File "D:\EDISA\Python\qlik-py-tools-6.0\qlik-py-env\lib\site-packages\sklearn\model_selection\_search.py", line 1149, in _run_search evaluate_candidates(ParameterGrid(self.param_grid)) File "D:\EDISA\Python\qlik-py-tools-6.0\qlik-py-env\lib\site-packages\sklearn\model_selection\_search.py", line 667, in evaluate_candidates cv.split(X, y, groups))) File "D:\EDISA\Python\qlik-py-tools-6.0\qlik-py-env\lib\site-packages\joblib\parallel.py", line 779, in __call__ while self.dispatch_one_batch(iterator): File "D:\EDISA\Python\qlik-py-tools-6.0\qlik-py-env\lib\site-packages\joblib\parallel.py", line 625, in dispatch_one_batch self._dispatch(tasks) File "D:\EDISA\Python\qlik-py-tools-6.0\qlik-py-env\lib\site-packages\joblib\parallel.py", line 588, in _dispatch job = self._backend.apply_async(batch, callback=cb) File "D:\EDISA\Python\qlik-py-tools-6.0\qlik-py-env\lib\site-packages\joblib\_parallel_backends.py", line 111, in apply_async result = ImmediateResult(func) File "D:\EDISA\Python\qlik-py-tools-6.0\qlik-py-env\lib\site-packages\joblib\_parallel_backends.py", line 332, in __init__ self.results = batch() File "D:\EDISA\Python\qlik-py-tools-6.0\qlik-py-env\lib\site-packages\joblib\parallel.py", line 131, in __call__ return [func(*args, **kwargs) for func, args, kwargs in self.items] File "D:\EDISA\Python\qlik-py-tools-6.0\qlik-py-env\lib\site-packages\joblib\parallel.py", line 131, in <listcomp> return [func(*args, **kwargs) for func, args, kwargs in self.items] File "D:\EDISA\Python\qlik-py-tools-6.0\qlik-py-env\lib\site-packages\sklearn\model_selection\_validation.py", line 556, in _fit_and_score test_scores = _score(estimator, X_test, y_test, scorer, is_multimetric) File "D:\EDISA\Python\qlik-py-tools-6.0\qlik-py-env\lib\site-packages\sklearn\model_selection\_validation.py", line 599, in _score return _multimetric_score(estimator, X_test, y_test, scorer) File "D:\EDISA\Python\qlik-py-tools-6.0\qlik-py-env\lib\site-packages\sklearn\model_selection\_validation.py", line 629, in _multimetric_score score = scorer(estimator, X_test, y_test) File "D:\EDISA\Python\qlik-py-tools-6.0\qlik-py-env\lib\site-packages\sklearn\metrics\scorer.py", line 97, in __call__ **self._kwargs) File "D:\EDISA\Python\qlik-py-tools-6.0\qlik-py-env\lib\site-packages\sklearn\metrics\classification.py", line 1059, in f1_score sample_weight=sample_weight) File "D:\EDISA\Python\qlik-py-tools-6.0\qlik-py-env\lib\site-packages\sklearn\metrics\classification.py", line 1182, in fbeta_score sample_weight=sample_weight) File "D:\EDISA\Python\qlik-py-tools-6.0\qlik-py-env\lib\site-packages\sklearn\metrics\classification.py", line 1415, in precision_recall_fscore_support pos_label) File "D:\EDISA\Python\qlik-py-tools-6.0\qlik-py-env\lib\site-packages\sklearn\metrics\classification.py", line 1239, in _check_set_wise_labels y_type, y_true, y_pred = _check_targets(y_true, y_pred) File "D:\EDISA\Python\qlik-py-tools-6.0\qlik-py-env\lib\site-packages\sklearn\metrics\classification.py", line 88, in _check_targets raise ValueError("{0} is not supported".format(y_type)) ValueError: continuous is not supported

@nabeel-oz
Copy link
Owner

I believe the problem is your grid search arguments for the model. You have set the scoring to f1_micro which is only valid for classification. You should use one of the regression scoring parameters defined here.

Btw, the SSE log you posted doesn't have line breaks making it very hard to read! For further issues please attach the log found under ../qlik-py-tools/qlik-py-env/core/logs or just use screenshots of the terminal.

@SkylakeXx
Copy link
Author

My mistake, now the code breaks due to the feature importance, with LR and Ranfom Forest too.

image

image (1)

nabeel-oz pushed a commit that referenced this issue Nov 5, 2019
Fix for the final estimator being added to the wrong step in the pipeline when using grid search
@nabeel-oz
Copy link
Owner

Ah this was a bit tricky to comprehend. After a grid search the final estimator needs to be inserted at the end of the pipeline. But there is an issue in the code causing it to be inserted before the final step of the pipeline.

Please extract _sklearn.py from the attached file and replace the one in ../qlik-py-tools/qlik-py-env/core/. I'll add the fix to a next release as well.

_sklearn.zip

@SkylakeXx
Copy link
Author

@nabeel-oz
Copy link
Owner

@SkylakeXx , this looks like a bug in the Skater package which is used by the SSE for calculating feature importances. Looking at Skater's code I see that it always looks at the first 10 rows of data for each column, and decides whether the column is numeric based on those samples. In your case I think you have a column where the first 10 rows look numeric, but the column actually contains strings.

Note that the data is passed to Skater in its original form without transformations like one hot encoding and hashing.

You have two options:

  • See if you can adjust the input data to avoid the first 10 rows looking like numerical data when the column is a string.
  • Turn off the feature importances for this model by passing calculate_importances=false in your execution arguments.

@SkylakeXx
Copy link
Author

All working again, I close the issue, thanks for all.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants