Skip to content

kahramankostas/Prediction-of-wind-turbine-power-generation-from-real-time-SCADA-data

Repository files navigation

Prediction of wind turbine power generation from real-time SCADA data

The models to be created for this problem are the models that will predict the power values ​​expected to be produced by a wind turbine. Thus, by comparing the actual production of a wind turbine with these estimation results, it will be presented to the investor to what extent the turbine produces less than it should be. From this point of view, the investor will be able to realize that there is a performance problem related to the turbine and will be able to initiate root cause analysis.

The data set presented in the problem consists of real-time SCADA data. Each data value belongs only to the relevant time period and the input variables transmitted in the data set for the time period to be predicted are prepared to be used to predict the power generation result in the same time period.

In the shared data set, the real-time power generation amount (Power(kW)) of a wind turbine belonging to Enerjisa Ăśretim between 01.01.2019 and 14.08.2021 is given on a 10-minute basis.

Information presented in the dataset and its units

Column Unit
Timestamp ()
Gearbox_T1_High_Speed_Shaft_Temperature (°C)
Gearbox_T3_High_Speed_Shaft_Temperature (°C)
Gearbox_T1_Intermediate_Speed_Shaft_Temperature (°C)
Temperature Gearbox Bearing Hollow Shaft (°C)
Tower Acceleration Normal (mm/s²)
Gearbox_Oil-2_Temperature (°C)
Tower Acceleration Lateral (mm/s²)
Temperature Bearing_A (°C)
Temperature Trafo-3 (°C)
Gearbox_T3_Intermediate_Speed_Shaft_Temperature (°C)
Gearbox_Oil-1_Temperature (°C)
Gearbox_Oil_Temperature (°C)
Torque (%)
Converter Control Unit Reactive Power (kVAr)
Temperature Trafo-2 (°C)
Reactive Power (kVAr)
Temperature Shaft Bearing-1 (°C)
Gearbox_Distributor_Temperature (°C)
Moment D Filtered (kNm)
Moment D Direction (kNm)
N-set 1 (rpm)
Operating State ( )
Power Factor ( )
Temperature Shaft Bearing-2 (°C)
Temperature_Nacelle (°C)
Voltage A-N (V)
Temperature Axis Box-3 (°C)
Voltage C-N (V)
Temperature Axis Box-2 (°C)
Temperature Axis Box-1 (°C)
Voltage B-N (V)
Nacelle Position_Degree (°)
Converter Control Unit Voltage (V)
Temperature Battery Box-3 (°C)
Temperature Battery Box-2 (°C)
Temperature Battery Box-1 (°C)
Hydraulic Prepressure (bar)
Angle Rotor Position (°)
Temperature Tower Base (°C)
Pitch Offset-2 Asymmetric Load Controller (°)
Pitch Offset Tower Feedback (°)
Line Frequency (Hz)
Internal Power Limit (kW)
Circuit Breaker cut-ins ( )
Particle Counter ( )
Tower Accelaration Normal Raw (mm/s²)
Torque Offset Tower Feedback (Nm)
External Power Limit (kW)
Blade-2 Actual Value_Angle-B (°)
Blade-1 Actual Value_Angle-B (°)
Blade-3 Actual Value_Angle-B (°)
Temperature Heat Exchanger Converter Control Unit (°C)
Tower Accelaration Lateral Raw (mm/s²)
Temperature Ambient (°C)
Nacelle Revolution ( )
Pitch Offset-1 Asymmetric Load Controller (°)
Tower Deflection (ms)
Pitch Offset-3 Asymmetric Load Controller (°)
Wind Deviation 1 seconds (°)
Wind Deviation 10 seconds (°)
Proxy Sensor_Degree-135 (mm)
State and Fault ( )
Proxy Sensor_Degree-225 (mm)
Blade-3 Actual Value_Angle-A (°)
Scope CH 4 ( )
Blade-2 Actual Value_Angle-A (°)
Blade-1 Actual Value_Angle-A (°)
Blade-2 Set Value_Degree (°)
Pitch Demand Baseline_Degree (°)
Blade-1 Set Value_Degree (°)
Blade-3 Set Value_Degree (°)
Moment Q Direction (kNm)
Moment Q Filltered (kNm)
Proxy Sensor_Degree-45 (mm)
Turbine State ( )
Proxy Sensor_Degree-315 (mm)

Preprocessing

#!/usr/bin/env python
# coding: utf-8
import numpy as np
import csv

import warnings
warnings.filterwarnings('ignore')

import pandas as pd
import pickle
import time
import os

import pandas as pd

unzip dataset

import zipfile
path_to_zip_file="enerjisa-uretim-hackathon.zip"
with zipfile.ZipFile(path_to_zip_file, 'r') as zip_ref:
    zip_ref.extractall(path_to_zip_file[:-4])

take dataset files as a lit in "csvs"

def find_the_way(path,file_format):
    files_add = []

    for r, d, f in os.walk(path):
        for file in f:
            if file_format in file:
                files_add.append(os.path.join(r, file))  
    return files_add

path=path_to_zip_file[:-4]
csvs=find_the_way(path,'.csv')
csvs
['enerjisa-uretim-hackathon\\features.csv',
 'enerjisa-uretim-hackathon\\feature_units.csv',
 'enerjisa-uretim-hackathon\\power.csv',
 'enerjisa-uretim-hackathon\\sample_submission.csv']

replace nan and inf value with 0

features=pd.read_csv(csvs[0])
labels=pd.read_csv(csvs[2])

features.replace([np.inf, -np.inf], np.nan, inplace=True)
features=features.fillna(0)

create and add a new feature related with timeseries

ay_ve_gun=[]
for i in features["Timestamp"]:
    month=int(i[5:7])*100
    day=(int(i[8:10])//10+1)
    if day==4:
        day=3
    ay_ve_gun.append(month+day)
features["ay_ve_gun"]=ay_ve_gun

split labelled and unlabelled data

train_size=len(labels)
main=features[0:train_size]
submission=features[train_size:]

add labels to dataframe

main["Power(kW)"]=labels["Power(kW)"]

show unlabeled data which we will not use

submission
Timestamp Gearbox_T1_High_Speed_Shaft_Temperature Gearbox_T3_High_Speed_Shaft_Temperature Gearbox_T1_Intermediate_Speed_Shaft_Temperature Temperature Gearbox Bearing Hollow Shaft Tower Acceleration Normal Gearbox_Oil-2_Temperature Tower Acceleration Lateral Temperature Bearing_A Temperature Trafo-3 ... Blade-2 Set Value_Degree Pitch Demand Baseline_Degree Blade-1 Set Value_Degree Blade-3 Set Value_Degree Moment Q Direction Moment Q Filltered Proxy Sensor_Degree-45 Turbine State Proxy Sensor_Degree-315 ay_ve_gun
136730 2021-08-15 00:00:00 60.068333 62.0 56.000000 58.000000 125.218666 60.000000 64.707336 54.348331 121.000000 ... 9.493241 8.925109 9.014512 8.266594 -41.861877 -37.917656 5.739297 1.0 5.734730 802
136731 2021-08-15 00:10:00 60.000000 62.0 56.000000 57.036667 145.160309 59.279999 64.127480 58.098331 120.971664 ... 7.507399 6.937748 7.022389 6.287027 -19.210815 -19.602339 5.720869 1.0 5.726634 802
136732 2021-08-15 00:20:00 60.000000 62.0 55.853333 57.000000 129.239914 59.000000 54.563091 60.360001 120.028336 ... 8.065812 7.497398 7.581376 6.844808 -28.144068 -34.329105 5.727475 1.0 5.728649 802
136733 2021-08-15 00:30:00 60.000000 62.0 55.000000 57.000000 140.151611 59.000000 61.899250 61.715000 120.000000 ... 8.132490 7.565773 7.654368 6.909220 -7.592476 -11.718444 5.728980 1.0 5.739824 802
136734 2021-08-15 00:40:00 60.000000 62.0 55.000000 57.000000 126.124702 59.000000 56.804501 62.698334 120.000000 ... 9.546413 8.974770 9.064083 8.313858 -7.760864 -9.863355 5.736651 1.0 5.747692 802
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
154257 2021-12-14 23:10:00 65.811668 0.0 59.945000 62.808334 225.038239 65.300003 109.889709 61.000000 97.000000 ... 15.820095 15.199166 15.235223 14.540556 -29.340843 -27.513502 5.746916 1.0 5.756082 1202
154258 2021-12-14 23:20:00 68.586670 0.0 62.084999 65.413330 229.905838 67.871666 106.016670 61.116665 97.000000 ... 16.504293 15.876278 15.917643 15.207320 -31.925669 -30.197918 5.749150 1.0 5.755406 1202
154259 2021-12-14 23:30:00 63.746666 0.0 59.965000 64.051666 223.352631 64.461670 111.690208 61.293335 97.000000 ... 15.331903 14.720088 14.768394 14.064686 -53.071564 -48.306511 5.751807 1.0 5.747936 1202
154260 2021-12-14 23:40:00 66.643333 0.0 60.678333 63.421665 227.704514 66.081665 119.716499 60.786667 97.000000 ... 16.481724 15.887610 15.945046 15.230121 -28.747763 -23.844364 5.747686 1.0 5.757787 1202
154261 2021-12-14 23:50:00 65.593330 0.0 60.738335 64.731667 223.235413 65.891670 103.372475 60.395000 97.000000 ... 16.198933 15.591414 15.635881 14.941538 -28.904552 -30.457935 5.753047 1.0 5.761520 1202

17532 rows Ă— 78 columns

split two part labelled data as training (67%) and testing (33%)

train_size = int(len(main) * 0.67)
test_size = len(main) - train_size
train, test = main[0:train_size], main[train_size:]

save training and testing datasets as csvs

submission.to_csv("submission.csv",index=False)
train.to_csv("TT.csv",index=False)
test.to_csv("t.csv",index=False)

Machine Learning Step

from sklearn.metrics import mean_absolute_error
from sklearn.metrics import mean_squared_error
from sklearn.metrics import r2_score
from sklearn.model_selection import KFold
import sklearn
from sklearn.ensemble import BaggingRegressor
from sklearn.ensemble import GradientBoostingRegressor
from sklearn.ensemble import RandomForestRegressor
from sklearn.linear_model import  LinearRegression
from sklearn.linear_model import BayesianRidge
from sklearn.linear_model import ElasticNet
from sklearn.linear_model import Lasso
from sklearn.neighbors import KNeighborsRegressor
from sklearn.svm import LinearSVR
from sklearn.svm import SVR
from sklearn.tree import DecisionTreeRegressor
from sklearn.isotonic import IsotonicRegression
from sklearn.ensemble import VotingRegressor
from sklearn.ensemble import GradientBoostingRegressor
from sklearn.ensemble import StackingRegressor
from sklearn.linear_model import RidgeCV, LassoCV
from sklearn.neighbors import KNeighborsRegressor
from sklearn.ensemble import GradientBoostingRegressor
from sklearn.ensemble import RandomForestRegressor
from sklearn.linear_model import LinearRegression
from sklearn.gaussian_process import GaussianProcessRegressor
from sklearn.gaussian_process.kernels import DotProduct, WhiteKernel
kernel = DotProduct() + WhiteKernel()
from xgboost import XGBRegressor



List of Machine learning algorithms

estimators = [('ridge', RidgeCV()),
               ('lasso', LassoCV(random_state=42)),
               ('knr', KNeighborsRegressor(n_neighbors=20,
                                          metric='euclidean'))]
final_estimator = GradientBoostingRegressor(
    n_estimators=25, subsample=0.5, min_samples_leaf=25, max_features=1,
    random_state=42)
reg = StackingRegressor(
    estimators=estimators,
    final_estimator=final_estimator)
from sklearn.linear_model import TweedieRegressor
reg1 = GradientBoostingRegressor(random_state=1)
reg2 = RandomForestRegressor(random_state=1)
reg3 = LinearRegression()
ml_list={'LR':LinearRegression(),'DT':DecisionTreeRegressor(),
'BR':BayesianRidge(),
'EL':ElasticNet(),
'twd':TweedieRegressor(),
'LAS':Lasso(),
'rcv':RidgeCV(), 
'lcv':LassoCV(),'BAG':BaggingRegressor(),
'GBR':GradientBoostingRegressor(),
'RF':RandomForestRegressor(),
'KNN':KNeighborsRegressor(),
#'LRVR':LinearSVR(),'SVR':SVR(),
#'iso':IsotonicRegression(),
'vot':VotingRegressor(estimators=[('gb', reg1), ('rf', reg2), ('lr', reg3)]),
'stc' : StackingRegressor(
    estimators=estimators,
    final_estimator=final_estimator),'XGB':XGBRegressor()}

split dataframe as data (X) and label (y)

def data_and_label(name):
    
    df = pd.read_csv(name)
    df.replace([np.inf, -np.inf], np.nan, inplace=True)
    df=df.fillna(0)
    del df["Timestamp"]
    X =df[df.columns[:-1]]
    X=np.array(X)
    y=np.array(df[df.columns[-1]])
    return X,y
path='./csv/'

Evaluation - Calculate error score

def score_erros(altime,train_time,test_time,expected,predicted,class_based_results,i,cv,dname,ii):     
      
    mse = mean_squared_error(expected, predicted)
    mae=mean_absolute_error(expected, predicted)
    rmse=mean_squared_error(expected, predicted, squared=False)
    r2=r2_score(expected, predicted)    
    precision,recall,f_score=0,0,0 
    print ('%-10s %-3s %-3s %-10s  %-8s %-8s %-8s %-11s %-8s %-8s %-8s %-6s %-6s %-16s' % (dname,i,cv,ii[0:6],str(round((precision),2)),str(round((recall),2)),str(round((f_score),2)),str(round((mse),2)), str(round((mae),2)),
        str(round((rmse),2)), str(round((r2),2)),str(round((train_time),2)),str(round((test_time),2)),altime))
    lines=str(dname)+","+str(i)+","+str(cv)+","+str(ii)+","+str(round((precision),15))+","+str(round((recall),15))+","+str(round((f_score),15))+","+str(round((mse),15))+","+str(round((mae),15))+","+str(round((rmse),15))+","+ str(round((r2),15))+","+str(round((train_time),15))+","+str(round((test_time),15))+"\n"
    
    return lines,class_based_results,mae

ML Function

def ML(output,file,test_file,i):
    ths = open(output, "a")
    X_test,y_test=data_and_label(test_file)
    ths.write ("Dataset,T,CV,ML_alg,precision,recall,f_scor,mse,mae,rmse, r2  ,tra-T,test-T,total\n")


    fold=5
    repetition=1
    class_based_results= pd.DataFrame()
    target_names=[0,1]


    for ii in ml_list:
        mae_min=1000

        cv=0
        dataset=file[-20:-4]
        clf = ml_list[ii]
        second=time.time()
        X_train,y_train=data_and_label(file)
        clf.fit(X_train, y_train)  
        train_time=(float((time.time()-second)) )
        second=time.time()
        predicted=clf.predict(X_test)
        test_time=(float((time.time()-second)) )
        expected = y_test
        
        error=[]
        for j in range(len(y_test)):
            error.append(abs(float(y_test[j])-float(predicted[j])))
        error.sort()
        cep68 = round((error[round(68 * len(error) / 100)])**(1/2),2)
        cep95 = round((error[round(95 * len(error) / 100)])**(1/2),2)
        cep=str(cep68)+'   '+str(cep95)
        
        line,cb,mae=score_erros(cep,train_time,test_time,expected, predicted,class_based_results,i,cv,dataset,ii)

        filename=f".sav"
        filename=filename.replace('\\','_')
        pickle.dump(clf, open(filename, 'wb'))

        ths.write (line)
    ths.close()  

tarining file

csvs=find_the_way("./",'TT')
csvs
['./TT.csv']

Results

print ('%-10s %-3s %-3s %-10s  %-8s %-8s %-8s %-11s %-8s %-8s %-8s %-6s %-6s %-16s' %
                   ("Dataset","T","CV","ML_alg",'prec','rec','f1',"mse","mae","rmse", "r2"  ,"T","t","CDF68    CDF95"))

for num,csv in enumerate(csvs):
    output="./results.csv" #OUTPUT
    test_file=csv.replace('TT','t') # TEST DATA# TEST DATA
    ML(output,csv,test_file,num)
Dataset    T   CV  ML_alg      prec     rec      f1       mse         mae      rmse     r2       T      t      CDF68    CDF95  
./TT       0   0   LR          0        0        0        1167041.09  989.28   1080.3   -0.01    2.23   0.01   34.42   40.37   
./TT       0   0   DT          0        0        0        24983.24    21.8     158.06   0.98     13.39  0.03   2.47   4.86     
./TT       0   0   BR          0        0        0        1168839.83  991.49   1081.13  -0.01    2.95   0.01   34.34   39.97   
./TT       0   0   EL          0        0        0        1167041.13  989.28   1080.3   -0.01    2.49   0.01   34.42   40.37   
./TT       0   0   twd         0        0        0        1167041.1   989.28   1080.3   -0.01    2.53   0.01   34.42   40.37   
./TT       0   0   LAS         0        0        0        1167041.15  989.28   1080.3   -0.01    2.12   0.01   34.42   40.37   
./TT       0   0   rcv         0        0        0        1167054.07  989.28   1080.3   -0.01    2.87   0.01   34.42   40.37   
./TT       0   0   lcv         0        0        0        1166893.56  991.61   1080.23  -0.01    4.54   0.01   34.37   39.94   
./TT       0   0   BAG         0        0        0        8307.35     14.41    91.14    0.99     57.68  0.34   2.2   4.96      
./TT       0   0   GBR         0        0        0        9996.79     52.03    99.98    0.99     149.85 0.12   6.76   13.03    
./TT       0   0   RF          0        0        0        5502.6      12.68    74.18    1.0      501.61 1.39   2.15   5.24     
./TT       0   0   KNN         0        0        0        557629.39   463.57   746.75   0.52     5.2    267.91 22.23   42.1    
./TT       0   0   vot         0        0        0        139551.09   341.29   373.57   0.88     761.01 2.17   20.05   23.42   
./TT       0   0   stc         0        0        0        519621.29   610.33   720.85   0.55     649.77 333.07 27.57   36.77   
./TT       0   0   XGB         0        0        0        9760.45     51.78    98.79    0.99     41.54  0.19   6.81   12.99    

About

Prediction of wind turbine power generation from real-time SCADA data

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published