Recently reconversed from Oil&Gas field, I've never coded before 04/2021. The last time I coded some lines was with TurboPascal 👴 (I know I'm old) when I was in my final high school year 2002 and with C & Matlab when I was on the preparatory classes for Engineering school 2003
I've started this github in order to host and shares my academic mini-projects related to my master degree.
My very first project was a kind of familiarization with Python coding and was a PWVAP calculation of 10 bitcoins exchanges
The second project is related to Data management, Data processing & cleaning, Data viz, and an optional section of prediction using Logistic regression
Then came statistics project at the end of Statistics-1 module, and the project include a statistical descriptive analysis + CAPM/Fama&French 3 and 5 factors application on 10 stocks exchanges from DowJones
The Text mining project was also a cool one, the topic was Tweets scraping with keywords: "European super League", Text processing, Data viz and Text classification using unsupervised algorithms. I used K-Means, Topic Modeling Latent Dirichlet Allocation & NMF
The Time Series project tackled the COV-19 cases & confirmed deaths stats: The targeted countries were the European ones, The main parts of the projects were EDA and outliers treatments wich were mainly due to double tests PCR/Antigenic for cases numbers and wrong classification of deaths causes. A Data Viz part which highlighted a couple of statistical rates that explains differents aspects of countries reactions to the pandemic, then a modelisation part with Random walk,ARMA,SARIMAX and XGBoost Regressor applied on time Series.
The Machine Learning project used an open source dataset from Kaggle "Are you Gonna be Hired?" which was shuffled, modified & changed prior given to usage in the project, the main objective was a binary classification of the target: Hired = 1 / Not Hired = 0. One of the conditions in this project was to keep all the Test dataset complete without dropping Nans or reshuffling. The principle I followed in this project, was to focus on EDA and dealing with NaNs & Outliers, then after some features engineering, choosing a couple of Classification Algorithms and select the best model based on ROC/AUC score, models used : Logistic Regeression, KNN, Random Forest, Gradient Boosting, XGBoost, SVC. Features selection was also applied after tunning Hyperparameters.
The Last project in this degree is a Computer vision multiclass classification,based on a TF Keras backend, we used an open source dataset from Zalando "Fashion MNIST". In the project, I used a DNN / CNN & Transfer learning models (Inception V3 & VGG19), tunning hyperparameters with Keras Tuner & Hyperband, an optionnal XAI part was added with SamplingExplainer of SHAP package but the methodology can be improved.