Machine Learning in Python

This repository contains Machine Learning Projects in Python programming language. All the projects are done on Jupyter Notebooks.

Libraries Required

The following libraries are required to successfully implement the projects.

Python 3.6+
NumPy (for Linear Algebra)
Pandas (for Data Preprocesssing)
Scikit-learn (for ML models)
Matplotlib (for Data Visualization)
Seaborn (for statistical data visualization)

The projects are divided into various categories listed below -

Supervised Learning

Linear Regression
- Linear Regression Single Variables. : A Simple Linear Regression Model to model the linear relationship between Population and Profit for plot sales.
- Linear Regression Multiple Variables. : In this project, I build a Linear Regression Model for multiple variables for predicting the House price based on acres and number of rooms.
Logistic Regression : In this project, I train a binary Logistic Regression classifier to predict whether a student will get selected on the basis of mid semester and end semester marks.
Support Vector Machine : In this project, I build a Support Vector Machines classifier for predicting Social Network Ads . It predicts whether a user with age and estimated salary will buy the product after watching the ads or not. It uses the Radial Basic Function Kernal of SVM.
K Nearest Neighbours : K Nearest Neighbours or KNN is the simplest of all machine learning algorithms. In this project, I build a kNN classifier on the Iris Species Dataset which predict the three species of Iris with four features sepal_length, sepal_width, petal_length and petal_width.
Naive Bayes : In this project, I build a Naïve Bayes Classifier to classify the different class of a message from sklearn dataset called fetch_20newsgroups.
Decision Tree Classification : In this project, I used the Iris Dataset and tried a Decision Tree Classifier which give an accuracy of 96.7% which is less than KNN.
Random Forest Classification : In this project I used Random Forest Classifier and Random Forest Regressor on the Social Network Ads dataset.

Unsupervised Learning

K Means Clustering : K-Means clustering is used to find intrinsic groups within the unlabelled dataset and draw inferences.It is one of the most detailed projects, In this project, I implement K-Means Clustering on Credit Card Dataset to cluster different credit card users based on the features.I scaled the data using StandardScaler because normalizing(scale in range 0 to 1) will improves the convergence.I also implemented the Elbow Method to search for the best numbers of clusters.For visualizing the dataset I used PCA(Principal Component Analysis) for dimensionality reduction as the dataset features were large in number.In the end I used Silhouette Score which is used to calculate the performance of clustering . It ranges from -1 to 1 and I got a score of 0.203.

NLP( Natural Language Processing )

Text Analytics : It is a project for Introduction to Text Analytics in NLP.I performed the important steps -
- Tokenization
- Removal of Special Characters
- Lower Case
- Removing StopWords
- Stemming
- Count Vectorizer ( which generally performs all the steps mentioned above except Stemming)
- DTM (Document Term Matrix)
- TF-IDF (Text Frequency Inverse Document Frequency)
Sentiment Analysis : I applied Sentiment analysis in MovieReview (Dataset from nltk library) and RestaurentReview Datasets to predict the positive and negative review . I used Naive Bayes Classifier (78.8%) and Logistic Regression (84.3%) to build the models and for prediction.

Data Cleaning and Preprocessing

Data Preprocessing : I perform various data preprocessin and cleaning methods which are mentioned below -
- Label Encoding : It converts each category into a unique numeric value ranging from 0 to n(size of dataset).
- Ordinal Encoding : Categories to ordered numerical values.
- One Hot Encoding : It creates a dummy variable with value 0 to n(unique value count in the column) for each category value.Extra columns are created.

Some Comparisons on Datasets

Social Network Ads	Accuracy
Support Vector Machine	90.83%
Random Forest Classifier	90.0%
Random Forest Regressor	61.8%

Iris Dataset	Accuracy
KNN	98.3%
Decision Tree	96.7%

Name		Name	Last commit message	Last commit date
Latest commit History 39 Commits
Data Preprocessing		Data Preprocessing
Decision Tree		Decision Tree
K-NN		K-NN
K-means		K-means
Linear Regression		Linear Regression
Logistic Regression/Logistic		Logistic Regression/Logistic
RandomForest		RandomForest
SVM		SVM
Sentiment Analysis		Sentiment Analysis
TextAnalytics		TextAnalytics
TextClassification		TextClassification
LICENSE		LICENSE
README.md		README.md
_config.yml		_config.yml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Machine Learning in Python

Libraries Required

Supervised Learning

Unsupervised Learning

NLP( Natural Language Processing )

Data Cleaning and Preprocessing

Some Comparisons on Datasets

Kaggle

About

Packages

Languages

License

suubh/Machine-Learning-in-Python

Folders and files

Latest commit

History

Repository files navigation

Machine Learning in Python

Libraries Required

Supervised Learning

Unsupervised Learning

NLP( Natural Language Processing )

Data Cleaning and Preprocessing

Some Comparisons on Datasets

Kaggle

About

Topics

Resources

License

Stars

Watchers

Forks

Packages 0

Languages

Packages