Titanic Dataset Analysis using PCA and Apriori Algorithm

Principal Component Analysis

This analysis utilizes Principal Component Analysis (PCA) to reduce the dimensionality of the Titanic dataset. PCA is a dimensionality reduction technique that extracts important features while removing redundancy. This allows us to visualize the dataset in 2D space while preserving as much variance as possible. Here, we will visualize how the PCA-transformed data relates to the Survival status of the Passengers, taking into consideration that the 'Survived' attribute is the target variable.
This analysis uses certain libraries such as:

scikit-learn - To perform Principal Component Analysis
pandas - To read the Titanic dataset, and
matplotlib.pyplot - To visualize the 2D PCA plot.

Apriori Algorithm

The Apriori Algorithm is used to discover frequent itemsets and association rules in large datasets. It is particularly useful in market basket analysis, where the goal is to identify the patterns in customer purchasing behavior.

1. Implementation of Apriori Algorithm:

The Apriori algorithm works by identifying frequent itemsets (groups of items) based on a minimum support threshold. It starts with 1-itemsets and builds larger itemsets in an iterative manner by joining previously found frequent itemsets. The process continues until no more frequent itemsets can be found.

How It Works

Step 1: Generate Frequent 1-Itemsets

The algorithm first calculates the support of each individual item and filters out those that do not meet the minimum support threshold.

Step 2: Generate Larger Frequent Itemsets

Starting from the frequent 1-itemsets, the algorithm generates candidate itemsets of size 2, 3, and so on. For each iteration (k-itemsets):

Combine frequent (k-1)-itemsets to generate candidate k-itemsets.
Calculate the support for each candidate.
Retain only those itemsets whose support is greater than or equal to the minimum support.

Step 3: Termination

The process stops when no more frequent itemsets can be generated.

2. Finding Frequent Itemset through Apriori using Libraries:

To identify the frequent itemsets from a list of transactions, we can implement Apriori Algorithm using the 'mlxtend' library. The transactions are first transformed into a one-hot encoded DataFrame, which is then used as input for the Apriori algorithm.

3. Finding the Association Rules of Groceries Dataset

Here, we implement the Apriori algorithm for association rule mining using the Groceries dataset where each row represents a transaction, and each column corresponds to an item purchased. The goal is to discover interesting patterns, associations, and relationships between items bought in transactions.

Name		Name	Last commit message	Last commit date
Latest commit History 14 Commits
LICENSE		LICENSE
Principal Component Analysis.ipynb		Principal Component Analysis.ipynb
README.md		README.md
finding_frequent_itemset_through_apriori.ipynb		finding_frequent_itemset_through_apriori.ipynb
finding_the_association_rules_using_apriori.ipynb		finding_the_association_rules_using_apriori.ipynb
groceries.csv		groceries.csv
implement_apriori_algorithm.py		implement_apriori_algorithm.py
titanic.csv		titanic.csv

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Titanic Dataset Analysis using PCA and Apriori Algorithm

Principal Component Analysis

Apriori Algorithm

1. Implementation of Apriori Algorithm:

How It Works

Step 1: Generate Frequent 1-Itemsets

Step 2: Generate Larger Frequent Itemsets

Step 3: Termination

2. Finding Frequent Itemset through Apriori using Libraries:

3. Finding the Association Rules of Groceries Dataset

About

Releases

Packages

Languages

License

jatinn-sw/titanic-dataset-analysis-using-PCA-and-groceries-dataset-analysis-using-Apriori-Algorithm

Folders and files

Latest commit

History

Repository files navigation

Titanic Dataset Analysis using PCA and Apriori Algorithm

Principal Component Analysis

Apriori Algorithm

1. Implementation of Apriori Algorithm:

How It Works

Step 1: Generate Frequent 1-Itemsets

Step 2: Generate Larger Frequent Itemsets

Step 3: Termination

2. Finding Frequent Itemset through Apriori using Libraries:

3. Finding the Association Rules of Groceries Dataset

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages