Skip to content

anuraganand92/AI42001-Agglomerative-Clustering-soil-components

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

5 Commits
 
 
 
 
 
 

Repository files navigation

Hierarchical Agglomerative Clustering on Soil Component Data

Project Description

This project focuses on performing Hierarchical Agglomerative Clustering on a dataset containing district names and their corresponding component percentages for soil. The goal is to group districts based on the similarity of their soil component percentages using different linkage methods. The project also includes plotting a dendrogram to visualize the clustering results and identifying the break line for clustering.

Dataset

The dataset consists of district names and their corresponding component percentages for soil. It is assumed that the dataset has been preprocessed and contains no missing data.

Project Steps

  1. Data Cleaning: Remove any rows with missing data from the dataset to ensure that only complete data points are considered for clustering.

  2. Data Sampling: Randomly choose 155 data points from the dataset for the clustering task. This sampling step ensures that the clustering analysis is performed on a representative subset of the data.

  3. Hierarchical Agglomerative Clustering: Apply Hierarchical Agglomerative Clustering to the sampled data using different linkage methods. Some commonly used linkage methods include:

    • Single linkage
    • Complete linkage
    • Average linkage
    • Ward linkage
  4. Clustering Comparison: Compare the outputs of the clustering analysis using the different linkage methods. Evaluate the resulting clusters based on their similarity and dissimilarity in terms of the soil component percentages.

  5. Dendrogram Visualization: Plot a dendrogram to visualize the hierarchical clustering results. The dendrogram will illustrate the hierarchical structure of the clusters and the distances between data points. Include a break line in the dendrogram to indicate the optimal number of clusters or the desired level of clustering.

Releases

No releases published

Packages

No packages published