Skip to content

Divides the given set of Full Names into groups using accessible Digital footprints.

Notifications You must be signed in to change notification settings

suhasgumma/Grouper-IR

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

9 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Grouper-IR

Divides the given set of Names into groups using accessible Digital footprints.

Used Libraries

  1. googlesearch
  2. BeautifulSoup
  3. requests
  4. NLTK
  5. numpy
  6. Lsa
  7. sklearn

1. Obtaining Data of Given Names

  • Get the top 10 links related to the given name using googlesearch Library.

  • Get the text using BeautifulSoup Library.

  • Pre- process all the text( Removing all stopwords, Apply stemming, tokenize) using NLTK library.

2. Forming TF_IDF matrix for Each Name

  • Buid a Tf-Idf matrix for each name from the pre-processed text.

3.Create LSI Corpus and Cluster them

  • Create LSI corpus using Tf-Idf matrix and cluster them using K-means Algorithm.

DEMO

About

Divides the given set of Full Names into groups using accessible Digital footprints.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages