Grouper-IR

Divides the given set of Names into groups using accessible Digital footprints.

Used Libraries

1. Obtaining Data of Given Names

Get the top 10 links related to the given name using googlesearch Library.
Get the text using BeautifulSoup Library.
Pre- process all the text( Removing all stopwords, Apply stemming, tokenize) using NLTK library.

2. Forming TF_IDF matrix for Each Name

3.Create LSI Corpus and Cluster them

DEMO

Name		Name	Last commit message	Last commit date
Latest commit History 9 Commits
Images		Images
README.md		README.md
clustering.py		clustering.py
irProject.py		irProject.py
lsa.py		lsa.py
project.py		project.py
tfidf1.py		tfidf1.py
web.ipynb		web.ipynb
webparse.py		webparse.py
webparse.pyc		webparse.pyc

Provide feedback