In this project, we explore NetworkX, a Python library for graph algorithms and visualizations. We scrape Wikipedia pages for any arbitrary search word and get the first web link referred by the wiki page. Then, we visit that page and again get the first link. We recursively keep visiting the web links and store them in a graph. If the link is already visited, we stop. The links/search words are stored as nodes in a directed graph. The graph is stored using Pickle library in Python. The final visualization of the graph in a HTML page is done using PyVis
. PyVis takes in the NetworkX graph and renders a HTML page using vis.js
.
- Webscaping using requests and Beautiful Soup.
- Cleaning links to get words to be used as graph nodes.
- Build graph with nodes obtained using NetworkX.
- Visualization of graph using PyVis.
- Recursively visit and store all links from first non-empty <p> tag in each wiki page.
Try -
- Breadth First Search (BFS)
- Depth First Search (DFS)
- NetworkX
- PyVis
- Beautiful Soup
- Requests
- Pickle
- Jupyter Notebook
Contributions are what make the open source community such an amazing place to be learn, inspire, and create. Any contributions you make are greatly appreciated.
- Fork the Project (click on
Fork
in the top-left corner) - Create your Feature Branch (
git checkout -b feature
) - Commit your Changes (
git commit -m 'Add some AmazingFeature'
) - Push to the Branch (
git push origin feature
) - Open a Pull Request
Sinjoy Saha