This project was collaborated with Rapid7, as part of the Cyber4s program by Scale-Up Velocity.
The goal of the project is to scrape data from a pastes site (like Pastebin) in the dark web.
The data should be analyzed and saved in a database, and then be presented in a dashboard.
The code is split to a front end application and 2 main services:
- Frontend: built with React and Typescript.
- Backend: built with Nodejs, Express, Typescript and MongoDB.
- Scraper: built with Python and MongoDB.
The paste site is scraped from Tor browser.
The API server and the python scraper act as microservices and communicate via RabbitMQ.
The entire application is dockerized using docker-compose.
Deployed on an AWS EC2 instance.
Beware - The pastes in the website are real, some content may be inappropriate.
darkweb-scraper.tk
(Alternative Link)
In the root folder run:
docker-compose up
Note: it might take some time for all the container to build.
When all the containers are running, go to localhost:3000.
You will then have to wait for the first scrape to complete before any data will be presented in the dashboard.