🕷️ SpiderSel 🕷️

Python 3 script to crawl and spider websites for keywords via selenium

💎 Features

SpiderSel provides the following features:

Crawling of HTTP and HTTPS websites for keywords via Selenium (native JS support)
Spidering of new URLs found within source code (adjustable depth, stays samesite)
Filtering keywords by length and removing non-sense (paths, emails, protocol handlers etc.)
Storing keywords and ignored strings into a separate results directory (txt files)

Basically alike to CeWL or CeWLeR but with support for websites that require JavaScript.

🎓 Usage

usage: spidersel.py [-h] --url URL [--depth DEPTH] [--min-length MIN_LENGTH]

Web Crawler and Keyword Extractor

options:
  -h, --help                  show this help message and exit
  --url URL                   URL of the website to crawl
  --depth DEPTH               Depth of subpage spidering (default: 1)
  --min-length MIN_LENGTH     Minimum keyword length (default: 4)
  --lowercase                 Convert all keywords to lowercase
  --include-emails            Include emails as keywords

🐳 Example 1 - Docker Run

External Dockerhub Image

docker run -v ${PWD}:/app/results --rm l4rm4nd/spidersel:latest --url https://www.apple.com --lowercase --include-emails

You will find your scan results in the current directory.

Local Docker Build Image

If you don't trust my image on Dockerhub, please go ahead and build the image yourself:

git clone https://github.com/Haxxnet/SpiderSel && cd SpiderSel
docker build -t spidersel .
docker run -v ${PWD}:/app/results --rm spidersel --url https:/www.apple.com --lowercase --include-emails

🐍 Example 2 - Native Python

Installation

# clone repository and change directory
git clone https://github.com/Haxxnet/SpiderSel && cd SpiderSel

# optionally install google-chrome if not available yet
wget https://dl.google.com/linux/direct/google-chrome-stable_current_amd64.deb
sudo dpkg -i google-chrome-stable_current_amd64.deb

# install python dependencies; optionally use a virtual environment (e.g. virtualenv, pipenv, etc.)
pip3 install -r requirements.txt

Running

python3 spidersel.py --url https://www.apple.com/ --lowercase --include-emails

The extracted keywords will be stored in an output file within the results folder.

Name		Name	Last commit message	Last commit date
Latest commit History 51 Commits
.github		.github
.gitignore		.gitignore
CHANGELOG.md		CHANGELOG.md
Dockerfile		Dockerfile
README.md		README.md
package.json		package.json
requirements.txt		requirements.txt
spidersel.py		spidersel.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

🕷️ SpiderSel 🕷️

💎 Features

🎓 Usage

🐳 Example 1 - Docker Run

External Dockerhub Image

Local Docker Build Image

🐍 Example 2 - Native Python

Installation

Running

About

Sponsor this project

Contributors 2

Languages

Haxxnet/SpiderSel

Folders and files

Latest commit

History

Repository files navigation

🕷️ SpiderSel 🕷️

💎 Features

🎓 Usage

🐳 Example 1 - Docker Run

External Dockerhub Image

Local Docker Build Image

🐍 Example 2 - Native Python

Installation

Running

About

Topics

Resources

Stars

Watchers

Forks

Sponsor this project

Contributors 2

Languages