Async Web Crawler

An async Python web crawler.

Local environment

Setup

In a virtual env:

pip install -r requirements.txt

(or pip-sync if you have pip-tools)

Run crawler

cd to the root directory and:

./run_crawler.py https://github.com

Run tests

cd to the root directory and:

./run_tests.sh

Docker

Unfortunately, because of an issue with aiohttp/python, as mentioned here, the code that is reponsible for making the get requests might fail on machines that do not have a Python version of 3.7.4 and higher. For taht case, I have included a docker file and some commands where that you can use to run the crawler and the tests inside docker, using a Python 3.8 image.

Setup

Install Docker.
From the root directory, run:

docker build -t crawler .

Run crawler

docker run crawler python run_crawler.py https://github.com

Run tests

docker run crawler python -m pytest tests/

ssh into the container

If you need to ssh into the container to check what is there, like the crawler resuls file, you can do:

docker run -it crawler /bin/sh

Name		Name	Last commit message	Last commit date
Latest commit History 12 Commits
tests		tests
.gitignore		.gitignore
Dockerfile		Dockerfile
README.md		README.md
crawler.py		crawler.py
parser.py		parser.py
requirements.in		requirements.in
requirements.txt		requirements.txt
run_crawler.py		run_crawler.py
run_tests.sh		run_tests.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Async Web Crawler

Local environment

Setup

Run crawler

Run tests

Docker

Setup

Run crawler

Run tests

ssh into the container

About

Releases

Packages

Languages

stathismor/async-crawler

Folders and files

Latest commit

History

Repository files navigation

Async Web Crawler

Local environment

Setup

Run crawler

Run tests

Docker

Setup

Run crawler

Run tests

ssh into the container

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages