Set GCS credentials

A valid GCS key file must be placed in the following directory in the repository:

/creds/WebCrawler-feb11a08e450.json

In order to generate this json file, see the instructions

Run tests locally

Unit tests can be run locally with the following command:

./run_unit_tests.sh

Local Dev Environment with Docker

The application requires the following containers:

Main Application
Crawler Manger
Crawler
MySQL
Redis

In production, MySQL and Redis will not be ran on containers.
Crawler Manager container also has an internal Redis service.

Dependencies

Docker Desktop (2.x)

Steps

Build and start the containers from the root of the repository:

touch main/kubeconfig; touch crawler-manager/kubeconfig
docker-compose -f docker-compose.yml up --build --no-start
docker-compose -f docker-compose.yml start

All three containers will be started, but only the main container will have a permanent webserver running, this is the access route:
```
Main: `http://localhost:8001/`   
```
If the main application is not loading, then likely the database isn't created properly. Enter the shell for the mysql container and create the test database
```
docker exec -it webcrawler_crawler-mysql5_1 bash
$ mysql -u root -proot

mysql> CREATE DATABASE test;
```
For running the crawler-manager (No Longer Required. Crawler manager is also getting started in step 1):

This does not need to be built, it's done already in the docker-compose command, but to simulate a new crawler manager, run the following:
```
docker exec -i crawler-manager /bin/bash -c "export JOB_ID=123; export ENVIRONMENT=prod && service redis-server start && python app.py"  
```
That will start the webserver for 60s, like is done in kubernetes
For running the crawler, is like the manager (No Longer Required. Crawler is also getting started in step 1)

Sames as above, does not need to be built.

For processing urls
```
docker exec -i crawler /bin/bash -c "export URL='http://google.com,https://cnn.com'; export ENVIRONMENT=prod && python app.py" 
```
To clear Redis cache
```
docker exec -it webcrawler_crawler-redis_1 redis-cli FLUSHALL
```
Crawler runs for 15s

Note: Change the value in the export command, for the URLS you want to process

In order to reload code changes into the containers, run these comands:

Main:

docker cp main/. main-app:/srv/www/web-crawler/
docker-compose -f docker-compose.yml restart main

Crawler Manager:

docker cp crawler-manager/. crawler-manager:/srv/www/web-crawler/    
docker-compose -f docker-compose.yml restart crawler-manager

Crawler:

docker cp crawler/. crawler:/srv/www/web-crawler/
docker-compose -f docker-compose.yml restart crawler

After all containers are running, you need to run the initialize-django.sh script, to initialize the DB and set up a super user, so you can use the admin UI to create more users.
```
docker exec -it main-app bash -c "./initialize-django.sh"
```

Database Migration In case that a new field is added to the model, the easiest way to update the database is by droping the database, creating it again and then run the initialize script:

drop database test;
create database test;

then run again the initialize script:

docker exec -it main-app bash -c "./initialize-django.sh"

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

Set GCS credentials

Run tests locally

Local Dev Environment with Docker

Dependencies

Steps

Files

README.md

Latest commit

History

README.md

File metadata and controls

Set GCS credentials

Run tests locally

Local Dev Environment with Docker

Dependencies

Steps