This project contains two modules, a fetcher
to download the entire content of a specified website and a server
to provide search results based on the keywords extracted from the downloaded website.
This project uses a cmake
based build system. Hence to build it in the first place cmake >=3.6
is required. Additionally it also depends on several external libraries which are necessary for the project to work. The required libraries are as follows:
openssl
- To fetch websites fromhttps
hostsmysql
- To store fetched and processed linksthreads
- For multi-threading during website download
sudo apt-get update
sudo apt-get install cmake libssl-dev pkgconf libmysqlclient-dev mysql-client mysql-server
brew update
brew install cmake openssl mysql-server mysql-client mysql-dev
To build this project execute:
cmake CMakelists.txt
make
mysql -u <username> -p <password>
mysql> create database np_select;
mysql> use np_select;
mysql> SELECT * FROM Links WHERE id > 0 ORDER BY id ASC LIMIT 1000;
The configuration can be provided in config.ini
. Different config files can be used at the same time as long as they are passed on to the program as arguments. If nothing specified, program will take config.ini
as the default configuration.
save_location
: location to download the websiteinvalid_save_location
: location to download invalid URLs
host
: hostname of website to be downloadedprotocol
: http or httpsstart_page
: entry pointbegin_at
: beginning id of database index (can be used to resume download)cert
: location to certificate chaintimeout
: timeout in seconds
host
: host of the database e.g. localhostusername
: username for the database connectionpassword
: password for the database connectionname
: name of the database
root_path
: location to the template files for the search engine
Both the executables can be called as follows:
./fetcher [config_file]
./server [config_file]
This repository was developed by Kunal Pal and Thorsten Born as part of their Network Programming lab project.