In this application we Scrap data from various data sources. [ Mostly healthcare facility websites in the US ] and Store those data into a database. We provide a searchable interface where users can search for facilities based on the state, city or even the facility name.
If queried listing is not found, user can make a request and the system will fetch the given info.
Users can also view the different facility details. They can upload images of these facilities so that other users can be benefitted.
- Application
- Technologies Used:
- Architecture Overview
- Api Specifications
- Deployment - SSL Configuration: - Database Configuration: - Proxy: - Storing Images in S3:
Live Demo: Link
The backend server is turned off for now. You will need to clone the repo and deploy the server inorder to preview.
- Vite.js
- React
- Material-ui
- Nodejs
- Express
- Sequelize
- MySQL
- python3
- selenium
- Frontend Deployed in Vercel
- AWS RDS is used for MySQL Database
- AWS s3 is used for image storage
- AWS Lightsail is used for server deployment
Full Architecture
We divide the backend apis into different services which serves various purposes. Here are the services that are used by the frontend
Cities:
We use a third party api POST https://countriesnow.space/api/v0.1/countries/state/cities
- This returns the list of citie in a given state.
Requests:
POST /v1/requests
- adds a request to the db for scraping.GET /v1/requests
- get all request
Listings
GET /v1/listings
- get all listingsGET /v1/listings/:slug
- get listing by slugPOST /v1/listings/images
- upload imagesPOST /v1/listings/search
- returns the partial name, state or cities which match with the user input
Private Routes (Only accessible inside the network/VPC)
-
GET /private/requests/uncrawled
- Gets one uncrawled request -
PUT /private/requests/:id
- update request by id -
GET /private/requests/:id
- get request by id -
POST /private/listings/
- insert listing -
POST /private/listings/multiple
- insert multiple listings -
PUT /private/listings/:id
- update listing bygovSiteId
-
Versioning: We maintain a versioning system for our apis initially so that when a breaking change is introduced in production the application does not break.
-
Private Routes: We have introduced a proxy which ensure that our application can only be acessed from outside the instance through two ports
80
and443
the proxy also does automatic redirects fromhttp
tohttps
. Thus, other services such as the crawler can access private routes without any need for authentication and write securely to the db.
We use docker for deploying the whole application since docker makes it very easy to manage dependencies between local and prod environments. The entire application is hosted in a single Lightsail instance.
SSL was configured using certbot
. Further details provided here
We are using MySQL database for this project.
- Hosted in AWS RDS (Free Tier)
- For development, we use a local environment
More details here
We use haproxy as a reverse proxy to route requests from the client to our server. Haproxy is also used to configure ssl very easily in our application and enable https redirection.
First we create a bucket in S3 for our project. In the server we use the aws-sdk
for uploading images in s3. To do that, first we need to create a new IAM
account. This will allow our server to interact with AWS programatically. We grant this IAM account very limited access (s3 read-write only). All the credentials are stored in the server .env
file.
BUCKET_NAME=
S3_ACCESS_KEY=
S3_SECRET=
Front the front-end we use multi-part/formData
to make an api call which takes the binary files of all the images. and then in the server, we write our own custom middleware which uses multer
and aws-sdk
to upload the images to s3.
Once the images are uploaded, the middleware returns a list of urls
:string[]
which are then store in the database.