Skip to content

Get Github archive data, search by keyword and visualize the results

Notifications You must be signed in to change notification settings

hwaseonchoi/gharchive

Repository files navigation

Challenge GH Archive Keyword for Yousign

This is a technical test given by Yousign. It is part of recruiting process, though, the project itself is really interesting and challenging !

I thank Yousign team to offer me the chance to challenge this.

Install & Requirements

Prerequis

Install

  • In a terminal:
docker-compose build
docker-compose up
  • In another terminal (inside container):
docker exec -it -w /var/www php-fpm /bin/bash
cd var/www/
composer install

Import data

  • Exit from the container and import data from gharchive.org calling simply this script on the branch:
./import-gh-archive.sh
  • If you want to access mongo shell to do directly query from mongodb
docker exec -it mongo /bin/sh

Test

Go to http://localhost:81/search and try with any keywords.

Architecture

diagram

  • Use of MongoDB to store, process and query github archive json data
  • Use of docker to power a application platform together with nginx, php-fpm and mongo images
  • Importation of data with shell script and cli commands
  • Front app with PHP & Symfony 5

Documentations

A part of lots of Stackoverflow questions/answers that helped me, some documentations I've read and found interesting:

Missing parts or Improvements

  • Protection against mongo query injection
  • Write a Makefile to make things easier
  • Add Coding Standard verification command
  • Highlight search keyword in the commit list of selection d'évèvement
  • Management of error view
  • Make responsive part of commit list in selection d'évèvenement
  • Unit test
  • Gather more data with different dates to make date search correctly done
  • Reduce stored data (currently 330MB for 6 hours of data, 1.2GB per day)
  • Refactor MongoDBQueryService as repository directory bringing Dependancy Inversion principle
  • Try Elastic Search with RDBMS instead of MongoDB
  • Try more ODM part to map & query to compare performance

About

Get Github archive data, search by keyword and visualize the results

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published