-
Notifications
You must be signed in to change notification settings - Fork 0
Home
Welcome to the home of the Synomnic and Journal search project, here you fill in-depth resources related to the internals of this project.
- Flask - The web framework used (PYTHON 3.7)
- Jinja2 - Template engine
- Bootstrap - Front-end component library
- D3 - JS visualization library
- MySQL - Database back-end
- Docker - Container / Dependency management
All development was done using [Visual Studio Code] (https://code.visualstudio.com/), and thus the /.vscode
files have been provided in order for easy debugging of code. Simply, install the IDE, along with the Python package (in the IDE), select your debug options to Flask (note this is not the same as Flask (Old)) and press play.
The project should be available at the URL:
http://localhost:5000/
This project is being developed using an iterative approach. Therefore, new releases have yet been made and the project will be subject to drastic changes. No versioning practices will be followed until release. To see a history of changes made to this project, see commit history.
- Search Box - Begin searching the Erudit corpus from here (redirects to Analyzer)
- Recent Searches
- Upload file to analyze
The main interaction page. From this the user may interactively build queries with a navigable visual thesaurus.
- Search results (articles ranked by relevance)
- Active search terms
- Query results topics (from topic modelling)
Like search results, but amalgamated by journal instead. Upload a document to search for relevant results grouped by journal.
- Readme.md - project readme, getting started
-
file_upload.py - Main entrypoint for the flask app, set
export FLASK_APP=file_upload.py
to run -
run.py - alternate entrypoint that redirects to
file_upload.py
- babel.cfg - configuration for localization generation
- makefile - commands for re-generating the localization files
Third party TreeTagger project location.
PyBabel translations go here.
Topic model storage. gzipped.
Cascading stylesheets are found here. Additionally, some image resources used are found here too.
Clientside javascript used to drive the UI/UX.
-
analyzer.js
- Miscellaneous functions for the analyzer page for display or positioning for in-page interactions. If you are looking for functions that involve toggling a certain window, it is probably here.
-
events.js
- Sole location for registering event handlers and calling initialization functions.
- Additionally the keyword search code may be found here.
- Sole location for registering event handlers and calling initialization functions.
-
hooks.js
- Ajax calls that interact with the database through the backend
-
intro.js
- Interactive step-by-step introduction/guide for the website
-
journal.js
- Miscellaneous functions for the /journal page
-
main.js
- Entrypoint of the clientside javascript. Initializes values and other globals.
-
query.js
- Any functions that have to do with the /analyzer search bar, as well as showing any such results will exist here.
-
vis.js
- The bulk of the OHT visualization tool to look for new search terms/build your query
-
widget.js
- The bulk of the Query Results widget on the /analyzer page
Libraries that have discrete functionality stored here
- Bootstrap 3
- Capture
- custom library for capturing from webcams
- Dropzone
- Jquery
All of the python serverside components.
Note: file_upload.py
modifies it’s own system.path
so that it may import
the files within this directory directly without having to address the file through the directory in between.
File | Notes |
---|---|
common.py |
various common helper functions, mostly xml related |
constants.py |
project-wide constants |
db.py |
Mysql helper object that manages connecting and querying the database |
erudit_corpus.py |
Erudit corpus search functions, uses class indb.py to run sql search queries |
erudit_parser.py |
Data loader from Erudit xml into the mysql database |
oht.py |
Oxford Historical Thesaurus objects to enable traversing the OHT to support the tree visualization |
pickle_session.py |
Persistent sessions for flask using pickle to store the data in /app_session
|
topic_model.py |
Class that handles all topic modelling functionality (uses TreeTagger and Latent Dirichlet Allocation, processes text, saves/loads model, performs document tfidf, etc) |
Contains all the flask (jinja2) templates that are rendered server side before being sent to the client.
- analyzer.html - When analyzing a document to build queries this page is used
-
base.html - Basic re-usable base used in
interact.html
**** - explore.html - template for the query building/corpus exploring page.
- index.html - Index (home) page.
-
journal.html - Start page for beginning a journal search. Uses dropzone to handle drag'n'drop (see
static/lib/dropzone
for more) - journal_analyzer.html - Journal search results page
- journal_view.html - Per-journal results view
This is non-exhaustive, meant to be a place on where to start.
POST handling routes in file_upload.py
that redirect to Analyzer.
File | Note |
---|---|
file_upload.py |
Responsible for handling the http requests. Flask fills in the analyzer.html template with date from functions implemented in erudit_corpus.py, oht.py, topic_model.py
|
erudit_corpus.py , ohy.py , topic_model.py
|
All play a part in providing data/processing to support the functionality of the document search/analyzer. see individual files in /static/py for more details. |
Journal search is just a differently aggregated view of the same results under Analyzer, so most functionality there is relevant here
File | Note |
---|---|
file_upload.py |
getJournalSearchResults() , journal_analyzer() , journal_view() related "journal" entries in that file |