Skip to content
Nathan Beals edited this page Jul 10, 2020 · 1 revision

Synomnic Search Wiki Home

Welcome to the home of the Synomnic and Journal search project, here you fill in-depth resources related to the internals of this project.

Contents

Built With

  • Flask - The web framework used (PYTHON 3.7)
  • Jinja2 - Template engine
  • Bootstrap - Front-end component library
  • D3 - JS visualization library
  • MySQL - Database back-end
  • Docker - Container / Dependency management

Summary

Debugging

All development was done using [Visual Studio Code] (https://code.visualstudio.com/), and thus the /.vscode files have been provided in order for easy debugging of code. Simply, install the IDE, along with the Python package (in the IDE), select your debug options to Flask (note this is not the same as Flask (Old)) and press play.

The project should be available at the URL:

http://localhost:5000/

Versioning

This project is being developed using an iterative approach. Therefore, new releases have yet been made and the project will be subject to drastic changes. No versioning practices will be followed until release. To see a history of changes made to this project, see commit history.

Key Features

Home Page

  1. Search Box - Begin searching the Erudit corpus from here (redirects to Analyzer)
  2. Recent Searches
  3. Upload file to analyze

Analyzer

The main interaction page. From this the user may interactively build queries with a navigable visual thesaurus.

  1. Search results (articles ranked by relevance)
  2. Active search terms
  3. Query results topics (from topic modelling)

Journal Search

Like search results, but amalgamated by journal instead. Upload a document to search for relevant results grouped by journal.


Project Structure

/ - Root

  • Readme.md - project readme, getting started
  • file_upload.py - Main entrypoint for the flask app, set export FLASK_APP=file_upload.py to run
  • run.py - alternate entrypoint that redirects to file_upload.py
  • babel.cfg - configuration for localization generation
  • makefile - commands for re-generating the localization files

/treetagger

Third party TreeTagger project location.

/translations

PyBabel translations go here.

/model

Topic model storage. gzipped.

/static - Bulk of Files

/static/css

Cascading stylesheets are found here. Additionally, some image resources used are found here too.

/static/js

Clientside javascript used to drive the UI/UX.

  • analyzer.js
    • Miscellaneous functions for the analyzer page for display or positioning for in-page interactions. If you are looking for functions that involve toggling a certain window, it is probably here.
  • events.js
    • Sole location for registering event handlers and calling initialization functions.
      • Additionally the keyword search code may be found here.
  • hooks.js
    • Ajax calls that interact with the database through the backend
  • intro.js
    • Interactive step-by-step introduction/guide for the website
  • journal.js
    • Miscellaneous functions for the /journal page
  • main.js
    • Entrypoint of the clientside javascript. Initializes values and other globals.
  • query.js
    • Any functions that have to do with the /analyzer search bar, as well as showing any such results will exist here.
  • vis.js
    • The bulk of the OHT visualization tool to look for new search terms/build your query
  • widget.js
    • The bulk of the Query Results widget on the /analyzer page

/static/lib

Libraries that have discrete functionality stored here

  • Bootstrap 3
  • Capture
    • custom library for capturing from webcams
  • Dropzone
  • Jquery

/static/py

All of the python serverside components.

Note: file_upload.py modifies it’s own system.path so that it may import the files within this directory directly without having to address the file through the directory in between.

File Notes
common.py various common helper functions, mostly xml related
constants.py project-wide constants
db.py Mysql helper object that manages connecting and querying the database
erudit_corpus.py Erudit corpus search functions, uses class indb.py to run sql search queries
erudit_parser.py Data loader from Erudit xml into the mysql database
oht.py Oxford Historical Thesaurus objects to enable traversing the OHT to support the tree visualization
pickle_session.py Persistent sessions for flask using pickle to store the data in /app_session
topic_model.py Class that handles all topic modelling functionality (uses TreeTagger and Latent Dirichlet Allocation, processes text, saves/loads model, performs document tfidf, etc)

/templates - Flask templates

Contains all the flask (jinja2) templates that are rendered server side before being sent to the client.

  • analyzer.html - When analyzing a document to build queries this page is used
  • base.html - Basic re-usable base used in interact.html****
  • explore.html - template for the query building/corpus exploring page.
  • index.html - Index (home) page.
  • journal.html - Start page for beginning a journal search. Uses dropzone to handle drag'n'drop (see static/lib/dropzone for more)
  • journal_analyzer.html - Journal search results page
  • journal_view.html - Per-journal results view

Code Supporting Key Features

This is non-exhaustive, meant to be a place on where to start.

Home

POST handling routes in file_upload.py that redirect to Analyzer.

Analyzer

File Note
file_upload.py Responsible for handling the http requests. Flask fills in the analyzer.html template with date from functions implemented in erudit_corpus.py, oht.py, topic_model.py
erudit_corpus.py, ohy.py, topic_model.py All play a part in providing data/processing to support the functionality of the document search/analyzer. see individual files in /static/py for more details.

Journal Search

Journal search is just a differently aggregated view of the same results under Analyzer, so most functionality there is relevant here

File Note
file_upload.py getJournalSearchResults(), journal_analyzer(), journal_view() related "journal" entries in that file