diff --git a/README.MD b/README.MD index 93fc8ee..1c1ba68 100644 --- a/README.MD +++ b/README.MD @@ -9,81 +9,13 @@ ShiCo is a tool for visualizing time shifting concepts. We refer to a concept as ![Mock concept shift](./docs/mockConcept1.png) ![Mock concept shift](./docs/mockConcept2.png) -You can find more details of how the concept shift works [here](./docs/howItWorks.md). +You can find more details of how the concept shift works [here](./docs/howItWorks.md) and you can read the user documentation [here](./docs/ui.md). - - *How is it structured (backend/frontend)* - - *What the back end does?* - - *What do I see in the front end?* +## How to use it? +You can read how to get your own instance of ShiCo up and running [here](./docs/deploy.md). -*How to use it?* - - *What do I need to run it? (python, some web server, word2vec models)* - -*How to extend it?* - - *Use different semantic model (other than word2vec)* - -## Launching server - -To launch the server run: -``` -# python shico/server.py -f "word2vecModels/195?_????.w2v" -``` - -*Note:* loading the word2vec models takes some time and may consume a large amount of memory. - -Then you can access trace a concept by connecting to the server using curl (or your web browser). Examples: - -``` -http://localhost:5000/track/oorlog -http://localhost:5000/track/oorlog?startKey=1952_1961 -http://localhost:5000/track/oorlog?startKey=1952_1961&maxTerms=5 -http://localhost:5000/track/oorlog?startKey=1952_1961&maxTerms=5&forwards= -http://localhost:5000/track/nederland?maxTerms=5&sumDistances=true -http://localhost:5000/track/nederland?maxTerms=5&sumDistances= -http://localhost:5000/track/oorlog,oorlogse -``` - -## Web app - -### Adding hooks - -You can add your own custom behaviour to the force directed graphs like this: -``` -(function() { - 'use strict'; - - angular - .module('shico') - .run(runBlock); - - function runBlock(GraphConfigService) { - GraphConfigService.addForceGraphHook(function(node) { - node.select('circle').attr('r', function(d) { - return d.name.length; - }); - }); - } -})(); - -``` - -This snippet modifies the size of the force directed graph nodes, and makes them dependent on the length of the name in the node's data. - - -## Unit testing -To run Python unit tests, run: -``` -$ nosetests -``` - -## Cleaning functions -In some cases, resulting vocabularies may contain words which we would like to filter. ShiCo offers the possibility of using a *cleaning* function, for filtering vocabularies after they have been generated. To use this option, it is necessary to indicate the name of the cleaning function when starting the ShiCo server. A sample cleaning function is provided (*shico.extras.cleanTermList*). You can use this function as follows: -``` -$ python shico/server.py -c "shico.extras.cleanTermList" -``` - -## Speeding up ShiCo - -Current implementation of ShiCo relies on gensim word2vec model `most_similar` function, which in turn requires the calculation of the dot product between two large matrices, via `numpy.dot` function. For this reason, ShiCo greatly benefits from using libraries which accelerate matrix multiplications, such as OpenBLAS. ShiCo has been tested using [Numpy with OpenBLAS](https://hunseblog.wordpress.com/2014/09/15/installing-numpy-and-openblas/), producing a significant increase in speed. +## How to extend it +If you would like to modify ShiCo, read the developer manual [here](./docs/develop.md). ## Licensing diff --git a/docs/deploy.md b/docs/deploy.md index a9f4bb7..1c321f1 100644 --- a/docs/deploy.md +++ b/docs/deploy.md @@ -1,6 +1,78 @@ -# Making a release +# Deploying ShiCo +If you want to run your own instance of ShiCo, there are a few things you will need: - - Merge changes on branch `demo` - - Run `gulp build` - - Make github release - + - A set of word2vec models which your ShiCo instance will use. + - Run the python back end on your a server (you will need a server with enough memory to hold your word2vec models). + - Run a web server to serve the front end to the browser. + +## Word2vec models + +You are welcome to use our [existing w2v models](https://github.com/NLeSC/ShiCo/tree/master/word2vecModels); you might need to use [git-lfs](https://git-lfs.github.com/) to download them. If you do, please contact us for more details on how the models were build and to know how to cite our work. You can also [create your own](./docs/buildingModels.md) models, based on your own corpus. + +## Launching the back end + +Once you have downloaded the code (or clone this repo), and install all Python requirements (contained in *requirements.txt*), you can launch the flask server as follows: +``` +$ python shico/server/app.py -f "word2vecModels/????_????.w2v" +``` + +*Note:* loading the word2vec models takes some time and may consume a large amount of memory. + +You can check that the server is up and running by connecting to the server using curl (or your web browser): +``` +http://localhost:5000/load-settings +``` + +Alternatively you use [Gunicorn](http://gunicorn.org/), by setting your configuration on *shico/server/config.py* and then running: + +``` +$ gunicorn --bind 0.0.0.0:8000 --timeout 1200 shico.server.wsgi:app +``` + +## Launching the front end + +The necessary files for serving the front end are located in the *webapp* folder. You will need to edit your configuration file (*webapp/srs/config.json*) to tell the front end where your back end is running. For example, if your backend is running on *localhost* port 5000 as in the example above, you would set your configuration file as follows: + +``` +{ + "baseURL": "http://localhost:5000" +} +``` + +If you are familiar with the Javascript world, you can use the *gulp* tasks provided. You can serve your front end as follows (from the *webapp* folder): +``` +$ gulp serve +``` + +You can build a deployable version (minified, uglified, etc) as follows: +``` +$ gulp build +``` +This will build a deployable version on the *webapp/dist* folder. + +## Pre-build deployable version + +If you are not familiar with the Javascript world (or just don't feel like building your own deployable version), the *demo* branch of this repository contains a pre-build version of the front end. You can checkout (or download) that branch, and then you are ready to go. + +## Serve with your favorite web server + +Once you have a *webapp/dist* folder (whether downloaded or self built) you can serve the content of it using your favorite web server. For example, you could use Python SimpleHTTPServer as follows (from the *webapp/dist* folder): +``` +$ python -m SimpleHTTPServer +``` + +## Cleaning functions +In some cases, resulting vocabularies may contain words which we would like to filter. ShiCo offers the possibility of using a *cleaning* function, for filtering vocabularies after they have been generated. To use this option, it is necessary to indicate the name of the cleaning function when starting the ShiCo server. A sample cleaning function is provided (*shico.extras.cleanTermList*). You can use this function as follows: +``` +$ python shico/server/app.py -c "shico.extras.cleanTermList" +``` + +If you are using gunicorn, in your *config.py*, you can set `cleaningFunctionStr` to the name of your cleaning function, for instance: + +``` +cleaningFunctionStr = "shico.extras.cleanTermList" +``` + +## Speeding up ShiCo + +Current implementation of ShiCo relies on gensim word2vec model `most_similar` function, which in turn requires the calculation of the dot product between two large matrices, via `numpy.dot` function. For this reason, ShiCo greatly benefits from using libraries which accelerate matrix multiplications, such as OpenBLAS. ShiCo has been tested using [Numpy with OpenBLAS](https://hunseblog.wordpress.com/2014/09/15/installing-numpy-and-openblas/), producing a significant increase in speed. diff --git a/docs/develop.md b/docs/develop.md new file mode 100644 index 0000000..d166624 --- /dev/null +++ b/docs/develop.md @@ -0,0 +1,48 @@ +# What should you do if you want to modify ShiCo? + +Be brave! And get in touch if you need help. Pull requests are very welcome. + +## Backend + +Written in Python. + +### Unit testing +If you modify ShiCo back end, make sure to write your unit tests for your code. + +To run Python unit tests, run: +``` +$ nosetests +``` + +## Web app + +Written in Javascript (Angular). + +### Adding hooks + +You can add your own custom behaviour to the force directed graphs like this: +``` +(function() { + 'use strict'; + + angular + .module('shico') + .run(runBlock); + + function runBlock(GraphConfigService) { + GraphConfigService.addForceGraphHook(function(node) { + node.select('circle').attr('r', function(d) { + return d.name.length; + }); + }); + } +})(); + +``` + +This snippet modifies the size of the force directed graph nodes, and makes them dependent on the length of the name in the node's data. + +## Making a release on GitHub + - Merge changes on branch `demo` + - Run `gulp build` + - Make github release diff --git a/docs/embeddingGraph.png b/docs/embeddingGraph.png new file mode 100644 index 0000000..e7ee688 Binary files /dev/null and b/docs/embeddingGraph.png differ diff --git a/docs/networkGraph.png b/docs/networkGraph.png new file mode 100644 index 0000000..d31d2ea Binary files /dev/null and b/docs/networkGraph.png differ diff --git a/docs/searchBar.png b/docs/searchBar.png new file mode 100644 index 0000000..6391476 Binary files /dev/null and b/docs/searchBar.png differ diff --git a/docs/streamGraph.png b/docs/streamGraph.png new file mode 100644 index 0000000..626b182 Binary files /dev/null and b/docs/streamGraph.png differ diff --git a/docs/ui.md b/docs/ui.md new file mode 100644 index 0000000..52ba1e8 --- /dev/null +++ b/docs/ui.md @@ -0,0 +1,57 @@ +# How to use ShiCo? + +This guide will instruct you in the elements for using ShiCo's user interface. + +## User interface components + +When you first open ShiCo on your browser, you will see a simple search bar: + +![Search bar](./searchBar.png) + +You can enter one or multiple (comma separated) *seed terms*. These seed terms are the entry point for your concept search. Click *Submit* to begin your search. The results from your search will be displayed in the results panel below the search bar. + +The search bar has some additional features: + - It allows you to modify the search parameters. Click the *+* button to display additional search parameters. + - It allows you to save the parameters of your current search, or load the parameters of a previous search. + +## Search parameters + +The following is the list of parameters (with a link to a brief explanation) which can be used to control your concept search: + + - [Max Terms](/webapp/src/help/maxTerms.md) + - [Max related terms](/webapp/src/help/maxRelatedTerms.md) + - [Minimum concept similarity](/webapp/src/help/minSim.md) + - [Word boost](/webapp/src/help/wordBoost.md) + - [Boost method](/webapp/src/help/boostMethod.md) + - [Algorithm](/webapp/src/help/algorithm.md) + - [Track direction](/webapp/src/help/direction.md) + - [Years in interval](/webapp/src/help/yearsInInterval.md) + - [Words per year](/webapp/src/help/wordsPerYear.md) + - [Weighing function](/webapp/src/help/weighFunc.md) + - [Function shape](/webapp/src/help/wFParam.md) + - [Do cleaning ?](/webapp/src/help/doCleaning.md) (only shown if your backend uses a cleaning function). + - [Year period](/webapp/src/help/yearPeriod.md) + +## Produced graphics + +Once a search is complete, ShiCo displays results in the results panel. Results are displayed using various graphs: + + - Stream graph -- this shows each word of the resulting vocabulary as a stream over time. The stream gets wider or narrower according to the weight the word is given in the vocabulary. + +![Stream graph](./streamGraph.png) + + - Network graphs -- this shows a collection of graphs displaying the resulting vocabulary as a network graph. Words which are related to each other are connected with an arrow. The direction of the arrow indicates which word was the product of which seed word. + +![Network graph](./networkGraph.png) + + - Space embedding -- this shows an estimate of the spatial relationship between words in the final vocabulary at every time step. Please keep in mind that these spatial relations are approximate and should be considered with care. + +![Space embedding graph](./embeddingGraph.png) + + - Plain text vocabulary -- this shows a text representation of the concept search. This consists, for each time step, of the seed words used and the produced vocabulary. + +## Saving and loading search parameters + +When you click the *Save parameters* button, a text box with your search parameters will be displayed. Copy these parameters and save them somewhere. Click *Ok* to hide the text box. + +When you click the *Load parameters* button, another text box will be displayed. Enter previously saved search parameters in this box and click *Ok* to load the parameters. diff --git a/shico/server/app.py b/shico/server/app.py index 6b1715d..02c7e43 100644 --- a/shico/server/app.py +++ b/shico/server/app.py @@ -1,7 +1,7 @@ '''ShiCo server. Usage: - server.py [-f FILES] [-n] [-d] [-p PORT] [-c FUNCTIONNAME] + app.py [-f FILES] [-n] [-d] [-p PORT] [-c FUNCTIONNAME] -f FILES Path to word2vec model files (glob format is supported) [default: word2vecModels/195[0-1]_????.w2v] @@ -48,6 +48,7 @@ def trackWord(terms): response.''' params = app.config['trackParser'].parse_args() termList = terms.split(',') + termList = [ term.strip() for term in termList ] termList = [ term.lower() for term in termList ] results, links = \ app.config['vm'].trackClouds(termList, maxTerms=params['maxTerms'],