Sokrates

About

'Sokrates' is an ML-powered assistant to help write better questions!

Authors

Esteban Lopez: esteban@factored.ai
David Stiles: david@factored.ai

Repository Contents

The app directory contains all code necessary to run the HTTP service. Within it, the app_core package handles all core logic such as model inference, while the api package deals with handling HTTP requests for model inference. They communicate through the app_core.handlers module.
Most code is contained in the app/app_core package.
The app_core.data_processing package contains data extraction and preprocessing functionalities. Within it:
- The text_extract package contains classes used to extract features from text. They should all follow the Extractor interface.
- The XMLparser module contains functionality to parse the StackExchange .xml files and convert them to dataframes.
- The make_dataset_csv module uses text_extract and XMLparser to process the StackExchange .xml files and export them as csv.
The app_core.ml_models package contains managers (wrappers) to handle the ML models themselves.
The basic_nlp_model package contains code to quickly build and test neural network models on the dataset.
The notebooks directory contains several Jupyter notebooks with data exploration and model testing.

Instructions and Usage

Setup

In order to run the project you must first ensure that the required packages are installed, which you can do with:

pip install -r requirements.txt

You must also install the nltk dependencies. To do this, run the following in a python session:

import nltk
nltk.download("punkt")  # Punkt for tokenizing

Building the dataset

To build the dataset, first you must have downloaded and decompressed the data files from the stack exchange data dump. After this you will have a collection of directories (one per topic) containing .xml files. If mydir is the directory that contains these folders and outdir is the directory where you want to store the output csvs, you can generate them with:

python -m data_processing mydir outdir

You can also add an optional third argument (True or False) to force the re-processing of existing csvs.

Running Simple Baseline

To run the first simple baseline of the model run:

python -m ml_models

This will then prompt you for the title of your question and the body, which could be a path to a file where the question is stored as rendered HTML.

Running the Server

Run with Docker

To start the HTTP server with docker, do the following:

First, install Docker.
Second, prepare your .env file. It must follow this template:

AWS_ACCESS_KEY_ID=
AWS_SECRET_ACCESS_KEY=
BUCKET_NAME=
MODEL_PATH=
ENV=development

Note that you must have access to the S3 bucket where we are storing our models! For production deployment, the ENV variable must be set to production and the AWS_ACCESS_KEY_ID and AWS_SECRET_ACCESS_KEY variables SHOULD NOT BE SET. The credentials should be handled via an AWS IAM role!
Third, navigate to the app directory and build the docker image with:

docker build -t sokrates:<version> .

Finally, run the container with

docker run -p 3000:3000 --env-file <path-to-.env-file> -d sokrates:<version>

This may take a minute or two to initialize while it downloads the model.

Run with Docker Compose

If you have installed docker-compose and you prefer one-liners, you can also start the server by running

docker-compose up

You can add the --build flag to update the image.

Note on Docker Startup Time

If you want a faster startup time LOCALLY, you can persist the downloaded model from the container in a Docker volume or bind mount.

Back to top

Name		Name	Last commit message	Last commit date
Latest commit History 88 Commits
Transfer Learning		Transfer Learning
app		app
basic_nlp_model		basic_nlp_model
ensambles baseline		ensambles baseline
nlp_model_v1		nlp_model_v1
notebooks		notebooks
.gitignore		.gitignore
README.md		README.md
docker-compose.yml		docker-compose.yml
requirements.txt		requirements.txt
static page.html		static page.html

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Sokrates

Contents

About

Authors

Repository Contents

Instructions and Usage

Setup

Building the dataset

Running Simple Baseline

Running the Server

Run with Docker

Run with Docker Compose

Note on Docker Startup Time

About

Releases

Packages

Contributors 3

Languages

factoredai/sokrates

Folders and files

Latest commit

History

Repository files navigation

Sokrates

Contents

About

Authors

Repository Contents

Instructions and Usage

Setup

Building the dataset

Running Simple Baseline

Running the Server

Run with Docker

Run with Docker Compose

Note on Docker Startup Time

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 3

Languages

Packages