Skip to content

Latest commit

 

History

History
95 lines (63 loc) · 2.21 KB

INSTALLING.md

File metadata and controls

95 lines (63 loc) · 2.21 KB

Requirements for Computing Dataset

minimum ~40 GB RAM, has never been computed with less than 49 GB

Installing Benepar

pip install -r requirements.txt

Python3.7

Python 3.8 is only compatable with tensorflow 2.X, and the benepar tree parser we use is a tensorflow 1.X series model. So we must additionally install python3 using the deadsnakes ppa.

requires virtualenv as well

virtualenv --python=/usr/bin/python3.7 ./virtualenv
source ./virtualenv/bin/activate  # sh, bash, or zsh
python3.7 -m pip install --upgrade pip
pip list  
deactivate 

python --version should show Python3.7.X and likewise pip --version should by under python3. You should also see the (virtualenv) before your usual termanl prompt.

Now run pip install -r requirements.txt and follow that with pip install -r requrements_2.txt. These are separate to avoid a dependency cycle experienced in testing.

python3.7

import nltk
nltk.download('punkt')
import benepar
benepar.download('benepar_en')

mkdir logs

Making Dataset

Make sure you have the Yelp dataset, specifically yelp_academic_dataset_review.json in Data, then run Python3.7 Data/Run_all.py from orpheus.

Running

Expects excel file with one sheet and two columns: author_id and review_contents

type make features to generate features

Troubleshooting

Problem:

 File "feature_extraction/sample_parser.py", line 47, in <module>
    nlp = spacy.load('en')

    ...

    FileNotFoundError: [Errno 2] No such file or directory: '... Code/orpheus/virtualenv/lib/python3.7/site-packages/spacy/data/en/__init__.py'

Solution:

virtualenv/bin/python3.7 -m spacy link en_core_web_sm en --force

More information at explosion/spaCy#2406

If you run, you might get the error

benepar/chart_decoder.c:17:10: fatal error: Python.h: No such file or directory
       17 | #include "Python.h"
          |          ^~~~~~
    compilation terminated.
    error: command 'x86_64-linux-gnu-gcc' failed with exit status 1

To fix this,

sudo apt-get install python3.7-dev # for python3.x installs