We recommend using virtual environments, as some packages require an exact version.
If you only want to use the package do the following:
sudo apt-get install python3-pip, python3-venv, python3-dev
python3 -m venv rbenv
(create virutal environment named rbenv)source rbenv/bin/activate
(activate virtual env)pip3 uninstall setuptools && pip3 install setuptools && pip3 install --upgrade pip && pip3 install --no-cache-dir rbpy-rb
- Use it as in: https://github.com/readerbench/ReaderBench/blob/master/usage.py
If you want to contribute to the code base of package:
sudo apt-get install python3-pip, python3-venv, python3-dev
git clone git@git.readerbench.com:ReaderBench/readerbenchpy.git && cd readerbenchpy/
python3 -m venv rbenv
(create virutal environment named rbenv)source rbenv/bin/activate
(activate virtual env)pip3 uninstall setuptools && pip3 install setuptools && pip3 install --upgrade pip
pip3 install -r requirements.txt
python3 nltk_download.py
Optional: prei-install model for en (otherwise most of the English processings would fail and ask to run this command):python3 -m spacy download en_core_web_lg
If you want to install spellchecking (hunspell) also you need this non-python libraries:
sudo apt-get install libhunspell-1.6-0 libhunspell-dev hunspell-ro
pip3 install hunspell
For usage (parsing, lemmatization, NER, wordnet, content words, indices etc.) see file usage.py
from
https://github.com/readerbench/ReaderBench
You may also need some spacy models which are downloaded through spacy.
You have to download these spacy models by yourself, using the command:
python3 -m spacy download name_of_the_model
The logger will also write instructions on which models you need, and how to download them.
Our models are also available in the HuggingFace platform: https://huggingface.co/readerbench
You can use them directly from HuggingFace:
# tensorflow
from transformers import AutoModel, AutoTokenizer, TFAutoModel
tokenizer = AutoTokenizer.from_pretrained("readerbench/RoBERT-base")
model = TFAutoModel.from_pretrained("readerbench/RoBERT-base")
inputs = tokenizer("exemplu de propoziție", return_tensors="tf")
outputs = model(inputs)
# pytorch
from transformers import AutoModel, AutoTokenizer, AutoModel
tokenizer = AutoTokenizer.from_pretrained("readerbench/RoBERT-base")
model = AutoModel.from_pretrained("readerbench/RoBERT-base")
inputs = tokenizer("exemplu de propoziție", return_tensors="pt")
outputs = model(**inputs)
or from ReaderBench:
from rb.core.lang import Lang
from rb.processings.encoders.bert import BertWrapper
from tensorflow import keras
bert_wrapper = BertWrapper(Lang.RO, max_seq_len=128)
inputs, bert_layer = bert_wrapper.create_inputs_and_model()
cls_output = bert_wrapper.get_output(bert_layer, "cls") # or "pool"
# Add decision layer and compile model
# eg.
# hidden = keras.layers.Dense(..)(cls_output)
# output = keras.layers.Dense(..)(hidden)
# model = keras.Model(inputs=inputs, outputs=[output])
# model.compile(..)
bert_wrapper.load_weights() #must be called after compile
# Process inputs for model
feed_inputs = bert_wrapper.process_input(["text1", "text2", "text3"])
# feed_output = ...
# model.fit(feed_inputs, feed_output, ...)
In each file you have to initialize the logger:
from rb.utils.rblogger import Logger
logger = Logger.get_logger()
logger.info("info msg")
logger.warning("warning msg")
logger.error()
rm -r dist/
pip3 install twine wheel
./upload_to_pypi.sh
- Do the installing steps from contribution
- run
pip3 install xmltodict
- run
EXPORT PYTHONPATH=/add/path/to/repo/readerbenchpy/
- add json resources in a
jsons
directory inreaderbenchpy/rb/core/cscl/
- run
cd rb/core/cscl/ && python3 csv_parser.py
ReaderBench is able to perform conversation analysis from chats and communities. Each utterance must have the time expressed in one of the following formats:
- %Y-%m-%d %H:%M:%S.%f %Z
- %Y-%m-%d %H:%M:%S %Z
- %Y-%m-%d %H:%M %Z
- %Y-%m-%d %H:%M:%S.%f
- %Y-%m-%d %H:%M:%S
- %Y-%m-%d %H:%M where codifications are extracted from Python date format codes.