Out_of_domain_ABSA

The repo code is based on SpanEmo.

This repo will do "out of domain" aspect based sentiment analysis. It means that you will train a ABSA(Aspect bsaed sentiment analysis) model and use another dataset which is not relevant dataset, get the aspect and sentiment.

You can run the notebook

At first, you need to create a aspect detection model. I just use the SpanEmo, this SpanEmo will get all of the aspects of sentence. Then you need to train the SpanEmo and build a model to output the aspect_polarity.

First of all, let's build the sentiment detection model. You need to load your data in "data" folder. You can see the data format with my preprocessing data in the 'data' folder.

Then you can train the model like this.

!python scripts/train.py    --train-path {"data/sentihood-train.tsv"}\
                            --dev-path {"data/sentihood-dev.tsv"} \
                            --bert-type {"base-bert"}\
                            --max-length 128 \
                            --output-dropout 0.1 \
                            --seed 0 \
                            --train-batch-size 32 \
                            --eval-batch-size 32 \
                            --max-epoch 20 \
                            --ffn-lr 0.001 \
                            --bert-lr 2e-5

the detail of parameters are as follows:

"""
Usage:
    main.py [options]

Options:
    -h --help                         show this screen
    --max-length=<int>                text length [default: 128]
    --output-dropout=<float>          prob of dropout applied to the output layer [default: 0.1]
    --seed=<int>                      fixed random seed number [default: 42]
    --train-batch-size=<int>          batch size [default: 32]
    --eval-batch-size=<int>           batch size [default: 32]
    --max-epoch=<int>                 max epoch [default: 20]
    --ffn-lr=<float>                  ffn learning rate [default: 0.001]
    --bert-lr=<float>                 bert learning rate [default: 2e-5]
    --bert-type=<str>                 language choice [default: base-bert]
    --dev-path=<str>                  file path of the dev set [default: '']
    --train-path=<str>                file path of the train set [default: '']
"""

After having already trained the model. We can validation it with our test dataset. Like this

!python scripts/test.py --test-path {'data/ABSA_15_Restaurants_Test.tsv'} \
                        --model-path {"2021-11-12-16:39:24_checkpoint.pt"} \
                        --max-length 160 \
                        --bert-type {"base-bert"}

the detailed of parameters are as follows:

"""
Usage:
    main.py [options]

Options:
    -h --help                         show this screen
    --models-path=<str>                path of the trained models
    --max-length=<int>                text length [default: 128]
    --seed=<int>                      seed [default: 0]
    --test-batch-size=<int>           batch size [default: 32]
    --bert-type=<str>                      language choice [default: base-bert]
    --test-path=<str>                 file path of the test set [default: ]
"""

Next, you need to train SpanEmo.

%cd SpanEmo
!python scripts/train.py    --train-path {"SemEval16/SemEval2016SpanEmoTrain.csv"}\
                            --dev-path {"SemEval16/SemEval2016SpanEmoValidation.csv"} \
                            --loss-type {'cross-entropy'} \
                            --max-length 128 \
                            --output-dropout 0.1 \
                            --seed 42 \
                            --train-batch-size 32 \
                            --eval-batch-size 32 \
                            --max-epoch 20 \
                            --ffn-lr 0.001 \
                            --bert-lr 2e-5 \
                            --lang {"English"} \
                            --alpha-loss 0.2 \
                            --bert-type {'BERT'}

After training, you will get the model checkpoint, you need to load it when validation the model.

!python scripts/test.py --test-path {'SemEval16/SevEval2016SpanEmo.csv'} \
                        --model-path {"/content/Out_of_domain_ABSA/SpanEmo/models/2021-11-28-08:09:30_checkpoint.pt"} \
                        --bert-type {"BERT"}

Then, you have got the aspects of sentence in predict.csv. You need to run this code, to get the right format of data that before feeding sentiment detection.

%cd scripts
!python data_integration.py
!python data_preprocess.py
!python create_all_aspect.py
%cd ../

Lastly, you can get the output of aspect and sentiment of sentences and validate the output.

%cd /content/Out_of_domain_ABSA
!python scripts/predict_test.py --models-path {"2021-11-28-07:28:04_checkpoint.pt"} \
                        --max-length 160 \
                        --bert-type {"base-bert"} \
                        --real-test-path {"data/semEval2016.tsv"} \
                        --fake-test-path {'data/fakerSemEval16.tsv'}

We have two baseline: 1. train and test sentimentPredictor with no aspect. 2. train and test sentimentPredictor with all aspect.

train and test sentimentPredictor with no aspect The input will be [CLS] + "what do you think of " + "NULL" + " of " + target + [SEP] + sentence + [SEP].
train and test sentimentPredictor with all aspect The input will be [CLS] + "what do you think of " + "all aspect"(like "food, restaurant, and drinks" ) + " of " + target + [SEP] + sentnece + [SEP].

Here is the result(please ingore the last table):

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Out_of_domain_ABSA

About

Releases

Packages

Contributors 2

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 46 Commits
SemEval		SemEval
SpanEmo		SpanEmo
configs		configs
data		data
models		models
scripts		scripts
sentihood		sentihood
README.md		README.md

tengwang0318/Out_of_domain_ABSA

Folders and files

Latest commit

History

Repository files navigation

Out_of_domain_ABSA

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages