GitHub - aitor-garcia-p/bert_simple_example: Simple educational example of BERT usage for text classification and similarity/clustering

Example of BERT for Classification, Similarity and Clustering

This is a very basic (toy) code, only for educational purposes, that makes use of BERT for:

Training a basic document classifier
Using the trained model to classify new documents
Using BERT for document-similarity ranking (e.g. semantic search)
Using BERT to encode documents and group them using some clustering algorithm

The code is mostly complete and working. The training contains all the minimal parts that are required (data-loading, model update, evaluation, model saving, etc.). However, there are a lot of other details that are not covered here, because despite being useful, they are not required for a simple model to be trained (learning-rate scheduling, early-stopping, mixed-precision training, distributed training, model logging/reporting, etc.).

Used data

The data used for this little example has been borrowed from: https://github.com/mhjabreel/CharCnn_Keras/tree/master/data/ag_news_csv

Only a small subset of label-title pairs has been used. Even with this toy subset of data the resulting classifier works surprisingly well, showing the power of BERT and Transformers in general.

How to use this code

You need Python3.7+. It is recommended to create a fresh Python virtual environment before installing the dependencies. You also need to install Pytorch 1.7+ (with CUDA support if you have a CUDA capable device and you plan to use it). It is advisable to use a CUDA device for training, because the process works 10-15x faster. However, since the data is small, you can also use CPU if you don't mind waiting several minutes for completion.

Then install the dependencies listed in requirements.txt

pip install -r requirements.txt

You should be ready to go. Have a look at the "main" sections of each Python file, and if necessary, adjust the paths to your system (there are a few hard-coded paths).

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
bert_examples		bert_examples
example_data		example_data
.gitignore		.gitignore
LICENSE.txt		LICENSE.txt
README.md		README.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Example of BERT for Classification, Similarity and Clustering

Used data

How to use this code

About

Releases

Packages

Languages

License

aitor-garcia-p/bert_simple_example

Folders and files

Latest commit

History

Repository files navigation

Example of BERT for Classification, Similarity and Clustering

Used data

How to use this code

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages