Releases · Tiiiger/bert_score · GitHub

10 May 22:23

felixgwu

Version 0.3.3

Fixing the bug with empty strings issue #47.
Supporting 6 ELECTRA models and 24 smaller BERT models.
A new Google sheet for keeping the performance (i.e., pearson correlation with human judgment) of different models on WMT16 to-English.
Including the script for tuning the best number of layers of an English pre-trained model on WMT16 to-English data (See the details).

Assets 2

18 Apr 17:04

felixgwu

Version 0.3.2

Bug fixed: fixing the bug in v0.3.1 when having multiple reference sentences.
Supporting multiple reference sentences with our command-line tool

Assets 2

18 Apr 17:09

felixgwu

Version 0.3.1

A new BERTScorer object that caches the model to avoid re-loading it multiple times. Please see our jupyter notebook example for the usage.
Supporting multiple reference sentences for each example. The score function now can take a list of lists of strings as the references and return the score between the candidate sentence and its closest reference sentence.

Assets 2

18 Apr 17:11

felixgwu

Version 0.3.0

Supporting Baseline Rescaling: we apply a simple linear transformation to enhance the readability of BERTscore using pre-computed "baselines". It has been pointed out (e.g. by #20, #23) that the numerical range of BERTScore is exceedingly small when computed with RoBERTa models. In other words, although BERTScore correctly distinguishes examples through ranking, the numerical scores of good and bad examples are very similar. We detail our approach in a separate post.

Assets 2

18 Apr 17:17

felixgwu

Version 0.2.3

Supporting DistilBERT (Sanh et al.), ALBERT (Lan et al.), and XLM-R (Conneau et al.) models.
Including the version of huggingface's transformers in the hash code for reproducibility

Assets 2

18 Apr 17:18

felixgwu

Version 0.2.2

Bug fixed: when using RoBERTaTokenizer, we now set add_prefix_space=True which was the default setting in huggingface's pytorch_transformers (when we ran the experiments in the paper) before they migrated it to transformers. This breaking change in transformers leads to a lower correlation with human evaluation. To reproduce our RoBERTa results in the paper, please use version 0.2.2.
The best number of layers for DistilRoBERTa is included
Supporting loading a custom model

Assets 2

18 Apr 17:19

felixgwu

Version 0.2.1

SciBERT (Beltagy et al.) models are now included. Thanks to AI2 for sharing the models. By default, we use the 9th layer (the same as BERT-base), but this is not tuned.

Assets 2

18 Apr 17:20

felixgwu

Version 0.2.0

Supporting BERT, XLM, XLNet, and RoBERTa models using huggingface's Transformers library
Automatically picking the best model for a given language
Automatically picking the layer based on a model
IDF is not set as default as we show in the new version that the improvement brought by importance weighting is not consistent

Assets 2