v2.3.0

Latest

Latest

github-actions released this 17 Dec 16:10

· 1 commit to main since this release

573f9e4

2.3.0 - 2024-12-17

Features

Release new multilabel biomedBERT model trained on LLM (Gemini) synthetically generated NER data. The model was trained on over 7000 LLM annoted documents with a total of 295822 samples.
The model was trained for 21 epochs and achieved an F1 score of 95.6% on a held out test set. (multilabel_bert)
added multilabel NER training example and config.
added scaling kazu with Ray docs and example.

Bugfixes

Fix issue with TransformersModelForTokenClassificationNerStep when processing large amounts of documents. The fix offloads tensors onto cpu before performin the torch.cat operation which lead to a zero tensor before. (pytorch_memory_issue)

Assets 5