v0.3.0
What's Changed
- add multilabel f1 eval usage by @fcakyon in #221
- Force get_supported_tasks() to return a list instead of dict keys by @mathemakitten in #227
- Unpin rouge_score by @albertvillanova in #220
- Remove import statement in Measurement Card by @meg-huggingface in #231
- make rouge support multi-ref by @lvwerra in #229
- Fix enforce string by @lvwerra in #230
- Fix examples in perplexity measurement docs by @mathemakitten in #238
- Add Wilcoxon's signed rank test by @douwekiela in #237
- Add support for two input columns for TextClassificationEvaluator by @fxmarty in #205
- fix bug in TEMPLATE_REQUIRE: add comma by @BramVanroy in #248
- Minor quicktour doc suggestions by @stevhliu in #236
- Clarify error message for ChrF no. references by @BramVanroy in #247
- only track unique missing dependencies by @BramVanroy in #246
- Update evaluate in spaces by @lvwerra in #228
- add
commit_hash
to args by @lvwerra in #253 - Change perplexity to be calculated with base e by @mathemakitten in #242
- Rebase for previous PR by @mathemakitten in #254
- Fix docstrings with new perplexities with base e by @mathemakitten in #255
- add a tokenizer option to rouge by @lvwerra in #258
- Adding list_duplicates=True to example. by @meg-huggingface in #263
- Minor change in describing what this does. by @meg-huggingface in #267
- Mapping example output to returned output. by @meg-huggingface in #268
- Changes "duplicates_list" to "duplicates_dict" (since it's dict) by @meg-huggingface in #265
- Changes "duplicates_list" to "duplicates_dict" in the example. by @meg-huggingface in #264
- Add slow flag to two column parity test by @lvwerra in #273
- Remove
handle_impossible_answer
from the defaultPIPELINE_KWARGS
in the question answering evaluator by @fxmarty in #272 - Toxicity Measurement by @sashavor in #262
- Automatically choose dataset split if none provided by @mathemakitten in #232
- Fix YAML in Toxicity by @lvwerra in #278
- Added metric Brier Score by @kadirnar in #275
- Check for mismatch in device setup in evaluator by @mathemakitten in #287
- Fix transfomers import in the evaluator by @mathemakitten in #291
- Add support for name field when loading data by @mathemakitten in #283
- Adding regard measurement by @sashavor in #271
- Raise exception instead of assert in BertScore by @BramVanroy in #292
- fix regard yaml by @lvwerra in #295
- Add CONTRIBUTING.md by @mathemakitten in #293
- Refactor kwargs and configs by @lvwerra in #188
- Revert "Refactor kwargs and configs" by @lvwerra in #299
- Add missing
split
andsubset
kwarg into other evaluators by @mathemakitten in #301 - Adding HONEST score by @sashavor in #279
- fix wrong sorting in check by @sanderland in #305
- Fix HONEST yaml by @lvwerra in #303
- Refactor current_features to selected_feature_format by @mathemakitten in #306
- replace datasets list with local list of tasks by @lvwerra in #309
- Adding torch to the requirements by @sashavor in #311
- Honest space fix by @sashavor in #312
- Use HTML relative paths for tiles by @lewtun in #318
- Test for valid YAML files by @mathemakitten in #308
- add versioning the
HubEvaluationModuleFactory
by @lvwerra in #314 - Add text2text evaluator by @lvwerra in #261
- try main if tag does not work by @lvwerra in #322
New Contributors
- @fcakyon made their first contribution in #221
- @meg-huggingface made their first contribution in #231
- @stevhliu made their first contribution in #236
- @kadirnar made their first contribution in #275
- @sanderland made their first contribution in #305
Full Changelog: v0.2.2...v0.3.0