Release v0.3.0 · huggingface/evaluate

What's Changed

add multilabel f1 eval usage by @fcakyon in #221
Force get_supported_tasks() to return a list instead of dict keys by @mathemakitten in #227
Unpin rouge_score by @albertvillanova in #220
Remove import statement in Measurement Card by @meg-huggingface in #231
make rouge support multi-ref by @lvwerra in #229
Fix enforce string by @lvwerra in #230
Fix examples in perplexity measurement docs by @mathemakitten in #238
Add Wilcoxon's signed rank test by @douwekiela in #237
Add support for two input columns for TextClassificationEvaluator by @fxmarty in #205
fix bug in TEMPLATE_REQUIRE: add comma by @BramVanroy in #248
Minor quicktour doc suggestions by @stevhliu in #236
Clarify error message for ChrF no. references by @BramVanroy in #247
only track unique missing dependencies by @BramVanroy in #246
Update evaluate in spaces by @lvwerra in #228
add commit_hash to args by @lvwerra in #253
Change perplexity to be calculated with base e by @mathemakitten in #242
Rebase for previous PR by @mathemakitten in #254
Fix docstrings with new perplexities with base e by @mathemakitten in #255
add a tokenizer option to rouge by @lvwerra in #258
Adding list_duplicates=True to example. by @meg-huggingface in #263
Minor change in describing what this does. by @meg-huggingface in #267
Mapping example output to returned output. by @meg-huggingface in #268
Changes "duplicates_list" to "duplicates_dict" (since it's dict) by @meg-huggingface in #265
Changes "duplicates_list" to "duplicates_dict" in the example. by @meg-huggingface in #264
Add slow flag to two column parity test by @lvwerra in #273
Remove handle_impossible_answer from the default PIPELINE_KWARGS in the question answering evaluator by @fxmarty in #272
Toxicity Measurement by @sashavor in #262
Automatically choose dataset split if none provided by @mathemakitten in #232
Fix YAML in Toxicity by @lvwerra in #278
Added metric Brier Score by @kadirnar in #275
Check for mismatch in device setup in evaluator by @mathemakitten in #287
Fix transfomers import in the evaluator by @mathemakitten in #291
Add support for name field when loading data by @mathemakitten in #283
Adding regard measurement by @sashavor in #271
Raise exception instead of assert in BertScore by @BramVanroy in #292
fix regard yaml by @lvwerra in #295
Add CONTRIBUTING.md by @mathemakitten in #293
Refactor kwargs and configs by @lvwerra in #188
Revert "Refactor kwargs and configs" by @lvwerra in #299
Add missing split and subset kwarg into other evaluators by @mathemakitten in #301
Adding HONEST score by @sashavor in #279
fix wrong sorting in check by @sanderland in #305
Fix HONEST yaml by @lvwerra in #303
Refactor current_features to selected_feature_format by @mathemakitten in #306
replace datasets list with local list of tasks by @lvwerra in #309
Adding torch to the requirements by @sashavor in #311
Honest space fix by @sashavor in #312
Use HTML relative paths for tiles by @lewtun in #318
Test for valid YAML files by @mathemakitten in #308
add versioning the HubEvaluationModuleFactory by @lvwerra in #314
Add text2text evaluator by @lvwerra in #261
try main if tag does not work by @lvwerra in #322

New Contributors

@fcakyon made their first contribution in #221
@meg-huggingface made their first contribution in #231
@stevhliu made their first contribution in #236
@kadirnar made their first contribution in #275
@sanderland made their first contribution in #305

Full Changelog: v0.2.2...v0.3.0

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

v0.3.0

What's Changed

New Contributors

Contributors