Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for NoiseBench #3512

Merged
merged 14 commits into from
Dec 19, 2024
Prev Previous commit
Next Next commit
raise ValueError and list supported noise types in the message
  • Loading branch information
elenamer committed Dec 13, 2024
commit 1893d65c067857d375b99eeba0d2ce134f5c9b1b
7 changes: 5 additions & 2 deletions flair/datasets/sequence_labeling.py
Original file line number Diff line number Diff line change
@@ -5251,9 +5251,12 @@ def __init__(
in_memory (bool): If True the dataset is kept in memory achieving speedups in training.
**corpusargs: The arguments propagated to :meth:'flair.datasets.ColumnCorpus.__init__'.
"""
if noise not in ["clean", "crowd", "crowdbest", "expert", "distant", "weak", "llm"]:
raise Exception("Please choose a valid version")

VALUE_NOISE_VALUES = ["clean", "crowd", "crowdbest", "expert", "distant", "weak", "llm"]

if noise not in VALUE_NOISE_VALUES:
raise ValueError(f"Unsupported value for noise type argument. Got {noise}, expected one of {VALUE_NOISE_VALUES}!")

self._set_path(base_path)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would remove this function and just set the base_path here in __init__:

if base_path:
    self.base_path = Path(base_path)
else:
    self.base_path = flair.cache_root / "datasets" / "noisebench"


filename = "clean" if noise == "clean" else f"noise_{noise}"