Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

docs: document schema for datasets in tasks #250

Closed
MartinBernstorff opened this issue Mar 15, 2024 · 3 comments · Fixed by #255
Closed

docs: document schema for datasets in tasks #250

MartinBernstorff opened this issue Mar 15, 2024 · 3 comments · Fixed by #255
Assignees

Comments

@MartinBernstorff
Copy link
Contributor

MartinBernstorff commented Mar 15, 2024

When I added #247, it was non-obvious for me which columns a dataset should contain.

I propose documenting this at the AbsTask level, e.g. for classification something like:

class AbsTaskClassification(AbsTask):
    """
    Abstract class for kNN classification tasks
    The similarity is computed between pairs and the results are ranked. 
    
    Dataset must be a huggingface dataset split into train/test, and contain the following columns:
        text: str
        label: int
    """
@imenelydiaker
Copy link
Contributor

imenelydiaker commented Mar 15, 2024

I completely agree with you! We ran into the same issue when creating the benchmark for French. We should add a docstring like this one for each task type maybe 🤔

@MartinBernstorff
Copy link
Contributor Author

MartinBernstorff commented Mar 15, 2024

I'd gladly take on part of this btw 👍

@KennethEnevoldsen
Copy link
Contributor

KennethEnevoldsen commented Mar 17, 2024

Perfect @MartinBernstorff will assign it to you. Feel free to add me as the reviewer

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants