Fix PawsX eval splits #316

imenelydiaker · 2024-04-04T12:49:45Z

Fix for issue #309. Eval splits didn't have the correct name.

KennethEnevoldsen

Can you run one of the smaller models on this just to check that it works without issue?

MartinBernstorff · 2024-04-04T12:58:17Z

Looks good! Agree with Kenneth.

As a meta-question, maybe it's possible to check if the splits exist in the dataset without downloading it? Might be worth it if we get more errors like this.

imenelydiaker · 2024-04-04T13:02:41Z

Looks good! Agree with Kenneth.

As a meta-question, maybe it's possible to check if the splits exist in the dataset without downloading it? Might be worth it if we get more errors like this.

Yep was thinking about adding it to the test that checks if a dataset + revision id exists

KennethEnevoldsen · 2024-04-04T13:10:47Z

As a meta-question, maybe it's possible to check if the splits exist in the dataset without downloading it? Might be worth it if we get more errors like this.

As I understand there is no API for this (at least documented), but you could probably do something like:

>>> ds = load_dataset("paws-x", "en", revision="8a04d940a42cd40658986fdd8e3da561533a3646", streaming=True)
>>> ds.keys()
dict_keys(['train', 'test', 'validation'])

It only download metadata I believe:

Downloading builder script: 100%|██████████| 6.54k/6.54k [00:00<00:00, 9.68MB/s]
Downloading metadata: 100%|████████████████| 17.6k/17.6k [00:00<00:00, 9.76MB/s]
Downloading readme: 100%|██████████████████| 11.8k/11.8k [00:00<00:00, 7.18MB/s]

Might be a way to generalize the existing tests.

KennethEnevoldsen · 2024-04-04T13:12:08Z

Yep was thinking about adding it to the test that check if a dataset + revision id exists

@imenelydiaker this test already exists (based on a recent pr)

imenelydiaker · 2024-04-04T13:17:48Z

Can you run one of the smaller models on this just to check that it works without issue?

Yep I used average_word_embeddings_komninos with mteb CLI and it worked 🙂

imenelydiaker · 2024-04-04T13:26:32Z

As a meta-question, maybe it's possible to check if the splits exist in the dataset without downloading it? Might be worth it if we get more errors like this.

As I understand there is no API for this (at least documented), but you could probably do something like:
>>> ds = load_dataset("paws-x", "en", revision="8a04d940a42cd40658986fdd8e3da561533a3646", streaming=True)
>>> ds.keys()
dict_keys(['train', 'test', 'validation'])
It only download metadata I believe:
Downloading builder script: 100%|██████████| 6.54k/6.54k [00:00<00:00, 9.68MB/s]
Downloading metadata: 100%|████████████████| 17.6k/17.6k [00:00<00:00, 9.76MB/s]
Downloading readme: 100%|██████████████████| 11.8k/11.8k [00:00<00:00, 7.18MB/s]
Might be a way to generalize the existing tests.

So you just need to download the dataset in streaming mode to not store it locally and check the keys of the dataset object? Looks like a good alternative 🤔

imenelydiaker · 2024-04-04T14:06:52Z

Should I merge the fix and we'll open another PR for the test?

MartinBernstorff · 2024-04-05T10:07:31Z

Yeah, I'll give it a stab today 👍

Fix PawsX eval splits

d875793

imenelydiaker linked an issue Apr 4, 2024 that may be closed by this pull request

KeyError: 'test.full' #309

Closed

imenelydiaker requested review from MartinBernstorff and KennethEnevoldsen April 4, 2024 12:49

KennethEnevoldsen approved these changes Apr 4, 2024

View reviewed changes

MartinBernstorff approved these changes Apr 4, 2024

View reviewed changes

imenelydiaker merged commit 15285d4 into main Apr 5, 2024
5 checks passed

imenelydiaker deleted the fix/309-keyerror-testfull branch April 5, 2024 08:20

MartinBernstorff pushed a commit that referenced this pull request Apr 10, 2024

Fix PawsX eval splits (#316)

f180cc8

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix PawsX eval splits #316

Fix PawsX eval splits #316

imenelydiaker commented Apr 4, 2024 •

edited

Loading

KennethEnevoldsen left a comment

MartinBernstorff commented Apr 4, 2024

imenelydiaker commented Apr 4, 2024 •

edited

Loading

KennethEnevoldsen commented Apr 4, 2024

KennethEnevoldsen commented Apr 4, 2024

imenelydiaker commented Apr 4, 2024

imenelydiaker commented Apr 4, 2024 •

edited

Loading

imenelydiaker commented Apr 4, 2024

MartinBernstorff commented Apr 5, 2024

Fix PawsX eval splits #316

Fix PawsX eval splits #316

Conversation

imenelydiaker commented Apr 4, 2024 • edited Loading

KennethEnevoldsen left a comment

Choose a reason for hiding this comment

MartinBernstorff commented Apr 4, 2024

imenelydiaker commented Apr 4, 2024 • edited Loading

KennethEnevoldsen commented Apr 4, 2024

KennethEnevoldsen commented Apr 4, 2024

imenelydiaker commented Apr 4, 2024

imenelydiaker commented Apr 4, 2024 • edited Loading

imenelydiaker commented Apr 4, 2024

MartinBernstorff commented Apr 5, 2024

imenelydiaker commented Apr 4, 2024 •

edited

Loading

imenelydiaker commented Apr 4, 2024 •

edited

Loading

imenelydiaker commented Apr 4, 2024 •

edited

Loading