Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[WIP] OpenMIC2018 #544

Merged
merged 16 commits into from
Sep 21, 2022
Merged

[WIP] OpenMIC2018 #544

merged 16 commits into from
Sep 21, 2022

Conversation

bmcfee
Copy link
Contributor

@bmcfee bmcfee commented Nov 9, 2021

This PR adds a dataset class for openmic2018. Fixes #253

The basic functionality is there, but there are a few things left to sort out:

  1. How should we handle the individual (anonymized, disaggregated) annotations?
  2. JAMS conversion is not done yet. This is currently blocked by Issues with tag_to_jams #541
  3. More tests (?) -- the basics are here, but I wanted to get feedback on what's here so far before developing further.

@bmcfee
Copy link
Contributor Author

bmcfee commented Nov 9, 2021

I think I don't understand why the loader tests are failing / what the expected behavior is here:

def test_track_placeholder_case():
data_home_dir = "not/a/real/path"
for dataset_name in DATASETS:
print(dataset_name)
data_home = os.path.join(data_home_dir, dataset_name)
dataset = mirdata.initialize(dataset_name, data_home, version="test")
if not dataset._track_class:
continue
if dataset_name in CUSTOM_TEST_TRACKS:
trackid = CUSTOM_TEST_TRACKS[dataset_name]
else:
trackid = dataset.track_ids[0]
try:
track_test = dataset.track(trackid)
except:
assert False, "{}: {}".format(dataset_name, sys.exc_info()[0])
track_data = get_attributes_and_properties(track_test)
for attr in track_data["attributes"]:
ret = getattr(track_test, attr)
for prop in track_data["properties"]:
with pytest.raises(Exception):
ret = getattr(track_test, prop)
for cprop in track_data["cached_properties"]:
with pytest.raises(Exception):
ret = getattr(track_test, cprop)

(Incidentally, this would be a bit cleaner using a fixture to iterate over the datasets, rather than sticking them all in a loop inside the test. You could cut down on prints this way and isolate failures.)

@bmcfee
Copy link
Contributor Author

bmcfee commented Nov 10, 2021

Still not making any headway on the loader tests. The placeholder test fails because it's trying to load metadata from a path that doesn't exist:

   @core.cached_property
    def _metadata(self):
        metadata_path = Path(self.data_home) / "openmic-2018-metadata.csv"
    
        try:
            with open(metadata_path, "r") as fdesc:
                # index column is second to last
                metadata = pd.read_csv(fdesc, index_col=-2)
        except FileNotFoundError as exc:
>           raise FileNotFoundError(
                f"Metadata file {metadata_path} not found. " "Did you run .download?"
            ) from exc
E           FileNotFoundError: Metadata file not/a/real/path/openmic2018/openmic-2018-metadata.csv not found. Did you run .download?

This seems to me like exactly what should happen? Clearly I'm missing something here: why is this expected to work?

@bmcfee
Copy link
Contributor Author

bmcfee commented Nov 24, 2021

Bumping this one -- lil help?

@genisplaja
Copy link
Collaborator

genisplaja commented Nov 25, 2021

Hello @bmcfee! The problem is in mirdata/datasets/openmic2018.py, lines 159-160:

# -- set the split
self.split = self._track_metadata.get("split") 

The problem is in loading the metadata file in the Track class init. The issue could be fixed by converting this self.split attribute into a track property just as artist or title in your loader are defined. Hope this helps!

@bmcfee
Copy link
Contributor Author

bmcfee commented Nov 25, 2021

Thanks @genisplaja, that's easy enough to fix. I still don't understand what this test is trying to accomplish though: it seems like it ought to be an intentional failure.

@codecov
Copy link

codecov bot commented Nov 25, 2021

Codecov Report

Merging #544 (c125594) into master (c4759b4) will increase coverage by 0.02%.
The diff coverage is 97.91%.

@@            Coverage Diff             @@
##           master     #544      +/-   ##
==========================================
+ Coverage   96.59%   96.61%   +0.02%     
==========================================
  Files          48       49       +1     
  Lines        5908     6004      +96     
==========================================
+ Hits         5707     5801      +94     
- Misses        201      203       +2     

@bmcfee
Copy link
Contributor Author

bmcfee commented Apr 17, 2022

Might be a good time to revive this one.. is there anything that needs doing from my side here?

@bmcfee
Copy link
Contributor Author

bmcfee commented Jun 8, 2022

Attempting to bump this one again

@bmcfee
Copy link
Contributor Author

bmcfee commented Jul 7, 2022

image

@nkundiushuti nkundiushuti self-requested a review July 18, 2022 12:30
Copy link
Collaborator

@nkundiushuti nkundiushuti left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sorry for the delay, Brian. this looks good. I have just added some small questions regarding the way you construct the paths.

mirdata/datasets/openmic2018.py Outdated Show resolved Hide resolved
mirdata/datasets/openmic2018.py Outdated Show resolved Hide resolved
mirdata/datasets/openmic2018.py Show resolved Hide resolved
mirdata/datasets/openmic2018.py Outdated Show resolved Hide resolved
mirdata/datasets/openmic2018.py Show resolved Hide resolved
mirdata/datasets/openmic2018.py Show resolved Hide resolved
mirdata/datasets/openmic2018.py Show resolved Hide resolved
mirdata/datasets/openmic2018.py Show resolved Hide resolved
scripts/make_openmic2018_index.py Show resolved Hide resolved
setup.py Show resolved Hide resolved
@bmcfee
Copy link
Contributor Author

bmcfee commented Jul 18, 2022

CI failures here seem to be unrelated to this PR. I've fetched upstream and rebased, so not sure what's going on here (unless master is broken).

@nkundiushuti
Copy link
Collaborator

there is an error in one of the existing tests in the master branch indeed. I will look into it this week.

@nkundiushuti
Copy link
Collaborator

I have fixed the problem on the main branch. so you can update this PR with the master

@nkundiushuti
Copy link
Collaborator

if no one objects, I propose merging this into main branch. pandas is already installed as a dependency of jams

@nkundiushuti nkundiushuti merged commit 8db5795 into mir-dataset-loaders:master Sep 21, 2022
@bmcfee
Copy link
Contributor Author

bmcfee commented Sep 21, 2022

Thanks for merging this - I do think there are still some unresolved issues to clear up as noted in my original post:

  1. How should we handle the individual (anonymized, disaggregated) annotations?
  2. JAMS conversion is not done yet. This is currently blocked by Issues with tag_to_jams #541
  3. More tests (?) -- the basics are here, but I wanted to get feedback on what's here so far before developing further.

@nkundiushuti
Copy link
Collaborator

hi Brian!
could you add 1 and 3 as issues so we keep track of them?

@bmcfee
Copy link
Contributor Author

bmcfee commented Sep 28, 2022

@nkundiushuti to be honest, this PR languished for so long that I don't have a clear recollection of the issues.

For (3), I don't think I ever fully understood what the tests were, or what else could/should be added. I think this should be more on the maintainers to decide rather than contributors.

For (1), the question is pretty well described in #253 - I'd suggest re-opening that issue until it's entirely resolved. There are also some other finnicky points in #253 that I raised for discussion, but never received any feedback on (eg erroring out when users request random splits).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

OpenMic2018
4 participants