[WIP] OpenMIC2018 #544

bmcfee · 2021-11-09T16:24:11Z

This PR adds a dataset class for openmic2018. Fixes #253

The basic functionality is there, but there are a few things left to sort out:

How should we handle the individual (anonymized, disaggregated) annotations?
JAMS conversion is not done yet. This is currently blocked by Issues with tag_to_jams #541
More tests (?) -- the basics are here, but I wanted to get feedback on what's here so far before developing further.

bmcfee · 2021-11-09T17:28:48Z

I think I don't understand why the loader tests are failing / what the expected behavior is here:

Lines 323 to 355 in e591c54

    
           def test_track_placeholder_case(): 
        
               data_home_dir = "not/a/real/path" 
        
               for dataset_name in DATASETS: 
        
                   print(dataset_name) 
        
                   data_home = os.path.join(data_home_dir, dataset_name) 
        
                   dataset = mirdata.initialize(dataset_name, data_home, version="test") 
        
                   if not dataset._track_class: 
        
                       continue 
        
                   if dataset_name in CUSTOM_TEST_TRACKS: 
        
                       trackid = CUSTOM_TEST_TRACKS[dataset_name] 
        
                   else: 
        
                       trackid = dataset.track_ids[0] 
        
                   try: 
        
                       track_test = dataset.track(trackid) 
        
                   except: 
        
                       assert False, "{}: {}".format(dataset_name, sys.exc_info()[0]) 
        
                   track_data = get_attributes_and_properties(track_test) 
        
                   for attr in track_data["attributes"]: 
        
                       ret = getattr(track_test, attr) 
        
                   for prop in track_data["properties"]: 
        
                       with pytest.raises(Exception): 
        
                           ret = getattr(track_test, prop) 
        
                   for cprop in track_data["cached_properties"]: 
        
                       with pytest.raises(Exception): 
        
                           ret = getattr(track_test, cprop)

(Incidentally, this would be a bit cleaner using a fixture to iterate over the datasets, rather than sticking them all in a loop inside the test. You could cut down on prints this way and isolate failures.)

bmcfee · 2021-11-10T12:00:14Z

Still not making any headway on the loader tests. The placeholder test fails because it's trying to load metadata from a path that doesn't exist:

   @core.cached_property
    def _metadata(self):
        metadata_path = Path(self.data_home) / "openmic-2018-metadata.csv"
    
        try:
            with open(metadata_path, "r") as fdesc:
                # index column is second to last
                metadata = pd.read_csv(fdesc, index_col=-2)
        except FileNotFoundError as exc:
>           raise FileNotFoundError(
                f"Metadata file {metadata_path} not found. " "Did you run .download?"
            ) from exc
E           FileNotFoundError: Metadata file not/a/real/path/openmic2018/openmic-2018-metadata.csv not found. Did you run .download?

This seems to me like exactly what should happen? Clearly I'm missing something here: why is this expected to work?

bmcfee · 2021-11-24T14:47:37Z

Bumping this one -- lil help?

genisplaja · 2021-11-25T09:59:05Z

Hello @bmcfee! The problem is in mirdata/datasets/openmic2018.py, lines 159-160:

# -- set the split
self.split = self._track_metadata.get("split")

The problem is in loading the metadata file in the Track class init. The issue could be fixed by converting this self.split attribute into a track property just as artist or title in your loader are defined. Hope this helps!

bmcfee · 2021-11-25T12:58:39Z

Thanks @genisplaja, that's easy enough to fix. I still don't understand what this test is trying to accomplish though: it seems like it ought to be an intentional failure.

codecov · 2021-11-25T15:41:40Z

Codecov Report

Merging #544 (c125594) into master (c4759b4) will increase coverage by 0.02%.
The diff coverage is 97.91%.

@@            Coverage Diff             @@
##           master     #544      +/-   ##
==========================================
+ Coverage   96.59%   96.61%   +0.02%     
==========================================
  Files          48       49       +1     
  Lines        5908     6004      +96     
==========================================
+ Hits         5707     5801      +94     
- Misses        201      203       +2

bmcfee · 2022-04-17T14:51:53Z

Might be a good time to revive this one.. is there anything that needs doing from my side here?

bmcfee · 2022-06-08T12:20:17Z

Attempting to bump this one again

bmcfee · 2022-07-07T15:52:50Z

nkundiushuti

sorry for the delay, Brian. this looks good. I have just added some small questions regarding the way you construct the paths.

mirdata/datasets/openmic2018.py

scripts/make_openmic2018_index.py

setup.py

bmcfee · 2022-07-18T14:44:40Z

CI failures here seem to be unrelated to this PR. I've fetched upstream and rebased, so not sure what's going on here (unless master is broken).

nkundiushuti · 2022-07-19T14:32:46Z

there is an error in one of the existing tests in the master branch indeed. I will look into it this week.

nkundiushuti · 2022-07-22T08:57:35Z

I have fixed the problem on the main branch. so you can update this PR with the master

nkundiushuti · 2022-08-08T09:11:40Z

if no one objects, I propose merging this into main branch. pandas is already installed as a dependency of jams

bmcfee · 2022-09-21T17:22:49Z

Thanks for merging this - I do think there are still some unresolved issues to clear up as noted in my original post:

How should we handle the individual (anonymized, disaggregated) annotations?

JAMS conversion is not done yet. This is currently blocked by Issues with tag_to_jams #541

More tests (?) -- the basics are here, but I wanted to get feedback on what's here so far before developing further.

nkundiushuti · 2022-09-22T08:45:58Z

hi Brian!
could you add 1 and 3 as issues so we keep track of them?

bmcfee · 2022-09-28T19:18:53Z

@nkundiushuti to be honest, this PR languished for so long that I don't have a clear recollection of the issues.

For (3), I don't think I ever fully understood what the tests were, or what else could/should be added. I think this should be more on the maintainers to decide rather than contributors.

For (1), the question is pretty well described in #253 - I'd suggest re-opening that issue until it's entirely resolved. There are also some other finnicky points in #253 that I raised for discussion, but never received any feedback on (eg erroring out when users request random splits).

nkundiushuti self-requested a review July 18, 2022 12:30

nkundiushuti requested changes Jul 18, 2022

View reviewed changes

mirdata/datasets/openmic2018.py Outdated Show resolved Hide resolved

mirdata/datasets/openmic2018.py Outdated Show resolved Hide resolved

mirdata/datasets/openmic2018.py Show resolved Hide resolved

PRamoneda reviewed Jul 18, 2022

View reviewed changes

bmcfee added 15 commits July 22, 2022 07:08

added openmic2018 indexer

e0bdb97

basic openmic dataclass. tracks still need much work.

75ddb49

promoted pandas to a proper dependency

7b8b255

blacked openmic2018, added some features

490e820

building tests for openmic2018

8fe76d6

building tests for openmic2018

f2ffda5

generalize openmic to multiple splits going forward

32f7e74

autodocced openmic track class

fa5641a

blacked

13c264c

fixing some naming inconsistencies, use proper open wrapper

6e8641b

mypy is mysterious

23fbc78

blacking

e6e01ee

added exception chaining and download message to openmic

b2cc3ec

Trying openmic split as a property

c72326f

fixed a typo in a docstring

a73ba9e

simplified partial annotation filter

c125594

bmcfee force-pushed the openmic branch from 73710f8 to c125594 Compare July 22, 2022 11:08

nkundiushuti approved these changes Sep 21, 2022

View reviewed changes

nkundiushuti merged commit 8db5795 into mir-dataset-loaders:master Sep 21, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[WIP] OpenMIC2018 #544

[WIP] OpenMIC2018 #544

bmcfee commented Nov 9, 2021

bmcfee commented Nov 9, 2021 •

edited

Loading

bmcfee commented Nov 10, 2021

bmcfee commented Nov 24, 2021

genisplaja commented Nov 25, 2021 •

edited

Loading

bmcfee commented Nov 25, 2021

codecov bot commented Nov 25, 2021 •

edited

Loading

bmcfee commented Apr 17, 2022

bmcfee commented Jun 8, 2022

bmcfee commented Jul 7, 2022

nkundiushuti left a comment

bmcfee commented Jul 18, 2022

nkundiushuti commented Jul 19, 2022

nkundiushuti commented Jul 22, 2022

nkundiushuti commented Aug 8, 2022

bmcfee commented Sep 21, 2022

nkundiushuti commented Sep 22, 2022

bmcfee commented Sep 28, 2022

[WIP] OpenMIC2018 #544

[WIP] OpenMIC2018 #544

Conversation

bmcfee commented Nov 9, 2021

bmcfee commented Nov 9, 2021 • edited Loading

bmcfee commented Nov 10, 2021

bmcfee commented Nov 24, 2021

genisplaja commented Nov 25, 2021 • edited Loading

bmcfee commented Nov 25, 2021

codecov bot commented Nov 25, 2021 • edited Loading

Codecov Report

bmcfee commented Apr 17, 2022

bmcfee commented Jun 8, 2022

bmcfee commented Jul 7, 2022

nkundiushuti left a comment

Choose a reason for hiding this comment

bmcfee commented Jul 18, 2022

nkundiushuti commented Jul 19, 2022

nkundiushuti commented Jul 22, 2022

nkundiushuti commented Aug 8, 2022

bmcfee commented Sep 21, 2022

nkundiushuti commented Sep 22, 2022

bmcfee commented Sep 28, 2022

bmcfee commented Nov 9, 2021 •

edited

Loading

genisplaja commented Nov 25, 2021 •

edited

Loading

codecov bot commented Nov 25, 2021 •

edited

Loading