Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adding loader for BAF #577

Closed
wants to merge 45 commits into from
Closed

Adding loader for BAF #577

wants to merge 45 commits into from

Conversation

guillemcortes
Copy link
Collaborator

Adding loader for BAF

Description

Please include the following information at the top level docstring for the dataset's module mydataset.py:

  • Describe annotations included in the dataset
  • Indicate the size of the datasets (e.g. number files and duration, hours)
  • Mention the origin of the dataset (e.g. creator, institution)
  • Describe the type of music included in the dataset
  • Indicate any relevant papers related to the dataset
  • Include a description about how the data can be accessed and the license it uses (if applicable)

Dataset loaders checklist:

  • Create a script in scripts/, e.g. make_my_dataset_index.py, which generates an index file.
  • Run the script on the canonical version of the dataset and save the index in mirdata/indexes/ e.g. my_dataset_index.json.
  • Create a module in mirdata, e.g. mirdata/my_dataset.py
  • Create tests for your loader in tests/datasets/, e.g. test_my_dataset.py
  • Add your module to docs/source/mirdata.rst and docs/source/table.rst
  • Run tests/test_full_dataset.py on your dataset.

If your dataset is not fully downloadable there are two extra steps you should follow:

  • Contacting the mirdata organizers by opening an issue or PR so we can discuss how to proceed with the closed dataset. --> Talked to @genisplaja
  • Show that the version used to create the checksum is the "canonical" one, either by getting the version from the dataset creator, or by verifying equivalence with several other copies of the dataset.
  • Make sure someone has run pytest -s tests/test_full_dataset.py --local --dataset my_dataset once on your dataset locally and confirmed it passes

Please-do-not-edit flag

To reduce friction, we will make commits on top of contributor's pull requests by default unless they use the please-do-not-edit flag. If you don't want this to happen don't forget to add the flag when you start your pull request.

@guillemcortes guillemcortes changed the title Adding loader for BAF [WIP] Adding loader for BAF Feb 23, 2023
@guillemcortes guillemcortes changed the title [WIP] Adding loader for BAF Adding loader for BAF Feb 24, 2023
@guillemcortes guillemcortes changed the title Adding loader for BAF [WIP] Adding loader for BAF Feb 24, 2023
@guillemcortes
Copy link
Collaborator Author

I might need some help passing the failing checks.

  • Both buildpy37 and buildpy38 are failing in tests/datasets/test_egfxset.py, a test from another dataset.
  • I've ran black --target-version py38 mirdata/ tests/ but I don't pass formatting check.
    I see that @genisplaja is facing the same issues above in PR-576
  • readthedocs.org:mirdata check fails due to some error when importing pandas. I see that openmic2018 uses pandas, so I'm wondering if this has happened before. This is the Traceback:
WARNING: autodoc: failed to import module 'baf' from module 'mirdata.datasets'; the following exception was raised:
Traceback (most recent call last):
  File "/home/docs/checkouts/readthedocs.org/user_builds/mirdata/envs/577/lib/python3.7/site-packages/sphinx/ext/autodoc/importer.py", line 70, in import_module
    return importlib.import_module(modname)
  File "/home/docs/checkouts/readthedocs.org/user_builds/mirdata/envs/577/lib/python3.7/importlib/__init__.py", line 127, in import_module
    return _bootstrap._gcd_import(name[level:], package, level)
  File "<frozen importlib._bootstrap>", line 1006, in _gcd_import
  File "<frozen importlib._bootstrap>", line 983, in _find_and_load
  File "<frozen importlib._bootstrap>", line 967, in _find_and_load_unlocked
  File "<frozen importlib._bootstrap>", line 677, in _load_unlocked
  File "<frozen importlib._bootstrap_external>", line 728, in exec_module
  File "<frozen importlib._bootstrap>", line 219, in _call_with_frames_removed
  File "/home/docs/checkouts/readthedocs.org/user_builds/mirdata/checkouts/577/mirdata/datasets/baf.py", line 120, in <module>
    import pandas as pd
  File "/home/docs/checkouts/readthedocs.org/user_builds/mirdata/envs/577/lib/python3.7/site-packages/pandas/__init__.py", line 22, in <module>
    from pandas.compat import (
  File "/home/docs/checkouts/readthedocs.org/user_builds/mirdata/envs/577/lib/python3.7/site-packages/pandas/compat/__init__.py", line 15, in <module>
    from pandas.compat.numpy import (
  File "/home/docs/checkouts/readthedocs.org/user_builds/mirdata/envs/577/lib/python3.7/site-packages/pandas/compat/numpy/__init__.py", line 7, in <module>
    from pandas.util.version import Version
  File "/home/docs/checkouts/readthedocs.org/user_builds/mirdata/envs/577/lib/python3.7/site-packages/pandas/util/__init__.py", line 1, in <module>
    from pandas.util._decorators import (  # noqa
  File "/home/docs/checkouts/readthedocs.org/user_builds/mirdata/envs/577/lib/python3.7/site-packages/pandas/util/_decorators.py", line 14, in <module>
    from pandas._libs.properties import cache_readonly  # noqa
  File "/home/docs/checkouts/readthedocs.org/user_builds/mirdata/envs/577/lib/python3.7/site-packages/pandas/_libs/__init__.py", line 13, in <module>
    from pandas._libs.interval import Interval
  File "pandas/_libs/interval.pyx", line 1, in init pandas._libs.interval
TypeError: numpy.dtype is not a type object

@dagett
Copy link

dagett commented Feb 25, 2023

About the failing test in test_egfxset.py, looks like it is due to recent librosa release, see #578

@guillemcortes guillemcortes changed the title [WIP] Adding loader for BAF Adding loader for BAF Feb 28, 2023
guillemcortes and others added 2 commits March 7, 2023 09:26
* Fix tox for formatting test

* Pin black version to 23.1.0

* Upgrade librosa version and ensure python3.6 compatibility

* Black formatting with new 23.1.0 version

* Fixing egfxset expected return value

* Mock pandas import at sphinx autodoc

* Fix black version for python3.6
@codecov
Copy link

codecov bot commented Mar 7, 2023

Codecov Report

Merging #577 (0bf86c5) into master (c9fd249) will increase coverage by 0.00%.
The diff coverage is 98.03%.

@@           Coverage Diff            @@
##           master     #577    +/-   ##
========================================
  Coverage   96.85%   96.86%            
========================================
  Files          55       57     +2     
  Lines        6705     6818   +113     
========================================
+ Hits         6494     6604   +110     
- Misses        211      214     +3     

@guillemcortes guillemcortes closed this by deleting the head repository Mar 7, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants