Skip to content

Commit

Permalink
Version 0.5.0
Browse files Browse the repository at this point in the history
  • Loading branch information
Labbeti committed Jan 5, 2024
1 parent c9312b5 commit 781b759
Show file tree
Hide file tree
Showing 55 changed files with 3,655 additions and 5,421 deletions.
34 changes: 29 additions & 5 deletions .github/workflows/python-package-pip.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -10,6 +10,7 @@ on:

env:
CACHE_NUMBER: 0 # increase to reset cache manually
AAC_DATASETS_ROOT: "$HOME/.cache/data"

# Cancel workflow if a new push occurs
concurrency:
Expand All @@ -23,7 +24,7 @@ jobs:
strategy:
matrix:
os: [ubuntu-latest]
python-version: ["3.7", "3.8", "3.9", "3.10", "3.11"]
python-version: ["3.7", "3.11"]

steps:
# --- INSTALLATIONS ---
Expand All @@ -43,10 +44,13 @@ jobs:
run: |
python -m pip install "aac-datasets[dev] @ git+https://github.com/Labbeti/aac-datasets@${GITHUB_REF##*/}"
- name: Install soundfile for torchaudio
- name: Install soundfile for torchaudio, ffmpeg and yt-dlp for AudioCaps download
run: |
# For soundfile dep
sudo add-apt-repository ppa:tomtomtom/yt-dlp # Add ppa repo to apt
sudo apt-get update
sudo apt-get install libsndfile1
sudo apt-get install ffmpeg
sudo apt-get install yt-dlp
# --- TESTS ---
- name: Compile python files
Expand All @@ -60,11 +64,31 @@ jobs:
- name: Check format with Black
run: |
python -m black --check --diff src
- name: Print install info
run: |
aac-datasets-info
ffmpeg -version
yt-dlp --version
- name: Test with pytest
run: |
python -m pytest -v
- name: Build data root
run: |
dataroot=`eval echo $AAC_DATASETS_ROOT`
echo "Building directory '$dataroot'..."
mkdir -p "$dataroot"
- name: Try to download Clotho val
run: |
aac-datasets-download --verbose 2 clotho --subsets val
- name: Try to download AudioCaps val
run: |
aac-datasets-download --verbose 2 audiocaps --subsets val --max_workers none --with_tags true
- name: Check data root
run: |
aac-datasets-check --verbose 2 --datasets clotho audiocaps
8 changes: 8 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,14 @@
# Change log

All notable changes to this project will be documented in this file.

## [0.5.0] 2024-01-05
### Changed
- Update typing for paths with python class `Path`.
- Refactor functional interface to load raw metadata for each dataset.
- Refactor class variables to init arguments.
- Faster AudioCaps download with `ThreadPoolExecutor`.

## [0.4.1] 2023-10-25
### Added
- `AudioCaps.DOWNLOAD_AUDIO` class variable for compatibility with [audiocaps-download 1.0](https://github.com/MorenoLaQuatra/audiocaps-download).
Expand Down
4 changes: 2 additions & 2 deletions CITATION.cff
Original file line number Diff line number Diff line change
Expand Up @@ -22,5 +22,5 @@ keywords:
- captioning
- audio-captioning
license: MIT
version: 0.4.1
date-released: '2023-10-25'
version: 0.5.0
date-released: '2024-01-05'
82 changes: 47 additions & 35 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -56,37 +56,55 @@ for batch in dataloader:
...
```

## Datasets stats
## Download datasets
To download a dataset, you can use `download` argument in dataset construction :
```python
dataset = Clotho(root=".", subset="dev", download=True)
```
However, if you want to download datasets from a script, you can also use the following command :
```bash
aac-datasets-download --root "." clotho --subsets "dev"
```

## Datasets information
Here is the statistics for each dataset :

| | AudioCaps | Clotho | MACS | WavCaps |
<!-- | | AudioCaps | Clotho | MACS | WavCaps |
|:---:|:---:|:---:|:---:|:---:|
| Subsets | train, val, test | dev, val, eval, dcase_aac_test, dcase_aac_analysis, dcase_t2a_audio, dcase_t2a_captions | full | as, as_noac, bbc, fsd, fsd_nocl, sb |
| Subsets | `train`, `val`, `test` | `dev`, `val`, `eval`, `dcase_aac_test`, `dcase_aac_analysis`, `dcase_t2a_audio`, `dcase_t2a_captions` | `full` | `as`, `as_noac`, `bbc`, `fsd`, `fsd_nocl`, `sb` |
| Sample rate (kHz) | 32 | 44.1 | 48 | 32 |
| Estimated size (GB) | 43 | 53 | 13 | 941 |
| Audio source | AudioSet | FreeSound | TAU Urban Acoustic Scenes 2019 | AudioSet, BBC Sound Effects, FreeSound, SoundBible |
| Audio source | AudioSet | FreeSound | TAU Urban Acoustic Scenes 2019 | AudioSet, BBC Sound Effects, FreeSound, SoundBible | -->

For Clotho, the dev subset should be used for training, val for validation and eval for testing.
| Dataset | Sampling<br>rate (kHz) | Estimated<br>size (GB) | Source | Subsets |
|:---:|:---:|:---:|:---:|:---:|
| AudioCaps | 32 | 43 | AudioSet | `train`<br>`val`<br>`test`<br>`train_v2` |
| Clotho | 44.1 | 53 | Freesound | `dev`<br>`val`<br>`eval`<br>`dcase_aac_test`<br>`dcase_aac_analysis`<br>`dcase_t2a_audio`<br>`dcase_t2a_captions` |
| MACS | 48 | 13 | TAU Urban Acoustic Scenes 2019 | `full` |
| WavCaps | 32 | 941 | AudioSet<br>BBC Sound Effects<br>FreeSound<br>SoundBible | `as`<br>`as_noac`<br>`bbc`<br>`fsd`<br>`fsd_nocl`<br>`sb` |

For Clotho, the **dev** subset should be used for training, val for validation and eval for testing.

Here is the **train** subset statistics for AudioCaps, Clotho and MACS datasets :
Here is additional statistics on the train subset for AudioCaps, Clotho and MACS:

| | AudioCaps/train | Clotho/dev | MACS/full |
|:---:|:---:|:---:|:---:|
| Nb audios | 49,838 | 3,840 | 3,930 |
| Total audio duration (h) | 136.6<sup>1</sup> | 24.0 | 10.9 |
| Audio duration range (s) | 0.5-10 | 15-30 | 10 |
| Nb captions per audio | 1 | 5 | 2-5 |
| Nb captions | 49,838 | 19,195 | 17,275 |
| Total nb words<sup>2</sup> | 402,482 | 217,362 | 160,006 |
| Sentence size<sup>2</sup> | 2-52 | 8-20 | 5-40 |
| | AudioCaps/train | Clotho/dev | MACS/full | WavCaps/full |
|:---:|:---:|:---:|:---:|:---:|
| Nb audios | 49,838 | 3,840 | 3,930 | 403,050 |
| Total audio duration (h) | 136.6<sup>1</sup> | 24.0 | 10.9 | 7563.3 |
| Audio duration range (s) | 0.5-10 | 15-30 | 10 | 1-67,109 |
| Nb captions per audio | 1 | 5 | 2-5 | 1 |
| Nb captions | 49,838 | 19,195 | 17,275 | 403,050 |
| Total nb words<sup>2</sup> | 402,482 | 217,362 | 160,006 | 3,161,823 |
| Sentence size<sup>2</sup> | 2-52 | 8-20 | 5-40 | 2-38 |
| Vocabulary<sup>2</sup> | 4724 | 4369 | 2721 | 24600 |

<sup>1</sup> This duration is estimated on the total duration of 46230/49838 files of 126.7h.

<sup>2</sup> The sentences are cleaned (lowercase+remove punctuation) and tokenized using the spacy tokenizer to count the words.

## Requirements

This package has been developped for Ubuntu 20.04, and it is expected to work on most Linux distributions.
This package has been developped for Ubuntu 20.04, and it is expected to work on most Linux-based distributions.
### Python packages

Python requirements are automatically installed when using pip on this repository.
Expand All @@ -104,7 +122,7 @@ numpy >= 1.21.2

The external requirements needed to download **AudioCaps** are **ffmpeg** and **yt-dlp**.
**ffmpeg** can be install on Ubuntu using `sudo apt install ffmpeg` and **yt-dlp** from the [official repo](https://github.com/yt-dlp/yt-dlp).
<!-- programs can be downloaded on Ubuntu using `sudo apt install ffmpeg`. -->
<!-- programs can be downloaded on Ubuntu using `sudo apt install ffmpeg`. -->

You can also override their paths for AudioCaps:
```python
Expand All @@ -116,16 +134,6 @@ dataset = AudioCaps(
)
```

## Download datasets
To download a dataset, you can use `download` argument in dataset construction :
```python
dataset = Clotho(root=".", subset="dev", download=True)
```
However, if you want to download datasets from a script, you can also use the following command :
```bash
aac-datasets-download --root "." clotho --subsets "dev"
```

## Additional information
### Compatibility with audiocaps-download
If you want to use [audiocaps-download 1.0](https://github.com/MorenoLaQuatra/audiocaps-download) package to download AudioCaps, you will have to respect the AudioCaps folder tree:
Expand All @@ -139,9 +147,13 @@ downloader.download(format="wav")
Then disable audio download and set the correct audio format before init AudioCaps :
```python
from aac_datasets import AudioCaps
AudioCaps.AUDIO_FORMAT = "wav"
AudioCaps.DOWNLOAD_AUDIO = False # this will only download labels and metadata files
dataset = AudioCaps(root=root, subset="train", download=True)
dataset = AudioCaps(
root=root,
subset="train",
download=True,
audio_format="wav",
download_audio=False, # this will only download labels and metadata files
)
```

## References
Expand All @@ -155,21 +167,21 @@ dataset = AudioCaps(root=root, subset="train", download=True)
[3] F. Font, A. Mesaros, D. P. W. Ellis, E. Fonseca, M. Fuentes, and B. Elizalde, Proceedings of the 6th Workshop on Detection and Classication of Acoustic Scenes and Events (DCASE 2021). Barcelona, Spain: Music Technology Group - Universitat Pompeu Fabra, Nov. 2021. Available: https://doi.org/10.5281/zenodo.5770113

#### WavCaps
[1] X. Mei et al., “WavCaps: A ChatGPT-Assisted Weakly-Labelled Audio Captioning Dataset for Audio-Language Multimodal Research,” arXiv preprint arXiv:2303.17395, 2023, [Online]. Available: https://arxiv.org/pdf/2303.17395.pdf
[4] X. Mei et al., “WavCaps: A ChatGPT-Assisted Weakly-Labelled Audio Captioning Dataset for Audio-Language Multimodal Research,” arXiv preprint arXiv:2303.17395, 2023, [Online]. Available: https://arxiv.org/pdf/2303.17395.pdf

## Cite the aac-datasets package
If you use this software, please consider cite it as "Labbe, E. (2013). aac-datasets: Audio Captioning datasets for PyTorch.", or use the following BibTeX citation:

```
@software{
Labbe_aac_datasets_2023,
Labbe_aac_datasets_2024,
author = {Labbé, Etienne},
license = {MIT},
month = {10},
month = {01},
title = {{aac-datasets}},
url = {https://github.com/Labbeti/aac-datasets/},
version = {0.4.1},
year = {2023}
version = {0.5.0},
year = {2024}
}
```

Expand Down
File renamed without changes.
File renamed without changes.
File renamed without changes.
7 changes: 7 additions & 0 deletions docs/aac_datasets.datasets.functional.audiocaps.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
aac\_datasets.datasets.functional.audiocaps module
==================================================

.. automodule:: aac_datasets.datasets.functional.audiocaps
:members:
:undoc-members:
:show-inheritance:
7 changes: 7 additions & 0 deletions docs/aac_datasets.datasets.functional.clotho.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
aac\_datasets.datasets.functional.clotho module
===============================================

.. automodule:: aac_datasets.datasets.functional.clotho
:members:
:undoc-members:
:show-inheritance:
7 changes: 7 additions & 0 deletions docs/aac_datasets.datasets.functional.common.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
aac\_datasets.datasets.functional.common module
===============================================

.. automodule:: aac_datasets.datasets.functional.common
:members:
:undoc-members:
:show-inheritance:
7 changes: 7 additions & 0 deletions docs/aac_datasets.datasets.functional.macs.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
aac\_datasets.datasets.functional.macs module
=============================================

.. automodule:: aac_datasets.datasets.functional.macs
:members:
:undoc-members:
:show-inheritance:
19 changes: 19 additions & 0 deletions docs/aac_datasets.datasets.functional.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,19 @@
aac\_datasets.datasets.functional package
=========================================

.. automodule:: aac_datasets.datasets.functional
:members:
:undoc-members:
:show-inheritance:

Submodules
----------

.. toctree::
:maxdepth: 4

aac_datasets.datasets.functional.audiocaps
aac_datasets.datasets.functional.clotho
aac_datasets.datasets.functional.common
aac_datasets.datasets.functional.macs
aac_datasets.datasets.functional.wavcaps
7 changes: 7 additions & 0 deletions docs/aac_datasets.datasets.functional.wavcaps.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
aac\_datasets.datasets.functional.wavcaps module
================================================

.. automodule:: aac_datasets.datasets.functional.wavcaps
:members:
:undoc-members:
:show-inheritance:
7 changes: 0 additions & 7 deletions docs/aac_datasets.datasets.legacy.audiocaps.rst

This file was deleted.

7 changes: 0 additions & 7 deletions docs/aac_datasets.datasets.legacy.clotho.rst

This file was deleted.

7 changes: 0 additions & 7 deletions docs/aac_datasets.datasets.legacy.macs.rst

This file was deleted.

17 changes: 0 additions & 17 deletions docs/aac_datasets.datasets.legacy.rst

This file was deleted.

2 changes: 1 addition & 1 deletion docs/aac_datasets.datasets.rst
Original file line number Diff line number Diff line change
Expand Up @@ -12,7 +12,7 @@ Subpackages
.. toctree::
:maxdepth: 4

aac_datasets.datasets.legacy
aac_datasets.datasets.functional

Submodules
----------
Expand Down
7 changes: 7 additions & 0 deletions docs/aac_datasets.utils.audioset_mapping.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
aac\_datasets.utils.audioset\_mapping module
============================================

.. automodule:: aac_datasets.utils.audioset_mapping
:members:
:undoc-members:
:show-inheritance:
7 changes: 7 additions & 0 deletions docs/aac_datasets.utils.globals.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
aac\_datasets.utils.globals module
==================================

.. automodule:: aac_datasets.utils.globals
:members:
:undoc-members:
:show-inheritance:
7 changes: 0 additions & 7 deletions docs/aac_datasets.utils.paths.rst

This file was deleted.

4 changes: 3 additions & 1 deletion docs/aac_datasets.utils.rst
Original file line number Diff line number Diff line change
Expand Up @@ -12,7 +12,9 @@ Submodules
.. toctree::
:maxdepth: 4

aac_datasets.utils.audioset_mapping
aac_datasets.utils.cmdline
aac_datasets.utils.collate
aac_datasets.utils.collections
aac_datasets.utils.download
aac_datasets.utils.paths
aac_datasets.utils.globals
Loading

0 comments on commit 781b759

Please sign in to comment.