PAM is a no-reference metric for assessing audio quality for different audio processing tasks. It prompts Audio-Language Models (ALMs) using an antonym prompt strategy to calculate an audio quality score. It does not require reference data or task-specific models and correlates well with human perception.
[Jul 24] Improved human correlation across tasks [commit]
[Mar 24] PAM is accepted at INTERSPEECH 2024
Open the Anaconda terminal and run:
> git clone https://github.com/soham97/PAM.git
> cd PAM
> conda create -n pam python=3.10
> conda activate pam
> pip install -r requirements.txt
To compute PAM on folder containing audio files, you can directly run:
> python run.py --folder {folder_path}
The symbol {..}
indicates user input.
To compute PAM on heirarchy of folder or multiple directory, we recommed creating a custom dataset.
- In
dataset.py
creating a custom dataset by inheriting fromAudioDataset
, similar toExampleDataset
- Modify the
get_filelist
function to fit to your directory structure - Update the
run.py
with your custom dataset and make changes to evaluation if needed
The manuscript uses data from multiple sources. It can be obtained as follows:
- For the text-to-audio and text-to-music generation, we conducted the human listening test using Amazon Turk. The audio generated by models and human listening scores are available at: Zenodo
- For text-to-music generation with FAD comparison (Figure 6), we used the data and human listening scores from Adapting Frechet Audio Distance for Generative Music Evaluation (ICASSP 24). The website is here
- For text-to-speech generation, we used the data and human listening scores from Evaluating speech synthesis by training recognizers on synthetic speech (2023)
- For distortions (Figure 4) we sourced the data from NISQA. The data with human listening scores, can be downloaded from the GitHub repo: here.
- For voice conversion, we use the voice conversion subset from the VoiceMOS Challenge data. The data can be downloaded at: Zenodo
This section covers reproducing numbers for text-to-audio and text-to-music. First download the human listening test data by following the instruction listed above. The download should contain a folder titled human_eval
.
Then run the following commands.
> python pcc.py --folder {folder_path}
where {folder_path}
points to human_eval
folder.
@article{deshmukh2024pam,
title={PAM: Prompting Audio-Language Models for Audio Quality Assessment},
author={Soham Deshmukh and Dareen Alharthi and Benjamin Elizalde and Hannes Gamper and Mahmoud Al Ismail and Rita Singh and Bhiksha Raj and Huaming Wang},
journal={arXiv preprint arXiv:2402.00282},
year={2023}
}