PAM: Prompting Audio-Language Models for Audio Quality Assessment

[Paper] [data]

PAM is a no-reference metric for assessing audio quality for different audio processing tasks. It prompts Audio-Language Models (ALMs) using an antonym prompt strategy to calculate an audio quality score. It does not require reference data or task-specific models and correlates well with human perception.

News

[Jul 24] Improved human correlation across tasks [commit]
[Mar 24] PAM is accepted at INTERSPEECH 2024

Setup

Open the Anaconda terminal and run:

> git clone https://github.com/soham97/PAM.git
> cd PAM 
> conda create -n pam python=3.10
> conda activate pam
> pip install -r requirements.txt

Compute PAM

Folder evaluation

To compute PAM on folder containing audio files, you can directly run:

> python run.py --folder {folder_path}

The symbol {..} indicates user input.

Custom evaluation

To compute PAM on heirarchy of folder or multiple directory, we recommed creating a custom dataset.

In dataset.py creating a custom dataset by inheriting from AudioDataset, similar to ExampleDataset
Modify the get_filelist function to fit to your directory structure
Update the run.py with your custom dataset and make changes to evaluation if needed

Data

The manuscript uses data from multiple sources. It can be obtained as follows:

For the text-to-audio and text-to-music generation, we conducted the human listening test using Amazon Turk. The audio generated by models and human listening scores are available at: Zenodo
For text-to-music generation with FAD comparison (Figure 6), we used the data and human listening scores from Adapting Frechet Audio Distance for Generative Music Evaluation (ICASSP 24). The website is here
For text-to-speech generation, we used the data and human listening scores from Evaluating speech synthesis by training recognizers on synthetic speech (2023)
For distortions (Figure 4) we sourced the data from NISQA. The data with human listening scores, can be downloaded from the GitHub repo: here.
For voice conversion, we use the voice conversion subset from the VoiceMOS Challenge data. The data can be downloaded at: Zenodo

Paper reproduction

This section covers reproducing numbers for text-to-audio and text-to-music. First download the human listening test data by following the instruction listed above. The download should contain a folder titled human_eval.

Then run the following commands.

> python pcc.py --folder {folder_path}

where {folder_path} points to human_eval folder.

Citation

@article{deshmukh2024pam,
  title={PAM: Prompting Audio-Language Models for Audio Quality Assessment},
  author={Soham Deshmukh and Dareen Alharthi and Benjamin Elizalde and Hannes Gamper and Mahmoud Al Ismail and Rita Singh and Bhiksha Raj and Huaming Wang},
  journal={arXiv preprint arXiv:2402.00282},
  year={2023}
}

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

PAM: Prompting Audio-Language Models for Audio Quality Assessment

News

Setup

Compute PAM

Folder evaluation

Custom evaluation

Data

Paper reproduction

Citation

About

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
PAM		PAM
LICENSE		LICENSE
README.md		README.md
dataset.py		dataset.py
pcc.py		pcc.py
requirements.txt		requirements.txt
run.py		run.py

License

soham97/PAM

Folders and files

Latest commit

History

Repository files navigation

PAM: Prompting Audio-Language Models for Audio Quality Assessment

News

Setup

Compute PAM

Folder evaluation

Custom evaluation

Data

Paper reproduction

Citation

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages