Skip to content

evfinkn/speechviz

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Speechviz

An annotation tool for analyzing real-world auditory soundscapes

![interface](docs/interface.png)

Speechviz

Speechviz is a tool to

  1. Automatically process audio and video data—performing speaker diarization, voice-activity detection, speech recognition, and face detection
  2. Visualize the generated annotations in a user-friendly interface that allows playing the audio segments and refining the generated annotations to correct any errors

Contents

pyannote access

Before you can get started, you'll have to get an access token to use pyannote. You can do so by following these steps:

  1. Login to or signup for https://huggingface.co/
  2. Visit each of the following and accept the user conditions:
  3. Go to https://huggingface.co/settings/tokens and create an access token
  4. Set your PYANNOTE_AUTH_TOKEN environment variable to your access token

Docker / Podman image

The image is built from the Aria Data Tools image, so you'll need to build that container first.

git clone https://github.com/facebookresearch/Aria_data_tools.git --recursive
cd Aria_data_tools
docker build -t aria_data_tools .

After that's finished, build the Speechviz container.

git clone https://research-git.uiowa.edu/uiowa-audiology-reu-2022/speechviz.git
cd speechviz
docker build --build-arg \
    PYANNOTE_AUTH_TOKEN="${PYANNOTE_AUTH_TOKEN}" \
    -t speechviz .

Note that the above commands build the image with PyTorch CPU support only. If you'd like to include support for CUDA, follow the instructions for using the NVIDIA Container Toolkit and add --build-arg cuda=true to the docker build command above:

docker build --build-arg \
    PYANNOTE_AUTH_TOKEN="${PYANNOTE_AUTH_TOKEN}" \
    --build-arg cuda=true -t speechviz .

You'll want to mount your data into the image. To create the data folder, repository, and database, run these 3 commands:

npm run mkdir
python3 scripts/init_fossil.py

You can then start the container by running

docker run -it \
    -v ./data:/speechviz/data \
    -v ./speechviz.sqlite3:/speechviz/speechviz.sqlite3 \
    speechviz

If you're going to use the interface in the container, use the -p PORT:PORT option. By default, the interface uses port 3000, so the command for that port is

docker run -it -p 3000:3000 \
    -v ./data:/speechviz/data \
    -v ./speechviz.sqlite3:/speechviz/speechviz.sqlite3 \
    speechviz

Manual installation

git clone https://research-git.uiowa.edu/uiowa-audiology-reu-2022/speechviz.git
cd speechviz

Setup the interface

npm install
npm run mkdir
python3 scripts/init_fossil.py

Install script dependencies

To use process_audio.py, you will need to install audiowaveform and ffmpeg. The remaining dependencies for process_audio.py can be installed using pip or conda. For encode_faces.py and cluster_faces.py, you will need to install dlib. If you'll be using extract-vrs-data.py, you will need to install VRS. Lastly, for create_poses.py, you will need to install Aria Data Tools.

pip

To install with PyTorch CPU support only:

pip3 install --extra-index-url \
    "https://download.pytorch.org/whl/cpu" \
    -r requirements.txt

To install with PyTorch CUDA support (Linux and Windows only):

pip3 install --extra-index-url \
    "https://download.pytorch.org/whl/cu116" \
    -r requirements.txt cuda-python nvidia-cudnn

conda

conda env create -f environment.yml

Usage

Audio can be processed by moving the audio file to data/audio (or data/video for video files) and running

python3 scripts/process_audio.py data/audio/FILE

Then, to view the results on the interface, run

npm start

and open http://localhost:3000.
For a more in-depth usage guide, see USAGE.md.

Troubleshooting

If installing on Bigcore, you are likely to run into an error relating to a proxy URL. To resolve this, run the following command:

http_proxy="http://$(echo $http_proxy)" && https_proxy="http://$(echo $https_proxy)"

If you receive a subprocess.CalledProcessError relating to ffmpeg, running the following should resolve the issue:

conda update ffmpeg

If installing for the first time on a fresh wsl and you get this error /usr/bin/env: ‘bash\r’: No such file or directory the problem is likely you don't have nodejs. This should fix it:

export PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin
sudo apt install nodejs npm