Skip to content

Latest commit

 

History

History
167 lines (111 loc) · 5.46 KB

README.md

File metadata and controls

167 lines (111 loc) · 5.46 KB

Whisper API Server

This project is a RESTful API server that provides endpoints for transcribing and translating audio files. The APIs are compitable with OpenAI APIs of transcriptions and translations.

The following is the list of supported audio formats and codecs:

  • The formats supported are caf, isomp4, mkv, ogg, aiff, wav.

  • The codecs supported are aac, adpcm, alac, flac, mp1, mp2, mp3, pcm, vorbis.

Note

The project is still under active development. The existing features still need to be improved and more features will be added in the future.

Quick Start

Setup

  • Install WasmEdge v0.14.1 with wasi_nn-whisper plugin

    curl -sSf https://raw.githubusercontent.com/WasmEdge/WasmEdge/master/utils/install_v2.sh | bash -s -- -v 0.14.1
  • Deploy wasi_nn-whisper plugin

    # Download whisper plugin for Mac Apple Silicon
    curl -LO https://github.com/WasmEdge/WasmEdge/releases/download/0.14.1/WasmEdge-plugin-wasi_nn-whisper-0.14.1-darwin_arm64.tar.gz
    
    # Unzip the plugin to $HOME/.wasmedge/plugin
    tar -xzf WasmEdge-plugin-wasi_nn-whisper-0.14.1-darwin_arm64.tar.gz -C $HOME/.wasmedge/plugin

Run whisper-api-server

  • Download whisper-api-server.wasm binary

    curl -LO https://github.com/LlamaEdge/whisper-api-server/releases/download/0.3.0/whisper-api-server.wasm
  • Download model

    ggml whisper models are available from https://huggingface.co/ggerganov/whisper.cpp/tree/main

    In the following command, ggml-medium.bin is downloaded as an example. You can replace it with other models.

    curl -LO https://huggingface.co/ggerganov/whisper.cpp/resolve/main/ggml-medium.bin
  • Start server

    wasmedge --dir .:. whisper-api-server.wasm -m ggml-medium.bin

    To start the server on other port, use --socket-addr to specify the port you want to use, for example:

    wasmedge --dir .:. whisper-api-server.wasm -m ggml-medium.bin --socket-addr 0.0.0.0:10086

Usage

Transcribe an audio file

  • Download audio file

    curl -LO https://github.com/LlamaEdge/whisper-api-server/raw/main/data/test.wav
    
  • Send curl request to the transcriptions endpoint

    curl --location 'http://localhost:8080/v1/audio/transcriptions' \
      --header 'Content-Type: multipart/form-data' \
      --form 'file=@"test.wav"'

    If everything is set up correctly, you should see the following generated transcriptions:

    {
        "text": "[00:00:00.000 --> 00:00:03.540]  This is a test record for Whisper.cpp"
    }

Translate an audio file

  • Download audio file

    curl -LO https://github.com/LlamaEdge/whisper-api-server/raw/main/data/test_cn.wav

    This audio contains a Chinese sentence, 这里是中文广播, the English meaning is This is a Chinese broadcast.

  • Send curl request to the translations endpoint

    curl --location 'http://localhost:8080/v1/audio/translations' \
      --header 'Content-Type: multipart/form-data' \
      --form 'file=@"test.wav"' \
      --form 'language="cn"'

    If everything is set up correctly, you should see the following generated transcriptions:

    {
      "text": "[00:00:00.000 --> 00:00:04.000]  This is a Chinese broadcast."
    }

Build

To build the whisper-api-server.wasm binary, you need to have the Rust toolchain installed. If you don't have it installed, you can install it by following the instructions on the Rust website.

If you are working on macOS, you need to download the wasi-sdk from https://github.com/WebAssembly/wasi-sdk/releases; and then, set the WASI_SDK_PATH environment variable to the path of the wasi-sdk directory, and set CC environment variable to the clang of wasi-sdk, for example:

export WASI_SDK_PATH /path/to/wasi-sdk-22.0
export CC="${WASI_SDK_PATH}/bin/clang --sysroot=${WASI_SDK_PATH}/share/wasi-sysroot"

Now, you can build the whisper-api-server.wasm binary by following the steps below:

  • Clone the repository

    git clone https://github.com/LlamaEdge/whisper-api-server.git
  • Build the whisper-api-server.wasm binary

    cd whisper-api-server
    
    cargo build --release

    If the build is successful, you should see the whisper-api-server.wasm binary in the target/wasm32-wasip1/release directory.

CLI Options

$ wasmedge whisper-api-server.wasm -h

Whisper API Server

Usage: whisper-api-server.wasm [OPTIONS] --model <MODEL>

Options:
  -n, --model-name <MODEL_NAME>    Model name [default: default]
  -a, --model-alias <MODEL_ALIAS>  Model alias [default: default]
  -m, --model <MODEL>              Path to the whisper model file
      --threads <THREADS>          Number of threads to use during computation [default: 4]
      --processors <PROCESSORS>    Number of processors to use during computation [default: 1]
      --task <TASK>                Task type [default: full] [possible values: transcribe, translate, full]
      --port <PORT>                Port number [default: 8080]
      --socket-addr <SOCKET_ADDR>  Socket address of LlamaEdge API Server instance. For example, `0.0.0.0:8080`
  -h, --help                       Print help (see more with '--help')
  -V, --version                    Print version