FA-LAMP

This is an FPGA implementation of Learned Approximate Matrix Profile (LAMP) algorithm on Ultra96-V2 board. LAMP-FPGA takes a time series as input and predicts its matrix profile values for a particular window size. You can read more at the Matrix Profile Homepage.

Folder Structure

.
├── models              # pre-trained models, weight values for custom kernel
├── scripts             # Scripts for generating the compiled model for on DPU and evalution
├── src
│   ├── hls				# Custom kernel HLS files
│   └── host			# Source code for ARM processor
├── vitis				# Configurations and Makefile for project synthesis
├── LICENSE
└── README.md

Build Instructions

In this tutorial we show how to deploy LAMP on Xilinx Deep Processing Unit (DPU), integrate it with the custom kernel and run it on the FPGA given a pre-trained model on GPU. The instructions require Docker, Vitis-AI 1.1, Vitis 2019.2, and Peta Linux 2019.2 installed on the system.

HLS Kernel

To generate the IP file for the custom kernel, in hls directory run (board files for Ultra96-V2 can be downloaded from Avnet Github Rpository)

vivado_hls -f script.tcl

The sigmoid implementation can be configured by either choosing SIGMOID_ULTRA_FAST or SIGMOID_EXP_512 in defines.h file. Other weights.cpp files from models directory can be replaced with the original file to evaluate different benchmarks. The generated .xo file will be used in the next step.

DPU Integration

Clone Xilinx's Vitis-AI github repository

git clone https://github.com/Xilinx/Vitis-AI
cd Vitis-AI$ 
git checkout v1.1
export VITIS_AI_HOME="$PWD"

Download Ultra96-V2 Vitis Platform
Set the location of platform

export SDX_PLATFORM=/home/Avnet/vitis/platform_repo/ULTRA96V2/ULTRA96V2.xpfm

Replace the dpu_conf.vh in /prj/Vitis with the provided file in vitis directory of this repository, do the same for config_file/prj_config file.
Using the provided Makefile, run

make KERNEL=DPU DEVICE=ULTRA96V2

The synthesis will take around 1 hour and after that the sd_card directory is generated

Compiling the Model

Launch the docker tools from Vitis-AI directory

sh -x docker_run.sh xilinx/vitis-ai:latest
conda activate vitis-ai-tensorflow

1. Freezing Tensorflow graph

The Vitis-AI flow requires a frozen model for quantization and optimization steps. A frozen model contains information about the graph and checkpoint variables, saving these hyperparameters as constants within the graph structure. This allows fusing some of the layers together for deployment on DPU. We can generate a binary protobuf (.pb) file by running the freeze_graph.py script

python freeze_graph.py input_model

where input_model is the pre-trained LAMP model.

2. Quantization

We will quantize the weights/biases and activations of the model to improve the performance of the model inference on FPGA. Currently, Xilinx DPU only supports 8 bit models, so we quantize everything to 8 bits.

vai_q_tensorflow quantize 
                 --input_frozen_graph frozen_graph.pb 
                 --input_fn input_func.calib_input
                 --output_dir quantized 
                 --input_nodes input_1 
                 --output_nodes reshape_1/Reshape 
                 --input_shapes ?,256,1,32 
                 --calib_iter 32

frozen_graph.pb is the frozen model generated in the previous step, input_func is the python file that generates the input data for quantizer (since there is no backpropagation step here, the unlabeled dataset is sufficient), and calib_iter is the number of iterations for calibrating the activations, we noticed that values larger than 32 do not increase the quantizer accuracy by a lot.

3. Evaluation

We will test the accuracy of the generate quantized model before deploying it to the FPGA.

python evaluate.py

evaluate.py reads in the Tensorflow frozen binary graph, runs the inference and reports the least squared error by comparing the model output with the labels (matrix profile values).

4. Compilation

Vitis-AI Docker image does not support Ultra96-v2 board, we need to generate the DPU configuration file (Ultra96.dcf) required in the compile step by using the DPU Hardware Handoff file (dpu.hwh) generated in the DPU integration step (located in sd_card directory) then running the following command

dlet -f dpu.hwh

dlet is a host tool that extracts the DPU information from the input file and generates the configuration file.

Next, we will compile the model for the target hardware

vai_c_tensorflow --frozen_pb quantized\deploy_model.pb 
                 --arch /opt/vitis_ai/compiler/arch/DPUCZDX8G/ultra96/arch.json 
                 --output_dir . 
                 --net_name lamp

arch.json is located in the script directory. Since, Sigmoid and Global Average Pool layers are not supported by DPU, the command generates four kernels, we will only use lamp_0 model.

Compiling Host Program

Download and install the SDK for cross-compilation

wget -O sdk.sh https://www.xilinx.com/bin/public/openDownload?filename=sdk.sh
chmod +x sdk.sh
./sdk.sh -d ~/petalinux_sdk_vai_1_1_dnndk

setup the environment for cross-compilation

unset LD_LIBRARY_PATH
source ~/petalinux_sdk_vai_1_1_dnndk/environment-setup-aarch64-xilinx-linux

Download and extract the additional DNNDK runtime content to the previously installed SDK

wget -O vitis-ai_v1.1_dnndk.tar.gz  https://www.xilinx.com/bin/public/openDownload?filename=vitis-ai_v1.1_dnndk.tar.gz
tar -xvzf vitis-ai-v1.1_dnndk.tar.gz

Install the additional DNNDK runtime content to the previously installed SDK

cd vitis-ai-v1.1_dnndk
./install.sh $SDKTARGETSYSROOT

in host directory run the make command.

5. Running inference

Copy the generated sd_card folder from the DPU Integration step, the lamp_0.elf model file, and the executable host program from the previous step to the boot directory of SD card. Extract rootfs.tar.gz file to the root directory of the SD card.
Boot the Ultra96-V2 board with the SD card and use "root" for both login and password
Navigate to the SD card folder and run the lamp program

cd /run/medi/mmcblk0p1
cp dpu.xclbin /usr/lib/.
./lamp

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

FA-LAMP

Folder Structure

Build Instructions

HLS Kernel

DPU Integration

Compiling the Model

1. Freezing Tensorflow graph

2. Quantization

3. Evaluation

4. Compilation

Compiling Host Program

5. Running inference

About

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
models		models
scripts		scripts
src		src
vitis		vitis
LICENSE		LICENSE
README.md		README.md

License

aminiok1/fccm-lamp

Folders and files

Latest commit

History

Repository files navigation

FA-LAMP

Folder Structure

Build Instructions

HLS Kernel

DPU Integration

Compiling the Model

1. Freezing Tensorflow graph

2. Quantization

3. Evaluation

4. Compilation

Compiling Host Program

5. Running inference

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages