Skip to content

aminiok1/fccm-lamp

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

5 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

FA-LAMP

This is an FPGA implementation of Learned Approximate Matrix Profile (LAMP) algorithm on Ultra96-V2 board. LAMP-FPGA takes a time series as input and predicts its matrix profile values for a particular window size. You can read more at the Matrix Profile Homepage.

Folder Structure

.
├── models              # pre-trained models, weight values for custom kernel
├── scripts             # Scripts for generating the compiled model for on DPU and evalution
├── src
│   ├── hls				# Custom kernel HLS files
│   └── host			# Source code for ARM processor
├── vitis				# Configurations and Makefile for project synthesis
├── LICENSE
└── README.md

Build Instructions

In this tutorial we show how to deploy LAMP on Xilinx Deep Processing Unit (DPU), integrate it with the custom kernel and run it on the FPGA given a pre-trained model on GPU. The instructions require Docker, Vitis-AI 1.1, Vitis 2019.2, and Peta Linux 2019.2 installed on the system.

HLS Kernel

To generate the IP file for the custom kernel, in hls directory run (board files for Ultra96-V2 can be downloaded from Avnet Github Rpository)
vivado_hls -f script.tcl

The sigmoid implementation can be configured by either choosing SIGMOID_ULTRA_FAST or SIGMOID_EXP_512 in defines.h file. Other weights.cpp files from models directory can be replaced with the original file to evaluate different benchmarks. The generated .xo file will be used in the next step.

DPU Integration

  1. Clone Xilinx's Vitis-AI github repository
git clone https://github.com/Xilinx/Vitis-AI
cd Vitis-AI$ 
git checkout v1.1
export VITIS_AI_HOME="$PWD"
  1. Download Ultra96-V2 Vitis Platform
  2. Set the location of platform
export SDX_PLATFORM=/home/Avnet/vitis/platform_repo/ULTRA96V2/ULTRA96V2.xpfm
  1. Replace the dpu_conf.vh in /prj/Vitis with the provided file in vitis directory of this repository, do the same for config_file/prj_config file.
  2. Using the provided Makefile, run
make KERNEL=DPU DEVICE=ULTRA96V2

The synthesis will take around 1 hour and after that the sd_card directory is generated

Compiling the Model

Launch the docker tools from Vitis-AI directory
sh -x docker_run.sh xilinx/vitis-ai:latest
conda activate vitis-ai-tensorflow

1. Freezing Tensorflow graph

The Vitis-AI flow requires a frozen model for quantization and optimization steps. A frozen model contains information about the graph and checkpoint variables, saving these hyperparameters as constants within the graph structure. This allows fusing some of the layers together for deployment on DPU. We can generate a binary protobuf (.pb) file by running the freeze_graph.py script
python freeze_graph.py input_model

where input_model is the pre-trained LAMP model.

2. Quantization

We will quantize the weights/biases and activations of the model to improve the performance of the model inference on FPGA. Currently, Xilinx DPU only supports 8 bit models, so we quantize everything to 8 bits.

vai_q_tensorflow quantize 
                 --input_frozen_graph frozen_graph.pb 
                 --input_fn input_func.calib_input
                 --output_dir quantized 
                 --input_nodes input_1 
                 --output_nodes reshape_1/Reshape 
                 --input_shapes ?,256,1,32 
                 --calib_iter 32

frozen_graph.pb is the frozen model generated in the previous step, input_func is the python file that generates the input data for quantizer (since there is no backpropagation step here, the unlabeled dataset is sufficient), and calib_iter is the number of iterations for calibrating the activations, we noticed that values larger than 32 do not increase the quantizer accuracy by a lot.

3. Evaluation

We will test the accuracy of the generate quantized model before deploying it to the FPGA.
python evaluate.py

evaluate.py reads in the Tensorflow frozen binary graph, runs the inference and reports the least squared error by comparing the model output with the labels (matrix profile values).

4. Compilation

Vitis-AI Docker image does not support Ultra96-v2 board, we need to generate the DPU configuration file (Ultra96.dcf) required in the compile step by using the DPU Hardware Handoff file (dpu.hwh) generated in the DPU integration step (located in sd_card directory) then running the following command
dlet -f dpu.hwh

dlet is a host tool that extracts the DPU information from the input file and generates the configuration file.

Next, we will compile the model for the target hardware

vai_c_tensorflow --frozen_pb quantized\deploy_model.pb 
                 --arch /opt/vitis_ai/compiler/arch/DPUCZDX8G/ultra96/arch.json 
                 --output_dir . 
                 --net_name lamp

arch.json is located in the script directory. Since, Sigmoid and Global Average Pool layers are not supported by DPU, the command generates four kernels, we will only use lamp_0 model.

Compiling Host Program

  1. Download and install the SDK for cross-compilation
wget -O sdk.sh https://www.xilinx.com/bin/public/openDownload?filename=sdk.sh
chmod +x sdk.sh
./sdk.sh -d ~/petalinux_sdk_vai_1_1_dnndk
  1. setup the environment for cross-compilation
unset LD_LIBRARY_PATH
source ~/petalinux_sdk_vai_1_1_dnndk/environment-setup-aarch64-xilinx-linux
  1. Download and extract the additional DNNDK runtime content to the previously installed SDK
wget -O vitis-ai_v1.1_dnndk.tar.gz  https://www.xilinx.com/bin/public/openDownload?filename=vitis-ai_v1.1_dnndk.tar.gz
tar -xvzf vitis-ai-v1.1_dnndk.tar.gz
  1. Install the additional DNNDK runtime content to the previously installed SDK
cd vitis-ai-v1.1_dnndk
./install.sh $SDKTARGETSYSROOT
  1. in host directory run the make command.

5. Running inference

  1. Copy the generated sd_card folder from the DPU Integration step, the lamp_0.elf model file, and the executable host program from the previous step to the boot directory of SD card. Extract rootfs.tar.gz file to the root directory of the SD card.

  2. Boot the Ultra96-V2 board with the SD card and use "root" for both login and password

  3. Navigate to the SD card folder and run the lamp program

cd /run/medi/mmcblk0p1
cp dpu.xclbin /usr/lib/.
./lamp

About

Code repository for FCCM 2021 submission

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages