This is an FPGA implementation of Learned Approximate Matrix Profile (LAMP) algorithm on Ultra96-V2 board. LAMP-FPGA takes a time series as input and predicts its matrix profile values for a particular window size. You can read more at the Matrix Profile Homepage.
.
├── models # pre-trained models, weight values for custom kernel
├── scripts # Scripts for generating the compiled model for on DPU and evalution
├── src
│ ├── hls # Custom kernel HLS files
│ └── host # Source code for ARM processor
├── vitis # Configurations and Makefile for project synthesis
├── LICENSE
└── README.md
In this tutorial we show how to deploy LAMP on Xilinx Deep Processing Unit (DPU), integrate it with the custom kernel and run it on the FPGA given a pre-trained model on GPU. The instructions require Docker, Vitis-AI 1.1, Vitis 2019.2, and Peta Linux 2019.2 installed on the system.
To generate the IP file for the custom kernel, in hls directory run (board files for Ultra96-V2 can be downloaded from Avnet Github Rpository)vivado_hls -f script.tcl
The sigmoid implementation can be configured by either choosing SIGMOID_ULTRA_FAST
or SIGMOID_EXP_512
in defines.h
file. Other weights.cpp
files from models directory can be replaced with the original file to evaluate different benchmarks.
The generated .xo
file will be used in the next step.
- Clone Xilinx's Vitis-AI github repository
git clone https://github.com/Xilinx/Vitis-AI
cd Vitis-AI$
git checkout v1.1
export VITIS_AI_HOME="$PWD"
- Download Ultra96-V2 Vitis Platform
- Set the location of platform
export SDX_PLATFORM=/home/Avnet/vitis/platform_repo/ULTRA96V2/ULTRA96V2.xpfm
- Replace the
dpu_conf.vh
in /prj/Vitis with the provided file in vitis directory of this repository, do the same for config_file/prj_config file. - Using the provided Makefile, run
make KERNEL=DPU DEVICE=ULTRA96V2
The synthesis will take around 1 hour and after that the sd_card
directory is generated
sh -x docker_run.sh xilinx/vitis-ai:latest
conda activate vitis-ai-tensorflow
freeze_graph.py
script
python freeze_graph.py input_model
where input_model
is the pre-trained LAMP model.
We will quantize the weights/biases and activations of the model to improve the performance of the model inference on FPGA. Currently, Xilinx DPU only supports 8 bit models, so we quantize everything to 8 bits.
vai_q_tensorflow quantize
--input_frozen_graph frozen_graph.pb
--input_fn input_func.calib_input
--output_dir quantized
--input_nodes input_1
--output_nodes reshape_1/Reshape
--input_shapes ?,256,1,32
--calib_iter 32
frozen_graph.pb
is the frozen model generated in the previous step, input_func
is the python file that generates the input data for quantizer (since there is no backpropagation step here, the unlabeled dataset is sufficient), and calib_iter
is the number of iterations for calibrating the activations, we noticed that values larger than 32 do not increase the quantizer accuracy by a lot.
python evaluate.py
evaluate.py
reads in the Tensorflow frozen binary graph, runs the inference and reports the least squared error by comparing the model output with the labels (matrix profile values).
dlet -f dpu.hwh
dlet is a host tool that extracts the DPU information from the input file and generates the configuration file.
Next, we will compile the model for the target hardware
vai_c_tensorflow --frozen_pb quantized\deploy_model.pb
--arch /opt/vitis_ai/compiler/arch/DPUCZDX8G/ultra96/arch.json
--output_dir .
--net_name lamp
arch.json
is located in the script directory. Since, Sigmoid and Global Average Pool layers are not supported by DPU, the command generates four kernels, we will only use lamp_0
model.
- Download and install the SDK for cross-compilation
wget -O sdk.sh https://www.xilinx.com/bin/public/openDownload?filename=sdk.sh
chmod +x sdk.sh
./sdk.sh -d ~/petalinux_sdk_vai_1_1_dnndk
- setup the environment for cross-compilation
unset LD_LIBRARY_PATH
source ~/petalinux_sdk_vai_1_1_dnndk/environment-setup-aarch64-xilinx-linux
- Download and extract the additional DNNDK runtime content to the previously installed SDK
wget -O vitis-ai_v1.1_dnndk.tar.gz https://www.xilinx.com/bin/public/openDownload?filename=vitis-ai_v1.1_dnndk.tar.gz
tar -xvzf vitis-ai-v1.1_dnndk.tar.gz
- Install the additional DNNDK runtime content to the previously installed SDK
cd vitis-ai-v1.1_dnndk
./install.sh $SDKTARGETSYSROOT
- in host directory run the
make
command.
-
Copy the generated sd_card folder from the DPU Integration step, the
lamp_0.elf
model file, and the executable host program from the previous step to the boot directory of SD card. Extractrootfs.tar.gz
file to the root directory of the SD card. -
Boot the Ultra96-V2 board with the SD card and use "root" for both login and password
-
Navigate to the SD card folder and run the lamp program
cd /run/medi/mmcblk0p1
cp dpu.xclbin /usr/lib/.
./lamp