Evidential deep learning-based multi-modal environment perception for intelligent vehicles

KITTI Road, KITTI Semantic, Road Detection, Semantic Segmentation, Evidence Theory

Adjustements to decision-making and the number of prototypes used have to be considered, and links between the organization of the files.

Overview

In this work, evidence theory is combined with a camera-lidar-based deep learning fusion architecture. The encoding section together with context module have two processing pipelines of 13 layers each, one for LiDAR and the other for camera input. Each layer of one modality is fused with the corresponding layer of the other modality by a weighted sum operation. The fusion weights are trainable, this makes the position and extent of fusion to be fixed by the data. Once the LiDAR and camera inputs are represented by BBAs in the evidential formulation layer, a decision can be made on desired calss. The coupling is based on generating basic belief functions using distance to prototypes. It also uses a distance-based decision rule. The project is an extended work of the cross-fusion reduction repository: https://github.com/geletumn/cf_reduction The new benchmark introduces evidence theory for the decision-making part.

Architecture:

Installation

Python version and TF framework python 3.7

tensorflow==2.8.0

Getting Started

Before running the jupyter, the user is advised to import the corresponding libraries and activation functions related to evidential formulation part. In the utils.py file, simplification of the classes is intransigent. Two architectures can be found in model_arch and their corresponding weights

ds_layer_p2p

DS1_activate

Decision Making

The belief theory approach Distance to prototypes using Interval Dominance and Decision Based Interval adapted to road segmentation as well as multi-class segmentation.
The E-CNN-classifier is referred to Tong Zheng's respository (@tongzheng1992) and its inspired by the evidential classifer of Prof. Denouex.

Dataset

Two datasets both from KITTI Benchmark were used in this work: KITTI road and KITTI semantic pixel-wise. https://www.cvlibs.net/datasets/kitti/ The second dataset used for semantic segmentation contains 127 frames (lidar and camera) from the KITTI raw dataset. The number of classes is simplified to 3 classes: road, vehicle and background.

The lidar-camera dataset The semantic KITTI dataset has originally only 200 camera images. The dataset is similar to KITTI Stereo and KITTI Flow 2012/2015 datasets. Since the KITTI semantic has no LiDAR frames (like the road dataset for instance), the corresponding 3D point-cloud points of the existing camera frames have to be identified in the big original KITTI raw dataset, which contains the data for all tasks. Hence, for 127 out of the 200 camera images, LiDAR frames have been successfully projected and up-sampled to create dense depth images. A 3D LiDAR point x is mapped into a point y in the camera plane according to the KITTI projection P, rectification R and translation T matrices

Results

The evidential formulation introduces a extra class called ignorance to treat the uncertaintes. More deeper explanations are treated in the paper that is in the process of being published.

Road Detection: Bird Eye View

Evidential Prediction

Probabilistic Prediction

Semantic Segmentation

Far-end points are classified as ignorance, rather than making a wrong prediction.

Predicted frame (4 classes, including ignorance)

Groundtruth frame, simplified with 3 classes.

Predicted frame (4 classes, including ignorance)

Groundtruth frame, simplified with 3 classes.

Citing

Evidential deep learning-based multi-modal environment perception for intelligent vehicles

BibTex:

@INPROCEEDINGS{10186581, author={Geletu, Mihreteab Negash and Giurgi, Dănuţ-Vasile and Josso-Laurain, Thomas and Devanne, Maxime and Wogari, Mengesha Mamo and Lauffenburger, Jean-Philippe}, booktitle={2023 IEEE Intelligent Vehicles Symposium (IV)}, title={Evidential deep learning-based multi-modal environment perception for intelligent vehicles}, year={2023}, volume={}, number={}, pages={1-6}, doi={10.1109/IV55152.2023.10186581}}

Plain Text:

M. N. Geletu, D. -V. Giurgi, T. Josso-Laurain, M. Devanne, M. M. Wogari and J. -P. Lauffenburger, "Evidential deep learning-based multi-modal environment perception for intelligent vehicles," 2023 IEEE Intelligent Vehicles Symposium (IV), Anchorage, AK, USA, 2023, pp. 1-6, doi: 10.1109/IV55152.2023.10186581.

Name		Name	Last commit message	Last commit date
Latest commit History 41 Commits
image_2		image_2
libs		libs
lidar		lidar
model_arch		model_arch
output		output
semantic_rgb		semantic_rgb
IV_2023_Poster.pdf		IV_2023_Poster.pdf
README.md		README.md
run_train.ipynb		run_train.ipynb
utils.py		utils.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Evidential deep learning-based multi-modal environment perception for intelligent vehicles

Overview

Architecture:

Installation

Getting Started

Decision Making

Dataset

Results

Citing

About

Releases

Packages

Languages

vasigiurgi/evi-cf-deep-learning-based-for-iv

Folders and files

Latest commit

History

Repository files navigation

Evidential deep learning-based multi-modal environment perception for intelligent vehicles

Overview

Architecture:

Installation

Getting Started

Decision Making

Dataset

Results

Citing

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages