-
Notifications
You must be signed in to change notification settings - Fork 1
Home
Welcome to the nn.based.intersection.classficator wiki!
The idea of this project is to assess the generalization capabilities of modern deep learning networks in the context of road intersection classification. The goal is to understand the typology of intersections in front of vehicle, aiming at support a wide range of advanced driving assistance systems (ADAS) as well as self-driving algorithms, with can greatly benefit for many sub-task such as localization purposes.
Following our previous works in this field, i.e., Visual An online probabilistic road intersection detector and Localization at Intersections with Digital and the segmentation capabilities of our 3D-DEEP network, we propose to identify the following intersection topologies. Beside the limitation on the seven classes, this allows us to compare the improvements with respect the previous state of the art, yet paving the way to for further investigation for other kind of intersection configuration (roundabouts and so on, cite Carla approach).
The following picture depicts the underlying idea of our work.
In this work, we used ten sequences of the well-known KITTI dataset, choosing from challenging sequences that includes all the seven different road configurations shown in the picture below.
0 | 1 | 2 | 3 | 4 | 5 | 6 |
- ALVAROMASK : results of 3D-DEEP on KITTI image_02 frames
- CAMERA-BEV : created from RGB(image_02) + DEPTH(aanet[image_02+image_03]) + MASK(alvaromask[image_02]) [using fromAANETandDualBisenet]
- AUGMENTED-CAMERA-BEV : the set of CAMERA-BEV that were saved as png+json; jsons contain the 3D-PARAMETERS used in fromAANETandDualBisenet
- OG : Occupancy Grid
- OSM : OpenStreetMap
- GENERATED-OSM-OG : las que sacamos aleatoriamente
- REAL-OSM-OG : esas que salen de OpenStreetMap + posicion de KITTI
Interesting yet tricky transforms:
-
GenerateNewDataset [PNG writer]: nice trick to save on disk the results of fromAANETandDualBisenet dataloader. If the dataloader has this transform on the list, then the output of the ONLINE SAMPLING fromAANETandDualBisenet will be saved.
-
WriteDebugInfoOnNewDataset [JSON writer]: similar to GenerateNewDataset, with this flag will store the parameters used in the ONLINE SAMPLING.
-
GenerateBev: data augmentation routine to generate new CAMERA-BEV. Input: RGB+DEPTH+MASK. Output: CAMERA-BEV. Parameters: [maxdistance, decimate, random_Rx_degrees, random_Ry_degrees, random_Rz_degrees, random_Tx_meters, random_Ty_meters, random_Tz_meters, returnPoints]
More classic ones:
- Rescale
- GrayScale
- ToTensor
- Normalize
- NormalizePretrained
- Mirror
- generate.bev.dataset.py: used to generate the offline dataset used with fromGeneratedDataset dataloader.
- alvaro_process_all_folders.bash: used to create the AANET/npz files; uses an input file with the folders to process, in which two folders are needed, 'left' and 'right'; we usually put there symbolic links
-
Data Loaders
- Baseline [OFFLINE]
- fromAANETandDualBisenet [ONLINE]
- fromGeneratedDataset [OFFLINE]
- teacher_tripletloss [OFFLINE]
- teacher_tripletloss_generated [ONLINE]
- TestDataset [OFFLINE]
- Standard dataset
- Generated dataset
In the following [ONLINE] means that the loader creates/transform the images on-the-fly while if [OFFLINE] it reads the images from disk.
- [OFFLINE] - class BaseLine
- [ONLINE] - class fromAANETandDualBisenet. Used in Direct classification of BEV. This dataloader uses images from [image_02 + aanet + alvaromask], i.e, [RGB + DEPTH + ROAD_MASK], to generate the CAMERA-BEV
- [OFFLINE] - class fromGeneratedDataset
- [OFFLINE] - class teacher_tripletloss. Input: OpenStreetMap-Files + Ground-Truth (frames_topology.txt) + KITTI-OXTS. Output: TRIPLET
- [ONLINE] - class teacher_tripletloss_generated
- [OFFLINE] - class TestDataset. Used for the dataset without data augmentation.
This is an [OFFLINE] dataloader. This is the dataloader used to directly use RGB images.
sample = {'data': image,
'label': gTruth}
This is an [ONLINE] dataloader.
sample = {'aanet': aanet_image,
'alvaromask': alvaromask_image,
'image_02': image_02_image,
'label': gTruth}
If the dataloader contains the GenerateNewDataset(...path...) TRANSFORM (nice trick uh?!) then we save the output to disk.
This is an [OFFLINE] dataloader. Loads the AUGMENTED-CAMERA-BEVs. The folder containing these images+jsons is generated using the generate.bev.dataset.py.
Depending on the addGeneratedOSM parameter we can have:
sample = {'data': bev_image,
'label': bev_label,
'generated_osm': generated_osm[0]}
sample = {'data': bev_image,
'label': bev_label,
}
This is an [OFFLINE] dataloader. This data loader uses OpenStreetMap files (namely the .osm), the GT file (frames_topology.txt) and the associated OXTS files to create a triplet that contains the following items. Please note the data loader uses the pre-processed filenames containing the ground truth images, i.e., the ground truth OGs (occupancy grids as called in the ICRA2017/19 projects by Ballardini/Cattaneo). This means that it READS files contained in the folder downloaded from the IRALAB website and only for those filenames (filenames=frame-number) creates the triplets, using the frames_topology.txt info to choose right positive/negative elements.
Notice that the downloaded ground truth contains only the OGs that are closer than 20 meters from an intersection center.
sample = {'anchor': anchor_image,
'positive': positive_image,
'negative': negative_image,
'label_anchor': anchor_type,
'label_positive': positive_item[2], # [2] is the type
'label_negative': negative_item[2], # [2] is the type
'filename_anchor': self.osm_data[idx][0], # [0] is the filename
'filename_positive': positive_item[0],
'filename_negative': negative_item[0],
'ground_truth_image': ground_truth_img, # for debugging purposes
'anchor_oxts_lat': self.osm_data[idx][4], # [4] lat
'anchor_oxts_lon': self.osm_data[idx][5], # [5] lon
'positive_oxts_lat': positive_item[4], # [4] lat
'positive_oxts_lon': positive_item[5], # [5] lon
'negative_oxts_lat': negative_item[4], # [4] lat
'negative_oxts_lon': negative_item[5] # [5] lon
}
This is an [ONLINE] dataloader.
sample = {'anchor': anchor_image[0], # [0] is the image
'positive': positive_image[0],
'negative': negative_image[0],
'canonical': canonical_image[0],
'label_anchor': anchor_type,
'label_positive': positive_item[0], # [0] is the type
'label_negative': negative_item[0], # [0] is the type
'ground_truth_image': anchor_image[0], # for debugging purposes | in this dataloader is = the anchor
'anchor_xx': anchor_image[1], # [1] is the xx coordinate
'anchor_yy': anchor_image[2], # [2] is the yy coordinate
'positive_xx': positive_image[1],
'positive_yy': positive_image[2],
'negative_xx': negative_image[1],
'negative_yy': negative_image[2],
# the following are not used; are here to mantain the compatibility with "teacher_tripletloss"
'filename_anchor': 0, 'filename_positive': 0, 'filename_negative': 0, 'anchor_oxts_lat': 0,
'anchor_oxts_lon': 0, 'positive_oxts_lat': 0, 'positive_oxts_lon': 0, 'negative_oxts_lat': 0,
'negative_oxts_lon': 0}
This is an [OFFLINE] dataloader.
sample = {'data': image,
'label': gTruth}
Standard dataset containing
Images are saved in /home/malvaro/Documentos/DualBiSeNet/data_raw
/home/malvaro/Documentos/DualBiSeNet/data_raw
├── mkdir 2011_09_30_drive_0018_sync
├── mkdir 2011_09_30_drive_0020_sync
├── mkdir 2011_09_30_drive_0027_sync
├── mkdir 2011_09_30_drive_0028_sync
├── mkdir 2011_09_30_drive_0033_sync
├── mkdir 2011_09_30_drive_0034_sync
├── mkdir 2011_10_03_drive_0027_sync
└── mkdir 2011_10_03_drive_0034_sync
and for each folder, the sub-structure is:
├── 2011_09_30_drive_0018_sync
│ ├── altdiff
│ ├── alvaromask
│ ├── bev
│ ├── image_02
│ ├── image_03
│ ├── left -> image_02
│ ├── OSM
│ ├── OSM_TYPES
│ ├── oxts
│ │ └── data
│ ├── pred
│ └── right -> image_03
alvaromask | CAMERA-BEV | image_02 | image_03 | OSM
Using the Standard Dataset in realtime during training generates on-the-fly all the required data (this means creating the BEVs from aanet+alvaroMask). This pipeline makes the training slow. We than saved an augmented dataset containing 100 sampled elements for each element in the Standard Dataset, along with a json file (for each element) containing the information related to the generation of the intersection type. We refer to these images as AUGMENTED-CAMERA-BEV. For example, for each intersection we have:
representing all the same intersection. The json looks like the following:
{"label": 3,
"random_Tx": 1.7707240855972204,
"random_Ty": 1.062781057344313,
"random_Tz": 0.3012331042687739,
"random_Rx": 1.404327135460766,
"random_Ry": 2.6811639810346044,
"random_Rz": 1.1197293956636485,
"starting_points": 62152,
"remaining_points": 62152,
"mirrored": true,
"debug": true,
"path": "/home/malvaro/Documentos/DualBiSeNet/data_raw_bev",
"bev_path_filename": "/home/malvaro/Documentos/DualBiSeNet/data_raw_bev/2011_09_30_drive_0020_sync/0000000031.002.png"}
Images are saved in /home/malvaro/Documentos/DualBiSeNet/data_raw_bev
/home/malvaro/Documentos/DualBiSeNet/data_raw_bev
.
├── 2011_09_30_drive_0018_sync
├── 2011_09_30_drive_0020_sync
├── 2011_09_30_drive_0027_sync
├── 2011_09_30_drive_0028_sync
├── 2011_09_30_drive_0033_sync
├── 2011_09_30_drive_0034_sync
├── 2011_10_03_drive_0027_sync
└── 2011_10_03_drive_0034_sync
We evaluated different learning paradigms to try to match the intersection topology. This section describes these attempts that are summerized as follows:
- Direct classification of BEV
- Direct Triplet Loss (OSM - BEV - BEV) *not yet implemented
- Teacher / Student
We first tried to directly classify the images resulting from the ALVARO network and the AANET output, i.e., a birth eye view (BEV) of the intersection area, masked with the points labelled as 'road'. Beside the many efforts, including different network configurations and different parameterizations, the results were not satisfying. In this process we evaluated the following networks:
- resnet-18
- resnet-101
- ALVARO ESCRIBIMOS ALGO DE LO QUE HICIMOS?
After this, we decided to try new approaches described in issue 10
not yet implemented
Generated using the teacher_tripletloss_generated dataloader, it is a general classifier built with a RESNET18 network.
Based on a RESNET-18 network w/o the last fully-connected layer. The teacher can be tested with standard triplets (anchor+positive+negative) or or with "canonical" intersection images.
In the following image
- Anchor: this is generated from OSM data and the LAT/LON coordinates stored in the KITTI-OXTS files. The ground truth of this intersection, as stored in the GROUND TRUTH file, is shown in the last box.
- Positive: using the ground truth of the anchor, a similar intersection is identified.
- Negative: in a similar way, a negative is identified.
Please notice that this elements are occupancy grids without "injected noise", as the following:
The number of elements in the ground truth, i.e. the intersection in the selected sequences or REAL-OSM-OG, is not sufficient to train any network adequately. To efficiently train a network we propose a new data loader that generates intersection sampling from the ICRA2017/19 model. This is the teacher_tripletloss_generated. Similarly to the teacher_tripletloss data loader, here we "randomly generate" a list of N# elements. The list contains numbers in range 0-7 only. Then, in the getitem method, we create the three OG images sampling from the intersection model. The generator has the following parameters (per instance, even though you can change them runtime with the getter/setter method; this should be useful in order to increase the randomness/deviation of the intersection from the canonical intersection shape).
* elements=1000, how many elements the dataset contains
* rnd_width=2.0, the size of each road/branch will be increased/decreased depending on uniform(-rnd_width, rnd_width)
* rnd_angle=0.4, the angle between road/branches will be increased/decreased depending on uniform(-rnd_width, rnd_width)
* rnd_spatial=9.0, intersection center randomness to add to xx = 15.0 + uniform(-rnd_spatial, rnd_spatial) and
y = 0.0 + uniform(-rnd_spatial, rnd_spatial)
* noise=True, we can add noise to the generated intersection; the idea is to have occupancy grids that can be then
compared to the actual BEVs in which the density of the points along the longitudinal axis decrease
* transform=None, no transform will be used here.
Same with "Anchor | Positive | Negative | GT | Canonical"
Here we describe what we've done for training, following what is described in Learning schemes.
We propose two types of teacher, a Teacher Classificator and a Teacher with Triplet Loss. Both the teachers are trained from GENERATED-OSM-OG (not RGB, not AUGMENTED-CAMERA-BEV).
This is a classic classificator.
Uses a GENERATED-OSM-OG + LABEL to create an embedding.
- We first trained the teacher network using the OSM-GENERATE-OSM-OG in the form ANCHOR+POSITIVE+NEGATIVE to create an embedding. The wandb experiment record is store in young-firefly-129.
- We then tested the teacher network on the REAL-OSM-OG to see whether the intersections sampled from our model is representative of the "reality"; one issue that arises is that we test/compare the Anchor with the Positive with the cosinesimilarity function; since we sampled a Positive item at every iteration, the accuracy value is different for each execution of the test. To limit this anomaly we first added the so-called Canonical match, that is the intersection type with no random error injection to each of the arms values (width/angle/center position). The Canonical is added both to the teacher_tripletloss and teacher_tripletloss_generated data-loaders.
We have two methods to train the student, depending on the type of teacher train,
- Student with Teacher Classificator
- Student with Teacher-Triplet-Loss
Both student modalities are trained over AUGMENTED-CAMERA-BEV images.
Training the student networks require a set of images originated from the fromGeneratedDataset dataloader with the flag addGeneratedOSM set TRUE. With this flag, the dataloader will then call test_crossing_pose during its getitem method returning a python-dictionary with the following keys:
sample = {'data': bev_image, --> from the AUGMENTED-CAMERA-BEV
'label': bev_label, --> the LABEL, 0-6
'generated_osm': generated_osm[0]} --> on-the-fly GENERATED-OSM-OG (0-6) with noise
ALVARO, PARA TEST USAMOS TestDataset CORRECTO?
The two following images are an example of the output of the generarator with the extra addGeneratedOSM flag.
For each of the images we generate the embeddings using
- teacher-network with input 'data' key
- student-network with input 'generated_osm' key.
The loss function we used is the CosineEmbeddingLoss. The accuracy is then calculated with CosineSimilarity scaled in range [0..1].
The first student network train is recorded in generous-snow-151 wandb-experiment. This experiment was stopped after one day due to the slow training process. After 11 epochs the training ended up with 0.9777 Train Accuracy and 0.04423 Train Loss (the validation acc/loss values were corrupted due to a little bug). We then ran a test that is recorded in astral-pyramid-157 wandb-experiment.
Here we describe what we've done for testing, following what is described in [Training procedure](#Training procedure).
Following to the training described for the teacher network a test is made with the [teacher_tripletloss] (#teacher_tripletloss) dataset, using real OSM images. The test results are recorded in [sunny-pine-988] (https://wandb.ai/chiringuito/nn-based-intersection-classficator/runs/3r8vj5iz) wandb-experiment.
In this confusion matrix, you can see seven classes for the crossing, despite there are six crossing types. This is because in testing we can only measure the similarity between the real osm and an artificial one generated of the same class. This lack of information has been solved by creating class number seven, which represents the crosses that could not be classified with sufficient reliability. The reliability threshold for the classification is 0.92
Within the category teacher/student we have devised three different forms of testing. The first one consists of testing the student network by comparing the features of test images in bird perspective with the features obtained from the teacher network through an osm of the same class as the BEV. (Same as in the training) The second, allows us to get rid of the network teacher once the student is trained. It consists in using an SVM to cluster the features obtained in the training and thus be able to categorize the features obtained from the BEV images of testing from the student network. The third and last one consists in freezing the weights of the student network, adding a fully connected layer at the end and retraining with the training data until getting the correct categorization of the embeddings that the student network returns once the test images are used.
Old info moved from the README
- The camera images are taken from the original KITTI RAW dataset, synchronized version.
- Crossing-frames and ground truth label are those available in Intersection Ground Truth
- files in alvaromask are created with DualBiSeNet (this correspond to the IV2020 Alvaro Submission)
- files in pred are created with aanet ; this are numpy-compressed images created with this version of aanet
- files in bev and pcd are created with the reproject.py Python script; the PCD creation needs the PCL-Library installed as it uses pcl_ply2pcd; you don't need to call directly this script, but instead use the generate.bev.and.pcds.sh bash script in /scripts
├── 2011_09_30_drive_0018_sync
│ ├── alvaromask
│ ├── bev
│ ├── image_02
│ ├── image_03
│ ├── pcd
│ └── pred
├── 2011_09_30_drive_0020_sync
│ ├── alvaromask
│ ├── bev
│ ├── image_02
│ ├── image_03
│ ├── pcd
│ └── pred
.
.
reproject takes as input the following list of arguments
- aanet output (disparity) as numpy npz, not the image. various test were performed as the original aaanet code saves in png with
skimage.io.imsave(save_name, (disp * 256.).astype(np.uint16))
and the results of the projection were not good. - alvaro mask
- the intermediate output for pcl generation. will be auto deleted
- PCD OUTPUT FILE NAME
- image_02 (from kitti)
- image_03 (from kitti)
- BEV OUTPUT FILE NAME
Example for Pycharm debug:
/media/ballardini/4tb/ALVARO/Secuencias/2011_09_30_drive_0034_sync/pred/0000001018_pred.npz
/media/ballardini/4tb/ALVARO/Secuencias/2011_09_30_drive_0034_sync/alvaromask/0000001018pred.png
out_aanet.ply
/media/ballardini/4tb/ALVARO/Secuencias/2011_09_30_drive_0034_sync/pcd/0000001018.pcd
/media/ballardini/4tb/ALVARO/Secuencias/2011_09_30_drive_0034_sync/image_02/0000001018.png
/media/ballardini/4tb/ALVARO/Secuencias/2011_09_30_drive_0034_sync/image_03/0000001018.png
/media/ballardini/4tb/ALVARO/Secuencias/2011_09_30_drive_0034_sync/bev/0000001018.png
example output during execution:
python reproject.py /media/ballardini/4tb/ALVARO/Secuencias/2011_09_30_drive_0034_sync/pred/0000001017_pred.npz /media/ballardini/4tb/ALVARO/Secuencias/2011_09_30_drive_0034_sync/alvaromask/0000001017pred.png out_aanet.ply /media/ballardini/4tb/ALVARO/Secuencias/2011_09_30_drive_0034_sync/pcd/0000001017.pcd /media/ballardini/4tb/ALVARO/Secuencias/2011_09_30_drive_0034_sync/image_02/0000001017.png /media/ballardini/4tb/ALVARO/Secuencias/2011_09_30_drive_0034_sync/image_03/0000001017.png /media/ballardini/4tb/ALVARO/Secuencias/2011_09_30_drive_0034_sync/bev/0000001017.png
out_aanet.ply saved
Convert a PLY file to PCD format. For more information, use: pcl_ply2pcd -h
PCD output format: binary
> Loading out_aanet.ply [done, 110 ms : 45604 points]
Available dimensions: x y z rgb
> Saving /media/ballardini/4tb/ALVARO/Secuencias/2011_09_30_drive_0034_sync/pcd/0000001017.pcd [done, 2 ms : 45604 points]
Given the folders' structure you can generate a mosaic of BEVs+Images using the script generate.videos.bash
The following scripts uses as input a txt file (foldes.txt) generated with
ls -d1 */ > folders.txt
- generate.bev.and.pcds.sh
- generate.videos.bash
- copy.img2.to.img3.bash: this script copies the "selected" images from original KITTI/DATA folder (example image_03/data) to somewhere else using the filenames contained in some other folder, example image_02