LiDAR-Camera fusion approaches for perception in off-road environments.

Autonomous navigation in open-pit mines and other unstructured terrains poses unique perception challenges: irregular surfaces, sparse semantic cues and harsh environmental conditions undermine camera-only pipelines that dominate urban self-driving research. This thesis explores and evaluates deep learning approaches for perception in unstructured environments; in particular considering the specific use case and deployment platform, an autonomous truck moving in a quarry, the approaches explored are camera 2D semantic segmentation and camera and LiDAR middle fusion methods. First, a comprehensive benchmark of state-of-the-art 2-D semantic-segmentation networks (DDRNET-23-Slim, PIDNET-S and PIDNET-M) is conducted on the GOOSE off-road dataset. After a targeted data-enrichment strategy with Great Outdoor Dataset(GO) and AutoMine-SimWorld images, PIDNET-S attains 60.4 mIoU while sustaining 169 FPS on an NVIDIA® Jetson Orin AGX using TensorRT FP16, meeting the latency budget for on-board deployment. Qualitative tests on in-house video streams confirm its robustness to dust, low contrast and the presence of the truck bonnet. Considering the presence of different sensors onboard, the study extends perception to 3D by surveying mid-level camera–LiDAR fusion frameworks. Dense BEV encoders (BEVFusion) and sparse query-based detectors (CMT, MV2DFusion) are analyzed for object detection, while EPMF, UniSeg, and U2MKD are reviewed for point-cloud segmentation. The comparison exposes a clear trade-off: BEVFusion offers the highest frame rate (25 FPS) and holistic bird's eye context, while MV2DFusion delivers the best detection precision (mAP 0.78). For semantic understanding, EPMF emerges as the most suitable off-road candidate thanks to its 70 FPS throughput and reliance on widely available LiDAR labels. The results achieved by PIDNET-s, in an off-road environment, allowed deployment in a real-world vehicle using the TensorRT acceleration framework. The findings highlight that careful sensor fusion, dataset adaptation, and hardware-aware optimization are pivotal for transferring deep-learning perception from structured roads to the unpredictability of off-road environments.

LiDAR-Camera fusion approaches for perception in off-road environments.

MENABENI, MICHELE

2024/2025

Abstract

Autonomous navigation in open-pit mines and other unstructured terrains poses unique perception challenges: irregular surfaces, sparse semantic cues and harsh environmental conditions undermine camera-only pipelines that dominate urban self-driving research. This thesis explores and evaluates deep learning approaches for perception in unstructured environments; in particular considering the specific use case and deployment platform, an autonomous truck moving in a quarry, the approaches explored are camera 2D semantic segmentation and camera and LiDAR middle fusion methods. First, a comprehensive benchmark of state-of-the-art 2-D semantic-segmentation networks (DDRNET-23-Slim, PIDNET-S and PIDNET-M) is conducted on the GOOSE off-road dataset. After a targeted data-enrichment strategy with Great Outdoor Dataset(GO) and AutoMine-SimWorld images, PIDNET-S attains 60.4 mIoU while sustaining 169 FPS on an NVIDIA® Jetson Orin AGX using TensorRT FP16, meeting the latency budget for on-board deployment. Qualitative tests on in-house video streams confirm its robustness to dust, low contrast and the presence of the truck bonnet. Considering the presence of different sensors onboard, the study extends perception to 3D by surveying mid-level camera–LiDAR fusion frameworks. Dense BEV encoders (BEVFusion) and sparse query-based detectors (CMT, MV2DFusion) are analyzed for object detection, while EPMF, UniSeg, and U2MKD are reviewed for point-cloud segmentation. The comparison exposes a clear trade-off: BEVFusion offers the highest frame rate (25 FPS) and holistic bird's eye context, while MV2DFusion delivers the best detection precision (mAP 0.78). For semantic understanding, EPMF emerges as the most suitable off-road candidate thanks to its 70 FPS throughput and reliance on widely available LiDAR labels. The results achieved by PIDNET-s, in an off-road environment, allowed deployment in a real-world vehicle using the TensorRT acceleration framework. The findings highlight that careful sensor fusion, dataset adaptation, and hardware-aware optimization are pivotal for transferring deep-learning perception from structured roads to the unpredictability of off-road environments.

Scheda breve

Scheda completa

Scheda completa (DC)

	Facoltà/Dipartimento
	
				Dipartimento di Ingegneria "Enzo Ferrari"
			
	Corso di studio
	
				Artificial intelligence engineering
			
	Anno Accademico
	
				2024
			
	Titolo inglese
	
				LiDAR-Camera fusion approaches for perception in off-road environments
			
	Parola chiave
	
				RGB Segmentation
3D object detection
Real-time
Lidar-camera fusion
Off-Road Perception
			
	Relatore
	
				BERTOGNA, MARKO
			
	Controrelatore
	
				VERUCCHI, MICAELA
			
	Appare nelle tipologie:
	
				Lauree Magistrali

File in questo prodotto:

File	Dimensione	Formato
Michele.Menabeni.pdf embargo fino al 15/07/2028 Dimensione 31.82 MB Formato Adobe PDF	31.82 MB	Adobe PDF

I documenti in UNITESI sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/20.500.14251/3521