Autonomous navigation in open-pit mines and other unstructured terrains poses unique perception challenges: irregular surfaces, sparse semantic cues and harsh environmental conditions undermine camera-only pipelines that dominate urban self-driving research. This thesis explores and evaluates deep learning approaches for perception in unstructured environments; in particular considering the specific use case and deployment platform, an autonomous truck moving in a quarry, the approaches explored are camera 2D semantic segmentation and camera and LiDAR middle fusion methods. First, a comprehensive benchmark of state-of-the-art 2-D semantic-segmentation networks (DDRNET-23-Slim, PIDNET-S and PIDNET-M) is conducted on the GOOSE off-road dataset. After a targeted data-enrichment strategy with Great Outdoor Dataset(GO) and AutoMine-SimWorld images, PIDNET-S attains 60.4 mIoU while sustaining 169 FPS on an NVIDIA® Jetson Orin AGX using TensorRT FP16, meeting the latency budget for on-board deployment. Qualitative tests on in-house video streams confirm its robustness to dust, low contrast and the presence of the truck bonnet. Considering the presence of different sensors onboard, the study extends perception to 3D by surveying mid-level camera–LiDAR fusion frameworks. Dense BEV encoders (BEVFusion) and sparse query-based detectors (CMT, MV2DFusion) are analyzed for object detection, while EPMF, UniSeg, and U2MKD are reviewed for point-cloud segmentation. The comparison exposes a clear trade-off: BEVFusion offers the highest frame rate (25 FPS) and holistic bird's eye context, while MV2DFusion delivers the best detection precision (mAP 0.78). For semantic understanding, EPMF emerges as the most suitable off-road candidate thanks to its 70 FPS throughput and reliance on widely available LiDAR labels. The results achieved by PIDNET-s, in an off-road environment, allowed deployment in a real-world vehicle using the TensorRT acceleration framework. The findings highlight that careful sensor fusion, dataset adaptation, and hardware-aware optimization are pivotal for transferring deep-learning perception from structured roads to the unpredictability of off-road environments.

LiDAR-Camera fusion approaches for perception in off-road environments.

MENABENI, MICHELE
2024/2025

Abstract

Autonomous navigation in open-pit mines and other unstructured terrains poses unique perception challenges: irregular surfaces, sparse semantic cues and harsh environmental conditions undermine camera-only pipelines that dominate urban self-driving research. This thesis explores and evaluates deep learning approaches for perception in unstructured environments; in particular considering the specific use case and deployment platform, an autonomous truck moving in a quarry, the approaches explored are camera 2D semantic segmentation and camera and LiDAR middle fusion methods. First, a comprehensive benchmark of state-of-the-art 2-D semantic-segmentation networks (DDRNET-23-Slim, PIDNET-S and PIDNET-M) is conducted on the GOOSE off-road dataset. After a targeted data-enrichment strategy with Great Outdoor Dataset(GO) and AutoMine-SimWorld images, PIDNET-S attains 60.4 mIoU while sustaining 169 FPS on an NVIDIA® Jetson Orin AGX using TensorRT FP16, meeting the latency budget for on-board deployment. Qualitative tests on in-house video streams confirm its robustness to dust, low contrast and the presence of the truck bonnet. Considering the presence of different sensors onboard, the study extends perception to 3D by surveying mid-level camera–LiDAR fusion frameworks. Dense BEV encoders (BEVFusion) and sparse query-based detectors (CMT, MV2DFusion) are analyzed for object detection, while EPMF, UniSeg, and U2MKD are reviewed for point-cloud segmentation. The comparison exposes a clear trade-off: BEVFusion offers the highest frame rate (25 FPS) and holistic bird's eye context, while MV2DFusion delivers the best detection precision (mAP 0.78). For semantic understanding, EPMF emerges as the most suitable off-road candidate thanks to its 70 FPS throughput and reliance on widely available LiDAR labels. The results achieved by PIDNET-s, in an off-road environment, allowed deployment in a real-world vehicle using the TensorRT acceleration framework. The findings highlight that careful sensor fusion, dataset adaptation, and hardware-aware optimization are pivotal for transferring deep-learning perception from structured roads to the unpredictability of off-road environments.
2024
LiDAR-Camera fusion approaches for perception in off-road environments
RGB Segmentation
3D object detection
Real-time
Lidar-camera fusion
Off-Road Perception
File in questo prodotto:
File Dimensione Formato  
Michele.Menabeni.pdf

embargo fino al 15/07/2028

Dimensione 31.82 MB
Formato Adobe PDF
31.82 MB Adobe PDF

I documenti in UNITESI sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/20.500.14251/3521