ControlNet for Unpaired RGB-to-Thermal Image Translation Using Edge-Based Guidance

The generation of realistic synthetic thermal images is a critical challenge for vision systems operating in data-scarce environments. Thermal imaging is increasingly used in various applications, with particular importance in autonomous driving, as well as in surveillance, industrial inspection, search and rescue, and medical diagnostics. Unlike visible-spectrum imaging, thermal cameras capture infrared radiation emitted by objects, making them especially valuable in scenarios with poor illumination, adverse weather conditions, or en- vironments with smoke, fog, or dust. These properties make thermal imagery a crucial modality for developing robust perception systems in real-world scenarios where conven- tional RGB imaging may fail. Despite these advantages, the adoption of thermal imaging in machine learning pipelines is hindered by the scarcity of large, annotated datasets. This limitation motivates the ex- ploration of synthetic data generation to compensate for the lack of data. This work explores generative methods conditioned on spatial priors to synthesize realistic thermal images without requiring pixel-level alignment between thermal and RGB images, en- hancing their applicability in real-world scenarios where such alignment is often difficult to obtain. The main contribution of this work is a novel pipeline for synthetic thermal image gen- eration based on ControlNet, which outperforms existing methods in both visual fidelity and downstream object detection performance. Built on diffusion models, ControlNet en- ables precise and controllable image synthesis through explicit spatial conditioning. In this pipeline, edge maps extracted from segmentation masks produced by the Segment Any- thing Model (SAM) are used to guide generation, allowing the output images to maintain structural coherence and semantic consistency with the source content. The quality of the generated images was assessed both qualitatively and quantitatively, comparing the proposed ControlNet pipeline with two state-of-the-art baselines: an edge- guided GAN and a two-stage diffusion approach (ECDM). ControlNet demonstrated clear improvements in perceptual quality and distributional metrics compared to baseline meth- ods. Notably, it achieved a KID score of 0.0106, approximately 74% lower than the best baseline score, indicating a significantly closer statistical alignment with the distribution of real thermal images. Furthermore, the synthetic images generated with ControlNet were used to construct thermal datasets for training object detectors. Results show that a detector trained on ControlNet-generated data achieves a mean mAP@50 improvement of approximately 14% and a mean mAP@50:95 improvement of about 9% compared to detectors trained on datasets generated by the best baseline method, confirming the superior effectiveness of the proposed approach in real downstream perception tasks.

ControlNet for Unpaired RGB-to-Thermal Image Translation Using Edge-Based Guidance

CORRADI, LORENZO

2024/2025

Abstract

The generation of realistic synthetic thermal images is a critical challenge for vision systems operating in data-scarce environments. Thermal imaging is increasingly used in various applications, with particular importance in autonomous driving, as well as in surveillance, industrial inspection, search and rescue, and medical diagnostics. Unlike visible-spectrum imaging, thermal cameras capture infrared radiation emitted by objects, making them especially valuable in scenarios with poor illumination, adverse weather conditions, or en- vironments with smoke, fog, or dust. These properties make thermal imagery a crucial modality for developing robust perception systems in real-world scenarios where conven- tional RGB imaging may fail. Despite these advantages, the adoption of thermal imaging in machine learning pipelines is hindered by the scarcity of large, annotated datasets. This limitation motivates the ex- ploration of synthetic data generation to compensate for the lack of data. This work explores generative methods conditioned on spatial priors to synthesize realistic thermal images without requiring pixel-level alignment between thermal and RGB images, en- hancing their applicability in real-world scenarios where such alignment is often difficult to obtain. The main contribution of this work is a novel pipeline for synthetic thermal image gen- eration based on ControlNet, which outperforms existing methods in both visual fidelity and downstream object detection performance. Built on diffusion models, ControlNet en- ables precise and controllable image synthesis through explicit spatial conditioning. In this pipeline, edge maps extracted from segmentation masks produced by the Segment Any- thing Model (SAM) are used to guide generation, allowing the output images to maintain structural coherence and semantic consistency with the source content. The quality of the generated images was assessed both qualitatively and quantitatively, comparing the proposed ControlNet pipeline with two state-of-the-art baselines: an edge- guided GAN and a two-stage diffusion approach (ECDM). ControlNet demonstrated clear improvements in perceptual quality and distributional metrics compared to baseline meth- ods. Notably, it achieved a KID score of 0.0106, approximately 74% lower than the best baseline score, indicating a significantly closer statistical alignment with the distribution of real thermal images. Furthermore, the synthetic images generated with ControlNet were used to construct thermal datasets for training object detectors. Results show that a detector trained on ControlNet-generated data achieves a mean mAP@50 improvement of approximately 14% and a mean mAP@50:95 improvement of about 9% compared to detectors trained on datasets generated by the best baseline method, confirming the superior effectiveness of the proposed approach in real downstream perception tasks.

Scheda breve

Scheda completa

Scheda completa (DC)

	Facoltà/Dipartimento
	
				Dipartimento di Ingegneria "Enzo Ferrari"
			
	Corso di studio
	
				Artificial intelligence engineering
			
	Anno Accademico
	
				2024
			
	Parola chiave
	
				Thermal Imaging
Diffusion Models
Image generation
Object detection
ControlNet
			
	Relatore
	
				FRANCHINI, GIORGIA
BERTOGNA, MARKO
			
	Controrelatore
	
				GOVI, ELENA
VERUCCHI, MICAELA
SAPIENZA, DAVIDE
			
	Appare nelle tipologie:
	
				Lauree Magistrali

File in questo prodotto:

File	Dimensione	Formato
Corradi.Lorenzo.pdf accesso aperto Dimensione 13.22 MB Formato Adobe PDF Visualizza/Apri	13.22 MB	Adobe PDF	Visualizza/Apri

I documenti in UNITESI sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/20.500.14251/3416