Development and Optimization of a Deep Learning Framework for the Anonymization of Surgical Videos in Robot-Assisted Minimally Invasive Surgery

This thesis presents the development and optimization of deep learning methods for real-time anonymization of surgical videos, carried out during an internship at ORSI Academy, an international training center specialized in robotic and minimally invasive surgery. The project is based on the Robotic Anonymization Network (ROBAN), a deep learning model designed to automatically detect and anonymize frames recorded when the endoscopic camera is outside the body of the patient, preventing privacy violations during data sharing and live broadcasting. The experiments conducted aimed to improve performance and the generalization capability of the model while maintaining a lightweight structure suitable for real-time deployment on hardware with limited computational resources. Several methodological strategies were explored, including the replacement of the MobileNetV2 backbone with the more advanced MobileNetV3 architecture, the integration of temporal consistency mechanisms, and the incorporation of depth-based information. The adoption of MobileNetV3 enhanced the feature extraction process and reduced misclassifications. Temporal consistency was achieved through the implementation of a Hidden Markov Model, which exploited temporal correlations between consecutive frames to smooth inconsistencies and improve overall prediction stability. Additionally, monocular depth estimation was investigated, exploiting geometric differences between in-body and out-of-body scenes to enhance classification capability in visually ambiguous or transitional conditions. Finally, an exploratory study was conducted on the application of machine learning techniques for trocar transition recognition. This analysis aimed to assess whether depth-based representations could contribute to identifying the passage of the endoscopic camera through the trocar cannula. The proposed optimizations enhanced robustness and temporal stability, confirming the feasibility of deploying real-time and privacy-preserving systems for surgical video analysis and educational data sharing.

Development and Optimization of a Deep Learning Framework for the Anonymization of Surgical Videos in Robot-Assisted Minimally Invasive Surgery

SITTI BOARINI, DYLAN

2024/2025

Abstract

This thesis presents the development and optimization of deep learning methods for real-time anonymization of surgical videos, carried out during an internship at ORSI Academy, an international training center specialized in robotic and minimally invasive surgery. The project is based on the Robotic Anonymization Network (ROBAN), a deep learning model designed to automatically detect and anonymize frames recorded when the endoscopic camera is outside the body of the patient, preventing privacy violations during data sharing and live broadcasting. The experiments conducted aimed to improve performance and the generalization capability of the model while maintaining a lightweight structure suitable for real-time deployment on hardware with limited computational resources. Several methodological strategies were explored, including the replacement of the MobileNetV2 backbone with the more advanced MobileNetV3 architecture, the integration of temporal consistency mechanisms, and the incorporation of depth-based information. The adoption of MobileNetV3 enhanced the feature extraction process and reduced misclassifications. Temporal consistency was achieved through the implementation of a Hidden Markov Model, which exploited temporal correlations between consecutive frames to smooth inconsistencies and improve overall prediction stability. Additionally, monocular depth estimation was investigated, exploiting geometric differences between in-body and out-of-body scenes to enhance classification capability in visually ambiguous or transitional conditions. Finally, an exploratory study was conducted on the application of machine learning techniques for trocar transition recognition. This analysis aimed to assess whether depth-based representations could contribute to identifying the passage of the endoscopic camera through the trocar cannula. The proposed optimizations enhanced robustness and temporal stability, confirming the feasibility of deploying real-time and privacy-preserving systems for surgical video analysis and educational data sharing.

Scheda breve

Scheda completa

Scheda completa (DC)

	Facoltà/Dipartimento
	
				Dipartimento di Ingegneria "Enzo Ferrari"
			
	Corso di studio
	
				Ingegneria informatica
			
	Anno Accademico
	
				2024
			
	Parola chiave
	
				video anonymization
deep learning
temporal modelling
depth estimation
hidden markov model
			
	Relatore
	
				FERRAGUTI, FEDERICA
			
	Controrelatore
	
				DE BACKER, PIETER
			
	Appare nelle tipologie:
	
				Lauree Magistrali

File in questo prodotto:

File	Dimensione	Formato
Sittiboarini.Dylan.pdf.pdf embargo fino al 01/12/2028 Dimensione 38.2 MB Formato Adobe PDF	38.2 MB	Adobe PDF

I documenti in UNITESI sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/20.500.14251/4137