This thesis presents the development and optimization of deep learning methods for real-time anonymization of surgical videos, carried out during an internship at ORSI Academy, an international training center specialized in robotic and minimally invasive surgery. The project is based on the Robotic Anonymization Network (ROBAN), a deep learning model designed to automatically detect and anonymize frames recorded when the endoscopic camera is outside the body of the patient, preventing privacy violations during data sharing and live broadcasting. The experiments conducted aimed to improve performance and the generalization capability of the model while maintaining a lightweight structure suitable for real-time deployment on hardware with limited computational resources. Several methodological strategies were explored, including the replacement of the MobileNetV2 backbone with the more advanced MobileNetV3 architecture, the integration of temporal consistency mechanisms, and the incorporation of depth-based information. The adoption of MobileNetV3 enhanced the feature extraction process and reduced misclassifications. Temporal consistency was achieved through the implementation of a Hidden Markov Model, which exploited temporal correlations between consecutive frames to smooth inconsistencies and improve overall prediction stability. Additionally, monocular depth estimation was investigated, exploiting geometric differences between in-body and out-of-body scenes to enhance classification capability in visually ambiguous or transitional conditions. Finally, an exploratory study was conducted on the application of machine learning techniques for trocar transition recognition. This analysis aimed to assess whether depth-based representations could contribute to identifying the passage of the endoscopic camera through the trocar cannula. The proposed optimizations enhanced robustness and temporal stability, confirming the feasibility of deploying real-time and privacy-preserving systems for surgical video analysis and educational data sharing.

Development and Optimization of a Deep Learning Framework for the Anonymization of Surgical Videos in Robot-Assisted Minimally Invasive Surgery

SITTI BOARINI, DYLAN
2024/2025

Abstract

This thesis presents the development and optimization of deep learning methods for real-time anonymization of surgical videos, carried out during an internship at ORSI Academy, an international training center specialized in robotic and minimally invasive surgery. The project is based on the Robotic Anonymization Network (ROBAN), a deep learning model designed to automatically detect and anonymize frames recorded when the endoscopic camera is outside the body of the patient, preventing privacy violations during data sharing and live broadcasting. The experiments conducted aimed to improve performance and the generalization capability of the model while maintaining a lightweight structure suitable for real-time deployment on hardware with limited computational resources. Several methodological strategies were explored, including the replacement of the MobileNetV2 backbone with the more advanced MobileNetV3 architecture, the integration of temporal consistency mechanisms, and the incorporation of depth-based information. The adoption of MobileNetV3 enhanced the feature extraction process and reduced misclassifications. Temporal consistency was achieved through the implementation of a Hidden Markov Model, which exploited temporal correlations between consecutive frames to smooth inconsistencies and improve overall prediction stability. Additionally, monocular depth estimation was investigated, exploiting geometric differences between in-body and out-of-body scenes to enhance classification capability in visually ambiguous or transitional conditions. Finally, an exploratory study was conducted on the application of machine learning techniques for trocar transition recognition. This analysis aimed to assess whether depth-based representations could contribute to identifying the passage of the endoscopic camera through the trocar cannula. The proposed optimizations enhanced robustness and temporal stability, confirming the feasibility of deploying real-time and privacy-preserving systems for surgical video analysis and educational data sharing.
2024
video anonymization
deep learning
temporal modelling
depth estimation
hidden markov model
File in questo prodotto:
File Dimensione Formato  
Sittiboarini.Dylan.pdf.pdf

embargo fino al 01/12/2028

Dimensione 38.2 MB
Formato Adobe PDF
38.2 MB Adobe PDF

I documenti in UNITESI sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/20.500.14251/4137