This thesis presents the design and development of an AI-based web platform aimed at automating report generation through audio analysis of video recordings. The proposed system allows users to upload videos and associate them with structured reports containing multiple types of keyword fields (basic, multi-choice, tabular, and notes). Through speech recognition and natural language analysis, the audio content is transcribed, segmented, and automatically matched with the relevant fields of the report. The architecture is structured into microservices, including components for speech transcription (based on models such as Whisper, Vosk, and Google Speech API), punctuation modeling, and keyword identification using n-gram and Levenshtein similarity algorithms. Key technologies also include React, FastAPI, RabbitMQ, Firebase, MySQL, PostgreSQL, and Google Cloud Storage. The system significantly reduces manual workload in contexts such as medical visits, audits, or construction site inspections.

AI-Driven Report Generation through Audio Analysis and Keyword Recognition

BERTACCHINI, MATTIA
2024/2025

Abstract

This thesis presents the design and development of an AI-based web platform aimed at automating report generation through audio analysis of video recordings. The proposed system allows users to upload videos and associate them with structured reports containing multiple types of keyword fields (basic, multi-choice, tabular, and notes). Through speech recognition and natural language analysis, the audio content is transcribed, segmented, and automatically matched with the relevant fields of the report. The architecture is structured into microservices, including components for speech transcription (based on models such as Whisper, Vosk, and Google Speech API), punctuation modeling, and keyword identification using n-gram and Levenshtein similarity algorithms. Key technologies also include React, FastAPI, RabbitMQ, Firebase, MySQL, PostgreSQL, and Google Cloud Storage. The system significantly reduces manual workload in contexts such as medical visits, audits, or construction site inspections.
2024
Audio Analysis
Speech Recognition
Keyword Matching
Whisper
NLP
File in questo prodotto:
File Dimensione Formato  
bertacchini.mattia.pdf

Accesso riservato

Dimensione 3.39 MB
Formato Adobe PDF
3.39 MB Adobe PDF

I documenti in UNITESI sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/20.500.14251/3364