AI-Driven Report Generation through Audio Analysis and Keyword Recognition

This thesis presents the design and development of an AI-based web platform aimed at automating report generation through audio analysis of video recordings. The proposed system allows users to upload videos and associate them with structured reports containing multiple types of keyword fields (basic, multi-choice, tabular, and notes). Through speech recognition and natural language analysis, the audio content is transcribed, segmented, and automatically matched with the relevant fields of the report. The architecture is structured into microservices, including components for speech transcription (based on models such as Whisper, Vosk, and Google Speech API), punctuation modeling, and keyword identification using n-gram and Levenshtein similarity algorithms. Key technologies also include React, FastAPI, RabbitMQ, Firebase, MySQL, PostgreSQL, and Google Cloud Storage. The system significantly reduces manual workload in contexts such as medical visits, audits, or construction site inspections.

AI-Driven Report Generation through Audio Analysis and Keyword Recognition

BERTACCHINI, MATTIA

2024/2025

Abstract

This thesis presents the design and development of an AI-based web platform aimed at automating report generation through audio analysis of video recordings. The proposed system allows users to upload videos and associate them with structured reports containing multiple types of keyword fields (basic, multi-choice, tabular, and notes). Through speech recognition and natural language analysis, the audio content is transcribed, segmented, and automatically matched with the relevant fields of the report. The architecture is structured into microservices, including components for speech transcription (based on models such as Whisper, Vosk, and Google Speech API), punctuation modeling, and keyword identification using n-gram and Levenshtein similarity algorithms. Key technologies also include React, FastAPI, RabbitMQ, Firebase, MySQL, PostgreSQL, and Google Cloud Storage. The system significantly reduces manual workload in contexts such as medical visits, audits, or construction site inspections.

Scheda breve

Scheda completa

Scheda completa (DC)

	Facoltà/Dipartimento
	
				Dipartimento di Ingegneria "Enzo Ferrari"
			
	Corso di studio
	
				Artificial intelligence engineering
			
	Anno Accademico
	
				2024
			
	Parola chiave
	
				Audio Analysis
Speech Recognition
Keyword Matching
Whisper
NLP
			
	Relatore
	
				CALDERARA, SIMONE
			
	Appare nelle tipologie:
	
				Lauree Magistrali

File in questo prodotto:

File	Dimensione	Formato
bertacchini.mattia.pdf Accesso riservato Dimensione 3.39 MB Formato Adobe PDF	3.39 MB	Adobe PDF

I documenti in UNITESI sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/20.500.14251/3364