Development of a Distributed Retrieval Augmented
Generation System with Multi-Client Orchestration

This thesis presents the design, development, and evaluation of a comprehensive 360-degree RAG (Retrieval-Augmented Generation) platform engineered for enter- prise deployment. The system addresses the growing need for scalable, multi-tenant AI solutions that can be deployed either as a dedicated tenant for individual en- terprises or as a multi-client SaaS platform serving multiple organizations simulta- neously. The System provides complete RAG lifecycle management through an integrated architecture that encompasses client management, sub-client hierarchies, dynamic pipeline creation, document ingestion, and intelligent conversational interfaces. Un- like traditional RAG implementations that focus solely on retrieval and generation, this system delivers end-to-end enterprise functionality including user authentica- tion, role-based access control, real-time processing monitoring, and comprehensive administrative interfaces. The document processing architecture implements four specialized ingestion pipelines optimized for different content types and business requirements. The Mistral Pipeline (1) leverages state-of-the-art OCR for image-heavy and scanned docu- ments, the Semantic Pipeline (2) provides high-throughput processing for digital content, the Section Pipeline (3) preserves hierarchical document structures, and the GPT-4o Pipeline (4) employs large language models for complex .docx document understanding. Each pipeline maintains complete document traceability while inte- grating visual content directly into responses through automated image captioning and reference linking. The microservices-based architecture ensures modularity and deployment flexibility across different infrastructure environments. The system combines TypeScript- based orchestration services with Python-based AI processing components, en- abling independent scaling and maintenance of different functional areas. This architectural approach facilitates both on-premises deployment in client-owned in- frastructure and cloud-based multi-tenant operations. A comprehensive frontend management system provides no-code administration capabilities, enabling non-technical users to configure processing pipelines, manage document collections, create client hierarchies, and monitor system performance through intuitive interfaces. The intelligent chat interface seamlessly integrates document references and visual content into conversational responses, providing users with complete context and source attribution. The evaluation methodology focuses on production metrics rather than academic benchmarks, assessing the sys- tem through actual deployment across legal and financial sector clients. The complete RAG workflow maintains document-to-response traceability by pre- serving original document references, enabling precise page-level citations, and au- tomatically incorporating relevant visual content into generated responses.

Development of a Distributed Retrieval Augmented Generation System with Multi-Client Orchestration

REGGIANINI, GIACOMO

2024/2025

Abstract

This thesis presents the design, development, and evaluation of a comprehensive 360-degree RAG (Retrieval-Augmented Generation) platform engineered for enter- prise deployment. The system addresses the growing need for scalable, multi-tenant AI solutions that can be deployed either as a dedicated tenant for individual en- terprises or as a multi-client SaaS platform serving multiple organizations simulta- neously. The System provides complete RAG lifecycle management through an integrated architecture that encompasses client management, sub-client hierarchies, dynamic pipeline creation, document ingestion, and intelligent conversational interfaces. Un- like traditional RAG implementations that focus solely on retrieval and generation, this system delivers end-to-end enterprise functionality including user authentica- tion, role-based access control, real-time processing monitoring, and comprehensive administrative interfaces. The document processing architecture implements four specialized ingestion pipelines optimized for different content types and business requirements. The Mistral Pipeline (1) leverages state-of-the-art OCR for image-heavy and scanned docu- ments, the Semantic Pipeline (2) provides high-throughput processing for digital content, the Section Pipeline (3) preserves hierarchical document structures, and the GPT-4o Pipeline (4) employs large language models for complex .docx document understanding. Each pipeline maintains complete document traceability while inte- grating visual content directly into responses through automated image captioning and reference linking. The microservices-based architecture ensures modularity and deployment flexibility across different infrastructure environments. The system combines TypeScript- based orchestration services with Python-based AI processing components, en- abling independent scaling and maintenance of different functional areas. This architectural approach facilitates both on-premises deployment in client-owned in- frastructure and cloud-based multi-tenant operations. A comprehensive frontend management system provides no-code administration capabilities, enabling non-technical users to configure processing pipelines, manage document collections, create client hierarchies, and monitor system performance through intuitive interfaces. The intelligent chat interface seamlessly integrates document references and visual content into conversational responses, providing users with complete context and source attribution. The evaluation methodology focuses on production metrics rather than academic benchmarks, assessing the sys- tem through actual deployment across legal and financial sector clients. The complete RAG workflow maintains document-to-response traceability by pre- serving original document references, enabling precise page-level citations, and au- tomatically incorporating relevant visual content into generated responses.

Scheda breve

Scheda completa

Scheda completa (DC)

	Facoltà/Dipartimento
	
				Dipartimento di Ingegneria "Enzo Ferrari"
			
	Corso di studio
	
				Artificial intelligence engineering
			
	Anno Accademico
	
				2024
			
	Parola chiave
	
				Retrieval
Generation
AI
RAG
LLM
			
	Relatore
	
				SIMONINI, GIOVANNI
			
	Appare nelle tipologie:
	
				Lauree Magistrali

File in questo prodotto:

File	Dimensione	Formato
Reggianini.Giacomo.pdf Accesso riservato Dimensione 3.72 MB Formato Adobe PDF	3.72 MB	Adobe PDF

I documenti in UNITESI sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/20.500.14251/3738