This thesis presents the design, development, and evaluation of a comprehensive 360-degree RAG (Retrieval-Augmented Generation) platform engineered for enter- prise deployment. The system addresses the growing need for scalable, multi-tenant AI solutions that can be deployed either as a dedicated tenant for individual en- terprises or as a multi-client SaaS platform serving multiple organizations simulta- neously. The System provides complete RAG lifecycle management through an integrated architecture that encompasses client management, sub-client hierarchies, dynamic pipeline creation, document ingestion, and intelligent conversational interfaces. Un- like traditional RAG implementations that focus solely on retrieval and generation, this system delivers end-to-end enterprise functionality including user authentica- tion, role-based access control, real-time processing monitoring, and comprehensive administrative interfaces. The document processing architecture implements four specialized ingestion pipelines optimized for different content types and business requirements. The Mistral Pipeline (1) leverages state-of-the-art OCR for image-heavy and scanned docu- ments, the Semantic Pipeline (2) provides high-throughput processing for digital content, the Section Pipeline (3) preserves hierarchical document structures, and the GPT-4o Pipeline (4) employs large language models for complex .docx document understanding. Each pipeline maintains complete document traceability while inte- grating visual content directly into responses through automated image captioning and reference linking. The microservices-based architecture ensures modularity and deployment flexibility across different infrastructure environments. The system combines TypeScript- based orchestration services with Python-based AI processing components, en- abling independent scaling and maintenance of different functional areas. This architectural approach facilitates both on-premises deployment in client-owned in- frastructure and cloud-based multi-tenant operations. A comprehensive frontend management system provides no-code administration capabilities, enabling non-technical users to configure processing pipelines, manage document collections, create client hierarchies, and monitor system performance through intuitive interfaces. The intelligent chat interface seamlessly integrates document references and visual content into conversational responses, providing users with complete context and source attribution. The evaluation methodology focuses on production metrics rather than academic benchmarks, assessing the sys- tem through actual deployment across legal and financial sector clients. The complete RAG workflow maintains document-to-response traceability by pre- serving original document references, enabling precise page-level citations, and au- tomatically incorporating relevant visual content into generated responses.
Development of a Distributed Retrieval Augmented Generation System with Multi-Client Orchestration
REGGIANINI, GIACOMO
2024/2025
Abstract
This thesis presents the design, development, and evaluation of a comprehensive 360-degree RAG (Retrieval-Augmented Generation) platform engineered for enter- prise deployment. The system addresses the growing need for scalable, multi-tenant AI solutions that can be deployed either as a dedicated tenant for individual en- terprises or as a multi-client SaaS platform serving multiple organizations simulta- neously. The System provides complete RAG lifecycle management through an integrated architecture that encompasses client management, sub-client hierarchies, dynamic pipeline creation, document ingestion, and intelligent conversational interfaces. Un- like traditional RAG implementations that focus solely on retrieval and generation, this system delivers end-to-end enterprise functionality including user authentica- tion, role-based access control, real-time processing monitoring, and comprehensive administrative interfaces. The document processing architecture implements four specialized ingestion pipelines optimized for different content types and business requirements. The Mistral Pipeline (1) leverages state-of-the-art OCR for image-heavy and scanned docu- ments, the Semantic Pipeline (2) provides high-throughput processing for digital content, the Section Pipeline (3) preserves hierarchical document structures, and the GPT-4o Pipeline (4) employs large language models for complex .docx document understanding. Each pipeline maintains complete document traceability while inte- grating visual content directly into responses through automated image captioning and reference linking. The microservices-based architecture ensures modularity and deployment flexibility across different infrastructure environments. The system combines TypeScript- based orchestration services with Python-based AI processing components, en- abling independent scaling and maintenance of different functional areas. This architectural approach facilitates both on-premises deployment in client-owned in- frastructure and cloud-based multi-tenant operations. A comprehensive frontend management system provides no-code administration capabilities, enabling non-technical users to configure processing pipelines, manage document collections, create client hierarchies, and monitor system performance through intuitive interfaces. The intelligent chat interface seamlessly integrates document references and visual content into conversational responses, providing users with complete context and source attribution. The evaluation methodology focuses on production metrics rather than academic benchmarks, assessing the sys- tem through actual deployment across legal and financial sector clients. The complete RAG workflow maintains document-to-response traceability by pre- serving original document references, enabling precise page-level citations, and au- tomatically incorporating relevant visual content into generated responses.| File | Dimensione | Formato | |
|---|---|---|---|
|
Reggianini.Giacomo.pdf
Accesso riservato
Dimensione
3.72 MB
Formato
Adobe PDF
|
3.72 MB | Adobe PDF |
I documenti in UNITESI sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.
https://hdl.handle.net/20.500.14251/3738