Enhancing Retrieval-Augmented Generation Systems with Advanced Retrieval Control and Agentic Behavior

In this thesis, we explore advanced techniques of RAG (Retrieval Augmented Generation) to enhance the performance of LLMs (Large Language Models) in question answering. Large Language Models have represented one of the most significant breakthroughs in the field of Artificial Intelligence in recent years. Their ability to process and generate human-like text enables natural, back-and-forth interactions with users, making it possible to engage in coherent and context-aware conversations. Despite their capabilities, the answers given by the LLMs may be subject to hallucinations. Hallucinations are a well-known phenomenon in which a Large Language Model (LLM) fails to acknowledge its lack of knowledge and instead produces false information. To reduce hallucinations and have more domain-specific responses, Retrieval Augmented Generation (RAG) can be applied as a technique. RAG implies having a separate model, called retriever, that searches information from external knowledge bases that should be useful to answer the user query. Once the information is retrieved, it is typically inserted into the LLM context, so the LLM can generate the response with the retrieved information in its context window. This thesis aims to explore different advanced RAG techniques in which the retrieved information undergoes different evaluation steps before being used to generate the answer. Our approach is based on the construction of a general pipeline in which different agents interact with each other. An agent is a Large Language Model (LLM) configured through specific prompts to address a given task, with instructions to generate an output following a predefined schema that includes the required fields. The work will mainly focus on the integration of in-the-loop evaluations that different agents perform, before the final answer is given to the user. Different pipeline configurations are analyzed to compare how performance increases together with the complexity and latency of the pipeline. Each configuration will contain an additional evaluation step to gain more control over the data that is passed between the agents and to the final output. A significant part of the work is focused on the evaluation of the different pipelines. The evaluations will be carried out both by exact match/accuracy metrics and by llm-as-a-judge metrics to evaluate the correctness of the responses against a target response. The evaluation test set are taken from the literature and generated. Finally, for mainly implementation purposes, the Model Context Protocol (MCP) is used to achieve a fully integrated system in which a specialized agent is created for each knowledge base uploaded by the user, and the agent is exposed through the MCP protocol.

Enhancing Retrieval-Augmented Generation Systems with Advanced Retrieval Control and Agentic Behavior

MORINI, MARCO

2024/2025

Abstract

In this thesis, we explore advanced techniques of RAG (Retrieval Augmented Generation) to enhance the performance of LLMs (Large Language Models) in question answering. Large Language Models have represented one of the most significant breakthroughs in the field of Artificial Intelligence in recent years. Their ability to process and generate human-like text enables natural, back-and-forth interactions with users, making it possible to engage in coherent and context-aware conversations. Despite their capabilities, the answers given by the LLMs may be subject to hallucinations. Hallucinations are a well-known phenomenon in which a Large Language Model (LLM) fails to acknowledge its lack of knowledge and instead produces false information. To reduce hallucinations and have more domain-specific responses, Retrieval Augmented Generation (RAG) can be applied as a technique. RAG implies having a separate model, called retriever, that searches information from external knowledge bases that should be useful to answer the user query. Once the information is retrieved, it is typically inserted into the LLM context, so the LLM can generate the response with the retrieved information in its context window. This thesis aims to explore different advanced RAG techniques in which the retrieved information undergoes different evaluation steps before being used to generate the answer. Our approach is based on the construction of a general pipeline in which different agents interact with each other. An agent is a Large Language Model (LLM) configured through specific prompts to address a given task, with instructions to generate an output following a predefined schema that includes the required fields. The work will mainly focus on the integration of in-the-loop evaluations that different agents perform, before the final answer is given to the user. Different pipeline configurations are analyzed to compare how performance increases together with the complexity and latency of the pipeline. Each configuration will contain an additional evaluation step to gain more control over the data that is passed between the agents and to the final output. A significant part of the work is focused on the evaluation of the different pipelines. The evaluations will be carried out both by exact match/accuracy metrics and by llm-as-a-judge metrics to evaluate the correctness of the responses against a target response. The evaluation test set are taken from the literature and generated. Finally, for mainly implementation purposes, the Model Context Protocol (MCP) is used to achieve a fully integrated system in which a specialized agent is created for each knowledge base uploaded by the user, and the agent is exposed through the MCP protocol.

Scheda breve

Scheda completa

Scheda completa (DC)

	Facoltà/Dipartimento
	
				Dipartimento di Ingegneria "Enzo Ferrari"
			
	Corso di studio
	
				Artificial intelligence engineering
			
	Anno Accademico
	
				2024
			
	Parola chiave
	
				AI
LLM
RAG
Agents
MCP
			
	Relatore
	
				BARALDI, LORENZO
			
	Controrelatore
	
				CORNIA, MARCELLA
			
	Appare nelle tipologie:
	
				Lauree Magistrali

File in questo prodotto:

File	Dimensione	Formato
Morini.Marco.pdf accesso aperto Dimensione 1.02 MB Formato Adobe PDF Visualizza/Apri	1.02 MB	Adobe PDF	Visualizza/Apri

I documenti in UNITESI sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/20.500.14251/3650