In this thesis, we explore advanced techniques of RAG (Retrieval Augmented Generation) to enhance the performance of LLMs (Large Language Models) in question answering. Large Language Models have represented one of the most significant breakthroughs in the field of Artificial Intelligence in recent years. Their ability to process and generate human-like text enables natural, back-and-forth interactions with users, making it possible to engage in coherent and context-aware conversations. Despite their capabilities, the answers given by the LLMs may be subject to hallucinations. Hallucinations are a well-known phenomenon in which a Large Language Model (LLM) fails to acknowledge its lack of knowledge and instead produces false information. To reduce hallucinations and have more domain-specific responses, Retrieval Augmented Generation (RAG) can be applied as a technique. RAG implies having a separate model, called retriever, that searches information from external knowledge bases that should be useful to answer the user query. Once the information is retrieved, it is typically inserted into the LLM context, so the LLM can generate the response with the retrieved information in its context window. This thesis aims to explore different advanced RAG techniques in which the retrieved information undergoes different evaluation steps before being used to generate the answer. Our approach is based on the construction of a general pipeline in which different agents interact with each other. An agent is a Large Language Model (LLM) configured through specific prompts to address a given task, with instructions to generate an output following a predefined schema that includes the required fields. The work will mainly focus on the integration of in-the-loop evaluations that different agents perform, before the final answer is given to the user. Different pipeline configurations are analyzed to compare how performance increases together with the complexity and latency of the pipeline. Each configuration will contain an additional evaluation step to gain more control over the data that is passed between the agents and to the final output. A significant part of the work is focused on the evaluation of the different pipelines. The evaluations will be carried out both by exact match/accuracy metrics and by llm-as-a-judge metrics to evaluate the correctness of the responses against a target response. The evaluation test set are taken from the literature and generated. Finally, for mainly implementation purposes, the Model Context Protocol (MCP) is used to achieve a fully integrated system in which a specialized agent is created for each knowledge base uploaded by the user, and the agent is exposed through the MCP protocol.
Enhancing Retrieval-Augmented Generation Systems with Advanced Retrieval Control and Agentic Behavior
MORINI, MARCO
2024/2025
Abstract
In this thesis, we explore advanced techniques of RAG (Retrieval Augmented Generation) to enhance the performance of LLMs (Large Language Models) in question answering. Large Language Models have represented one of the most significant breakthroughs in the field of Artificial Intelligence in recent years. Their ability to process and generate human-like text enables natural, back-and-forth interactions with users, making it possible to engage in coherent and context-aware conversations. Despite their capabilities, the answers given by the LLMs may be subject to hallucinations. Hallucinations are a well-known phenomenon in which a Large Language Model (LLM) fails to acknowledge its lack of knowledge and instead produces false information. To reduce hallucinations and have more domain-specific responses, Retrieval Augmented Generation (RAG) can be applied as a technique. RAG implies having a separate model, called retriever, that searches information from external knowledge bases that should be useful to answer the user query. Once the information is retrieved, it is typically inserted into the LLM context, so the LLM can generate the response with the retrieved information in its context window. This thesis aims to explore different advanced RAG techniques in which the retrieved information undergoes different evaluation steps before being used to generate the answer. Our approach is based on the construction of a general pipeline in which different agents interact with each other. An agent is a Large Language Model (LLM) configured through specific prompts to address a given task, with instructions to generate an output following a predefined schema that includes the required fields. The work will mainly focus on the integration of in-the-loop evaluations that different agents perform, before the final answer is given to the user. Different pipeline configurations are analyzed to compare how performance increases together with the complexity and latency of the pipeline. Each configuration will contain an additional evaluation step to gain more control over the data that is passed between the agents and to the final output. A significant part of the work is focused on the evaluation of the different pipelines. The evaluations will be carried out both by exact match/accuracy metrics and by llm-as-a-judge metrics to evaluate the correctness of the responses against a target response. The evaluation test set are taken from the literature and generated. Finally, for mainly implementation purposes, the Model Context Protocol (MCP) is used to achieve a fully integrated system in which a specialized agent is created for each knowledge base uploaded by the user, and the agent is exposed through the MCP protocol.| File | Dimensione | Formato | |
|---|---|---|---|
|
Morini.Marco.pdf
accesso aperto
Dimensione
1.02 MB
Formato
Adobe PDF
|
1.02 MB | Adobe PDF | Visualizza/Apri |
I documenti in UNITESI sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.
https://hdl.handle.net/20.500.14251/3650