The adoption of Large Language Model (LLM) based systems is transforming the artificial intelligence landscape, with growing interest in agentic architectures capable of autonomously interacting with complex environments. However, the design of effective multi-agent systems raises fundamental questions regarding optimal architectures, communication protocols, and evaluation methodologies. This thesis addresses the problem of comparative evaluation between single-agent and multi-agent architectures in multi-turn scenarios, specifically analyzing the trade-off between architectural complexity, computational costs, and performance. The work also focuses on the integration of emerging protocols such as Model Context Protocol (MCP) and Agent-to-Agent (A2A) for the implementation of distributed systems, comparing them with traditional monolithic approaches. To conduct this analysis, MABench, a benchmarking framework derived from $\tau$-bench, has been used, enabling systematic evaluation of different agentic strategies (single-agent, supervisor-based, swarm and decentralized) in simulated user-agent interaction environments. The evaluation metrics, implemented through the DeepEval framework, include task completion, tool correctness, step efficiency, communication quality and verification thoroughness. Experimental results demonstrate that more complex architectures do not necessarily guarantee superior performance, and that the choice of optimal architecture strongly depends on the application domain and operational constraints. Specifically, distributed systems based on MCP and A2A show performance comparable to their local counterparts, opening interesting prospects for scalable enterprise deployments. This work aims to contribute to this topic by providing a reproducible evaluation techniques and practical guidelines for selecting agentic architectures and evaluation criterias in enterprise contexts.
Architectural Trade-offs in LLM-Based Agentic Systems: Complexity, Cost, and Performance in Multi-Turn Scenarios
GRANDI, ANDREA
2024/2025
Abstract
The adoption of Large Language Model (LLM) based systems is transforming the artificial intelligence landscape, with growing interest in agentic architectures capable of autonomously interacting with complex environments. However, the design of effective multi-agent systems raises fundamental questions regarding optimal architectures, communication protocols, and evaluation methodologies. This thesis addresses the problem of comparative evaluation between single-agent and multi-agent architectures in multi-turn scenarios, specifically analyzing the trade-off between architectural complexity, computational costs, and performance. The work also focuses on the integration of emerging protocols such as Model Context Protocol (MCP) and Agent-to-Agent (A2A) for the implementation of distributed systems, comparing them with traditional monolithic approaches. To conduct this analysis, MABench, a benchmarking framework derived from $\tau$-bench, has been used, enabling systematic evaluation of different agentic strategies (single-agent, supervisor-based, swarm and decentralized) in simulated user-agent interaction environments. The evaluation metrics, implemented through the DeepEval framework, include task completion, tool correctness, step efficiency, communication quality and verification thoroughness. Experimental results demonstrate that more complex architectures do not necessarily guarantee superior performance, and that the choice of optimal architecture strongly depends on the application domain and operational constraints. Specifically, distributed systems based on MCP and A2A show performance comparable to their local counterparts, opening interesting prospects for scalable enterprise deployments. This work aims to contribute to this topic by providing a reproducible evaluation techniques and practical guidelines for selecting agentic architectures and evaluation criterias in enterprise contexts.| File | Dimensione | Formato | |
|---|---|---|---|
|
Grandi.Andrea.pdf
Accesso riservato
Dimensione
5.13 MB
Formato
Adobe PDF
|
5.13 MB | Adobe PDF |
I documenti in UNITESI sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.
https://hdl.handle.net/20.500.14251/5404