Assessing Translation Quality in NMT and LLMs: DeepL, Google Translate and ChatGPT on Olympic Games Texts (1956–2026).

La rapida evoluzione dei sistemi neurali di traduzione automatica (NMT) e la recente comparsa dell’intelligenza artificiale generativa (GenAI) stanno ridefinendo radicalmente gli standard di qualità traduttiva e la figura professionale del traduttore. La presente tesi analizza in che misura queste tecnologie influenzano il processo di traduzione, concentrandosi in questo caso sulla traduzione dall’italiano (L1) all’inglese (L2). La ricerca mira a identificare le differenze tra due dei sistemi NMT più utilizzati oggi (DeepL e Google Translate) e un Large Language Model (LLM), ovvero ChatGPT, per quanto riguarda la loro affidabilità traduttiva e l'eventuale sforzo di post-editing umano da applicare agli output. Il fulcro dello studio è un'analisi linguistica comparativa di due articoli relativi alle cerimonie di apertura dei Giochi Olimpici Invernali. L'analisi mette a confronto un testo dei Giochi invernali di Milano-Cortina 2026, che è un testo pre-editato e più adatto alla traduzione automatica, e un testo storico dei Giochi invernali del 1956 di Cortina d'Ampezzo, pubblicato in un'epoca in cui i testi non venivano pre-editati per essere tradotti dai sistemi automatici. Questo paragone consente di valutare come gli NMT e gli LLM gestiscono il linguaggio standardizzato e strutture linguistiche più complesse. La metodologia utilizzata adotta un approccio di valutazione che integra sia metriche automatiche sia la valutazione umana. Nello specifico, i risultati sono stati valutati utilizzando le metriche COMET e TER per quantificare sia l'accuratezza e la fluidità, che lo sforzo di post-editing necessario. Per classificare gli errori presenti negli output e la loro gravità, è stata condotta la valutazione umana utilizzando il framework MQM (Multidimensional Quality Metrics), basato su criteri chiave quali la terminologia e l'accuratezza semantica. I risultati empirici dimostrano una discrepanza significativa nelle prestazioni dei motori. Per il testo contemporaneo sulle Olimpiadi del 2026, tutti i sistemi hanno mostrato alti livelli di accuratezza e fluidità. La sinergia tra il pre-editing e i sistemi ha ridotto significativamente lo sforzo di post-editing necessario per migliorare la qualità della traduzione. Tuttavia, anche in questi output l'intervento umano rimane essenziale per identificare e correggere alcune incongruenze linguistiche che persistono anche in questo caso. Tutti i sistemi hanno invece presentato notevoli difficoltà per quanto riguarda la traduzione dell'articolo del 1956: infatti, l'analisi ha rivelato una maggiore frequenza di gravi errori semantici e di contestualizzazione che richiedono un importante lavoro di post-editing. In conclusione, questo studio evidenzia il ruolo cruciale del pre-editing per migliorare le prestazioni dei sistemi NMT e LLM. I migliori risultati osservati nel testo contemporaneo dipendono non solo dai progressi nei meccanismi di funzionamento dei motori, ma anche dalla semplificazione linguistica del testo di partenza. Riducendo la complessità sintattica e l’ambiguità, il pre-editing crea le condizioni che facilitano i processi di traduzione automatica, migliorando la fluidità e l’accuratezza dell'output e riducendo al contempo lo sforzo di post-editing. Tuttavia, questo processo comporta uno svantaggio: la necessità di adattare i testi per la traduzione automatica può portare a una riduzione della complessità linguistica, diminuendo potenzialmente non solo la ricchezza stilistica, ma anche la profondità e le sfumature del contenuto. Pertanto, sebbene queste tecnologie siano strumenti potenti, il ruolo del traduttore professionista rimane fondamentale in quanto mediatore in grado di preservare la profondità concettuale, le variazioni linguistiche e l'intento comunicativo del testo da tradurre.

The rapid evolution of Neural Machine Translation (NMT) and the recent advent of Generative Artificial Intelligence (GenAI) are radically redefining quality standards and the professional figure of the translator. This dissertation investigates the extent to which these technologies affect the translation process, specifically focusing on the translation directionality from Italian (L1) to English (L2). The research aims to identify the differences between two of the most used NMT systems (DeepL and Google Translate) and one Large Language Model (LLM), i.e. ChatGPT, particularly regarding their reliability and the resulting workload for the professional translator (post-editing). The core of the study consists of a comparative linguistic analysis of two distinct articles from the Opening Ceremonies of the Winter Olympic Games. The analysis compares a contemporary text from the 2026 Milano-Cortina Winter Games, a more “machine translation-friendly” and pre-edited text, and a historical text from the 1956 Winter Games held in Cortina d’Ampezzo, a time when texts were not pre-edited for international spread and automatic machines. This contrast allows for an evaluation of how modern MT and LLMs systems handle standardised, contemporary language compared to more complex, traditional linguistic structures. The methodology employs a mixed-methods evaluation approach that integrates both automatic metrics and human assessment. Specifically, the outputs were assessed using the COMET (Cross-lingual Optimized Metric for Evaluation of Translation) and TER (Translation Edit Rate) metrics to quantify accuracy and fluency and the necessary post-editing effort. To further investigate these automatic measures, human evaluation was conducted using the MQM (Multidimensional Quality Metrics) framework, based on key parameters such as terminology, semantic accuracy and fluency to classify the errors found in the outputs. The empirical results demonstrate a significant discrepancy in performance based on the nature of the source text. For the contemporary 2026 Olympic text, all systems showed high levels of accuracy and fluency. The synergy between pre-editing and advanced neural architectures significantly reduced the post-editing effort, suggesting that these engines are reaching a state of elevated efficiency for standardised contemporary materials. However, even within these high-quality outputs, human intervention remains essential to identify and correct linguistic inconsistencies and subtle inaccuracies that persist despite the advanced engines. Conversely, the 1956 article presented substantial difficulties for all systems: in fact, the analysis revealed a higher frequency of serious semantic errors and contextualisation issues that require human intervention. Ultimately, this study highlights the crucial role of pre-editing in enhancing the performance of NMT systems and LLMs. The improved results observed in contemporary texts depend not only on advances in neural architectures, but also on the prior simplification and standardisation of the source text itself. By reducing syntactic complexity and ambiguity, pre-editing creates conditions that facilitate machine translation processes, improving fluency and accuracy while reducing the post-editing effort. However, this advantage entails a drawback. The need to adapt texts may lead to a reduction of linguistic complexity, potentially diminishing not only stylistic richness but also the depth and nuance of the content subject to translation. Therefore, although these technologies are powerful tools, the role of the professional translator remains essential as a mediator capable of preserving conceptual depth, linguistic variations and communicative intent of a text.