Hybrid Claim Verification with Large Language Models: A Benchmark on Corporate Reports

Corporate non-financial reports are a key resource for evaluating companies’ sustainability performance and their adherence to Environmental, Social, and Governance (ESG) principles. These reports are widely consulted by investors, regulators, and stakeholders, yet their automated analysis remains highly challenging due to heterogeneous structures, specialized terminology, and the coexistence of text with complex tables. To address these issues, the thesis introduces two benchmark datasets designed for hybrid text-and-table reasoning. The first focuses on a monotable setting, where claims are verified against a single table and its accompanying text. The second extends this framework to a multitable scenario, involving up to five interdependent tables whose values and relationships must be jointly considered. To validate the relevance of the proposed benchmarks, an evaluation was conducted using state-of-the-art Large Language Models (LLMs), including GPT-o4 mini, Qwen, and LLaMA. This evaluation highlighted that performance is modest even in the monotable setting and drops substantially when reasoning across multiple linked tables. These results emphasize both the complexity of claim verification in non-financial reports and the importance of the proposed datasets as a foundation for advancing research in hybrid reasoning.

Hybrid Claim Verification with Large Language Models: A Benchmark on Corporate Reports

BRUNELLI, SIMONE

2024/2025

Abstract

Corporate non-financial reports are a key resource for evaluating companies’ sustainability performance and their adherence to Environmental, Social, and Governance (ESG) principles. These reports are widely consulted by investors, regulators, and stakeholders, yet their automated analysis remains highly challenging due to heterogeneous structures, specialized terminology, and the coexistence of text with complex tables. To address these issues, the thesis introduces two benchmark datasets designed for hybrid text-and-table reasoning. The first focuses on a monotable setting, where claims are verified against a single table and its accompanying text. The second extends this framework to a multitable scenario, involving up to five interdependent tables whose values and relationships must be jointly considered. To validate the relevance of the proposed benchmarks, an evaluation was conducted using state-of-the-art Large Language Models (LLMs), including GPT-o4 mini, Qwen, and LLaMA. This evaluation highlighted that performance is modest even in the monotable setting and drops substantially when reasoning across multiple linked tables. These results emphasize both the complexity of claim verification in non-financial reports and the importance of the proposed datasets as a foundation for advancing research in hybrid reasoning.

Scheda breve

Scheda completa

Scheda completa (DC)

	Facoltà/Dipartimento
	
				Dipartimento di Ingegneria "Enzo Ferrari"
			
	Corso di studio
	
				Ingegneria informatica
			
	Anno Accademico
	
				2024
			
	Parola chiave
	
				Claim verification
Fact checking
Large language model
ESG
Tabular reasoning
			
	Relatore
	
				GUERRA, FRANCESCO
PAGANELLI, MATTEO
			
	Appare nelle tipologie:
	
				Lauree Magistrali

File in questo prodotto:

File	Dimensione	Formato
Brunelli.Simone.pdf embargo fino al 02/12/2026 Dimensione 14.5 MB Formato Adobe PDF	14.5 MB	Adobe PDF

I documenti in UNITESI sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/20.500.14251/3934