For centuries, scientists have sought mathematical laws capable of describing and predicting physical phenomena from empirical observations. Today, this process is formalized through symbolic regression (SR), which aims to automatize this heuristic process by extracting compact and interpretable analytical expressions directly from data. Unlike black-box models such as deep learning, SR prioritizes transparency and theoretical insight, making it especially useful in physics and chemistry. However, existing implementations, often based on genetic algorithms and tree-based structures, can be computationally expensive, prone to produce overly complex formulas, affected by operator induced bias in the choice of the complexity level of the formulas, and typically lack rigorous uncertainty quantification. This thesis introduces a Bayesian extension of the SR workflow, focusing on models generated by state-of-the-art SR algorithms. The proposed framework approximates the posterior probability distribution over candidate symbolic models conditioned on the dataset. By assigning each formula to a posterior probability density, the method enables principled model selection via maximum a posteriori estimation and provides quantitative uncertainty estimates at both the model and parameter levels. The approach is applied to a dataset on sodium diffusivity in amorphous crystalline materials, a topic of relevance for solid-state electrolytes, which are a fundamental component of solid-state batteries. By combining interpretability with statistically grounded error estimation, the proposed method offers a robust tool for uncovering physically meaningful relationships in complex materials data.

Uncertainty quantification in symbolic regression for modelling ion diffusion in electrolytes

DUGONI, PAOLO
2024/2025

Abstract

For centuries, scientists have sought mathematical laws capable of describing and predicting physical phenomena from empirical observations. Today, this process is formalized through symbolic regression (SR), which aims to automatize this heuristic process by extracting compact and interpretable analytical expressions directly from data. Unlike black-box models such as deep learning, SR prioritizes transparency and theoretical insight, making it especially useful in physics and chemistry. However, existing implementations, often based on genetic algorithms and tree-based structures, can be computationally expensive, prone to produce overly complex formulas, affected by operator induced bias in the choice of the complexity level of the formulas, and typically lack rigorous uncertainty quantification. This thesis introduces a Bayesian extension of the SR workflow, focusing on models generated by state-of-the-art SR algorithms. The proposed framework approximates the posterior probability distribution over candidate symbolic models conditioned on the dataset. By assigning each formula to a posterior probability density, the method enables principled model selection via maximum a posteriori estimation and provides quantitative uncertainty estimates at both the model and parameter levels. The approach is applied to a dataset on sodium diffusivity in amorphous crystalline materials, a topic of relevance for solid-state electrolytes, which are a fundamental component of solid-state batteries. By combining interpretability with statistically grounded error estimation, the proposed method offers a robust tool for uncovering physically meaningful relationships in complex materials data.
2024
Symbolic Regression
AI
Materials for Energy
Uncertainty Quant.
Solid-State Elec.
File in questo prodotto:
File Dimensione Formato  
Dugoni.Paolo.pdf

accesso aperto

Dimensione 2.59 MB
Formato Adobe PDF
2.59 MB Adobe PDF Visualizza/Apri

I documenti in UNITESI sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/20.500.14251/5753