Problem setting
Numerical simulations allow scientists and engineers to investigate in silico phenomena that would otherwise be too difficult or expensive to observe and study. However, high-fidelity simulations can be time- and resource-demanding, requiring significant computing resources and long simulation times. This computational burden becomes especially problematic considering repeated analysis tasks, such as optimization or uncertainty quantification (UQ). Practitioners often resort to surrogate models (also called metamodels or emulators) that approximate the input-output map of a costly simulator [1]. One common approach is to construct surrogate models from available datasets using machine learning (ML). More recently, scientific ML (SciML) surrogates that combine ML with physics-based modeling principles are getting traction [2]. Nonetheless, most data-driven surrogate models based on either pure ML or SciML produce point estimates that lack information regarding the uncertainty in their predictions. For surrogate-based estimates that may affect safety, cost, or scientific conclusions, information about predictive uncertainty is often as important as the prediction itself. This challenge is particularly relevant for data-scarce applications in physics and engineering.
Goal and tasks
This thesis will investigate conformal prediction (CP) methods [3] for providing reliable predictive UQ metrics to accompany the predictions of SciML surrogate models, see Figure 1. Methodology development will combine domain-aware algorithmic design with careful empirical evaluation, to produce predictive UQ tools that are both statistically principled and usable in data-scarce applications in physics and engineering.