Symbolic Regression Based Surrogate Modelling of a High-Fidelity Multiphysics CO2 Corrosion Model
Wednesday, April 9, 2025 9:30 AM to 10:00 AM · 30 min. (US/Central)
Presentation
Digital TransformationEmerging Topics
Information
Paper ID: C2025-00433 ABSTRACT: A high-fidelity mechanistic CO2 corrosion model developed at the University of Leeds has been implemented in the multiphyiscs software package COMSOL. The model integrates the bulk solution equilibria, interfacial mass transport and solution reactions, and surface electrochemical processes.
In this work, the high-fidelity model is randomly sampled in the 5-input dimensions using a Beta distribution, Β(α, β), with the parameters α and β set to 1/2 in order to oversample the extremes. The high-fidelity multiphysics model was sampled approximately 200,000 times and the data refined using a Deep Neural Network (DNN).
After refinement, the data was further analysed by symbolic regression genetic algorithm. Each generation from the symbolic regression was plotted on a Pareto optimization plot displaying the relationship between model complexity and the fit quality, (1 - R2). From the Pareto front, maximum quality of fit for minimal model complexity, the ‘best’ mathematical expression representing the data was chosen. Additionally, the mathematical expressions generated were reviewed subjectively for ‘physical reasonableness’ and for ‘ease of execution’ in Excel. The output from the three models were compared in a 3‑dimensional correlation diagram with good agreement between all models, R2 > 0.99.
In this work, the high-fidelity model is randomly sampled in the 5-input dimensions using a Beta distribution, Β(α, β), with the parameters α and β set to 1/2 in order to oversample the extremes. The high-fidelity multiphysics model was sampled approximately 200,000 times and the data refined using a Deep Neural Network (DNN).
After refinement, the data was further analysed by symbolic regression genetic algorithm. Each generation from the symbolic regression was plotted on a Pareto optimization plot displaying the relationship between model complexity and the fit quality, (1 - R2). From the Pareto front, maximum quality of fit for minimal model complexity, the ‘best’ mathematical expression representing the data was chosen. Additionally, the mathematical expressions generated were reviewed subjectively for ‘physical reasonableness’ and for ‘ease of execution’ in Excel. The output from the three models were compared in a 3‑dimensional correlation diagram with good agreement between all models, R2 > 0.99.
Author(s)
Richard Woollam, Michael Jones, Harvey Thompson, Ethan Proudlove, Richard Barker
Educational Track
Strategic & Emerging Technologies