Название: Symbolic Regression Автор: Gabriel Kronberger, Bogdan Burlacu, Michael Kommenda, Stephan M. Winkler Издательство: CRC Press Год: 2025 Страниц: 308 Язык: английский Формат: pdf (true) Размер: 12.9 MB
Symbolic Regression (SR) is one of the most powerful Machine Learning (ML) techniques that produces transparent models, searching the space of mathematical expressions for a model that represents the relationship between the predictors and the dependent variable without the need of taking assumptions about the model structure. Currently, the most prevalent learning algorithms for SR are based on Genetic Programming (GP), an evolutionary algorithm inspired from the well-known principles of natural selection. This book is an in-depth guide to GP for SR, discussing its advanced techniques, as well as examples of applications in science and engineering.
The basic idea of GP is to evolve a population of solution candidates in an iterative, generational manner, by repeated application of selection, crossover, mutation, and replacement, thus allowing the model structure, coefficients, and input variables to be searched simultaneously. Given that explainability and interpretability are key elements for integrating humans into the loop of learning in Artificial Intelligence (AI), increasing the capacity for data scientists to understand internal algorithmic processes and their resultant models has beneficial implications for the learning process as a whole.
More than three decades ago, John Koza coined the term "Symbolic Regression" in his book series with which he popularized genetic programming, a nature-inspired method to evolve programs to solve algorithmic tasks. Since then symbolic regression has become almost the flagship-application for genetic programming. This goes so far that some researchers even started to use genetic programming (GP) and symbolic regression (SR) interchangeably. However, we think SR – which describes the idea of using a symbolic representation for nonlinear regression in combination with symbolic operations for fitting the model – is an important concept that is worthwhile to be treated independently, not least because of its capability to produce interpretable models. In our point of view, GP is one of many possible approaches to solve SR tasks, albeit the most popular. Consequently, we also mainly treat GP in this book.
The book consists of two larger parts; the first part is focused on SR methods and the second part on applications. In the first part we start with a brief introduction of machine learning for data-based modelling, and then give a detailed description of the mechanics of GP. Practitioners should certainly read the first chapter but may want to skip the chapter on evolutionary computation on the first reading. Several sections in that chaper discuss concepts in depth and can be interesting for readers that want to gain a deeper understanding of evolutionary computation methods.
We dedicate a large chapter to GP and the fundamentals of evolutionary computation because GP has the longest history and is so far the best tested and studied method for SR. Even though different non-evolutionary solution methods have been described more recently, GP still works very well for many datasets. Evolutionary methods are inherently parallel and can therefore be easily parallelized to multiple cores or distributed to multiple nodes on a larger cluster. However, for many SR tasks that are practically relevant, a common office computer is sufficient to run GP.
The second part of the book consists of a large chapter that is devoted to different application examples. Each section in this chapter describes a different application and can be read independently. We invite our readers to read the sections in any order and skip those examples that are less relevant to their own work.
Machine Learning (ML) is the branch of Computer Science that studies the development of methods and algorithms that can learn how to describe a system or how to perform a particular task from data. Depending on how the data is organized and structured, ML can be divided in two main subfields, called supervised and unsupervised learning. If the sample data is labeled, meaning that each data point is already mapped to a corresponding measurement known to be correct, then we are dealing with a supervised learning problem, where the goal is to develop a predictive model that can predict a system’s future behaviour from past measurements or observations. This learning process is also called fitting or training which refers to adapting model parameters so that the model matches observations. Classification, regression, and time series modelling are the principal examples of supervised learning tasks.
This book represents a practical guide for industry professionals and students across a range of disciplines, particularly Data Science, engineering, and applied mathematics. Focused on state-of-the-art SR methods and providing ready-to-use recipes, this book is especially appealing to those working with empirical or semi-analytical models in science and engineering.