CWI designs algorithms for the improvement of Genetic Programming

Marco Virgolin of CWI’s Life Sciences & Health group has researched ways to improve the efficiency and effectiveness of Genetic Programming (GP). He defends his thesis ‘Design and Application of Gene-Pool Optimal Mixing Evolutionary Algorithms for Genetic Programming’ on Monday 6 June.

Publication date: 05-06-2020

As applications of Machine Learning increasingly spread throughout modern society, so do cases in which it results in unacceptable failures. This increasingly motivates scientists and policy makers to demand more responsible use of Machine Learning, which includes the adoption of Machine Learning models that are explainable or interpretable.

Marco Virgolin of CWI’s Life Sciences & Health group has researched ways to improve the efficiency and effectiveness of Genetic Programming (GP), a meta-heuristic inspired by natural evolution for the synthesis of programs. He defends his thesis ‘Design and Application of Gene-Pool Optimal Mixing Evolutionary Algorithms for Genetic Programming’ on Monday 6 June.  Programs can be considered as Machine Learning models in the form of sequences of human-readable and interpretable instructions. Virgolin mostly focused on situations where programs need to be sufficiently compact, so that interpretation is possible, and brought to GP mechanisms of modern model-based evolutionary algorithms that belong to the family of Gene-pool Optimal Mixing Evolutionary Algorithms (GOMEAs). This resulted in GP-GOMEA, a stochastic population-based search algorithm that operates by (1) the identification of building blocks in the form of hierarchical clusters of instruction patterns, and (2) the efficient and effective recombination of these building blocks. This has been experimentally shown to speed up the search process as compared to classical forms of GP. GP-GOMEA was shown to build interpretable machine learning models by leveraging automated feature construction, again outperforming classical GP for this task. Next to that, the benefits and limitations of other recent novel forms of GP were shown.

Marco Virgolin’s PhD research was funded through a project granted by Stichting Kinderen Kankervrij (KiKa) and as such, the research on GP-GOMEA was motivated by the use of such forms of machine learning for a clinical application. To achieve the main goal of the project, which was performed in collaboration with the Amsterdam University Medical Centers (Amsterdam UMC, location AMC), Marco studied how machine learning, and in particular GP-GOMEA, could be used to perform the reconstruction of 3D radiation dose distributions for childhood cancer survivors of whom 3D anatomical imaging at the time of their childhood treatment is not available.

Together with PhD student Ziyuan Wang from Amsterdam UMC, Marco Virgolin studied two methods that are very different from what is traditionally done in the field, i.e., the use of a 3D surrogate anatomy that is heuristically matched to features of the patient. He showed that an original and patient-specific anatomy can be created on the fly, by assembling together 3D imaging of different patients, according to machine learning predictions. Furthermore, a machine learning training method was developed and studied to produce models capable of predicting the radiation dose directly, without the need of using any 3D surrogate anatomy. For this clinical application, Marco was able to experimentally show that GP-GOMEA machine learning models could be constructed that perform competitively to models trained by other machine learning algorithms, but in addition are much more likely to be interpretable by (medical) experts.

Marco Virgolin will defend his thesis Monday 6 June online at 12.30 hrs. You are welcome to attend the defense. Details can be found here