Evolutionary Intelligence Seminars 2022

Overview of the seminars in 2022

Date: 14 June 2022

Sanne Abeln
Title: Can we predict the biological impact of a genomic variant in a tumor sample?

Abstract: Cancer cells typically display some type of genomic instability, causing many changes to the genome in their offspring. These changes can for example be mutations, deletion, gains, or translocations. Typically, these events occur randomly. Nevertheless, some of these alterations may have a profound impact on the cell biology. It is difficult to know, which alterations can serve as a biomarker for disease progression and treatment response. Currently, the only method at hand is enrichment of the alteration over many patients. Here, we show how we can also show biological impact of such events by considering gene expression profiles. Effectively showing a preselection method for clinically relevant genomic biomarkers.

Date: 17 May 2022

Monika Grewal
Title: Uncertainty Estimation to Learn Effectively from Clinically Available Data for Organs at Risk Segmentation

Abstract: Epistemic uncertainty refers to the lack of knowledge in a model about the underlying data. Epistemic uncertainty is higher for out-of-domain samples or in other words the samples for which making a prediction is difficult (hard examples). In this presentation, I will discuss how this concept can be used to effectively learn a deep learning model for automatic segmentation of organs at risk from a large dataset of Computed Tomography (CT) scans containing class imbalance and missing annotations. I will share the basic idea, preliminary results, and challenges.

Dazhuang Liu
Title: Probability distribution based constants optimization in symbolic regression

Abstract: Symbolic regression (SR) is a method to discover machine learning models from data that are mathematical equations. In fact, these models are made by composing operators (+, -, * and /), variables (features of the dataset), and constants (0.1, 10, e, Pi). Since SR models are composed of commonly used mathematical operations, they are well suited for human interpretation (important for explainable AI). However, SR is hard because both the structure of the model (i.e., how to compose the operations) and the constant values must be optimized at the same time. Several works concerning this topic have been proposed in the literature, such as combining evolutionary algorithms with gradient descent. However, existing approaches have several limitations. To solve this problem in an efficient and effective way, we are working on a new evolutionary algorithm (EA) to jointly optimize the structure and the constants at the same time. Namely, our approach works by clustering together multiple “structures” that may benefit from similar constants from the population of the EA. Then, for each cluster, a different probability distribution (PD) is modelled and used to sample new promising constant values. Early experiment suggest that the approach is very promising.

Date: 19 April 2022

Leah Dickhoff
Title: Automated optimization for cervix brachytherapy requires more then the EMBRACE-II planning aims

Abstract: A bi-objective model was introduced for MO-RV-GOMEA-based optimization for prostate HDR brachytherapy, intuitively capturing the dose trade-off between tumour target coverage and healthy organ sparing. This approach entails direct optimization on dose volume indices specified in a clinical protocol. We studied its immediate extension to cervical cancer brachytherapy by optimizing on planning aims from the officially recommended EMBRACE-II protocol. Results were judged as clinically unsatisfactory by medical specialists because of undesirable properties in the dose distributions, indicating that additional aims are needed. A first approach optimizing on both the EMBRACE-II aims as well as additionally tuned aims has been developed and indicates promising results.

Evi Sijben
Title: AI for decision support of paraganglioma treatment planning

Abstract: Paraganglioma are tumors that appear in the head and neck area. Although they are usually benign slow-growing tumors, they can cause severe complaints. These complaints are dependent on the growth of the tumor. being able to predict the future volume of the tumor would be helpful to give inevitable treatment early on but to avoid unnecessary and stressful surveillance. In this talk, methods for paraganglioma tumor measurement and knowledge about tumor growth are presented as well as possible ways to further support the decision making process with help of AI.

Date: 22 March 2022

Aleksandr Chebykin
Title: Searching for architectures and hyperparameters of neural networks

Abstract: What does it actually mean to search for architectures and hyperparameters of neural networks? In this talk I will give a general step-by-step explanation of how one solves problems via neural networks, in order to provide the necessary context to our research; having explained the existing process, I'll describe how automatic architecture & hyperparameter can potentially improve it, and what our current research direction is.

Date: 22 February 2022

Marco Virgolin
Title: Counterfactual explanations: A bird-eye view

Abstract: Counterfactual explanations (CEs) are a powerful means for understanding how decisions made by machine learning models can be changed. In particular, a CE prescribes how a user (a point in feature space x with coordinates age_x=32, gender_x=Male, salary_x=...) needs to intervene (x->x') so that the classification made by a model (f(x)="no loan") changes, typically in favor of the user (f(x')="yes loan"). Obtaining "useful" CEs is a search problem that depends on many aspects. To begin with, we must define a distance d such that x'=argmin d(x,x') will reasonably translate to minimal intervention effort in real life. Next, multiple constraints and desiderata must be considered for practical adoption, such as plausibility of intervention (age_x' - age_x > 0), causal relationships (increment in salary->increment in savings), and robustness to perturbations of different nature (e.g., to measurements in x or updates to f). Finally, availability of access to and information on the model is important: f may or may not be re-trained (e.g., to smoothen its decision boundary), expose gradients or provide only labels, be queried an unlimited or limited number of times, and so on. This multitude of important aspects contributes to establishing important links to many research areas, such as optimization (multi-objective, gradient-based or gradient-free, expensive), machine learning, and also areas concerning ethics and fairness.

Speaker: Esteban Gabory
Title: On Strings Having the Same Length-k Substrings

Abstract: Let Substr_k(X) denote the set of length-k substrings of a given string X for a given integer k>0. We study the following basic string problem, called Shortest equivalent string: Given a set S_k of n length-k strings and an integer z>0, find a shortest string T such that Substr_k(T)=S_k. The Shortest equivalent string problem arises naturally as an encoding problem in many real-world applications; e.g., in data privacy, in data compression, and in bioinformatics. We answer this problem by showing that it is equivalent to finding a shortest walk that crosses every edge of a graph, and we give an algorithm for the latter problem.

Date: 8 February 2022

Wiktor Zuba
Title: Stringology - different models and approaches

Abstract: In the talk I am going to present an overview of the results from some of the papers I co-authored with the stringology group of the University of Warsaw. The talk contains an overview of the problem of finding a cover (a smaller text whose occurrences cover all the positions of the input text) in many distinct stringological models and the differences encountered while studying them. A different result shows a use of stringology for solving a graph problem - a compression of a representation of a Hamiltonian cycle of a sigma-tau Cayley graph of permutations. The use of the compressed representation results in finding efficient ranking and unranking algorithms for the cycle. Yet another result shows a conditional hardness of a problem of finding an Abelian square (a word whose two parts contain the same multiset of letters) - that is show, that the problem (almost surely) cannot be solved in strongly subquadratic time (while most stringology papers aim in obtaining linear time solutions to problems).

Date: 25 January 2022

Georgios Andreadis
Title: Multi-Objective Dual Simplex-Mesh Based Deformable Image Registration for 3D Medical Images

Abstract: Transferring information between images with large anatomical differences is an open challenge in medical image analysis. Existing methods require extensive up-front parameter tuning and have difficulty capturing large deformations and content mismatches. Using a multi-objective optimization approach with a dual-dynamic grid transformation model has previously proven effective at overcoming these issues while also producing a diverse set of high-quality registrations for 2D images. We successfully introduce the first method for multi-objective deformable image registration for 3D images, based on a new 3D dual-dynamic grid transformation model and using the Real-Valued Gene-pool Optimal Mixing Evolutionary Algorithm (RV-GOMEA) parallelized on a GPU. Our proof-of-concept prototype shows promising results on a synthetic and clinical registration problem. This talk is based on a conference paper to be published at the SPIE’22 Medical Imaging - Image Processing conference.

Solon P Pissis
Title: Differentially Private String Sanitization for Frequency-Based Mining Tasks

Abstract: Strings are used to model genomic, natural language, and web activity data, and are thus often shared broadly. However, string data sharing has raised privacy concerns stemming from the fact that knowledge of length-k substrings of a string and their frequencies (multiplicities) may be sufficient to uniquely reconstruct the string; and from that the inference of such substrings may leak confidential information. We thus introduce the problem of protecting length-k substrings of a single string S by applying Differential Privacy (DP) while maximizing data utility for frequency-based mining tasks. Our theoretical and empirical evidence suggests that classic DP mechanisms are not suitable to address the problem. In response, we employ the order-k de Bruijn graph G of S and propose a sampling-based mechanism for enforcing DP on G. We consider the task of enforcing DP on G using our mechanism while preserving the normalized edge multiplicities in G. We define an optimization problem on integer edge weights that is central to this task and develop an algorithm based on dynamic programming to solve it exactly. We also consider two variants of this problem with real edge weights. By relaxing the constraint of integer edge weights, we are able to develop linear-time exact algorithms for these variants, which we use as stepping stones towards effective heuristics. An extensive experimental evaluation using real-world large-scale strings (in the order of billions of letters) shows that our heuristics are efficient and produce near-optimal solutions which preserve data utility for frequency-based mining tasks.
This talk is based on a conference paper presented at ICDM 2021 (joint work with Huiping Chen, Changyu Dong, Liyue Fan, Grigorios Loukides, and Leen Stougie).

Date: 11 January 2022

Arkadiy Dushatskiy
Title: Data variation-aware medical image segmentation

Abstract: Deep learning algorithms have become the golden standard for segmentation of medical imaging data. In most works, the variability and heterogeneity of real clinical data is acknowledged to still be a problem. One way to automatically overcome this is to capture and exploit this variation explicitly. Here, we propose an approach that improves on our previous work in this area and explain how it potentially can improve clinical acceptance of (semi-)automatic segmentation methods. In experiments with a real clinical dataset of CT scans with prostate segmentations, our approach provides an improvement of several percentage points in terms of Dice and surface Dice coefficients compared to a standard Deep Learning approach. Noticeably, the largest improvement occurs in the upper part of the prostate that is known to be most prone to inter-observer segmentation variation.

Michelle Sweering
Title:Making de Bruijn Graphs Eulerian via Near-Optimal Algorithms for Connecting and Balancing