Life Sciences and Health Seminar

Life Sciences and Health Seminar


2021/11/02        Aleksandr   Solon
2021/11/16        Leen             Joe
2021/11/30        Arthur        Anton
2021/12/14        Arkadiy     



Date: 19 October 2021

Speaker: Riccardo Guidotti
Title: Explaining Explanation Methods

Abstract: The most effective Artificial Intelligence (AI) systems exploit complex machine learning models to fulfil their tasks due to their high performance. Unfortunately, the most effective machine learning models use for their decision processes a logic not understandable from humans that makes them real black-box models. The lack of transparency on how AI systems make decisions is a clear limitation in their adoption in safety-critical and socially sensitive contexts. Consequently, since the applications in which AI are employed are various, research in eXplainable AI (XAI) has recently caught much attention, with specific distinct requirements for different types of explanations for different users. In this webinar, we briefly present the existing explanation problems, the main strategies adopted to solve them, and the most common types of explanations are illustrated with references to state-of-the-art explanation methods able to retrieve them.
Short bio: Riccardo Guidotti is Assistant Professor at University of Pisa. Riccardo Guidotti was born in 1988 in Pitigliano (GR) Italy. In 2013 and 2010 he graduated cum laude in Computer Science (MS and BS) at University of Pisa. He received the PhD in Computer Science with a thesis on Personal Data Analytics in the same institution. He is currently an Assistant Professor at the Department of Computer Science University of Pisa, Italy and a member of the Knowledge Discovery and Data Mining Laboratory (KDDLab), a joint research group with the Information Science and Technology Institute of the National Research Council in Pisa. He won the IBM fellowship program and has been an intern in IBM Research Dublin, Ireland in 2015. He also won the DSAA New Generation Data Scientist Award 2018. His research interests are in explainable artificial intelligence, interpretable machine learning, quantum computing, fairness and bias detection, data generation and causal models, personal data mining, clustering, analysis of transactional data.

Date: 5 October 2021

Speaker: Sanne Abeln
Title: Bioinformatics: from amyloid formation to oncology

Abstract: In our research group at the Computer Science Department of the VU we look at topics ranging from molecular dynamics simulations to large scale omics analysis with machine learning, most of it applied to the health domains of neurodegenerative disease and oncology. In this talk I will give an overview of current results on multi-task learning for protein structural features, simulations that show the role of hydrophobicity in amyloid formation, a machine learning based breakpoint analysis for colorectal cancer and a simple model for prediction of proteins in extracellular vesicles.

Speaker: Leah Dickhoff
Title: Adaptive optimization for bi-objective treatment planning in cervical cancer brachytherapy

Abstract: The previously developed bi-objective optimization method using the Real-Valued Gene-pool Optimal Mixing Evolutionary Algorithm (RV-GOMEA) for prostate high-dose-rate brachytherapy (BT) has been extended to cervical cancer BT. This model directly optimizes on dose volume indices (DVIs), which describe specific maximum or minimum dose to subvolume relationships for each of the targets and organs at risk. Discussions with medical specialists have revealed that optimizing solely on the DVIs from the ESTRO-recommended protocol EMBRACE II does not suffice to obtain clinically acceptable treatment plans. Therefore, additional (potentially hospital-specific) DVIs need to be added to the objective functions. Because of interpatient variations in organ, target and applicator geometry, patient-specific aspiration values are required for these added DVIs. This entails the need for an adaptive optimization method, possibly incorporating different priorities in the objectives.

Date: 21 September 2021

Speaker: Giulia Bernardini
Title: Incomplete Directed Perfect Phylogeny in Linear Time

Abstract: Reconstructing the evolutionary history of a set of species is a central task in computational biology. In real data, it is often the case that some information is missing: the Incomplete Directed Perfect Phylogeny problem asks, given a collection of species described by a set of binary characters with some unknown states, to complete the missing states in such a way that the result can be explained with a perfect directed phylogeny. Pe'er et al. proposed a solution that takes quasilinear time. Their algorithm relies on pre-existing dynamic connectivity data structures: a computational study recently conducted by Fernàndez-Baca and Liu showed that, in this context, complex data structures perform worse than simpler ones with worse asymptotic bounds. This gives us the motivation to look into the particular properties of the dynamic connectivity problem in this setting, so as to avoid the use of sophisticated data structures as a blackbox. Not only are we successful in doing so, and give a much simpler quasilinear-time algorithm for the Incomplete Directed Perfect Phylogeny problem; our insights into the specific structure of the problem lead to an asymptotically faster algorithm, that runs in optimal linear time.

Speaker: Cedric Rodriguez
Title: Bio-mechanical modeling of large deformations of the uterus and bladder between External Beam Radiotherapy and Brachytherapy for cervical cancer

Abstract: The validation of Deformable Image Registration (DIR) algorithms within the radiotherapy field is challenging due to the missing ground truth in medical imaging. Therefore, we are trying to create a virtual phantom of the abdominal region to perform controlled bio-mechanical deformations. Our goal is to simulate the expected tissue deformation of the uterus, cervix, vagina and bladder due to the applicator insertion between External Beam Radiotherapy (EBRT) and Brachytherapy (BT) for cervical cancer. Our approach consists of the generation of volumetric models of the organ at risk (OARs) using the initial diagnostic MR scans. Thereafter, a Finite Element Method will simulate the interaction between the applicator and the OARs to predict the final organ state before BT. This approach will result in ground truth deformation vector fields that could be used to create augmented medical imaging for the validation of state-of-the-art DIR algorithms for radiotherapy.

Date: 7 September 2021

Speaker: Prof. Marcel van Herk, Chair in Radiotherapy Physics, The University of Manchester
Title: On software development for medical applications

Date: 27 July 2021

Speaker: Arthur Guijt
Title: Improving GOMEA for Neural Architecture Search by ‘Kernelization’

Abstract: Currently GOMEA for discrete spaces performs its search in a mostly global fashion, learning a single linkage model per population, and visiting each solution exactly once a generation to perform GOM. Within the context of Neural Architecture Search these traits have some notable downsides. The aim is to do get rid of this generational lockstep by ‘kernelization’, letting each solution be its own ‘individual’ that uses similarity & locality to improve itself, for example by learning its own linkage model. I display some initial results obtained by running GOMEA in this fashion, showing that a performance increase can be obtained in certain settings.

Date: 13 July 2021

Speaker: Damy Ha
Title: Hybridizing UHV-GOMEA with UHV-ADAM for real-valued multi objective optimization

Abstract: In real-valued single objective optimization, algorithms that make use of gradient information have shown to very efficient in finding the optimal solution if the problem has few local optima. Despite evolutionary algorithms being known for their ability to avoid converging to local optima, problems that have many local optima are generally still solved faster by so called hybrid evolutionary algorithms (which employ gradient information) compared to pure evolutionary algorithms. This faster convergence of hybrid evolutionary algorithms has not been shown for multi objective problems yet. Recently a new technique has been created that redefines multi objective problems to single objective problems. This brings back the question if using this new technique allows a hybrid evolutionary algorithm to converge faster to the optimal solution than a pure evolutionary algorithm. This work hybridizes the evolutionary algorithm UHV-GOMEA with the gradient technique UHV-ADAM and investigates its performance.

Date: 29 June 2021

Speaker: Martijn Bosma
Title: The Effect of Performance Estimation on Evolutionary Neural Architecture Search for Medical Image Segmentation

Abstract: In order to automate MRI/CT scan interpretation, accurate organ segmentation ML is needed. As organs have different characteristics (volume, density, clear boundaries), using Neural Architecture Search to create neural networks tailored to a specific task, can help in obtaining accurate segmentations. In this research, a novel search space is designed and evaluated. The performance of the obtained networks is compared to other SOTA neural networks. Next to these results, an analysis is done on the effect of performance estimation on evolutionary search algorithms in terms of accuracy and speed-up. This analysis demonstrates that performance estimation at different Spearman correlation levels can deteriorate the accuracy of a NAS algorithm, for similar run times.

Date: 15 June 2021

Speaker: Joost Commandeur
Title: Improving the homogeneity (reducing hot spots) of treatment plans for high dose rate brachytherapy as generated by BRIGHT (MO-RV-GOMEA)

Abstract: BRIGHT (an implementation of Multi Objective Real Valued Gene-Pool Optimal Mixing) has been in use to create brachytherapy treatment plans for patients with prostate cancer since the spring of 2020. A treatment plan consist of so-called 'dwell times', which is the duration that a radioactive source stays in a single 'dwell position' within a catheter (hollow needle) that has been placed inside the body. The goal of such a treatment plan is to deliver the prescribed radioactive dose to the target organs (i.a. prostate), whilst sparing surrounding organs and tissue (i.a. bladder). This multi-objective goal is formulated in a clinical protocol which states the prescribed upper and lower bounds for the different organs involved. In practice it has been noticed that even though the plans as generated by BRIGHT adhere to the clinical protocol, the clinical experts still adjust the resulting plans. In our research we try to tackle one of the aspects that requires adjustment, which is the homogeneity of the delivered radioactive dose by adjusting BRIGHT.

Date: 1 June 2021

Speaker: Evi Sijben
Title: Diverse high quality solutions using multi-tree multi-objective gene pool optimal mixing evolutionary algorithms

Abstract: Being able to find diverse high quality solutions in machine learning can have many purposes. One such example is to compute uncertainty estimates. Here we assume that the more aligned the different models are the more certain we are about the validity of the output. We search for multiple models, using a multi-tree representation. These multi-tree models are evaluated on multiple objectives, namely diversity between trees and quality of all trees. By doing so, we integrate into the search process a desirable characteristic of human explanatory cognition; exploring alternative hypotheses to explain observations.

Speaker: Arkadiy Dushatskiy
Title: A Novel Surrogate-assisted Evolutionary Algorithm Applied to Partition-based Ensemble Learning

Abstract: We propose a novel surrogate-assisted Evolutionary Algorithm for solving expensive combinatorial optimization problems. We integrate a surrogate model, which is used for fitness value estimation, into a state-of-the-art P3-like variant of the Gene-Pool Optimal Mixing Evolutionary Algorithm (GOMEA) and adapt the resulting algorithm for solving non-binary combinatorial problems. We test the proposed algorithm on an ensemble learning problem. In our experiments we use five classification datasets from the OpenML-CC18 benchmark and Support-vector Machines as learners in an ensemble. The proposed algorithm demonstrates better performance than alternative approaches, including Bayesian optimization algorithms. It manages to find better solutions using just several thousand fitness function evaluations for an ensemble learning problem with up to 500 variables.

Date: 18 May 2021

Speaker: Leen Stougie
Title: An Approximation Algorithm for a Phylogenetic Network Problem

Abstract: Often different parts of the genome give rise to different phylogenetic relationships in the form of trees. This is usually not a matter of inexact data, but a true representation of the phenomenon that there may have occurred hybridization between subspecies. This should rather be displayed by a network than by a tree. Therefore rather recently various models for displaying such hyberdization events have been proposed. One of the oldest ones is the Maximum Agreement Forest problem. This is an NP-hard problem and over the past two decades ever improving approximation algorithms have been presented. In this lecture I will explain you the problem and give its raison d'être. Then I will present a high brow view of a 2-approximate algorithm for the problem, which is currently the best in terms of worst-case performance analysis. This is joint work with Neil Olver (London School of Econ.), Frans Schalekamp (Cornell), Anke van Zuylen (Cornell)

Speaker: Georgios Andreadis
Title: Multi-Objective Deformable Image Registration for 3D Images Using the Gene-pool Optimal Mixing Evolutionary Algorithm

Abstract: Finding the most likely deformation of one image to another image is a problem with important applications in medical imaging, where large deformations and structural changes are common. Approaching this problem of Deformable Image Registration (DIR) from a multi-objective perspective has proven successful at producing a set of realistic registrations for 2D images for the user to choose from. This multi-objective optimization process is driven by the Real-Valued Gene-pool Optimal Mixing Evolutionary Algorithm (RV-GOMEA). Recent parallelization efforts with this algorithm on the Graphics Processing Unit (GPU) have delivered large speed-ups for 2D images by exploiting the conditional independence of local regions. This work builds on these advances and adapts the approach to support 3D images, introducing a new spatial model and a refined deformation energy model while still supporting annotated guidance information and multi-resolution schemes. We find that our proof-of-concept prototype can successfully tackle synthetic registration problems and are currently conducting experiments on patient scans to validate our approach on real-world problems.

Date: 4 May 2021

Speaker: Alexandr Chebykin
Title: Improving Neural Architecture Search by encouraging diversity

Abstract: Neural Architecture Search (NAS) is concerned with learning structure of a neural network from data. During the search, many candidate architectures need to be considered, but computational constraints make it impossible to independently train all candidates until convergence (or even evaluate them all). Various techniques make the search more efficient, while at the same time reducing the space of architectures that can be found. In this talk, I will describe our approach to improving upon a SOTA algorithm by optimizing an additional objective function that measures novelty of newly discovered solutions. Limitations of the approach, as well as of the current NAS paradigm in general, will also be discussed.

Speaker: Joe Harrison
Title: Differentiable Cartesian genetic programming for small interpretable symbolic expressions

Abstract: Interpretability is a desirable quality when it comes to machine learning. In genetic programming typically (binary) trees are used to evolve symbolic expressions that are human-readable. The leaf nodes of the tree can either be input data or ephemeral random constants (ERC) and the other nodes are operators (x, +, -, %) that take the output of its child nodes as input. In contrast, Cartesian genetic programming (CGP) uses a directed acyclic graph as a genotype. This allows for skip connections and reuse of salient patterns within the graph and possibly leads to more expressive formulas compared to a tree-based GP algorithm. Another issue with standard GP is that the ERC leaf nodes are instantiated once and do not change over time. By making the expression differentiable the 'constant' leaf nodes can be updated using backpropagation. This possibly takes away part of the burden of the GP algorithm having to evolve specific values using operator nodes. GP trees often suffer from bloat, a significant increase in solution size with insignificant improvement in terms of performance, which makes resulting expressions long and unreadable. CGP naturally uses a constrained template which prevents bloat. This template makes CGP an ideal candidate for Gene Optimal Mixing (GOM), a method where subsets of nodes are exchanged among solutions. GOM ensures improvement and offers the possibility of exploiting linkage information. With these adjustments, more expressive and shorter formulas are possible.

Date: 20 April 2021

Speaker: Leah Dickhoff
Title: Cervical cancer case modelling for bi-objective treatment planning in brachytherapy

Abstract: The previously developed bi-objective optimization method using the Real-Valued Gene-pool Optimal Mixing Evolutionary Algorithm (RV-GOMEA) for prostate high-dose-rate brachytherapy (BT) is extended to cervical cancer patients. The clinical objectives in terms of dose and dose-point indices, describing minimum doses to targets and maximum doses to organs at risk, are calculated directly from the EMBRACE-II protocol. They are then converted into a bi-objective formulation in which the trade-off is characterized by the Least Coverage Index versus Least Sparing Index. In order to generate clinically acceptable treatment plans, additional constraints on dwell time modulation and needle dose contribution are deemed necessary.

Speaker: Timo Deist
Title: Multi-objective learning for asymmetric Pareto fronts

Abstract: We will revisit the multi-objective (MO) learning problem to predict Pareto fronts presented in December. This time, I will highlight the issue of asymmetric Pareto fronts (asymmetry along the line L_1(x)=L_2(x)) that can occur when the competing loss functions are different in scale and curvature. Our proposed MO approach uses HV maximization and does not require specifying trade-offs before training. It is expected to outperform existing Multi-Task Learning (MTL) methods that require trade-offs to be known a priori which is a difficult requirement when Pareto front characteristics are unknown. Results from three experiments will be used to illustrate differences between our HV maximization-based approach and existing MTL algorithms.

Date: 6 April 2021

Speaker: Solon Pissis
Title: Bidirectional string anchors: A new mechanism for string sampling

Abstract: We will present a new mechanism for sampling strings, which has small density and is computable in linear time. Furthermore, we will show that by using this sampling, we can efficiently construct a small sketch that answers pattern matching queries in near-optimal time. This is a joint work (in progress) with Grigorios Loukides from King's College London.

Speaker: Thomas Uriot
Title: Interpretable dimensionality reduction using multi-objective Genetic Programming

Abstract: In this talk, we will discuss some of the ways to perform interpretable dimensionality reduction using Genetic Programming. These include using a teacher model, such as a neural-based autoencoder, a fully GP-based autoencoder and a manifold learning approach using GP as the function mapping from high to low dimensions. We also introduce variants of GP representation (vanilla, multi-tree, shared multi-tree). We compare all these approaches on several datasets by looking at their predictive power for downstream classification tasks.

Date: 23 March 2021

Speaker: Giulia Bernardini
Title: Constructing Strings Avoiding Forbidden Substrings

Abstract: We consider the problem of constructing strings over an alphabet Σ that start with a given prefix u, end with a given suffix v, and avoid occurrences of a given set of forbidden substrings. In the decision version of the problem, given a set S of forbidden substrings, each of length k, over Σ, we are asked to decide whether there exists a string x over Σ such that u is a prefix of x, v is a suffix of x, and no element of S occurs in x. Our first result is an O(|u|+|v|+k|S|)-time algorithm to decide this problem. In the more general optimization version of the problem, given a set S of forbidden arbitrary-length substrings over Σ, we are asked to construct a shortest string x over Σ such that u is a prefix of x, v is a suffix of x, and no element of S occurs in x. Our second result is an O(|u|+|v|+||S||*|Σ|)-time algorithm to solve this problem, where ||S|| denotes the total length of the elements of S.
Interestingly, our results can be directly applied to solve the reachability and shortest path problems in complete de Bruijn graphs in the presence of forbidden edges or of forbidden paths. Our algorithms are motivated by data privacy, and in particular, by the data sanitization process. In the context of strings, sanitization consists in hiding forbidden substrings from a given string by introducing the least amount of spurious information. We consider the following problem. Given a string w of length n over Σ, an integer k, and a set S of forbidden substrings, each of length k, over Σ, construct a shortest string y over Σ such that no element of S occurs in y and the sequence of all other length-k fragments occurring in w is a subsequence of the sequence of the length-k fragments occurring in y. Our third result is an O(nk|S|*|Σ|)-time algorithm to solve this problem.

Speaker: Monika Grewal
Title: Multi-Objective learning for Deformable Image Registration

Abstract: Deformable Image Registration (DIR) is an optimization task, wherein a Deformation Vector Field (DVF) is optimized to deform the source image such that it aligns with the target image. DIR is inherently multi-objective, requiring maximization or minimization of multiple conflicting objectives e.g., maximize image similarity and minimize deformation magnitude. Further, the computationally expensive nature of the DVF optimization gives a strong motivation to develop learning-based algorithms. In this talk, I will discuss a potential method for multi-objective learning of the DIR task.

Date: 9 March 2021

Speaker: Anton Bouter
Title: GPU-Accelerated Parallel Gene-pool Optimal Mixing applied to Multi-Objective Deformable Image Registration

Abstract: The Real-Valued Gene-pool Optimal Mixing Evolutionary Algorithm (RV-GOMEA) has previously been successfully used to achieve highly scalable optimization of various real-world problems in a gray-box optimization setting. Deformable Image Registration (DIR) is a multi-objective problem, aimed at finding the most likely non-rigid deformation of a given source image so that it matches a given target image. We specifically consider the case where the deformation model allows for finite-element-type modeling of tissue properties. This optimization problem is non-smooth, necessitating techniques like EAs to get good results. Though the objectives of DIR are non-separable, non-neighboring regions of the deformation grid are conditionally independent. We show that GOMEA allows to exploit such knowledge through the large-scale parallel application of variation steps, where each is only accepted when leading to an improvement, on a Graphics Processing Unit (GPU). On various 2-dimensional DIR problems, we find that this way, similar results can be achieved as when sequential processing is performed, while allowing for substantial speed-ups (up to a factor of 111) for the highest-dimensional problems (i.e., the highest deformation-grid resolution). This work opens the door to the extension of this type of DIR to larger (3-dimensional) deformation grids, and its application to other real-world problems.

Date: 23 February 2021
Speaker: Michelle Sweering

Date: 9 February 2021
next set of 8 presentations

Date: 26 January 2021
set of 8 presentations

Date: 15 December 2020

Speaker: Arkadiy Dushatskiy
Title: Deep learning for real-world medical image segmentation

Abstract: Automatic organ segmentation is an essential part of automatic radiotherapy treatment planning. We study how state-of-the-art deep learning methods can be successfully applied to real-world medical image segmentation. The research is based on datasets used in clinical practice such as MRI data of radiotherapy treatments at AMC.

Date: 1 December 2020

Speaker: Timo Deist
Title: Multi-objective inference by hypervolume-based Pareto front generation

Abstract: Multi-objective (MO) decision-making requires trading-off conflicting goals. To assist decision-making when preferred trade-offs are unknown, MO optimization literature describes techniques to approximate Pareto fronts which consist of optimal decisions for each trade-off. In this talk, we present preliminary results on treating MO statistical inference as a machine learning problem using a hypervolume-based loss function. We describe how to train sets of neural networks to produce Pareto front estimates for new problem instances.

Date: 17 November 2020

Speaker: Solon Pissis
Title: Pattern Masking for Dictionary Matching

Abstract: Data masking is a common technique for sanitizing sensitive data maintained in database systems, and it is also becoming increasingly important in various application areas, such as in record linkage of personal data. In this talk, we will investigate the Pattern Masking for Dictionary Matching (PMDM) problem. In PMDM, we are given a dictionary D of d strings, each of length L, a query string q of length L, and a positive integer z, and we are asked to compute a smallest set K⊆ {1,...,L}, so that if q[i], for all i∈K, is replaced by a wildcard, then q matches at least z out of d strings from D.

Date: 3 November 2020

Speaker: Haodi Zhong Title: Clustering datasets with demographics and diagnosis codes

Abstract: Clustering data derived from Electronic Health Record (EHR) systems is important to discover relationships between the clinical profiles of patients and as a pre-processing step for analysis tasks, such as classification. However, the heterogeneity of these data makes the application of existing clustering methods difficult and calls for new clustering approaches. In this paper, we propose the first approach for clustering a dataset in which each record contains a patient’s values in demographic attributes and their set of diagnosis codes. Our approach represents the dataset in a binary form in which the features are selected demographic values, as well as combinations (patterns) of frequent and correlated diagnosis codes. This representation enables measuring similarity between records using cosine similarity, an effective measure for binary-represented data, and finding compact, well-separated clusters through hierarchical clustering. Our experiments using two publicly available EHR datasets, comprised of over 26,000 and 52,000 records, demonstrate that our approach is able to construct clusters with correlated demographics and diagnosis codes, and that it is efficient and scalable.

Date: 20 October 2020

Speaker: Rosanne Wallin
Title: Applicability of phylogenetic network algorithms to represent the evolutionary history of SARS-CoV-2

The outbreaks of SARS (2003), MERS (2012) and most recently COVID-19 have led to a lot of public and scientific interest into the origins of the coronaviruses that cause these diseases. Traditionally, evolutionary history is inferred and displayed by constructing phylogenetic trees, but these are restricted to displaying vertical evolution. Phylogenetic networks are designed to represent more complex evolutionary relationships and are able to display horizontal evolutionary events, such as recombination in viruses. Several algorithms have been developed to construct such networks, but their suitability for coronavirus data is still unclear. In this presentation, I will shortly explain the concepts of viral recombination and phylogenetic networks and talk about the applicability of five phylogenetic network methods (TriLoNet, TriL2Net, Tree-Child Networks, Temporal Hybridization Number and Maximum-Pseudo Likelihood in PhyloNet) to a genomic data set of SARS-CoV-2 and related coronaviruses. I will show the influence of preprocessing steps (taxon selection, edge contraction and resolving multifurcations) and discuss the limitations of the different methods in terms of input size and consistency of the constructed networks. I will also discuss the biological interpretation of the resulting networks with respect to the evolutionary history of SARS-CoV-2.

Date: 6 October 2020

Speaker: Monika Grewal
Title: Developing a deep learning method for automatic organ segmentation

Abstract: Automatic organ segmentation can save hours of manual work required by clinicians in the process of radiotherapy treatment planning. Despite the availability of a plethora of deep learning methods for semantic segmentation and their surprisingly good performance on new unseen data, developing a deep learning method for a new problem is still full of challenges. At the bottom of these challenges is the need for a large annotated dataset for supervised learning of a neural network. In the medical imaging domain, the time and efforts required for building a large annotated dataset become even scarcer and costlier due to the underlying requirement of clinical expertise for annotating the data. Therefore, in order to avoid the need for manual annotation, we scraped a big dataset from the clinical database of a hospital along with the clinically available annotations. In this presentation, I will talk about the steps followed by us to develop a considerable size of fully annotated data suitable for supervised learning. I will also discuss the effects of different decisions related to sampling, preprocessing, and cleaning of data on the performance of baseline deep learning method for organ segmentation.

Date: 22 September 2020

Speaker: Tom den Ottelander
Title: What matters most for Neural Architecture Search: An analysis of search strategies, from simple to complex

Abstract: Computer vision tasks, like supervised image classification, are effectively tackled by convolutional neural networks, provided that the architecture, which defines the structure of the network, is set correctly. Neural Architecture Search (NAS) is a relatively young and increasingly popular field that is concerned with automatically optimizing the architecture of neural networks. Previously known work shows that even though recent works typically develop increasingly complex search strategies for NAS, several do not significantly outperform simple approaches like randomly sampling from the search space on single-objective NAS tasks. Additionally, proper ablation studies are often missing, therefore it is uncertain which mechanisms are fundamental for algorithms to achieve excellent NAS performances. In the first part of this thesis, Local Search (LS) and Uniform Size Random Search (USRS), are proposed for multi-objective (MO) NAS, demonstrating that very simple algorithms can provide searching performances close to state-of-the-art evolutionary algorithms (EAs), while outperforming random search. The second part explores what mechanisms are essential for the Multi-Objective Gene-pool Optimal Mixing Evolutionary Algorithm (MO-GOMEA), a state-of-the-art model-based EA, to achieve excellent performances for searching NAS spaces. The automatic population-sizing scheme of MO-GOMEA offers a welcome anytime-performance, but objective space clustering has only a small beneficial impact. The number of clusters can be set arbitrarily. Extreme clusters can be enabled to the practitioner’s preference, resulting in different search behaviors. The improved performance gained by automatic linkage learning is limited, although it can be helpful on future, more complex search spaces.

Date: 8 September 2020

Speaker: Michelle Sweering
Title: String Sanitization under Edit Distance: Improved and Generalized

Let W be a string of length n over an alphabet Σ, k be a positive integer, and S be a set of length-k substrings of W. The ETFS problem asks us to construct a string X_ED such that: (i) no string of S occurs in X_ED; (ii) the order of all other length-k substrings over Σ is the same in W and in X_ED; and (iii) X_ED has minimal edit distance to W. When W represents an individual's data and S represents a set of confidential patterns, the ETFS problem asks for transforming W to preserve its privacy and its utility [Bernardini et al., ECML PKDD 2019].
ETFS can be solved in O(n^2k) time [Bernardini et al., CPM 2020]. The same paper shows that ETFS cannot be solved in O(n^{2−δ}) time, for any δ>0, unless the Strong Exponential Time Hypothesis (SETH) is false. Our main results can be summarized as follows: (i) an O(n^2log^2k)-time algorithm to solve ETFS; and (ii) an O(n^2log^2n)-time algorithm to solve AETFS, a generalization of ETFS in which the elements of S can have arbitrary lengths.
Our algorithms are thus optimal up to polylogarithmic factors, unless SETH fails. Let us also stress that our algorithms work under edit distance with arbitrary weights at no extra cost. As a bonus, we show how to modify some known techniques, which speed up the standard edit distance computation, to be applied to our problems. Beyond string sanitization, our techniques may inspire solutions to other problems related to regular expressions or context-free grammars.

Date: 14 July 2020

Speaker: Monika Grewal
Title: Automatic Detection and Matching of Landmarks in 3D CT scans with an application to Deformable Image Registration

Deformable image registration (DIR) in lower abdominal Computed Tomography (CT) scans has tremendous applications in radiotherapy treatment planning, dose accumulation, and brachytherapy. However, DIR in the lower abdomen is quite challenging especially due to large variations in anatomy. Moreover, the physical conditions such as bladder filling, the presence of gas pockets, and contrast agents pose additional challenges to the DIR problem. Landmark correspondences in the fixed and moving image scans may provide additional guidance information to the DIR methods and in this way, help in finding a better DIR solution. However, the added value of using landmark pairs for DIR is largely understudied, especially due to a lack of automatic methods for landmark detection and matching in three-dimensional (3D) images. In this study, we present an end-to-end deep learning approach, called “DCNN-Match" that learns to predict landmark correspondences in 3D image scans in a self-supervised manner. We integrated DCNN-Match with open-source registration software Elastix to make an automatic DIR pipeline using additional guidance information from landmark correspondences. The entire pipeline was evaluated on CT scans of cervical cancer patients. The results for both simulated deformations as well as on the clinical data demonstrate the added value of using automatic landmark correspondences in the DIR pipeline.

Date: 30 June 2020

Speaker: Timo Deist
Title: Multi-objective Optimization by Uncrowded Hypervolume Gradient Ascent

Evolutionary algorithms (EAs) are the preferred method for solving black-box multi-objective optimization problems, but when gradients of the objective functions are available, it is not straightforward to exploit these efficiently. By contrast, gradient-based optimization is well-established for single-objective optimization. A single-objective reformulation of the multi-objective problem could therefore offer a solution. Of particular interest to this end is the recently introduced uncrowded hypervolume (UHV) indicator, which is Pareto compliant and also takes into account dominated solutions. We show that the gradient of the UHV can often be computed, which allows for a direct application of gradient ascent algorithms. We compare this new approach with two EAs for UHV optimization as well as with one gradient-based algorithm for optimizing the well-established hypervolume. On several bi-objective benchmarks, we find that gradient-based algorithms outperform the tested EAs by obtaining a better hypervolume with fewer evaluations whenever exact gradients of the multiple objective functions are available and in case of small evaluation budgets. For larger budgets, however, EAs perform similarly or better. We further find that, when finite differences are used to approximate the gradients of the multiple objectives, our new gradient-based algorithm is still competitive with EAs in most considered benchmarks.

Date: 16 June 2020

Speaker: Mark Jones
Title: A third strike against perfect phylogeny

Consider a set X (e.g. a set of species), and a set C of characters that each assign a state to each element in X (such as a DNA sequence alignment, where each column in the alignment assigns one of {A,C,G,T} to each row/species). An unrooted tree T on X is set to be a "perfect phylogeny" if each character in C can be extended to an assignment on all vertices of T, such that all vertices that are assigned the same state form a connected subtree. Perfect phylogenies are a useful tool in the study of evolutionary trees, where an important principle is that we expect changes in state to be rare.
A classical result states that 2-state characters permit a perfect phylogeny precisely if each subset of 2 characters permits one. More recently, it was shown that for 3-state characters the same property holds but with size-3 subsets. A long-standing open problem asked whether such a constant set size exists for each number of states. More precisely, it has been conjectured that for any fixed integer r, there exists a constant f(r) such that a set of r-state characters C has a perfect phylogeny if and only if every subset of at most f(r) characters has a perfect phylogeny. In this talk, we show that this conjecture is in fact false. In particular, we show that for any constant t, there exists a set C of 8-state characters such that C has no perfect phylogeny, but there exists a perfect phylogeny for every subset of t characters.
This talk is based on joint work with Leo van Iersel and Steven Kelk. It will also feature pictures of lobsters, for somewhat flimsy reasons.

Date: 10 March 2020

Speaker: Michelle Sweering
Title: String Sanitization under Edit Distance

Date: 25 February 2020

Speaker: Solon Pissis
Title: Reverse-Safe Data Structures for Text Indexing

We introduce the notion of reverse-safe data structures. These are data structures that prevent the reconstruction of the data they encode. A data structure D is called z-reverse-safe when there exist at least z datasets with the same set of answers as the ones stored by D. The main challenge is to ensure that D stores as many answers to useful queries as possible, is constructed efficiently, and has size close to the size of the original dataset it encodes. Given a text of length n and an integer z, we propose an algorithm which constructs a z-reverse-safe data structure that has size O(n) and answers pattern matching queries of length at most d optimally, where d is maximal for any z-reverse-safe data structure. The construction algorithm takes O(n^ω log n) time, where ω is the matrix multiplication exponent.

Date: 11 February 2020

Speaker: Sumanta Ray
Title: CODC: A copula based model to identify differential coexpression

I will explain a copula-based framework to model differential coexpression between gene pair in two different phenotype conditions.Identifying the difference in coexpression patterns, which is commonly known as differential coexpression is no doubt a challenging task in computational biology. This is essential to get a more informative picture of the differential regulation pattern of genes under two phenotype conditions. Copula produces a multivariate probability distribution from multiple uniform marginal distribution. Here, it is used to identify the dependency between expression patterns of a gene pair in two conditions separately.Kolmogorov-Smirnov distance between two joint distributions is treated as differential coexpression score of a gene pair. I will show the results of applying CODC in 5 pan-cancer data from TCGA data portal.

Date: 14 January 2020

Speaker: Marjolein van der Meer
Title: Optimization for high-dose-rate prostate brachytherapy: robustness and catheter positions

High-dose-rate (HDR) brachytherapy is a type of radiotherapy that can be used to treat prostate cancer, whereby small radioactive sources are temporarily placed inside the prostate. To achieve this, several very thin needles, called catheters, are placed inside the prostate, for a radioactive source to move through. A treatment plan consists of the positions of the catheters, combined with the movement of the source through those catheters. Although the radiation is aimed at cancer cells, it still poses a risk for the healthy organs surrounding the prostate. Therefore, it is important to use a treatment plan with the best possible trade-off between radiation to the prostate and radiation to the other organs. In a project together with the Academic Medical Center (AMC), we are developing optimization software to find such treatment plans. In my talk, I will discuss how to optimize the positions of the catheters, and how to ensure the robustness of the resulting treatment plan.

Date: 19 November 2019

Speaker: Peter Bosman
Title: Everything You Always Wanted to Know About Peter A.N. Bosman's Scientific Career* (*But Were Afraid to Ask)

In this talk, I will openly revisit the different stages of my career, ultimately leading to my part-time professorship at Delft University of Technology. I will pause to identify which moments, moves, or choices had the most impact and the lessons I learned from those about having a scientific career. Besides trying to give a few generalizing words of advice, I will leave the floor open to questions and discussions, that will hopefully also evoke opinions from other seniors in the group.

Date: 5 November 2019

Speaker: Giulia Bernardini
Title: A Rearrangement Distance for Fully-Labelled Trees

The problem of comparing trees representing the evolutionary histories of cancerous tumors has turned out to be crucial, since there is a variety of different methods which typically infer multiple possible trees. A departure from the widely studied setting of classical phylogenetics, where trees are leaf-labelled, tumoral trees are fully labelled, i.e., every vertex has a label. We provide a rearrangement distance measure between two fully-labelled trees. This notion originates from two operations: one which modifies the topology of the tree, the other which permutes the labels of the vertices, hence leaving the topology unaffected. While we show that the distance between two trees in terms of each such operation alone can be decided in polynomial time, the more general notion of distance when both operations are allowed is NP-hard to decide. Despite this result, we show that it is fixed-parameter tractable, and we give a 4-approximation algorithm when one of the trees is binary. On-going work include a constant factor approximation algorithm for general degree trees.

Date: 22 October 2019

Speaker: Arkadiy Dushatskiy
Title: Observer variation-aware medical image segmentation by combining deep learning and surrogate-assisted genetic algorithms.

We propose a deep learning based approach to capture observer variations in organs segmentation. Instead of training one neural net on all available data, we train several neural nets on subgroups of scans belonging to different segmentation variations separately. Because a priori it may be unclear what styles of segmentation exist in the data, the subgroups are determined automatically by finding an optimal data partition using a surrogate-assisted genetic algorithm. Such approach potentially provides a better quality of segmentation and segmentations of different segmentation styles can be presented to a doctor, which contributes to the ultimate goal of improving the acceptance of automatic segmentations in clinical practice.

Date: 10 September 2019

Speaker: Vincent Luo
Title: Haplotype aware de novo assembly of diploid genome from long reads

A diploid organism has two homologous copies of every type of chromosome, one from each parent. Determining the DNA sequence of each copy, which is called haplotype aware genome assembly, plays a crucial role in genetics and precision medicine. Over the last few years, long-read sequencing technologies such as SMRT Pacific Biosciences (PacBio) and Oxford Nanopore Technologies (ONT), have greatly improved the genome assembly due to its tremendous advantage of read length. However, current long-read assemblers usually collapse homologous sequences into one consensus sequence or rely on a pre-known high-quality reference as a backbone, which therefore fails to obtain genome sequences with high resolution. Here, we present a novel approach WhatsHap-denovo, for haplotype reconstruction in a de novo paradigm of diploid genome by only using long-read PacBio data. Benchmarking experiments on both simulated and real data demonstrate that our method outperforms state-of-the-art tools in terms of various aspects.

Date: 2 July 2019

Speaker: Arkadiy Dushatskiy (GECCO talk)
Title: Convolutional neural net surrogate-assisted GOMEA

Speaker: Marco Virgolin (GECCO talk)
Title: Linear Scaling with and within Semantic Backpropagation-based Genetic Programming for Symbolic Regression

Optimization problems with time-consuming objective function evaluations (expensive optimization) arise in different domains. We introduce a novel surrogate-assisted genetic algorithm for solving such optimization problems. The key novel features of our algorithm are keeping the strengths of the GOMEA algorithm while using a convolutional neural network as a surrogate model with a pairwise regression approach for model training to be able to train the CNN with small numbers of samples.

Both speakers will test-run their talks for the upcoming Genetic and Evolutionary Computation Conference (GECCO).

Date: 1 July 2019

Speaker: Stef Maree
Title: Real-valued Evolutionary Multi-modal Multi-objective Optimization by Hill-valley Clustering

Stef will test-run his conference talk as well as his pitch for the Human-Competitive Awards. Both will be featured in the upcoming Genetic and Evolutionary Computation Conference (GECCO).

Date: 18 June 2019

Mike Preuss (LIACS, Leiden University)

Mike Preuss is assistant professor at LIACS, the Computer Science department of Leiden University. He works in AI, namely game AI, natural computing, and social media computing. In this seminar, he will talk about the new links they see between evolutionary computation and AI, and about his Nature paper where they transferred AlphaGo to the chemical retrosynthesis problem (

Date: 4 June 2019

Speaker: Chantal Olieman
Title: Fitness-based Linkage Learning in the Real-Valued Gene-pool Optimal Mixing Evolutionary Algorithm

The recently introduced Real Valued Gene-pool Optimal Mixing Evolutionary Algorithm (RV-GOMEA) has been shown to be among the state-of-the-art for solving grey-box optimization problems where partial evaluations can be leveraged. A core strength is its ability to effectively exploit the linkage structure of a problem. For many real-world optimization problems the linkage structure is unknown a priori and has to be learned online. Previously published work on RV-GOMEA however demonstrated excellent scalability only when the linkage structure is pre-specified appropriately. A mutual-information-based metric to learn linkage structure online as commonly adopted in EDA's and the original discrete version of GOMEA did not lead to similarly excellent results, especially in a black box setting. In this article the strengths of RV-GOMEA are combined with a new tness-based linkage learning approach that is inspired by differential grouping, but reduces its computational overhead by an order of magnitude for problems with fewer interactions. The resulting new version of RV-GOMEA achieves scalability similar to when a predefined linkage model is used outperforming also, for the first time, the EDA AMaLGaM upon which it is partially based in the black box case where partial evaluations can not be leveraged.

Date: 23 May 2019

Speaker: David Craft
Title: MHC class 1 peptide binding prediction using structurally informed machine learning

The major histocompatibility complex (MHC) is a set of proteins that display peptides on the outer cell surface for recognition by T-cells. MHC class 1 displays small protein fragments degraded from inside the cell to cytotoxic T-cells; an immune response is invoked if the peptides displayed are of foreign origin or mutated. Predicting if a given peptide will bind to a given MHC protein is needed for the design of class 1 based personalized cancer vaccines. We developed a systematic framework for training and evaluating the performance of algorithms in predicting MHC-binding peptides, and will present a comparison of several modeling approaches trained and tested with a ground truth dataset of 95 MHC alleles with their binding peptides.

Date: 7 May 2019

Speaker: Divyae Prasad
Title: Epistasis unravelled by deep ConvNets: towards near-perfect phenotype classification from microbial genotypes

Sequence similarity based methods are the workhorse for identification of well described genetic fragments associated with a phenotype of interest. But how are new genetic markers (variations or risk loci) discovered? Genome-wide association studies (GWAS) are the current gold standard for such genotype to phenotype mapping efforts, and have been applied to understand the variations in pathogen genomes, especially those that are antimicrobial resistant. Yet GWAS often struggles to achieve both the statistical power needed for confident association calling, as well as the precision required to reject spurious findings. Here we develop and introduce a deep ConvNet, for tackling the genotype to phenotype mapping problem. Using P. aeruginosa (a bacterial species) pangenome as an example, we train our ConvNet to predict a binary drug response: a susceptible or resistance phenotype. We show near-perfect classification performance of our ConvNet models, demonstrating that non-linear additive effects (formally known as epistasis) of variants may entirely explain rapid evolution in pathogens. I take this opportunity to discuss the remaining challenges in our project: to demystify the workings of the ConvNet black-box and connect/compare it to a statistical test. Finally, we would like to integrate evolutionary relatedness measures (in the form of a “kinship matrix”, perhaps as an additional net) into the ConvNet thereby augmenting the statistical power of our method - inspirations on the same would be appreciated!

Date: 23 April 2019

Speaker: Timo Deist
Title: Distributed learning and prediction modelling in radiation oncology

I will present an overview of my PhD thesis: - empirical comparisons of machine learning algorithms (classifiers) for radiotherapy treatment outcome prediction, - a technique to embed simulation models in machine learning algorithms, - the concept and results of distributed learning studies to train prediction models on multiple (disjoint) databases of oncology institutes spread across Europe/Asia.

Date: 9 April 2019

Speaker: Jasmijn Baaijens
Title: De novo approaches to viral quasispecies assembly

A virus infection usually consists of a group of closely related virus strains, together referred to as viral quasispecies. This diversity is the result of extremely high mutation rates, which allows a viral population to rapidly adapt to its environment and evolve resistance to antiviral drugs. It is therefore of great importance to identify each of the individual genomes (haplotypes). However, this is a challenging task because of sequencing errors and low strain abundance rates, especially in the absence of high-quality reference sequences. In this talk, I will present an overview of the methods I've developed during my PhD research for de novo viral quasispecies assembly. And as this will be my last time presenting in the LSH seminar, there will be cake afterwards!

Date: 26 March 2019

Speaker: Luca Denti
Title: MALVA: genotyping by Mapping-free ALlele detection of known VAriants

The amount of genetic variation discovered and characterized in human populations is huge, and is growing rapidly with the widespread availability of modern sequencing technologies. Such a great deal of variation data, that accounts for human diversity, leads to various challenging computational tasks, including variant calling and genotyping of newly sequenced individuals. The standard pipelines for addressing these problems include read alignment, which is a computationally expensive procedure. A few mapping-free approaches were proposed in recent years to speed up the genotyping process. While such tools are very fast, they focus on isolated, bi-allelic SNPs, providing limited support for multi-allelic SNPs, indels, and genomic regions with high variant density. To address these issues, we introduce MALVA, a fast and lightweight mapping-free method to genotype an individual directly from a sample of reads. MALVA is the first mapping-free tool that is able to genotype multi-allelic SNPs and indels, even in high density genomic regions, and to effectively handle a huge number of variants such as those provided by the 1000 Genome Project. An experimental evaluation on whole-genome data shows that MALVA requires one order of magnitude less time to genotype a donor than alignment-based pipelines, providing similar accuracy. Remarkably, on indels, MALVA provides even better results than the most widely adopted variant discovery tools.

Date: 12 March 2019

Speaker: Marleen Balvert
Title: Metabolic Network Analysis and Phylogeny

In this lecture I will sketch two subfields of my research in life sciences, that have otherwise little overlap. First I will show how optimization and enumeration play a role in metabolic network analysis and that mathematics can help solving these problems. In particular I will consider so-called flux balance analysis. This is research that has been done and is going on in collaboration with Marie-France Sagot and Arne Reimers. Then I will give you an example of my research in phylogeny. In particular I will show you that reticulation events like horizontal gene transfer (especially in lower order organisms) require us to abolish the generically used pedigree tree model, but instead settle for a phylogenetic network. I will give some examples of results that we obtained within this model. This is research in collaboration with Leo van Iersel and Steven Kelk.

Date: Tuesday 26 February 2019

Speaker: Leen Stougie
Title: Metabolic Network Analysis and Phylogeny

In this lecture I will sketch two subfields of my research in life sciences, that have otherwise little overlap. First I will show how optimization and enumeration play a role in metabolic network analysis and that mathematics can help solving these problems. In particular I will consider so-called flux balance analysis. This is research that has been done and is going on in collaboration with Marie-France Sagot and Arne Reimers. Then I will give you an example of my research in phylogeny. In particular I will show you that reticulation events like horizontal gene transfer (especially in lower order organisms) require us to abolish the generically used pedigree tree model, but instead settle for a phylogenetic network. I will give some examples of results that we obtained within this model. This is research in collaboration with Leo van Iersel and Steven Kelk.

Date: Tuesday 12 February 2019

Speaker: Anton Bouter
Title: Latest Advancements in Bi-Objective Treatment Planning for HDR Brachytherapy

In our bi-objective treatment planning method for HDR brachytherapy, we use a model-based multi-objective evolutionary algorithm to find a large set of treatment plans with different trade-offs between the two optimization objectives: coverage of the tumor and sparing of the organs at risk. I will present some of the latest advancements and some work in progress for this treatment planning method. This includes adaptations to the optimization model to be robust against varying clinical protocols, the application of a Graphics Processing Unit (GPU) to substantially reduce optimization time through large-scale parallelization, and some of the steps required to introduce bi-objective treatment planning in clinical practice.

Tuesday 29 January 2019

Marjolein van der Meer
Title: HDR prostate brachytherapy: dwell time and catheter position optimization

Abstract: Although automatic dwell time optimization is common practice in HDR prostate brachytherapy (BT) treatment planning, this is less the case for automatic catheter position optimization for pre-planning. Recently, a bi-objective optimization model has been introduced, to automatically optimize dwell times and catheter positions simultaneously, creating a set of plans with different trade-offs between target coverage and organ sparing. The model can be optimized with the Gene-pool Optimal Mixing Evolutionary Algorithm (GOMEA), but this requires too large running times on an ordinary Central Processing Unit (CPU). In this work, we parallelize this optimization on a modern Graphics Processing Unit (GPU), and improve on the optimization model to ensure clinically feasible catheter positions.

Tuesday 22 January 2019

Nadia Pisanti
Title: Mapping Reads and Palindromic Decomposition on Pan-Genomes

Abstract: An elastic-degenerate string (ED-string) has been introduced to compactly represent a multiple alignment of several closely-related sequences (a pan-genome). In this representation, substrings of these sequences that match exactly are collapsed, while in positions where the sequences differ, all possible variants observed at that location are listed. The natural problem that arises is finding all matches of a deterministic pattern in an elastic-degenerate text. We introduce an algorithm to solve this problem on-line after a pre-processing stage. Moreover, we study the same problem under the edit- and Hamming-distance model. Finally, we describe a linear time algorithm for the comparison of non elastic D-strings (a special case of ED-strings) and a consequent efficient decomposition of D-strings into palindromes.

Tuesday 4 December 2018

Speaker: Mick van Dijk
Title: Classifying C. albicans species by using mixed integer optimization based optimal classification trees

Abstract: World-wide medical use of the antifungal azole has led to an enormous increase in azole resistant C. albicans species, which is most commonlyassociated with fungal infections. A possible reason for resistance are point mutations on the ERG11 gene. To provide patient specific medication it would be beneficial to be able to, in silico, classify the fungal isolates as being resistant or susceptible and identify the locations of the responsible point mutations. The aim of this thesis is to apply and compare several classification algorithms, in particular decision tree algorithms. Bertsimas and Dun recently introduced a novel formulation based on Mixed Integer Optimization to generate optimal classification trees. We have implement this method and applied it to the C. albicans data set to construct univariate and multivariate classification trees. Moreover, by adding extra constraints and variables to the original formulation we are able to model extensions such as minimizing false negative misclassifications and non-binary classification trees.

Tuesday 6 November 2018

Speaker: Solon P. Pissis
Title: Elastic-degenerate strings: a new representation for searching a collection of similar texts

Abstract: An elastic-degenerate string is a sequence of sets of strings. It has been introduced to represent multiple sequence alignments of closely-related sequences in a compact form. In this talk we will review some recent results on single and multiple pattern matching on this representation. We will then go on to present a real-world application of these results on searching pan-genomes, the full complement of genes in a clade.

Date: Thursday 18 October, 14h00

Speaker: Kalyanmoy Deb, Koenig Endowed Chair Professor
Michigan State University, East Lansing, USA
Title: Breaking the Billion-Variable Barrier Using Customized Evolutionary Optimization

Abstract: Optimization methods and practices are around for more than 50 years, but they are still criticized for their "curse of dimensionality". In this talk, we shall look at a specific large-dimensional integer-valued resource allocation problem class from practice and review the performance of well-known softwares, such as IBM's CPLEX, on the problem. Thereafter, we shall present a population-based heuristic search algorithm that has the ability to recombine short-sized building blocks, despite having overlapping variable linkage, to form larger-sized building blocks. The process is eventually able to solve a billion-variable version of the problem to near-optimality in polynomial computational time, making the application one of the largest size optimization problems ever solved.

Bio-sketch of the Speaker: Kalyanmoy Deb is Koenig Endowed Chair Professor at Department of Electrical and Computer Engineering in Michigan State University, USA. Prof. Deb's research interests are in evolutionary optimization and their application in multi-criterion optimization, modeling, and machine learning. He has worked at various universities across the world including IITs in India, University of Dortmund and Karlsruhe Institute of Technology in Germany, Aalto University in Finland, University of Skovde in Sweden, Nanyang Technological University in Singapore. He was awarded Infosys Prize, TWAS Prize in Engineering Sciences, CajAstur Mamdani Prize, Distinguished Alumni Award from IIT Kharagpur, Edgeworth-Pareto award, Bhatnagar Prize in Engineering Sciences, and Bessel Research award from Germany. He has been awarded IEEE CIS's "EC Pioneer Award". He is fellow of IEEE, ASME, and three Indian science and engineering academies. He has published over 490 research papers with Google Scholar citation of over 117,000 with h-index 110. He is in the editorial board on 18 major international journals. More information about his research contribution can be found from

Date: Tuesday 9 October, 16h00

Speaker: Alejandro Lopez Rincon
Title: Human Motion as a Complex System

Abstract: Movement of the human body is the result of complex processes involving several subsystems. Motion execution is generated by electrical impulses in the motor cortex, which propagate through the nervous system until reaching the alpha neurons associated with the specific activation of a particular group of muscle fibers. Several studies have shown that post-stroke patients develop higher activity in the sensorimotor areas of the affected hemisphere of the brain compared to healthy people during motor tasks. A proper understanding of the activity in the brains in post-stroke patients will help us develop mathematical models that clarify the underlying mechanisms associated with movement. This research describes an anatomically based brain computer model of movement impairment in stroke patients, providing an understanding of the mechanisms of neuromuscular complications. The overall system is composed by the abstraction of the following three subsystems: the brain, the skeletal muscle, and a cable equation that connects both systems modeled using a bidomain approach with the finite element method to simulate it. Two scenarios were simulated: a healthy subject and a post-stroke patient with motion impairment to create activity maps and compared them with the measurements.

Date: Monday 23 September, 16h00

Speaker: Andre Dekker (MAASTRO clinic, Maastricht)
Title: From Big Data to Better Cancer Care – FAIR, Linked Data & Personal Health Train

Abstract: Big data, artificial intelligence, machine learning and data science are expected to have a major impact on day-to-day cancer practice. Big data based services such as automated image segmentation, radiomics, decision support systems and literature mining are products already available to the cancer community and these are expected to rapidly change the way we practice medicine. Since 2008 Maastricht University and MAASTRO Clinic have developed a research program on this topic. A global IT infrastructure has been developed in which cancer centers are being connected with currently up to 25 partners. The aim is to enable cross-institute, privacy-preserving, data sharing & machine learning and more efficient clinical evidence generation: a concept now commonly referred to as "Rapid Learning". In the seminar innovative technology to extract, store and process (big) data for Rapid Learning and will be discussed. All this data is often seen as tremendously promising and is predicted to change health care radically, but at this point in time is mostly a challenge as we keep accumulating data without a clear path to clinical applications while privacy concerns are on the rise. Methods and examples how we go from data to making a difference in lives of cancer patients will be presented. As will the methods to do this in a way that preserves the privacy of patients such as the Personal Health Train and distributed learning.

Date: Tuesday 11 September, 16h00
Location: CWI, room L016

Speaker: Ziyuan Wang
Title: Automatic radiotherapy plan emulation for 3D dose reconstruction to enable big data analysis for historically treated patients

Abstract: 3D Dose Reconstruction (DR) for radiotherapy (RT) is the estimation of the 3D radiation dose distribution patients received during RT. Big DR data is needed to accurately model the relationship between the dose and onset of adverse effects, to ultimately gain insights and improve today’s treatments. DR is often performed by emulating the original RT plan on a surrogate anatomy for dose estimation. This is especially essential for historically treated patients with long-term follow-up, as solely 2D radiographs were used for RT planning, and no 3D imaging was acquired for these patients. Performing DRs for a large group of patients requires large amount of manual work, where the geometry of the original RT plan is emulated on the surrogate anatomy, by visually comparing the latter with the original 2D radiograph of the patient. This is a labor-intensive process that for practical use needs to be automated. This work presents an image-processing pipeline to automatically emulate plans on surrogate CTs. The pipeline was designed for pediatric cancer survivors that historically received abdominal RT with anterior-to-posterior and posterior-to-anterior RT field set-up. First, anatomical landmarks are automatically identified on 2D radiographs. Next, these landmarks are used to derive parameters needed to finally emulate the plan on a surrogate CT. Validation was performed by an experienced RT planner, blindly visually assessing 12 cases of automatic and manual plan emulations. Automatic emultions were approved 11 out of 12 times. This work paves the way to effortless scaling of DR data generation.

Date: Tuesday 3 July, 16h00
Location: CWI, room L016

Speaker: Alexander Schönhuth
Title: Efficient identification of genetic variants in single cells

Abstract: Only the analysis of the DNA or RNA of single cells, and not just the DNA or RNA of larger samples (bulks) allows to understand nature at its finest resolution. Recently introduced DNA/RNA sequencing technology has led to breakthroughs in understanding stem cell development, cancer formation and progression and immune cell differentiation that were hardly conceivable before.
The amount of DNA carried by a single cell is tiny however, which requires an experimental amplification step at the beginning of the analysis. This step introduces considerable statistical biases, which introduces non-negligible data uncertainties. These uncertainties pose tough computational challenges when aiming to identify the genetic variants inherent to single cells.
Here, we present an approach that efficiently quantifies these uncertainties, and thereby overcomes these challenges. Key to success is a statistical model that captures the conditional independencies among (both hidden and observed) variables that affect the variant identification process, and thereby points out an efficient computation scheme. Using the resulting efficient computation scheme allows for drastic improvements in comparison with other single cell variant discovery tools.

Date: Tuesday 19 June, 16h00
Location: CWI, room L016

Speaker: Marjolein van der Meer
Title: Better and Faster Catheter Position Optimization in HDR Brachytherapy for Prostate Cancer using Multi-Objective Real-Valued GOMEA

Abstract: The recently-introduced Gene-pool Optimal Mixing Evolutionary Algorithm (GOMEA) family has been shown to be capable of excellent performance on academic benchmark problems, especially when efficient partial evaluations are possible. This holds true also for the latest extension, the Multi-Objective Real-Valued GOMEA (MO-RV-GOMEA). We apply MO-RV-GOMEA to the real-world multi-objective optimization problem of catheter placement in High-Dose-Rate (HDR) brachytherapy for prostate cancer. Due to the underlying geometric structure of the variables, partial evaluations can be performed, allowing MO-RV-GOMEA to exploit this structure. The performance of MO-RV-GOMEA is tested on three real-world patient cases and is shown to be superior to a state-of-the-art mixed-integer evolutionary algorithm which was recently applied to this problem. Moreover, new insights on the objectives used for prostate brachytherapy treatment planning are obtained.

Date: Tuesday 5 June, 16h00
Location: CWI, room L016

Speaker: Marco Virgolin
Title: Symbolic Regression and Feature Construction with GP-GOMEA applied to Radiotherapy Dose Reconstruction of Childhood Cancer Survivors

Abstract: The recently introduced Gene-pool Optimal Mixing Evolutionary Algorithm for Genetic Programming (GP-GOMEA) has been shown to find much smaller solutions of equally high quality compared to other state-of-the-art GP approaches. This is an interesting aspect as small solutions better enable human interpretation. In this paper, an adaptation of GP-GOMEA to tackle real-world symbolic regression is proposed, in order to find small yet accurate mathematical expressions, and with an application to a problem of clinical interest. For radiotherapy dose reconstruction, a model is sought that captures anatomical patient similarity. This problem is particularly interesting because while features are patient-specific, the variable to regress is a distance, and is defined over patient pairs. We show that on benchmark problems as well as on the application, GP-GOMEA outperforms variants of standard GP. To find even more accurate models, we further consider an evolutionary meta learning approach, where GP-GOMEA is used to construct small, yet effective features for a different machine learning algorithm. Experimental results show how this approach significantly improves the performance of linear regression, support vector machines, and random forest, while providing meaningful and interpretable features.

Date: Tuesday 29 May, 16h00
Location: CWI, room L016

Speaker: Jakub Tomczak (UvA)
Title: Deep generative modeling using Variational Auto-Encoders

Abstract: Learning generative models that are capable of capturing rich distributions from vast amounts of data like image collections remains one of the major challenges of artificial intelligence. In recent years, different approaches to achieve this goal were proposed by formulating alternative training objectives to the log-likelihood like the adversarial loss or by utilizing variational inference. The latter approach could be made especially efficient through the application of the reparameterization trick resulting in a highly scalable framework now known as the variational auto-encoders (VAE). VAEs are scalable and powerful generative models that can be easily utilized in any probabilistic framework. The tractability and the flexibility of the VAE follow from the choice of the variational posterior (the encoder), the prior over latent variables and the decoder. In this presentation I will outline different manners of improving the VAE. Moreover, I will discuss current applications and possible future directions.

Date: Tuesday 22 May, 16h00
Location: CWI, room M290

Speaker:  Anton Bouter
Title: Large-Scale Parallelization of Partial Evaluations in Evolutionary Algorithms for Real-World Problems

Abstract: The importance and potential of Gray-Box Optimization (GBO) with evolutionary algorithms is becoming increasingly clear lately, both for benchmark and real-world problems. We show that the efficiency of GBO can be greatly improved through large-scale parallelism, exploiting the fact that each evaluation function requires the calculation of a number of independent sub-functions. This is especially interesting for real-world problems where often the majority of the computational effort is spent on the evaluation function. Moreover, we show how the best parallelization technique largely depends on factors including the number of sub-functions and their required computation time, revealing that for different parts of the optimization the best parallelization technique should be selected based on these factors. As an illustration, we show how large-scale parallelization can be applied to optimization of high-dose-rate brachytherapy treatment plans for prostate cancer.

Date: Tuesday 8 May, 16h00
Location: CWI, room M290

Speaker: Hoang Luong (LSH)
Title: Improving the Performance of MO-RV-GOMEA on Problems with Many Objectives using Tchebycheff Scalarizations

Abstract: The Multi-Objective Real-Valued Gene-pool Optimal Mixing Evolutionary Algorithm (MO-RV-GOMEA) has been shown to exhibit excellent performance in solving various bi-objective benchmark and real-world problems. We assess the competence of MO-RV-GOMEA in tackling many-objective problems, which are normally defined as problems with at least four conflicting objectives. Most Pareto dominance-based Multi-Objective Evolutionary Algorithms (MOEAs) typically diminish in performance if the number of objectives is more than three because selection pressure toward the Pareto-optimal front is lost. This is potentially less of an issue for MO-RV-GOMEA because its variation operator creates each offspring solution by iteratively altering a currently existing solution in a few decision variables each time, and changes are only accepted if they result in a Pareto improvement. For most MOEAs, integrating scalarization methods is potentially beneficial in the many-objective context. Here, we investigate the possibility of improving the performance of MO-RV-GOMEA by further guiding improvement checks during solution variation in MO-RV-GOMEA with carefully constructed Tchebycheff scalarizations. Results obtained from experiments performed on a selection of well-known problems from the DTLZ and WFG test suites show that MO-RV-GOMEA is by design already well-suited for many-objective problems. Moreover, by enhancing it with Tchebycheff scalarizations, it outperforms MOEA/D-2TCHMFI, a state-of-the-art decomposition-based MOEA.

Date: 10 April 2018, 16h15
Location: CWI, room L016

Speaker: Stef Maree (LSH)
Title: Solving multi-modal optimization problems using Estimation of Distribution Algorithms

Abstract: Estimation of Distribution Algorithms (EDAs) are heuristic optimization algorithms that try to iteratively find better solutions to a given (black-box) problem. Each iteration, new candidate solutions are sampled from a probability distribution. EDAs equipped with a Gaussian distribution have shown to be successful in real-valued optimization. However, performance often deteriorates when the problem at hand is multi-model, as multiple modes in the fitness landscape have to be modelled with a unimodal Gaussian. In this presentation, we focus on models that can adapt to the multi-modality of the fitness landscape. Specifically, we discuss Hill-Valley Clustering, a remarkably simple approach to adaptively cluster the search space in niches, such that a single mode resides in each niche. In each of the located niches, an EDA is initialized to optimize that niche. Combined with an EDA and a restart scheme, the resulting Hill-Valley Evolutionary Algorithm (HillVallEA) is, even though its remarkable simplicity competitive to the state-of-the-art algorithms and shows superior performance in the long run.

Date: 27 March 2018, 15h00
Location: CWI, room L016

Speaker: Kalia Orphanou (LSH)
Title: Learning Bayesian Network Structures with GOMEA

Abstract: Bayesian networks (BNs) are probabilistic graphical models which are widely used for knowledge representation and decision making tasks, especially in the presence of uncertainty. Finding or learning the structure of BNs from data is an NP-hard problem. Evolutionary algorithms (EAs) have been extensively used to automate the learning process. In this presentation, I will talk about the consideration of the Gene-pool Optimal Mixing Evolutionary Algorithm (GOMEA), a recenly introduced EA, applied to the specific case of learning BNs. The performance of GOMEA has also been compared against recently published results applied on commonly-used datasets of varying size. Considering the results, GOMEA out performs standard algorithms for learning the BN structure as well as other EAs, such as Genetic Algorithms (GAs) and Estimation of Distribution algorithms (EDAs) even when efficient local search techniques are added.

Date: 13 March 2018, 15h00
Location: CWI, room L016

Speaker: Jasmijn Baaijens (LSH)
Title: De novo viral quasispecies assembly: challenges and solutions

Abstract: A viral quasispecies, the ensemble of viral strains populating an infected person, can be highly diverse. For optimal assessment of virulence, pathogenesis and therapy selection, determining the haplotypes of the individual strains can play a key role. This task is known as viral quasispecies assembly. As many viruses are subject to high mutation and recombination rates, high-quality reference genomes are often not available at the time of a new disease outbreak. Reference free ('de novo') approaches therefore have clear benefits.
In this talk I will discuss the challenges that arise in de novo viral quasispecies assembly, along with several ideas to tackle these challenges. Many different techniques will be addressed, from FM-indexes to variation graphs and from maximal clique enumeration to linear programming. These techniques form the building blocks for our viral quasispecies assembly tools, SAVAGE and Virus-VG, together enabling full-length reconstruction of viral haplotypes without the use of any reference genome or prior information.

Date: Tuesday 27 February 2018, 15h00
Location: CWI, room L016

Speaker: Prof. dr. Gusz Eiben (VU)
Title: Evolving embodied intelligence in robotic populations

Abstract: Evolutionary robotics (ER) is the art of employing evolution to develop the brains, the bodies, or both for autonomous robots. In this talk I explain the benefits of the Evolution of Things for engineering as well as for fundamental scientific studies. I argue that constructing systems of self-reproducing machines will lead to a new exciting mix of evolutionary computing, robotics, and artificial life with new challenges and opportunities. In particular, I outline the concept of EvoSphere, a robotic ecosystem that evolves in real space and real time and review on-going activities following up the “Robot Baby Project”, our first proof-of-concept implementation. I hope to inspire a discussion about the perspectives of using this technology for studies into evolution, the emergence of intelligence, and the interplay between the body and the mind.

- A.E. Eiben and J. Smith, From evolutionary computation to the evolution of things <>, *Nature*, 521:476-482, doi:10.1038/nature14544, 2015
- M.J. Jelisavcic, M. De Carlo, E. Hupkes, P. Eustratiadis, J. Orlowski, E. Haasdijk, J. Auerbach, and A.E. Eiben, Real-World Evolution of Robot Morphologies: A Proof of Concept <>, *Artificial Life*. 23(2):206-235, 2017.

Date: Tuesday 13 February 2018, 16h00
Location: CWI, room L016

Speaker: Peter Bosman
Title: A Biography of the GOMEA Family

Abstract: In 2011, together with Dirk Thierens, I put forth the Gene-pool Optimal Mixing Evolutionary Algorithm. Since then, many improvements and adaptations have been made, mostly by current members of the Life Sciences and Health group. GOMEA has even inspired other researchers across the world, resulting in algorithms strongly related GOMEA, such as P3 and DSMGA-II. Today, GOMEA therefore is no longer a single algorithm, but truly a family of algorithms that is going strong. GOMEA has come to redefine the limits of what can be achieved with evolutionary algorithms for various types of problem variables, and is even being used by our group to improve real-world clinical practice, e.g., by optimizing brachytherapy treatment plans at AMC. The results and the future of GOMEA are great and inspiring, but where did GOMEA come from, why does it work so well, and why is it designed the way it is? To answer these questions, in this talk, I will present the backstory of GOMEA and show how its modern-day design traces back to simple genetic algorithms, evolution strategies, and genetic programming, along a path of probabilistic modelling and machine learning techniques that have helped a popular search paradigm that was once blind, to find the light.

Date: Tuesday 16 January 2018, 16h00
Location: CWI, room L016

Speaker: Marleen Balvert
Title: Bioinformatics: decoding the information in biological systems

Abstract: The genetic code can be considered as a recipe book for a living organism: it indirectly describes all the properties and functions of the organism such as its physical form and its metabolic processes. Over the past decades, reading the genetic code of an individual has become easier and cheaper. This yields large amounts of data on the DNA of viruses, bacteria, plants, humans, etcetera. Analyzing these databases results in an increasing understanding of the relation between genetic variation and biological processes. For example, analyzing the DNA of disease-causing viruses may help identifying those that are resistant to medication, and analyzing human DNA can aid in identifying the causes of familial diseases. The field of bioinformatics is concerned with the development of software tools and algorithms for the collection, management and analysis of genetic data.

Date: Monday December 19th, at 13h00
Location: CWI, room L016

Speaker: Marc van Dijk
Title: Will it bind?, Using computational simulation and modeling techniques to predict how small molecular compounds (drugs) bind to their target protein

Abstract: Powerful in-silico methods able to predict (un)wanted binding to target and off-target proteins during early-stage drug development can help to prevent failures and waste of resources during later stages. An illustrative example is the prediction of the interaction between drug candidates and members of the family of Cytochrome P450 (CYP) enzymes that play a central role in drug metabolism. Binding affinity or binding free energy prediction for these proteins is challenging because of the dynamic nature of the protein and the possible variety of ligand-binding orientations. Recently we introduced an automated workflow to account for such protein plasticity effects, while keeping computational costs tractable. Our workflow is based on Linear Interaction Energy (LIE) theory and relies on a docking stage to sample multiple ligand binding conformations, followed by short molecular dynamics simulation on the protein-ligand complexes to obtain values for binding free energy. A single, overall, value for the binding free energy is obtained using a statistically weighting-based scheme. LIE is an empirical method that requires calibration of model parameters for the system under study. A carefully calibrated model with associated applicability domain is essential in obtaining accurate predictions. We utilize a semi-unsupervised learning method using Bayesian statistics and distribution analysis to identify and learn, potentially multiple models followed by a series of applicability domain analysis to explain the obtained models in terms of protein-ligand interactions and ligand characteristics. We successfully applied these methods to train predictive models for CYP19A1 using a typical diverse ligand dataset obtained from an industrial high-throughput study.

 Date: Monday July 31st, starting at 15h00
Location: CWI, room L016

Speaker: Bastiaan van der Roest
Title: Using variation graphs to find virus strains in sequenced reads

Abstract: Due to high mutation rates, necessary for viruses to adapt to changing environments, a viral infection is most of the time caused by a number of strains of a specific virus species. To fight such an infection properly all present strains must be known. All the different mutated strains can be brought together in a mutant distributions: the viral quasispecies. The viral quasispecies can be represented as a variation graph. This is a directed acyclic graph, where each strain is represented by a path in the graph. We made an algorithm to build variation graphs. To do this, the algorithm perform an partial order alignment on assemblies of reads to get an initial graph. After that the initial graph will be compressed. Moreover, the algorithm predicts out of the assembled reads the sequences and frequencies of the present strains by using the compressed variation graphs.

Date: Wednesday July 19th, starting at 16h00
Location: CWI, room L016

Speaker: Jasmijn Baaijens
Title: De novo viral quasispecies assembly using overlap graphs

Abstract: A virus infection usually consists of a group of closely related virus strains, together referred to as viral quasispecies. This diversity is the result of extremely high mutation rates, which allows a viral population to rapidly adapt to its environment and evolve resistance to antiviral drugs. It is therefore of great importance to identify each of the individual genomes (haplotypes). However, this is a challenging task because of sequencing errors and low strain abundance rates, especially in the absence of high-quality reference sequences. We present SAVAGE, a computational tool for reconstructing individual haplotypes of intra-host virus strains without the need for a reference genome. SAVAGE makes use of overlap graphs, where nodes represent sequencing reads, while edges reflect that two reads, based on statistical considerations, originate from the same virus strain. First, we use maximal cliques in the overlap graph to correct errors in the input sequences. Then, we apply an iterative scheme which extends the corrected sequences until individual haplotypes are found. In benchmark experiments on both simulated and on real datasets, SAVAGE outperforms state-of-the-art tools and we are able to reconstruct individual strains in Zika virus and hepatitis C virus patient samples.

Date: Friday July 7th, starting at 13h00
Location: CWI, room L016

Speaker: Eli Zamir []
Title: Resolving the building blocks and assembly principles of cell-matrix adhesion sites

Abstract: Cell adhesion is a highly complex process, since it underlies the self-organization of cells in multi-cellular organisms. Cell-matrix adhesion is mediated by dynamic sites along the plasma membrane that anchor the actin cytoskeleton to the extracellular matrix. More than a hundred different proteins, collectively termed the integrin adhesome, are localized in cell-matrix adhesion sites. We study how the integrin adhesome gets self-organized locally, rapidly and correctly as diverse adhesion sites. To address this question, we monitor the integrin adhesome network in individual adhesion sites and in the cytosol. Using fluorescence cross-correlation spectroscopy (FCCS), fluorescence recovery after photobleaching (FRAP) and fluorescence lifetime imaging microscopy (FLIM), we revealed that the integrin adhesome is extensively pre-assembled already in the cytosol, forming multi-protein building blocks for adhesion sites. These building blocks are combinatorially diversified, confined in their size and correlate with the structural and functional organization of proteins across focal adhesions. We also found that stationary focal adhesions release symmetrically the same types of protein complexes that they recruit, thereby keeping the cytosolic pool of their building blocks spatiotemporally uniform. Based on these results we concluded a model in which multi-protein building blocks enable rapid and modular self-assembly of adhesion sites, while symmetric exchange of these building blocks preserves their specifications and thus the assembly logic of the system. In a broader sense, these results emphasize the need to study how interdependencies between protein interactions shape the repertoire of protein complexes in the cytosol and how this shaping facilitates cellular processes.

Date: Thursday June 29th, starting at 13h00
Location: CWI, room M290

Speaker: Kees Storm (TU/e)
Title: Cell adhesion and motility: The physics of mechanosensing and mechanoresponse.

Abstract: Cells are acutely aware of the mechanical properties of their environment, and base some of the most important decisions on these properties. What basic physical mechanism allows cells to translate external mechanical information into the biochemical language they understand? A family of proteins called the integrins, which connect the cell to the outside world, was recently demonstrated to possess some curious physical properties that may provide mechanosensory functionality.
These integrins form so-called catch bonds: cellular receptor-ligand pairs whose lifetime, counterintuitively, increases with increasing load. While their existence was initially pure theoretical speculation, recent years have seen several experimental demonstrations of catch behavior in biologically relevant protein-protein bonds. I discuss the implications of single catch-bond characteristics for the behavior of a load-sharing cluster of such bonds: these are shown to possess a regime of strengthening with increasing applied force. I will discuss load distributions in focal adhesions containing mixtures of slip and catch bonds and, time permitting, the implications of stiffness-dependent integrin binding for the persistence of durotactic motility, which in recent experiments was shown to correlate with substrate stiffness.

Date: Friday June 23rd, starting at 13h00
Location: CWI, room L016

Speaker: Eva Deinum, Wageningen University & Research
Title: How selective severing by katanin promotes order in the plant cortical microtubule array

Abstract: During interphase in plant cells, microtubules are attached to the plasma membrane where they play a crucial role in plant growth and development by guiding the anisotropic deposition of wall material. A large number of experimental studies show that the microtubule severing enzyme katanin is important for the self-organisation of the cortical microtubule array, but the underlying mechanism is not understood. In fact, our current understanding of self organisation of cortical microtubules would predict that microtubule severing would interfere with alignment for the following reason: Interactions between microtubules drive self organisation, and severing reduces the average microtubule length and life time, thereby reducing the number of interactions per microtubule. Using computer simulations and theoretical considerations, we are now able to resolve this paradox. Our results provide mechanistic insight into how microtubule severing can be modulated to drive microtubule alignment and reveal an unexpected additional requirement of the mechanism.

Date: Monday March 27th, starting at 15h00
Location: CWI, room L016

Speaker: Marco Virgolin
Title: Exploiting Machine Learning to Understand Similarity among Children: A First Step Towards Highly Individualized 3D Dose Reconstruction.

Abstract: External beam radiation is a powerful therapy used to exterminate cancer cells which can't be reached by surgery or treated with chemotherapy. Whereas often needed to ensure patient survival, radiotherapy (RT) can bring a number of Adverse Effects (AEs) due to its toxic effect on healthy tissues. Aiming at reducing AEs, a number of studies looks at the relationship between the detailed, 3D radiation dose absorbed by specific subvolumes of organs at risk and the onset of particular AEs. Unfortunately, such studies can't be carried on with respect to late-occurring AEs, which are now observed in patients treated when they were children, many years ago. This is because no 3D image acquisition and 3D planning could be performed at the time. To bridge this knowledge gap, and to gain valuable information to improve children treatment, a dose reconstruction technique is needed.
Here, we leverage on Machine Learning techniques to study the first step for a novel, highly-individualized dose reconstruction strategy: how to choose a representative 3D anatomy for a historically-treated child, based on recently acquired CT scans and historical patient records.

Date: Thursday March 9th, starting at 13:00hrs
Location: CWI, room L017

Speaker: Endre Bakken-Stovner (Trontheim Univiversity)
Title: Brief presentation of the background, hypothesis, methods and the current state of the project

Abstract: In the Programmable Epigenetics project we are investigating how chromatin modifications are preserved after DNA duplication. More specifically, we are testing the hypothesis that a group of heterochromatin-located RNAs, which are only transcribed after DNA duplication, are involved in remembering the location of H3K27me3 and H3K4me3 after mitosis.
For this purpose, we have 4 different types of data from human keratinocytes: RNA-Seq, ChIP-Seq H3K4me3, ChIP-Seq H3K27me3 and ChIP-Seq Phosphorylated PolII. The dataset is a time-series experiment with 9 timepoints from 0 to 24 hours, where there are two biological replicates for each timepoint.
By looking for RNAs that display a cyclic pattern of expression throughout the RNA time series, we hope to find candidates for RNAs that might help preserve the chromatin state.

Date: Friday March 3rd, starting at 13h00
Location: CWI, room L016

Speaker: Stephen Smith (University of Edinburgh)
Title: Linking noise across scales

Abstract: Noisy gene expression is known to be of fundamental importance to single cells, and is therefore widely studied and modelled in single-celled organisms. Extending these studies to multicellular organisms is challenging, since their cells are generally not isolated, but rather individuals in a tissue, closely coupled to several neighbour cells. Transport of molecules between neighbouring cells via gap junctions or plasmodesmata ensures that tissue-bound cells are neither fully independent of each other, nor an entirely homogeneous population. From a completely general mathematical description of a tissue with direct cell-to-cell transport, we derive two equations connecting the noise at the tissue scale with the noise in a single cell. These equations have a number of surprising implications for both modelling and experimentally studying cells within tissues, and make easily testable predictions. We confirm these predictions on detailed stochastic simulations of biochemical networks, and experimental data from a leaf of Arabidopsis thaliana and a population of mouse fibroblast cells.

Date: Friday February 17th, starting at 15h00
Location: CWI, room L016

Speaker: Nina Kudryashova (Ghent University)
Title: Virtual cardiac monolayers for arrhythmogeneity studies

Abstract: Cardiac tissue has a complex structure that is considered as one of the main determinants of the arrhythmogenic substrate. Such a morphology is a result of various dynamic cellular processes involving cell motion, cell-cell adhesion and cell-substrate interaction. We present a joint in vitro and in silico research aimed at developing the first mathematical model that describes the formation of cardiac tissue. Firstly, we performed experiments, in which we carefully characterised the morphology of cardiac tissue in a culture of neonatal rat cardiac cells. We considered two cell types, cardiomyocytes and fibroblasts, and characterised their cell shapes in various experimental conditions. Secondly, we proposed a modelling approach based on the Glazier-Graner-Hogeweg model, which is widely used in tissue growth studies. Using this model, we were able to reproduce the shapes for isolated cells or cells in monolayers; on a uniform scaffold and on nanofibres that resemble the extracellular matrix of the heart. The resulting morphology was coupled to the detailed electrophysiological Korhonen-Majumder model for neonatal rat cells to study wave propagation. The simulated waves had the same anisotropy ratio and wavefront complexity as those in the experiment. We conclude that our approach allows us to reproduce the morphological and physiological properties of cardiac tissue, and it can further be used in a wide range of studies on the relation of tissue morphology and electrophysiological phenomena.
Authors: Nina Kudryashova, Valeriya Tsvelaya, Konstantin Agladze, and Alexander Panfilov

Date: Friday January 27th, 13h00
Room: CWI, room L016

Speaker: Murray Patterson
Title: Correlated Evolution of Metabolic Functions over the Tree of Life

Abstract: We are interested in the structure and evolution of metabolism in order to better understand its complexity. We study metabolic functions in 1459 species within which several hundreds of thousands of families of homologous genes have been identified [1]. Given a protein sequence, PRIAM search [2] delivers probabilities of the presence of several thousand enzymes (ECs). This allows us to infer reaction sets and to construct a metabolic network for an organism, given its set of sequences.
We then propagate these ECs to the ancestral nodes of the species tree using maximimum likelihood methods. These evolutionary scenarios are systematically compared using pairwise mutual information. We identify co-evolving enzyme sets from the graph of these relationships using community detection algorithms [3,4]. This sheds light on the structure of the metabolic networks in terms of co-evolving metabolic modules. These modules are also interpreted from a functional perspective using stoichiometric models of metabolic networks.
[1] Penel et al., BMC Bioinformatics, 10(6):S3, 2009
[2] Claudel-Renard et al., Nucleic Acids Research, 31(22):6633--6639, 2003
[3] Ahn et al., Nature, 446:761--764, 2010
[4] Blondel et al., Journal of Statistical Mechanics, 2008(10):P10008, 2008

Date: Monday November 7th, at 15h00
Room: L016

Speaker: Neda Sepasian (TU/e)
Title: Brain white matter reconstruction and the clinical challenges

Abstract: The main focus of this talk will be the differential geometry and machine learning inspired contributions in diffusion weighted imaging (DWI) and tractography for qualitative and quantitative interpretation of DW images. For almost two decades DWI has been applied to the evaluation of various diseases and shown to be capable of detecting early or subtle changes. I will discuss the challenges yet left to tackle before the whole potential of this imaging can be explored in a real world clinical application.

Date: Monday October 3rd, starting at 11h00 (!)
Location: room L120

Speaker: Tim Otto Roth []
Title: Pixelsex meets Cellular Automata – a minimalistic view on the dynamics of self-organization

Abstract: For more than ten years Dr. Tim Otto Roth explores cellular automata as artist and composer but also as science historian. 'Pixelsex' translates the basics of that self-organization principle into the terms of the art world telling that discrete units interact according to local rules resulting in a complex macroscopic behaviour. In his presentation Roth introduces a few exemplary projects where he worked with the self-organization principle to be it a huge façade animation in public space in Rotterdam, a composition for a choir and string orchestra or a self-organizing water organ. Here he follows up a minimalist approach asking for instance what kind of rules to apply to a minimal set of sound units to weave a continuously changing sound carpet. Here he will present a discovery describing in a new way the dynamics and robustness of 1d cellular automata beyond the Wolfram classification scheme. Finally, he will have a look back into the quite young history of cellular automata working out how important the pictorial component of that model is.
Selected publications: Roth, Tim Otto; Deutsch, Andreas: Universal synthesizer and window – cellular automata as a new kind of cybernetic images, in: Grau, Oliver; Veigl, Thomas (Ed.): Imagery in the 21st Century, Cambridge/Mass (MIT Press) 2011, pp. 269-288.
Web: <> & <>

Date: Monday September 26th, starting at 15h00
Location: CWI, room L017

Speaker: Anton Bouter
Title: Designing the Real-Valued Gene-pool Optimal Mixing Evolutionary Algorithm and Applying it to Substantially Improve the Efficiency of Multi-Objective Deformable Image Registration

Abstract: The recently introduced Gene-pool Optimal Mixing Evolutionary Algorithm (GOMEA) for discrete variables has been shown to be able to efficiently and effectively exploit the decomposability of optimization problems, especially in a grey-box setting, in which a solution can be efficiently updated after a modification of a subset of its variables. GOMEA is considered to be state of the art, but currently no version of GOMEA for real-valued variables exists. In this thesis, we design a real-valued version of GOMEA, for both single-objective and multi-objective optimization. Our novel GOMEA variant is then applied to the Deformable Image Registration (DIR) problem, which was adapted to allow for efficient partial evaluations. DIR concerns the calculation of a deformation that transforms one image to another, and is of great importance for many medical applications. Experiments are performed to assess GOMEA’s performance in black-box and grey-box settings on a range of single-objective and multi-objective benchmark problems, including comparisons with the state-of-the-art real-valued optimization algorithm AMaLGaM. From the results of these experiments, we find that GOMEA performs substantially better on all considered single-objective and multi-objective benchmark problems in a grey-box setting, in terms of required time and number of evaluations. Moreover, the improvement becomes larger as problem dimensionality increases. In a black-box setting, GOMEA still performed better than AMaLGaM in terms of time, and comparable in terms of the number of evaluations. On DIR problems, GOMEA achieved solutions of similar quality while achieving a speed-up of up to a factor of 1600.

Date: Wednesday September 28th, starting at 12h00 (!)
Location: CWI, room L016

Speaker: James Collier (Monash University, Australia)
Title: Statistical Inductive Inference of Protein Structural Alignments

Abstract: The problem of finding reliable structural alignments is commonly posed as a combinatorial optimisation problem, which requires an optimisation strategy (a search method to find the best alignments) and an objective function (a measure of alignment quality). The objective function must arbitrate a trade-off between the structural fidelity of the proteins being aligned, and the complexity of the alignment itself. The alignment search algorithm then finds the alignment that the objective function considers optimal. Over the past five decades, many alignment methods have been conceived to identify structural alignments between proteins. Concerningly, the alignments obtained by these methods differ substantially and often produce contradictory results. Many comparative studies on methods generating structural alignments have highlighted the absence of a clear consensus on what constitutes a good structural alignment and the lack of a statistically rigorous measure of alignment quality. This has been stated as a leading cause of the observed proliferation of new structural alignment methods, which tend to perform small modifications to previous approaches.
We propose a fundamental shift in the way structural alignment quality is formalised and measured, and in the way biologically-meaningful alignments are identified. It brings together ideas from fields of information theory, data compression, and statistical inductive inference to develop a statistically rigorous framework to measure structural alignment quality. The resulting alignment quality measure, called I-value, is built on the Bayesian framework of minimum message length inference. Furthermore, we have developed a search algorithm that employs I-value to consistently identify high quality and statistically significant structural alignments. This search method is also able to identify significant alternative structural alignments of comparable quality. The culmination of this work is an open-source pairwise structural alignment program called MMLigner. MMLigner results were found to be highly-competitive compared to other methods, and consistently outperforms other methods in identifying alternative structural alignments, a challenging problem when aligning oligomeric proteins and protein complexes.

Date: Thursday September 8th, starting at 15:30hrs
Location: CWI, room L016

Speaker: Sander Bohte
Title: Fast and Efficient Asynchronous Neural Computation with Adapting Spiking Neurons

Abstract: Real neurons communicate with each other using electrical pulses: spikes. While this is well known and understood, one of the central questions in neuroscience remains how neurons in communicate information with these spikes. Specifically, a recurring topic of debate is whether or not the specific presence and timing of single spikes is of importance for neural information communication: traditionally, the notion is held that spikes are generated stochastically as a function of the neurons membrane potential, and individual spikes and spike-times are unimportant. Recent ideas however have pointed out the close relationship between detailed models of neural behavior, like Leaky-Integrate-and-Fire models, and a class of analog-to-digital conversion algorithms known as Asynchronous Pulsed Sigma Delta Modulators. While these algorithms work well, they however need unreasonably high pulse-rates compared to biological neurons. Here, we show that by using more advanced adapting neuron models, we can optimize information transmission, effectively dynamically adjusting the dynamic range of the neurons spike-coding mechanism. We show that with these spiking neurons, we can carry out asynchronous neural computation. We demonstrate this in a number of toy problems, including a standard deep neural network, where we replace the standard artificial neurons with spiking neurons. We show that these networks are just as effective as standard deep neural networks, while using very few spikes and remaining responsive to changes in inputs.

Date: Thursday September 1st, starting at 13h00
Location: CWI, room L016

Speaker: Margriet van Gendt (KNO)
Title: A fast, stochastic and adaptive model to predict auditory nerve responses to cochlear implant stimulation

Abstract: Cochlear implants (CI) rehabilitate hearing impairment through direct electrical stimulation of the auditory nerve. In many modern CIs sound is coded through the Continuous Interleaved Sampling (CIS) Strategy. Although many different sound-coding strategies have been introduced in the last decade, no major advances have been made since the introduction of the CIS strategy. New stimulation strategies are commonly investigated by means of psychophysical experiments and clinical trials, which is time-consuming for both patient and researcher. Alternatively, strategies can be evaluated using computational models. In this study a computationally efficient model that accurately predicts auditory nerve responses to CI pulse train input is developed. The model includes the 3D volume conduction and active nerve model developed in the Leiden University Medical Center, and is extended with stochasticity, adaptation and accommodation. This complete model includes spatial as well as temporal characteristics of both the cochlea and the auditory nerve. Power-law behaviour in the temporal domain is investigated. The stochastic and adaptive auditory nerve model is used to investigate full-nerve responses to amplitude modulated long duration stimulation. Understanding responses to amplitude modulation is important because current speech coding strategies are based on the principle of speech information distribution through amplitude modulation of the input pulse trains. The model is validated by comparison to experimentally measured single fiber action potential (SFAP) responses to pulse trains published in literature. The effects of different pulse-train parameters such as pulse rate, pulse amplitude and amplitude modulation are investigated. The neural spike patterns produced in response to CI stimulation are very similar to spike patterns obtained with single fiber action potential measurements in animal experiments in response to CI stimulation. Modeled effects of stimulus amplitude, pulse rate and amplitude modulation is similar to the effects seen in animal experiments. Adaptation is found to be an important factor in modeling nerve outcomes to amplitude modulated pulse trains and their spatial effects. The model can be used to predict full auditory nerve responses to electrical pulse trains, and thus to different sound coding strategies. The next step will be to apply this model to evaluate complete auditory nerve responses to different sound coding strategies.

Date: Friday July 22nd, starting at 13h00
Location: CWI, room L017

Speaker: Femke van Wageningen-Kessels
Title: Crowd and traffic flow modelling and simulation

Abstract: Crowd flow models describe and predict the behaviour of large groups of pedestrians. Traffic flow models do the same for traffic on roads. The models are used in a variety of applications, including predictions and control and evacuation planning. We have developed and studied several of such models. In this talk, I will present some of the basics of the models. But the focus will be on efficient numerical methods for simulation of traffic and crowds. The simulation methods that I have been working on are applied to continuum models and often use a moving (Lagrangian) grid. I show examples of such methods applied to both (one-dimensional) road traffic flow and (two-dimensional) crowd flow.
About the presenter: Dr. ir. Femke van Wageningen-Kessels is an independent scholar based in Muscat, the Sultanate of Oman and a guest researcher at Delft University of Technology in The Netherlands. Her main interests include the modelling and simulation of road traffic and crowds. She is specialised in continuum models for both types of flow and has done extensive research on numerical methods for efficient simulations. Van Wageningen-Kessels has a Masters degree in Applied Mathematics from Delft University of Technology and obtained her PhD degree (Cum Laude) from the same university in 2013.

Date : Monday July 4th, starting at 15h00
Location: CWI, room L016

Speaker : Kleopatra Pirpinia
Title : Parameter tuning in deformable image registration using multi-objective optimization applied to breast MR images

Abstract: In deformable image registration, the cost function to be optimized is typically formulated as a linear weighted combination of multiple objectives of interest, such as similarity between the images and transformation smoothness. Successfully tuning the weights associated with these objectives is not trivial, leading to trial-and-error approaches in practice. Such an approach assumes an intuitive interplay between weights, optimization objectives and registration outcome, which, however, is not well established. In this work, we investigate this interplay, using multi-objective optimization as a weight-tuning strategy applied to breast MR deformable image registration. In multi-objective optimization, multiple objectives of interest are optimized simultaneously, causing a set of multiple optimal solutions to exist. To obtain this set of multiple registration solutions, we employ a state-of-the-art multi-objective evolutionary algorithm in combination with an well-known open-source image registration framework. Further, we validate the quality of our solutions by calculating the mean target registration error based on expert-defined anatomical landmarks. The multi-objective weight tuning strategy removes the need to pre-determine a singular combination of objectives via trial-and-error and provides insight into the interaction between objectives of interest, facilitating finding and selecting the preferred registration outcome.