Outsmarting digital intruders: AI in cybersecurity

Universities, hospitals, businesses, and even government institutions have in recent years faced cyberattacks that paralyzed their systems. These incidents highlight how vulnerable our digital infrastructure is. Etienne van de Bijl (CWI’s Stochastics group) studied how artificial intelligence can detect such threats. Today, he defends his PhD thesis on this subject at Vrije Universiteit Amsterdam.

Many existing security systems operate with a kind of dictionary of known attacks. Once criminals change their methods, these systems are no longer able to recognize the threat. Machine learning, algorithms that learn to identify patterns in data, offers a more flexible way to respond to novel and previously unseen variants.

Etienne van de Bijl

Smart algorithms as digital watchdogs

Etienne van de Bijl investigated whether machine learning could help in the early detection of denial-of-service (DoS) and distributed denial-of-service (DDoS) attacks, in which a server is flooded with so many requests that regular users can no longer gain access. He also examined web-based attacks such as SQL injections (where malicious actors attempt to access a database via input fields or forms) and cross-site scripting (where harmful code is inserted into a website and then unknowingly executed by visitors).

Another widely used method studied by Van de Bijl is the brute force attack: the automatic testing of countless password combinations until one proves successful.

Recognizing new variants

His research shows that AI systems can, in some cases, recognize new types of attacks even without explicit training. A model designed to detect brute force attacks, for instance, may also pick up on certain variants of DDoS attacks. The reverse is not always true: a model that performs well on one type of attack is not necessarily suited to another.

The findings also revealed that more training data does not automatically mean better performance. In some situations, a small and carefully curated dataset yields better results than a large, indiscriminate collection of data. Dataset quality is therefore crucial to effective cyber defence, Van de Bijl concludes.

Learning with limited data

A common challenge in practice is the lack of labelled data: cases where it is not clear which network activity is normal and which is suspicious. To address this, Van de Bijl developed ULTRA, a method combining two techniques: active learning and transfer learning. Active learning enables the system to select the most promising examples for an expert to label as either ‘attack’ or ‘normal’. Transfer learning, meanwhile, draws on knowledge from another domain or earlier dataset to accelerate learning in a new system. Together, these methods allow a detection system to produce meaningful results at an early stage, even with limited data.

A valuable instrument

In his dissertation, Van de Bijl concludes that machine learning can be a valuable instrument for detecting cyberattacks. At the same time, challenges remain: there is a need for better algorithms that have been rigorously tested and whose workings are transparent and explainable. Only then can AI become a trustworthy part of our digital defence.

About the thesis

Title: From Baselines to Breakthroughs: Fundamentals and Applications of Machine Learning in Cybersecurity

PhD supervisors: Rob van der Mei (CWI/VU) and Sandjai Bhulai (VU)

Header picture: Shutterstock