Fast and efficient data analysis thanks to new database architectures

CWI’s long-term software development has led to fundamentally new database architectures that have transformed a trillion-dollar global database market.

Publication date
5 Mar 2024

"Databases and database systems are fundamental ingredients of every IT-system", says Peter Boncz, group leader of the Database Architectures group at CWI. "Therefore our research is fundamental. Everyone builds solutions with these tools. Everyone is thinking in the terms database architects came up with. These tools set the limits of imagination for all those application developers."

Peter Boncz. Picture: Ivar Pel
Peter Boncz. Picture: Ivar Pel

Whether they are retail chains, banks, web shops or hospitals, they all collect data and want to get as many insights from it as possible. After gathering the data, it is brought together, filtered, cleaned, aggregated and then analyzed using dashboards, statistical tools or machine learning. Databases are pivotal in this data pipeline. The latest CWI result in database architectures is the invention of so-called embedded analytics, designed to work within running processes, without the need of a separate server. A number of prior scientific inventions made at CWI have been crucial in this: column stores, vectorized query execution and fast data compression methods.

Hannes Mühleisen speaks at data summit in the US.
Hannes Mühleisen

"Embedded analytics provide big savings because you don't have to drag around as much data and it's easy to build into a larger data pipeline", says Hannes Mühleisen, senior researcher at the Database Architectures group. In 2019 Mühleisen launched the open-source database system DuckDB together with his colleague Mark Raasveldt. DuckDB is small, agile and efficient. It requires ten to a hundred times less hardware capacity than competitor Spark. Unlike Pandas, another popular data science tool, it can handle data that is larger than memory and can profit from parallel processing using multiple cores, present in all computers. DuckDB rapidly became a huge success, with more than two million downloads per month at the beginning of 2023.

"The development of DuckDB was made possible by the great freedom I had at CWI to invent something myself," Mühleisen says. "I had the conviction that for most data problems you don’t need a scale-out of the data to multiple computers. I believed that you can do much more on one computer than most people thought. In the coming years I would like to expand that vision, on the one hand, to significantly reduce the carbon footprint of IT systems and, on the other hand, to give users more control over their own data, thus limiting the power of cloud companies."

Spin-offs

What the Database Architectures group does, is very difficult to realize at a university as the projects significantly exceed the size of a PhD track, nor in companies where the focus is on relatively short term results. Boncz: "For a database system, you have to work on it with at least five people for at least ten years. You can't have fifty people do it in a year. It is CWI’s commitment to invest in long-term software development that led our group to produce MonetDB, VectorWise, and now DuckDB."

In 2021 Mühleisen and Raasveldt founded the spin-off company DuckDB Labs, which provides services and development for DuckDB. In the fall of 2022 DuckDB Labs helped to create the startup company MotherDuck, which connects DuckDB to the cloud. MotherDuck managed to raise 47.5 million dollar in funding.

Datasystems ecosystem

Scientific breakthroughs that inspire new businesses fits into Boncz's long-term vision for the Netherlands to create a data systems ecosystem of research, education and business. Gradually he is seeing the first results of that vision. For example, CWI has been instrumental in the establishment of the R&D center of the American company Databricks in Amsterdam, for which Databricks invested a hundred million euro in the past four years. "You could say that a hundred million euro has been pumped into the Dutch economy thanks to our work", says Boncz.

Boncz and Mühleisen are proud that CWI's long-term software development, which is part of its mission, is having such an impact on database applications used worldwide. Boncz: "If you look at the evolutionary lineage of all database systems, you can say that of the analytical systems 85% have a strong CWI signature." The other systems prominently include Snowflake, which achieved the biggest stock launch ever in 2021 and which was co-founded by Marcin Zukowski, a PhD student from CWI’s Database Architectures group. Zukowski had previously created the VectorWise system.

Author: Bennie Mols