DuckDB: Introducing a new class of data management systems

As CWI database architecture researchers, Hannes Mühleisen and Mark Raasveldt invented a new, much more efficient database technology for analysis. After its introduction in 2021 it now reaches two million downloads per month.

Publication date
23 May 2023

"The name? DuckDB comes from my late pet duck Wilbur", CEO and founder Hannes Mühleisen reveals. What DuckDB is, however, requires a more elaborate explanation. Mühleisen: "As database architecture researchers at CWI, it struck co-founder Mark Raasveldt and I that out of four possible directions of database technologies, only three types existed."

There are analytical and transactional workloads. Online Analytical Processing (OLAP) is optimised for queries and reports retrieved from large amounts of data. Online Transactional Processing (OLTP), on the other hand, supports the execution of a large number of real-time transactions. Another division in databases are the ‘client/server’ and the ‘in-process’ types. Put in a matrix, the quadrant for the combination of OLAP and in-process remained empty.

"That made it clear to us there was an opportunity for this new type of database technology", says Mühleisen. "So we set out to develop a prototype in 2018. It didn’t exist until then because it is complex to make. But Mark and I had a pretty good notion of how to tackle that. In 2021, we were ready to start a spin-off from CWI – which is what we did."

Open Source

As they already assumed, there is a lot of demand for in-process database analysis technology. Mühleisen: "Especially as a component built into an application, it comes in very handy. Within two years, DuckDB managed to reach two million downloads a month worldwide." More than one-third of the visitors to the website come from the USA; other users are located in Germany, Canada, France, the Netherlands, the UK and China. DuckDB is widely used in sciences which use huge datasets, such as genetics and astronomy. And it is even used in satellites.

The reason for the popularity: the state-of-the-art data engine. Its efficiency saves resources and energy. In practice, it enables analyses on a single laptop, which previously required dozens or even hundreds of computers. Jackpot! Well, not quite.

The founders chose to launch DuckDB as Open Source software. Mühleisen: "We find it unethical to make proprietary software based on taxpayer funded research. Researchers should always remember who pays their bill in the end." The software project is managed by a non-profit foundation. If you donate to it, you get to provide input to the development roadmap.

Commercial services

The Open Source set-up does not mean there is no commercial activity though. The company DuckDB Labs offers paid services based on the DuckDB Open Source platform. Its customers include Google. Mühleisen explains: "There is only one version of DuckDB. Our roadmap describes which features we’d like to add and when. Customers pay for us to develop extra features with priority or to develop features that we didn’t envision so far but make sense to add. All these features become available to all other users."

An interesting feature that DuckDB Labs developed is to work around computation and storage limitations as much as possible. Mühleisen: "Part of our success comes from listening closely to people who work with DuckDB, which is quite rare in fundamental research. We learned that it is important for our users – mostly data analysts and not programmers – that they can always finish their query, regardless of hardware limitations. So we saw to it that when a query uses up all memory capacity, it doesn’t abort but automatically switches to using other available options, such as using disk space."

Mühleisen tends to think that DuckDB represents the future of data analysis. "I’m biased, but I’m far from the only one to think that", he says. That counts for something, because as a senior researcher at CWI, professor of Data Engineering at Radboud University and employer of database specialists at DuckDB Labs, he is pretty well acquainted with the latest developments within the database research community. "I can assure you that during the next couple of years, we will have enough work on our hands."

Source: I/O magazine

This article appeared in I/O magazine in april 2023

Header photo: Unsplash