Tidying up while you search: CWI paper Database Cracking wins CIDR Test of Time Award

A 2007 publication titled Database Cracking has received a Test of Time Award from CIDR, the Conference on Innovative Data Systems Research. Authors Stefan Manegold, the late Martin Kersten, and Stratos Idreos (all at CWI at the time) describe a way for a database to gradually become better organized, based on what users actually look up.

Anyone who has ever searched for a book in a library knows the value of a good catalogue. Databases have a similar “catalogue” too: an index, a tool that helps you find what you need faster. But building and maintaining an index costs time, money, and storage space, so in practice, indexes are often missing. “That was the reason for the authors to turn everything around,” says Peter Boncz, head of Database Architectures, the group the authors were part of at the time.

The idea behind cracking

Their idea was to do a bit of tidying while you search. They called it database cracking, in the sense of breaking data up into manageable pieces. The idea behind “cracking” is simple. Imagine your room is messy and you keep looking for your gym bag. You have two options: tidy the whole room perfectly first, including places you never use (which takes a lot of time), or tidy only the spot you need at that moment. If you keep coming back to that spot, that corner gradually becomes neater.

That second approach is the idea of database cracking. Instead of organizing everything neatly in advance, the database uses each query as a trigger to place a small part of the data in a more sensible order for future queries. In the article, the authors call this “continuous physical reorganization”, with the data being divided into increasingly useful “pieces”. The system also remembers where those pieces are, so it does not have to look everywhere again next time.

The result: parts of the database that are searched become faster and faster to access. Parts that nobody ever searches remain unorganized, because tidying there would not pay off.

A self-learning system

Boncz calls the idea interesting “because end users don’t know all the indexing tricks”, and because the database does not necessarily have to organize data in the order it was originally inserted.
“The nice thing about cracking is that you don’t need people who know how to index a database.”
In that sense, the system teaches itself how to search faster.

That CIDR is now honouring this paper fits the nature of the prize. The Test of Time Award recognizes research that continues to shape the field years later. The authors of the paper are getting the award “for opening a new research direction into continuous physical reorganization based on query workload”. Boncz emphasizes that Database Cracking mainly had scientific impact: many follow-up studies explored variants of cracking. “In practice, this isn’t used yet. That’s because tidying is relatively expensive, but people are working on it. So the idea does live on.”

About the authors

The award-winning paper was written by three researchers who worked in CWI’s Database Architectures (DA) group at the time:

  • Stefan Manegold is still affiliated with the DA group as a senior researcher, and previously led the group.
  • First author Stratos Idreos was a PhD student at CWI in 2007. The Database Cracking paper formed the basis of his PhD research. His dissertation won the 2011 ACM SIGMOD Jim Gray Doctoral Dissertation Award, and he also received the ERCIM Cor Baayen Early Career Researcher Award. He is now a Professor of Computer Science at Harvard’s John A. Paulson School of Engineering and Applied Sciences.
  • Martin Kersten (1953–2022) led the DA group in 2007 and was appointed CWI Fellow in 2011.

About CIDR

CIDR stands for Conference on Innovative Data Systems Research: an international conference on new ideas in data systems. CIDR started in 2002 as a forum for innovative system architectures, alongside the large mainstream database conferences. The first CIDR conference was held in 2003.

Since 2020, CIDR has taken place every year and alternates between Amsterdam and the USA. This year, CIDR will be held in the USA (Santa Cruz, CA) from 18–21 January, where the Test of Time Award is on the programme to be presented.

Header photo: Andreas Kipf (Technische Universität Nürnberg).