New technology database cracking speeds up search process in large data sets


Digital data of companies and organizations are getting more and more extensive. Also in science, larger amounts of data become available in i.e. astronomical observations and DNA-analysis. Finding the right information in these data sets is getting more complex and requires a new look at database technologies. In his thesis ‘Database Cracking: Towards Auto-tuning Database Kernels’ researcher Stratos Idreos of the Centrum Wiskunde & Informatica (CWI) in Amsterdam, developed a new technology to speed up the search process. 

Whether we transfer our money online, book a flight or consult a public file, to achieve the ultimate query processing performance, indices are widely used. In this structure a search index is set up and it records what people search for. The disadvantage of this strategy is that production and maintenance take up time of the administrator and hence have become more expensive. The database cracking technology developed by Idreos is the first technique in which the database system takes over the role of the administrator and in which the system itself has an adaptive capacity. On June 24 Idreos received his PhD degree at the University of Amsterdam.

Database cracking doesn’t create a search strategy upfront, but reorganizes the data in such a way that future queries have faster access. Because no search index needs to be developed the new technology saves time and money. Idreos illustrates the database cracking principle through a disordered stack of playing cards: "If a user asks for a two of hearts, the system also may select all the hearts along the way and make a stack with only hearts and a stack with only non-hearts. In a following search to all the clubs the system knows that it needs only look at the stack non-hearts."

The CWI research group Database Architectures has applied the database cracking method to the Sloan Digital Sky Server, the world’s largest scientific database containing more then three terabyte of astronomical data. In this database the cracking technology has speeded up the search process by factor ten to twenty. The platform for the Sloan Digital Sky Server is realized by the open-source database system MonetDB that is developed by CWI and worldwide in use.


Picture: Shutterstock