PhD student Sándór Héman from CWI developed a new method to compress a large database, allowing for a much faster transport of data from storage to processor. Furthermore he developed efficient algorithms to make changes within such a compressed storage layout. He will defend his thesis on this topic on Wednesday 28 October, at the VU University Amsterdam.
Companies increasingly collect very large amounts of data for storage in a database management system. This data is constantly being modified, but also analysed. These two processes each require a specific and sometimes conflicting architecture of the database system.
The processor of a computer performs the data analysis. In order to do this, data needs to be transported from storage (usually a hard disk) to the processor. This transport is relatively slow and forms a bottleneck in big data analysis. “I reduced this bottle neck by applying compressing the database to a smaller size,” Héman says. “A fast and transparent compression allows the processor to perform the data analysis without delay.”
However, a compressed database cannot be modified without constant decompressing and recompressing. To circumvent this problem, Héman introduced a technique that stores modifications differentially, like errata in a book, in such a way that they are readily available while reading.
Héman’s research is relevant for every domain that works with big data. This includes web search engines that search and index the entire web, scientific research in for instance astronomy and genomics, and commercial applications like customer data. His research is also applied in the Vectorwise database system. Vectorwise was co-founded by Héman as a CWI spin-off company in 2008, and was sold in 2011 to Actian Corporation, where it is still a successful product.
Text: VU University Amsterdam