CWI develops new tools for data exploration

PhD student Thibault Sellam from CWI developed new tools to explore large databases. It allows explorative users to find interesting information hidden inside a database, with little or no knowledge of the data beforehand.

 

PhD student Thibault Sellam from CWI developed new tools to explore large databases. It allows explorative users to find interesting information hidden inside a database, with little or no knowledge of the data beforehand. He will defend his thesis on this topic on Thursday 3 November at the University of Amsterdam (UvA). Sellam carried out his research at the Database Architectures (DA) group at CWI, supervised by Prof. Martin Kersten and supported by the Dutch national program COMMIT/.

Database management systems rely on an implicit pact with the user. They give quick and correct answers, provided that they get precise and complete questions, expressed correctly in a query language such as SQL. This is a problem for data explorers who want to get a global overview of the database and interesting new facts hidden inside, while having little or no knowledge of the data. They typically resort to trial-and-error, which is tedious and prone to error for large databases

Sellam presents four database assistants to help these users compose and refine interesting new queries:

  •    Claude generates hypotheses for databases by exploiting statistical dependencies between various dimensions of the database
  •    Bleau helps users to build and refine queries by allowing them to select and project clusters of objects
  •    Ziggy shows what makes a selection of objects unique by highlighting the differences between those and the rest of the database
  •    Raimond detects and organizes text fragments describing a news event, for instance in social media data

Many of the results of this research are implemented in the R package findviews.