Advanced Database Techniques

Scientific databases: Seismology

The laboratory work in 2010 for this course consists of a sizeable project geared at getting exposed to a real-life system design problem using the MonetDB DBMS as a target platform in the context of managing seismic information. Unlike ordinary labwork, with clearly defined task and road to success, this project is open-ended. We confront you with the hard life of a scientist through the steps a researcher has to follow.

The group is split into teams and assigned a senior researcher for guidance and backup. The team with the most succesful demonstration at the end of the semester will find its solution to become part of a portal for scientific database examples.

Team name Senior Members

Fabian Groffen

Stefan Manegold/Jenny Zhang

Martin Kersten

The Scouting Phase

The first section runs with a deadline of September 26, 23:59.

In the scouting phase we enter a field we do not know, i.e. seismic data gathering, storage and retrieval. A quick search on Google on seismic data produces tons of information, much of which is hard to organize and digest. Many institutions seem active, collect data, and disseminate it to their clients. For example, within the Netherlands KNMI has a data center with a lot of data at its (and our) disposal (http://www.orfeus-eu.org/)

In the scouting phase we explore these internet resources and try to answer the following (incomplete list of) questions:

What is seismology all about? Give a short characterisation of the field.
What kind of data is being collected? Since when? How much? and is there a trend?
Where are the data centers providing data? How?, Are there standards for data exchange?
Are there libraries and tools that can be used in the context of a DBMS? Which would fit the MonetDB context?
What are the typical 10 queries? what queries can *not* be handled ?
Is there an Entity-Relationship diagrom, or relational scheme to describe the content of the seismology repositories?
If there is no clear relational scheme, what would be a good starting point?
What would be the unique selling point for a MonetDB enhanced seismology warehouse
What questions did we not ask?

As far as we are aware, there is no single document answering all such questions. The purpose of the scouting phases is to built up a frame of reference and share your results with the other teams. A 5-page document should be written as a LaTeX science paper using the ACM SIG Proceedings Templates.

The Design Phase

September 28, we will share the papers and discuss them during the lecture and derive the set of minimal requirements for each implementation track. The input is used by the teams to design and plan for the design of the seismic datawarehouse demonstrator. It is conceivable that some portions can be shared between the teams, e.g. loading, display, but this should be negotiated to ensure all teams take an equal burden to realise the demonstrator.

The Implementation Phase

Beginning of November, the prototypes should have been realized and plans should be made to scale to larger datasets and online experimentation.

The labwork is graded based on the quality of the reports and the maturity of the subsequent implementation.

Questions and request for assistance can be sent to a mailing list. The labwork can be undertaken by teams of at most 2 students. The target database platform is MonetDB/SQL, which is developed at CWI. MonetDB is distributed as Windows installers, Fedora RPMs, Debian & Ubuntu packages, and source tarball, which are all available from the MonetDB download area. Please consult the MonetDB website for installation and compilation advice. End-user experiences and questions can be emailed to monetdb-users@lists.sourceforge.net.