CWI Database Architectures Afternoon
CWI Database Architectures Afternoon
For the occasion of the PhD defense of Romulo Goncalves on Friday March 22
(cf., http://www.cwi.nl/events/phd-defence-romulo-goncalves)
we are delighted to announce the following Database Architectures afternoon at CWI
with presentations by
C. Mohan (IBM Almaden Research Center), Gustavo Alonso (ETH Zurich) and Ted Dunning (MapR Technology) :
Location: Centrum Wiskunde & Informatica (CWI)
Room L017
Science Park 123
1098 XG Amsterdam
Date: Friday March 22 2013
Agenda:
14:30 - 14:35 Opening & Welcome
14:35 - 15:35 Gustavo Alonso (ETH Zurich) "Performance in the Multicore Era" (See Abstact & Bio below)
15:35 - 15:45 Coffee break
15:45 - 16:45 C. Mohan (IBM Almaden) "Implications of Storage Class Memories (SCMs) on Software and Hardware Architectures" (See Abstact & Bio below)
16:45 - 16:55 Break
16:55 - 17:55 Ted Dunning (MapR Technology) "Fast, High Quality, Single Pass k-means Clustering" (See Abstact & Bio below)
17:55 - 18:00 Closing
Abstracts & Bios:
Performance in the Multicore Era
Gustavo Alonso, ETH Zurich
Abstract:
The pace and nature of the changes taking place at the processor and computer architecture level are a formidable challenge to system designers. Very few of the established assumptions about bottlenecks, optimizations, implementation techniques, and algorithm behavior hold when modern multicore machines are involved.
In this talk I will briefly overview some of our recent work in this area: from optimization of operators for multicore to the design of hardware accelerators in data appliances. Then I will go in depth on an analysis of the behavior of relation join operators on multicore machines. The results of this work provide valuable insights on the problems of exploiting the parallelism inherent on multicore. I will also discuss on the side some of the problems one encounters when doing performance analysis work in a research community that does not enforce any standards for reporting and comparing to related work. Problems that are well known but that the advent of multicore make even more acute.
Bio:
Gustavo Alonso is a professor at the Department of Computer Science at ETH Zurich in Switzerland. At ETHZ, he is part of the Systems Group and the Enterprise Computing Center. Gustavo has a degree in electrical engineering from the Madrid Technical University in Spain and an M.S. and Ph.D. in Computer Science from UC Santa Barbara. Before joining ETH, he worked at the IBM Almaden Research Center. Gustavo's research interests encompass almost all aspects of systems, from design to run time. Most of his research these days is related to multi-core architectures, large clusters, FPGAs, and cloud computing, with an emphasis on adapting traditional system software (OS, database, middleware) to these new hardware platforms.
Gustavo is a Fellow of the ACM and Senior Member of the IEEE. He has been awarded the AOSD 2012 Most Influential Paper Award, the VLDB 2010 Ten Year Best Paper Award, and the ICDCS 2009 Best Paper Award for work on Remote Direct Memory Access. He has served in the VLDB Endowment, the ACM/IFIP/IEEE Middleware Steering Committe, as an associate editor of the VLDB Journal, as Chair of EuroSys, and as general chair or PC-chair/vice-chair in numerous conferences (VLDB, ICDE, Middleware, BPM, ICDCS, IEEE MDM).
Implications of Storage Class Memories (SCMs) on Software and Hardware Architectures
C. Mohan, IBM Almaden Research Center
Abstract:
Flash memories have been in widespread usage for a while but they have had some performance and reliability problems which have made them unsuitable for long term storage of traditional database data. A new class of memory called Storage Class Memories (SCMs) are emerging which are built using different technologies than flash devices. SCMs overcome many of the shortcoming of flash devices while approaching the cost of flash memories. SCMs fall in between DRAM and traditional disk storage along many dimensions (performance, cost, energy usage, ....). As a result, large SCM-based memory systems will be built. While main memory database management systems (MMDBMSs) companies like TimesTen and SolidDB have been around for a while, those companies have been acquired recently by Oracle and IBM, respectively. SCMs will permit the sizes of databases managed by MMDBMSs to be very large while being cheaper than those using only DRAM. SCMs may be viewed as disks or as memory from an architectural perspective. Depending on the viewpoint, the implications on DBMS architectures will be very different. Some preliminary ideas on usage of a small amount of non-volatile memory realized by using battery-backed DRAM was presented in a paper design called Safe RAM in VLDB 1989. Technology has evolved tremendously in 2 decades and it is time for us to revisit system architectures.
In IBM Research, we have been working on multiple projects to understand the implications of SCMs on software and hardware architectures in general and on DBMS architectures in particular. Traditional locking, recovery, storage management and query processing ideas would need to be extended to take advantage of SCMs. In this talk, I will discuss what we have learnt from our investigations and what needs to be further explored. I believe this presentation will generate a lot of discussions and debates. This talk should be of interest to hardware, software, systems and storage people both in industry and academia.
Bio:
Dr. C. Mohan has been an IBM researcher for 30 years in the information management area, impacting numerous IBM and non-IBM products, the research community and standards, especially with his invention of the ARIES family of locking and recovery algorithms, and the Presumed Abort commit protocol. This IBM, ACM and IEEE Fellow has also served as the IBM India Chief Scientist. In addition to receiving the ACM SIGMOD Innovation Award, the VLDB 10 Year Best Paper Award and numerous IBM awards, he has been elected to the US and Indian National Academies of Engineering, and has been named an IBM Master Inventor. This distinguished alumnus of IIT Madras received his PhD at the University of Texas at Austin. He is an inventor of 38 patents. He serves on the advisory board of IEEE Spectrum and on the IBM Software Group Architecture Board’s Council. More information can be found in his home page at http://bit.ly/cmohan
Fast, High Quality, Single Pass k-means Clustering
Ted Dunning (MapR Technology)
Abstract:
I will describe an implementation of recent results that provide high quality k-means clustering at very high speed. For well clusterable data, this algorithm provides good bounds on quality, but practically speaking, it makes clustering practical in many applications by providing roughly 3 orders of magnitude speedup relative to the standard algorithm based on Lloyd's initial efforts. In addition, the algorithm is highly amenable to implementation using map-reduce and shows essentially linear speedup. Just as significant, this new algorithm allows clustering with a very large number of clusters which makes it practical to use as a feature extraction algorithm or set up for a nearest neighbor search.
I will provide an outline of how and why this class of algorithms work so well, and demonstrate the practical aspects of the implementation. I will also describe how this algorithm can be integrated into an industrial scale nearest neighbor implementation.
Bio:
Ted Dunning has been involved with a number of startups with the latest being MapR Technologies where he is Chief Application Architect working on advanced Hadoop-related technologies. He is also a PMC member for the Apache Zookeeper and Mahout projects. Opinionated about software and data-mining and passionate about open source, he is an
active participant of Hadoop and related communities and loves helping projects get going with new technologies.
Prior to his work in the startup world, he worked as a research at the Computing Research Laboratory in New Mexico under Yorick Wilks. His published work includes advances in statistical natural language processing, genomics and machine translation. One paper, in particular, on the statistics of rare coincidences has been widely cited academically and used in practical systems.

