Speakers CWI's Lectures on Database research (2020)

Video recordings and the presentations of the event are now available to view.

* Anastasia Ailamaki (EPFL, Switzerland)


Nothing is for granted: Making wise decisions using real-time intelligence
In today’s ever-growing demand for fast data analytics, heterogeneity severely undermines performance. On one hand, data format variety forces people to load their data into a single format, spending tons of resources and often losing valuable structural information. Or, requires a separate database system for each data type plus an integration tool to bring all the results together. All options are costly and waste valuable resources. On the other hand, “franken-chips” equipped with different types of potent compute units are severely under-utilised when running data analytics, as we’re used to coding with a CPU in mind and other core types are employed opportunistically, as an accelerating luxury. Nevertheless, hardware roadmaps indicate increasing levels of compute heterogeneity, and accelerator-level parallelism (ALP) is indeed the new way to make the best out of any hardware platform. Writing fast as well as portable programs, however, is an unsolved tradeoff. Real-time intelligence makes decisions during execution, when all relevant information is available for optimal utilisation of resources. I will show how just-in-time data virtualisation and code generation technologies can be used to execute queries fast across all kinds of data without costly preparation or heavy installations, as well as enable excellent utilisation of different hardware devices. 

Bio:
Anastasia Ailamaki is a Professor of Computer and Communication Sciences at the École Polytechnique Fédérale de Lausanne (EPFL) in Switzerland and the co-founder of RAW Labs SA, a Swiss company developing real-time analytics infrastructures for heterogeneous big data. She earned a Ph.D. in Computer Science from the University of Wisconsin-Madison in 2000. She works on strengthening the interaction between the database software and emerg- ing hardware and I/O devices, and on automating data management to support computationally-demanding, data- intensive scientific applications. She has received the 2019 ACM SIGMOD Edgar F. Codd Innovations and the 2020 VLDB Women in Database Research Award. She is also the recipient of an ERC Consolidator Award (2013), a Finmeccanica endowed chair from the Computer Science Department at Carnegie Mellon (2007), a European Young Investigator Award from the European Science Foundation (2007), an Alfred P. Sloan Research Fellowship (2005), an NSF CAREER award (2002), and ten best-paper awards in database, storage, and computer architecture conferences. She is an ACM fellow, an IEEE fellow, the Laureate for the 2018 Nemitsas Prize in Computer Science, and an elected member of the Swiss, the Belgian, and the Cypriot National Research Councils. She is a member of the Academia Europaea and of the Expert Network of the World Economic Forum.

* Gustavo Alonso (ETH, Switzerland)


Data Processing in the Era of Specialization
Current trends in workloads, cloud computing, as well as in hardware have led to a new era of specialization. Increasingly, the needs of different applications are being served through specialized systems tailored to those applications as a way to better cope with demanding throughput and/or latency requirements. The advent of machine learning as one of the main drivers for data processing has also accelerated the trend towards specialization and not only at the software level but also at the hardware level. In this talk I will describe our efforts to explore specialization from a data processing perspective, including our new hardware platform (Enzian) and the software infrastructure we are building on top of it to facilitate data processing.

Bio:
Gustavo Alonso is a Professor of Computer Science at ETH Zürich where he is a member of the Systems Group (www.systems.ethz.ch). He has a degree in electrical engineering from the Madrid Technical University as well as a M.S. and Ph.D. degrees in Computer Science from UC Santa Barbara. Gustavo's research interests encompass almost all aspects of systems, from design to run time. He works on distributed systems, data processing on data centers and the cloud, as well as hardware acceleration using FPGAs. Gustavo has received numerous awards for his work, including four Test-of-Time awards for contributions to databases, programming languages, mobile computing, and systems. He is a Fellow of the ACM and of the IEEE as well as a Distinguished Alumnus of the Department of Computer Science of UC Santa Barbara.

* Gerhard Weikum (MMPI, Germany)
Machine Knowledge: How I Stopped Worrying about Databases and Started Loving the Web

Thirty years ago, databases were premium assets in the digital world upon which enterprises and public services relied. My career started in this vibrant research community, with focus on performance and dependability.
At that time, AI had the grand vision of facilitating machines with comprehensive world knowledge to power intelligent applications. Back then, this looked like an unreachable goal.

The rapid growth of rich contents on the Web and the advent of Wikipedia became game-changing factors and brought the formerly elusive goal into reach.
Over the last fifteen years, huge knowledge bases, also known as knowledge graphs, have been automatically constructed from Web data and text sources,
and have become a key asset for search, analytics, recommendations, language processing and data integration. A large part of my work has been devoted to this theme. This talk reviews these advances and discusses lessons learned and new research opportunities.

So it seems that I left the database research community. The truth, however, is that the notion of databases has changed. Today, it comprises Web tables, data lakes and semi-structured contents, all of which play a major role in constructing and curating knowledge bases. The themes of data, Web contents and knowledge discovery have become intertwined, and the community at large keeps thriving.

Bio:
Gerhard Weikum is a Scientific Director at the Max Planck Institute for Informatics in Saarbruecken, Germany, and an Adjunct Professor at Saarland University.
He co-authored a comprehensive textbook on transactional systems, received the VLDB Test-of-Time Award 2002 for his work on automatic database tuning, and is one of the creators of the YAGO knowledge base which was recognized by the WWW Test-of-Time Award in 2018.
Weikum received the ACM SIGMOD Contributions Award in 2011, a Google Focused Research Award in 2011, an ERC Synergy Grant in 2014, and the ACM SIGMOD Edgar F. Codd Innovations Award in 2016.

Patrick Valduriez (Inria & LeanXcale, France)

Distributed Database Systems: the case for NewSQL

NewSQL [Valduriez & Jimenez-Peris 2019] is the latest technology in the big data management landscape, enjoying a fast-growing rate in the DBMS and BI markets. NewSQL combines the scalability and availability of NoSQL with the consistency and usability of SQL. By blending capabilities only available in different kinds of database systems such as fast data ingestion and SQL queries and by providing online analytics over operational data, NewSQL opens up new opportunities in many application domains where real-time decision is critical. Important use cases are eAdvertisement (such as Google Adwords), IoT, performance monitoring, proximity marketing, risk monitoring, real-time pricing, real-time fraud detection, etc.
NewSQL may also simplify data management, by removing the traditional separation between NoSQL and SQL (ingest data fast, query it with SQL), as well as between operational database and data warehouse / data lake (no more ETLs!). However, a hard problem is scaling out transactions in mixed operational and analytical (HTAP) workloads over big data, possibly coming from different data stores (HDFS, SQL, NoSQL). Today, only a few NewSQL systems have solved this problem. In this talk, I introduce the solution for scalable transaction and polystore data management in LeanXcale, a recent NewSQL DBMS.

Bio:

Patrick Valduriez is a senior scientist at Inria, France, and the scientific advisor of the LeanXcale company. He has also been a professor of computer science at University Pierre et Marie Curie (UPMC), now Sorbonne University, in Paris (2000-2002) and a researcher at Microelectronics and Computer Technology Corp. in Austin, Texas (1985-1989).

He is currently the head of the Zenith team that focuses on data science, in particular, scientific data management. He has authored and co-authored many technical papers and several textbooks, among which “Principles of Distributed Database Systems” (with Professor Tamer Özsu, University of Waterloo). He has served as PC chair of major conferences such as SIGMOD and VLDB. He was the general chair of SIGMOD04, EDBT08 and VLDB09.

He received prestigious awards and prizes. He obtained several best paper awards, including VLDB00. He was the recipient of the 1993 IBM scientific prize in Computer Science in France and the 2014 Innovation Award from the French Academy of Science. He is an ACM Fellow.