Dutch Seminar on Data Systems Design

The next edition of our DSDSD series, introducing three speakers who will deliver the upcoming talks. We are going to have our regular monthly session on Friday, July 7th 2023 at 3:30pm (2 talks), plus one additional session on Thursday, 6th July at 11am (1 talk). The seminar will be held via Zoom.

When
6 Jul 2023 from 11 a.m. to 6 Jul 2023 noon CEST (GMT+0200)
Where
The seminar will be held via Zoom
Web
Add

I am pleased to announce the next edition of our DSDSD series, introducing three speakers who will deliver the upcoming talks. We are going to have our regular monthly session on Friday, July 7th 2023 at 3:30pm (2 talks), plus one additional session on Thursday, 6th July at 11am (1 talk).

The seminar will be held via Zoom. The link will be sent separately in two following e-mails (one for Thursday, one for Friday).

Please see below for details on the talk and the speaker on Thursday.

-----------------------------------------------------------------

1st talk - Thursday 11am - Semih Salihoglu

Title: Accurate Summary-based Cardinality Estimation Through the Lens of Cardinality Estimation Graphs

Abstract: This paper is an experimental and analytical study of two classes of summary-based cardinality estimators that use statistics about input relations and small-size joins in the context of graph database management systems: (i) optimistic estimators that make uniformity and conditional independence assumptions; and (ii) the recent pessimistic estimators that use information theoretic linear programs (LPs). We begin by analyzing how optimistic estimators use pre-computed statistics to generate cardinality estimates. We show these estimators can be modeled as picking bottom-to-top paths in a cardinality estimation graph (CEG), which contains sub-
queries as nodes and edges whose weights are average degreestatistics. We show that existing optimistic estimators have either undefined or fixed choices for picking CEG paths as their estimates and ignore alternative choices. Instead, we outline a space of optimistic estimators to make an estimate on CEGs, which subsumes existing estimators. We show, using an extensive empirical analysis, that effective paths depend on the structure of the queries. While on acyclic queries and queries with small-size cycles, using the maximum-weight path is effective to address the well known underestimation problem, on queries with larger cycles these estimates tend to overestimate, which can be addressed by using minimum weight paths. We next show that optimistic estimators and seemingly disparate LP-based pessimistic estimators are in fact connected. Specifically, we show that CEGs can also model some recent pessimistic estimators. This connection allows us to adopt an optimization from pessimistic estimators to optimistic ones, and provide insights into the pessimistic estimators, such as showing that they have combinatorial solutions.

Bio: Semih Salihoglu is an Associate Professor at University of Waterloo. His research focuses on developing systems for managing, querying, or doing analytics on graph-structured data. His main on-going systems projects include Graphflow (http://graphflow.io/), which is a new graph database management system that integrates novel storage, indexing and query processing techniques, and GraphSurge (https://github.com/dsg-uwaterloo/graphsurge), which is a system designed to run batch computations over multiple graph views with significant computation sharing. He holds a PhD from Stanford University and is a recipient of the 2018 VLDB best paper award.

-----------------------------------------------------------------