I am pleased to announce the next edition of our DSDSD series, introducing two speakers who will deliver the upcoming talks. Next event will be on Friday, June 2nd 2023, from 3:30pm to 5:00pm CET, featuring talks by Laurens Kuiper and Andrew Lamb.
The seminar will be held via Zoom.
Please see below for details on the talks and the speakers.
-----------------------------------------------------------------
1st talk
{ "title": "Shredding deeply nested JSON, one vector at a time", "abstract": "JSON is a popular semi-structured data format. Despite being semi-structured, users often want to analyze it in a structured way, e.g., by analyzing JSON log files to find out what their users are doing. Analytical database systems would be the tool of choice for this, but these systems often cannot process semi-structured data or the nested data such as OBJECTs and ARRAYs found in JSON. DuckDB, however, supports efficient columnar STRUCT and LIST types and, therefore, supports the same nestedness as JSON. Since 0.7.0, DuckDB supports reading JSON files directly as if they were tables, with automatic schema detection. In this talk, I will explain how DuckDB reads JSON and transforms it into vectors for efficient analytics.", "author": "Laurens Kuiper", "bio": "Laurens is a PhD Student at the Database Architectures group at CWI in Amsterdam. He is also a Software Developer at DuckDB Labs. His research interests include OLAP systems, specifically graceful performance degradation when data sizes are larger than memory." }
-----------------------------------------------------------------
2nd talk
Title: Implementing InfluxDB IOx, "from scratch" using Apache Arrow, DataFusion, and Rust
Abstract:
It is easier than ever to build new analytic database systems. The trend towards deconstructed databases, high performance interchange standards, and high quality open source components means that cutting edge performance and connectivity is possible without building everything from scratch in a tightly integrated database system. In this talk, we will describe some key technologies such asĀ Apache Arrow, Parquet, DataFusion, and Arrow Flight, and describe how we use them in InfluxData's new Database system, InfluxDB IOx <https://www.influxdata.com/blog/influxdb-engine/>.
Bio: Andrew Lamb is a Staff Engineer at InfluxData, working on InfluxDB IOx, and a member of the Apache Arrow PMC. His experience ranges from startups such as Vertica to large multinational corporations and distributed open source projects, and has paid leadership dues as an architect and VP. He holds an SB and MEng from MIT in Electrical Engineering and Computer Science.
-----------------------------------------------------------------
We look forward to seeing you all during next session!
Ilaria, on behalf of the CWI Database Architectures team
--
dsdsd-list mailing list
dsdsd-list@cwi.nl
https://lists.cwi.nl/mailman/listinfo/dsdsd-list