The Database Architectures (DA) research group is specialized in software architecture for large-scale data systems, and is quite well known in both industry and academia for its work on column stores, vectorized execution and recently DuckDB. An MSc project here often is a stepping stone to a PhD track or a job in a tech company. The group has a lot of expertise in the interaction between computer architecture and database architecture. The DA group builds large software systems and has been involved in >5 spin-off companies in the past decade.
Options for MSc projects
One option is to do an internal MSc project at CWI. In this case, you will be working from CWI at Science Park, and you would also be entitled to an CWI internship grant of EUR ~400/m if your average grade in the MSc program is 8 or higher.
Peter Boncz is also professor at VU so would be your primary MSc advisor there, and can act as co-advisor for VU students if other CWI researchers are primary advisor. We are open to students from other universities as well - you then need a co-advisor from that university. The DA group also has connections in Leiden (Stefan Manegold) and Nijmegen (Hannes Mühleisen) who can act as advisor for students from these universities. Peter Boncz is Fellow at TU Munich and we have MSc students from there as well.
How to apply
We will however be picky in selecting MSc students; the first application step is to send your CV and grade list (boncz@cwi.nl); the next would be to come over to CWI and talk about any of the topics in the below list, which is, grouped by topic area (note: all require C++ programming skills):
Graph-related (DuckPGQ - Daniel):
- C1: Graph Algorithms Library
- C2: Optimizing GraphRAG using DuckPGQ
Incremental view maintenance (OpenIVM - Ilaria):
- C3: Extending Incremental View Maintenance on DuckDB
- C4: Adaptive Optimization For Incremental View Maintenance
- C5: Logical Plan to SQL String
- C6: View Matching
Secure & Private Data Management (Lotte / Ilaria):
- C7: Differential Privacy in DuckDB
- C8: Streaming Joins in SIDRA SQL
- C9: Smart Compilation for SIDRA SQL
- C10: Vectorized Encryption for FastLanes on the GPU
- C11: Evaluating Trusted Execution Environments
- C12: DuckDB on Intel SGX
Vector Search (Leonardo, PDX):
- C13: Vector Search in DuckLake
- C14: Blazing Fast Vector Search on GPUs
- C15: Towards a novel Vector Similarity Join algorithm
Core DB topics (Paul & Pedro):
- C16: Execution on Compressed Strings in DuckDB
- C17: Run-time Optimized Join Hash Tables in DuckDB
- C18: Increasing CSV Robustness in DuckDB
- C19: Variable Integer (VARINT) Types in DuckDB
The new FastLanes file format (Omid)
- C20: Advanced Arrow and DuckDB Decoding
- C21: Learned Compression
MSc projects at Motherduck
There is a close collaboration between the Database Architectures research group and MotherDuck, which stems from the fact that MotherDuck’s Amsterdam office started at the CWI – they have now moved to a location in the Oostelijke Handelskade.
Peter Boncz is advisor at MotherDuck and will spend time in its new office.
This collaboration gives rise to opportunities for MSc projects, such as:
- M1: Adaptive Data-Clustering in DuckLake
- M2: Adapt DuckDB’s storage format to be more S3 friendly
- M3: Safe Python UDFs in MotherDuck
- M4: Query-aware DuckDB memory allocator
MSc projects at Databricks Amsterdam
We also have projects at Databricks Amsterdam, with whom CWI also collaborates. Doing an internship there means you will be employed (and paid) in their office near RAI station. In addition to sending us your CV and grades, you will have to pass a Databricks interviewing process (US tech company style), as they seek MSc interns who could become Databricks engineers afterwards.
List of Databricks topics:
- TBD
In the Databricks section you find a description (by Databricks) of their approach and team, and the qualifications they expect from you.