FastLanes: redesigning data files for faster analytics

Computer scientist Azim Afroozeh redesigned data compression and storage so large datasets can be analyzed faster while taking less space. His work led to FastLanes, a new file format tuned for modern processors. On 9 January, he defends his PhD thesis at Vrije Universiteit Amsterdam.

Many widely used analytics formats, such as Parquet, were designed when computer processors worked in a more sequential way. Today’s machines can do much more work at the same time. A CPU (the main all-purpose processor) has SIMD instructions (Single Instruction, Multiple Data) that can process many data items at once, while a GPU (originally developed for graphics) contains a much larger number of simpler cores that excel at performing the same operation on large numbers of data items simultaneously.

Because older file formats are not organized to “feed” these processors efficiently or use their full capacity to process data at the same time, computers spend time waiting, computing power is wasted, and data analysis slows down.

Working prototype

Afroozeh’s core finding is that files can be smaller and faster when they are stored in a layout that matches how modern processors work. This makes it possible to decode and process thousands of values in parallel, with fewer bottlenecks.

His approach combines careful analysis of real datasets and processor behaviour with extensive engineering. He designed lightweight, fast-to-decode compression methods and new data layouts and implemented them in portable C++ (high-performance code that runs across many different computer systems). The approach was tested on platforms ranging from Intel, AMD and Apple processors to cloud hardware and NVIDIA GPUs. The result is a working prototype of the FastLanes file format.

Open source

In reported experiments, FastLanes achieved substantial performance gains; for example, the thesis describes results of up to 40 times faster performance on an Apple M1 processor compared to Parquet, while also improving compression ratio in that setting. To make FastLanes practical for real-world datasets, Afroozeh also developed ALP: a lightweight, lossless compression method for numerical (floating-point) columns that is designed to decode extremely quickly. Both FastLanes and ALP were released as open source to support adoption and reproducibility, enabling others to reproduce and build on the work.

Portrait of Azim Afroozeh, placed in front of an image of a corridor in a data center, filled with rows of servers as blinking lights indicate constant processing.

PhD defence details

Azim Afroozeh: 'FastLanes: A Next-Gen File Format'

Date of the PhD defense: 9 January 2026

Location: Vrije Universiteit Amsterdam

Supervisor: Peter Boncz (CWI/VU)

Co-supervisor: Hannes Mühleisen (CWI/Radboud University)

Link to the content of the thesis.

Header photo: Shutterstock
Second photo: Shutterstock (background), CWI/Minnie Middelberg