Real-time, immersive telecommunication systems are quickly becoming a reality, thanks to the advances in acquisition, transmission, and rendering technologies. Point clouds in particular serve as a promising representation in these types of systems, offering photorealistic rendering capabilities with low complexity. Further development of transmission, coding, and quality evaluation algorithms, though, is currently hindered by the lack of publicly available datasets that represent realistic scenarios of remote communication between people in real-time.
Sample frames from the point cloud sequences released with this dataset.
To bridge this gap, the DIS group has released a dynamic point cloud dataset that depicts humans interacting in social eXtended Reality (XR) settings. In particular, audio-visual data (RGB + Depth + Infrared + synchronized Audio) for a total of 45 unique sequences of people performing scripted actions, was captured and released. The screenplays for the human actors were devised so as to simulate a variety of common use cases in social XR, namely, (i) Education and training, (ii) Healthcare, (iii) Communication and social interaction, and (iv) Performance and sports. Moreover, diversity in gender, age, ethnicities, materials, textures and colors was additionally considered.
The capturing system was composed of commodity hardware to better resemble realistic setups that are easier to replicate. It complements existing datasets, since the latest versions of commercial depth sensing devices are used, and lifelike human behavior in social contexts with reasonable quality are recorded. As part of the release, annotated raw material, the resulting point cloud sequences, and an auxiliary software toolbox to acquire, process, encode, and visualize data suitable for real-time applications, are provided.
Illustration of RGB (1st row) and Depth (2nd row) raw data captured by our camera arrangement, and corresponding point cloud frames (3rd row) that are generated offline.
This dataset can be useful for different applications and computer science communities. For example:
- The point cloud data can serve as original content for the development of algorithms and models for immersive multimedia systems, such as post-processing, compression, streaming, user adaptation, and quality evaluation.
- The volumetric videos combined with the synchronized audio can facilitate research communities focused on the human behavioral aspects for social interactions in XR applications, which is advocated as a strong point of this submission.
- The raw captured data can serve as a benchmarking setup for multiple areas of computer vision and image processing applications, such as calibration, alignment, outlier detection, reconstruction, inpainting, geometry and/or color smoothing and enhancement.
Physical camera arrangement used to record the dataset.
The dataset and software releases will be presented during the ACM Multimedia Systems conference on September 29th in the Open Dataset and Software / Demo and Industry session. They can be downloaded via the following website under the Downloads section.
- Video about the dataset
- Article: Ignacio Reimat, Evangelos Alexiou, Jack Jansen, Irene Viola, Shishir Subramanyam and Pablo Cesar. 2021. CWIPC-SXR: Point Cloud dynamic human dataset for Social XR. Proceedings of the ACM Multimedia Systems Conference (MMSys’21), September 28 - October 1, 2021, Istanbul, Turkey.
Distributed and Interactive Systems (DIS) group
CWI’s DIS research group focuses on facilitating and improving the way people use interactive systems and how people communicate with each other. We combine data science with a strong human-centric, empirical approach to understand the experience of users. This enables us to design and develop next generation intelligent and empathic systems. We base our results on realistic testing grounds and data sets, and embrace areas such as ubiquitous computing, human-centered multimedia systems, and languages.