Fine-Grained Temporal Emotion Recognition in Video

CWI PhD student Tianyi Zhang will defend his PhD thesis "On Fine-Grained Temporal Emotion Recognition in Video: How to Trade off Recognition Accuracy with Annotation Complexity?" at TU Delft 3 October.

Publication date: 03-10-2022

Emotions play an important role in users’ selection and consumption of video content. According to the gratification theory, emotions influence users’ video selection either directly, by providing gratifying experiences, or indirectly, by contributing to their cognitive and social needs. Fine-grained emotion recognition is the process of automatically identifying the emotions of users at a fine-level of granularity. This granularity is typically between 0.5s to 4s according to the duration of emotions. By recognizing users’ emotion at a fine level of granularity, we can add an emotion layer on the videos which shows the dynamic changes of users’ emotions. It can help content providers better understand the users' emotions towards their products and adjust the content based on these inferences.

The thesis “On Fine-Grained Temporal Emotion Recognition in Video: How to Trade off Recognition Accuracy with Annotation Complexity?” by CWI PhD student Tianyi Zhang from the Distributed and Interactive Systems (DIS) research group reports on research exploring new machine learning algorithms for fine-grained emotion recognition. Most of the previous works on fine-grained emotion recognition require fine-grained emotion labels to train the recognition algorithm. However, the experiments to collect these fine-grained emotion labels are usually costly and time-consuming. Thus, his thesis focuses on investigating whether we can accurately predict the emotions of users at a fine granularity level with only a limited amount of emotion ground truth labels for training.

During his thesis, Tianyi developed machine learning algorithms which use weakly-supervised learning and few-shot learning to obtain fine-grained, segment-by-segment emotions of users. He found that when we train the recognition network using fully-supervised learning, the learning model will overfit when we use deeper or more sophisticated models because of the temporal mismatch between physiological signals and fine-grained self-reports. Weakly supervised learning can avoid overfitting and train the network with fewer annotations. However, it can only identify the post-stimuli emotions from the neutral labeled emotions, and categorizes all other emotions as part of this neutral label. Few-shot learning can predict more emotions by using fewer annotated signals. However, since the training samples are limited, it can only achieve high accuracy with subject dependent models.

More information

Everyone is welcome to attend the public defence of Tianyi Zhang of his thesis “On Fine-Grained Temporal Emotion Recognition in Video: How to Trade off Recognition Accuracy with Annotation Complexity?” 3rd of October from 12:00 in TU Delft.

Promotor: Prof. dr. Alan Hanjalic

Promotor: Prof. dr. Pablo Cesar

Daily Supervisor: Dr. Abdallah El Ali

Relevant publications

  • Tianyi Zhang, Abdallah El Ali, Alan Hanjalic, and Pablo Cesar. 2022. Few-shot Learning for Fine-grained Emotion Recognition using Physiological Signals. IEEE Transactions on Multimedia. https://doi.org/10.1109/TMM.2022.3165715
  • Tianyi Zhang, Abdallah El Ali, Chen Wang, Alan Hanjalic, and Pablo Cesar. 2022. Weakly-supervised Learning for Fine-grained Emotion Recognition using Physiological Signals. IEEE Transactions on Affective Computing. https://doi.org/10.1109/TAFFC.2022.3158234
  • Tianyi Zhang, Abdallah El Ali, Chen Wang, Alan Hanjalic, and Pablo Cesar. 2020. Corrnet: Fine-grained emotion recognition for video watching using wearable physiological sensors. Sensors. https://www.mdpi.com/1424-8220/21/1/52
  • Tianyi Zhang, Abdallah El Ali, Chen Wang, Alan Hanjalic, and Pablo Cesar. 2020. RCEA: real-time, continuous emotion annotation for collecting precise mobile video ground truth labels. ACM CHI. https://dl.acm.org/doi/abs/10.1145/3313831.3376808
  • Tianyi Zhang, Abdallah El Ali, Chen Wang, Xintong Zhu, and Pablo Cesar. 2019. CorrFeat: correlation-based feature extraction algorithm using skin conductance and pupil diameter for emotion recognition. International Conference on Multimodal Interaction. https://dl.acm.org/doi/10.1145/3340555.3353716
  • Tianyi Zhang. 2029. Multi-modal Fusion Methods for Robust Emotion Recognition using Body-worn Physiological Sensors in Mobile Environments. International Conference on Multimodal Interaction. https://dl.acm.org/doi/fullHtml/10.1145/3340555.3356089