Speech Emotion Recognition

Introduction

Speech Emotion Recognition (SER) is a subfield of speech processing and affective computing that aims to identify human emotions from speech signals. With the rapid development of deep learning, SER has made significant progress in creating applications that can bring many benefits to life. Some applications of SER include mental health monitoring, customer service analysis, human-computer interaction, and voice assistants enhancement.

Our research group focuses on exploiting machine learning and deep learning techniques, incorporating with acoustic features and linguistic knowledge to develop high performance SER systems. We also investigate methods to construct emotional speech datasets and to create robust models for cross-corpus and cross-language emotion recognition.

Contact: Dr. Nguyen Thi Thu Trang | ✉️ trangntt@soict.hust.edu.vn

Research Directions

Multimodal Emotion Recognition: Combining speech, text, and facial expressions to improve emotion recognition accuracy. We investigate fusion techniques and attention mechanisms to leverage complementary information from multiple modalities.
Cross-corpus Emotion Recognition: Developing robust models that can generalize across different datasets and recording conditions. We address domain adaptation and transfer learning challenges in SER.
Real-time Emotion Detection: Building efficient models for real-time emotion detection in streaming audio. We focus on lightweight architectures and optimization techniques for deployment on edge devices.