Automatic Speaker Verification

Our AI team focuses on cutting-edge artificial intelligence research and applications.

Introduction

Automatic Speaker Verification (ASV) is a subfield of speech processing and biometrics that aims to verify a person's claimed identity from speech signals. With the rapid development of deep learning, ASV has made significant progress in creating applications that can bring many benefits to life. Some applications of ASV include biometric authentication for e-banking, secure access control, forensic speaker identification, and personalized voice assistant services.

Our research group focuses on exploiting machine learning and deep learning techniques, incorporating with acoustic features and speaker-specific embeddings to develop high-performance ASV systems. We also investigate methods to construct large-scale and multi-genre speech datasets specifically for the Vietnamese language, such as the VoxVietnam, Vietnam Celeb and VSASV corpuses. Furthermore, we create robust models for spoof detection—including countermeasures against replay, text-to-speech, and voice conversion attacks—to ensure system security in cross-corpus and real-world scenarios.

Contact: Phuong Tuan Dat | ✉️ phuongtuandat2915@gmail.com

Research Direction

Vietnamese Speaker Recognition Datasets: Focusing on the development of high-quality, large-scale corpora tailored for the Vietnamese language. This involves creating novel construction pipelines that resolve the problem of high-proportion label noises for large-scale data retrieval, specifically for Vietnamese speakers.
Deep Speaker Recognition Models: Investigating state-of-the-art architectures and Self-Supervised Learning (SSL) frameworks to extract robust speaker embeddings.
Speech Spoof Detection Models: Developing advanced countermeasures to distinguish between "bonafide" human speech and malicious "spoof" attacks, including AI-generated deepfakes, text-to-speech, and replay