Skip to main content
Showing 10 of 10 publications

Mispronunciation Detection and Diagnosis Without Model Training: A Retrieval-Based Approach

Authors: Huu, T.T., Ha, V.K., Tran, T.D., Vu, H., Thien, V.L., Nguyen, T.C., Nguyen, T.T.T.

Mispronunciation Detection and Diagnosis (MDD) is crucial for language learning and speech therapy. Unlike conventional methods that require scoring models or training phoneme-level models, we propose a novel training-free framework that leverages retrieval techniques with a pretrained Automatic Speech Recognition model. Our method avoids phoneme-specific modeling or additional task-specific training,...

2026ICASSPMispronunciation Detection and Diagnosis

O_O-VC: Synthetic Data-Driven One-to-One Alignment for Any-to-Any Voice Conversion

Authors: Huu Tuong Tu, Huan Vu, Cuong Tien Nguyen, Dien Hy Ngo, Nguyen Thi Thu Trang

Voice conversion (VC) is a challenging task that aims to transform the voice of a source speaker into that of a target speaker. Recent advances in VC have been limited by the need for large amounts of aligned data, which is difficult to obtain in practice. This paper proposes a...

2025EMNLPMispronunciation Detection and Diagnosis

The Vietnamese Spoofing-aware Speaker Verification Challenge 2025: Summary and Results

Authors: P. T. Dat, V. H. L, N. T. T. Trang

The VLSP 2025 Vietnamese Spoofing-Aware Speaker Verification (VSASV) Challenge extends the study of spoofing-aware speaker verification (SASV) to Vietnamese, a low-resource language with limited anti-spoofing data resources. Building upon prior SASV challenges, VSASV introduces an evaluation framework encompassing both bonafide and spoofed trials, including replay, voice conversion (VC), text-to-speech (TTS),...

2025VLSPAutomatic Speaker Verification

SV++'s Vietnamese Spoofing-Aware Speaker Verification Systems for VLSP 2025

Authors: P. V. Hoang, H. B. Thu, H. V. Khanh

This paper presents our system for the Vietnamese Spoofing-Aware Speaker Verification in VLSP 2025 challenge. The proposed system consists of an automatic speaker verification sub-system, a spoof detection sub-system, and a fusion module operating at either the score or embedding level. To overcome limited model generalization caused by insufficient data,...

2025VLSPAutomatic Speaker Verification

SVBK System Description to the VLSP 2025 Challenge on Vietnamese Spoofing-Aware Speaker Verification

Authors: N. T. Trung, T. D. An, C. H. Viet

This technical report describes the SVBK team's approach to the Vietnamese Spoofing-Aware Speaker Verification Challenge at VLSP 2025. Our system consists of two independently trained components: an Automatic Speaker Verification module and a Countermeasure module, whose outputs are fused at the score level to produce the final SASV decision. The...

2025VLSPAutomatic Speaker Verification

VoxVietnam: a Large-Scale Multi-Genre Dataset for Vietnamese Speaker Recognition

Authors: H. L. Vu, P. T. Dat, P. T. Nhi, N. S. Hao, N. T. T. Trang

Recent research in speaker recognition aims to address vulnerabilities due to variations between enrolment and test utterances, particularly in the multi-genre phenomenon where the utterances are in different speech genres. Previous resources for Vietnamese speaker recognition are either limited in size or do not focus on genre diversity, leaving studies...

2025ICASSPAutomatic Speaker Verification

A Robust Pitch-Fusion Model for Speech Emotion Recognition in Tonal Languages

Authors: P. V. Thanh, N. T. T. Huyen, P. N. Quan, N. T. T. Trang

Speech Emotion Recognition (SER) is an essential task in spoken language processing, applicable across various domains. While research on SER systems for English datasets is growing rapidly, the reliability of these models for tonal languages remains a significant concern. Therefore, this paper introduces Pitch-fusion, a novel SER model tailored for...

2025ICASSPSpeech Emotion Recognition

VSASV: a Vietnamese Dataset for Spoofing-Aware Speaker Verification

Authors: V. Hoang, V. T. Pham, H. N. Xuan, P. Nhi, P. Dat, T. T. T. Nguyen

Recent research in improving speaker verification systems to detect spoofed speech has seen a concentrated focus on English language, while the performance of such systems in other languages remains unexplored. This paper introduces the VSASV dataset for Spoofing-Aware Speaker Verification (SASV) in Vietnamese language. The dataset comprises over 174,000 spoofed...

2024InterspeechAutomatic Speaker Verification

Vietnamceleb: a large-scale dataset for vietnamese speaker recognition

Authors: V. T. Pham, X. T. H. Nguyen, V. Hoang, T. T. T. Nguyen

The success of speaker recognition systems heavily depends on large training datasets collected under real-world conditions. While common languages like English or Chinese have vastly available datasets, low-resource ones like Vietnamese remain limited. This paper presents a large-scale spontaneous dataset gathered under noisy environments, with over 87,000 utterances from 1,000...

2023InterspeechAutomatic Speaker Verification

Mispronunciation detection and diagnosis model for tonal language applied to Vietnamese

Authors: H. T. Tu, V. T. Pham, T. T. T. Nguyen, T. L. Dao

A tonal language is a language in which the meaning of words is not only determined by the sounds of the consonants and vowels, but also by the pitch or tone used to pronounce them. Mispronunciation Detection and Diagnosis (MD&D) of tonal languages is challenging since tone presentation is difficult...

2023InterspeechMispronunciation Detection and Diagnosis