Authors: Huu, T.T., Ha, V.K., Tran, T.D., Vu, H., Thien, V.L., Nguyen, T.C., Nguyen, T.T.T.
Mispronunciation Detection and Diagnosis (MDD) is crucial for language learning and speech therapy. Unlike conventional methods that require scoring models or training phoneme-level models, we propose a novel training-free framework that leverages retrieval techniques with a pretrained Automatic Speech Recognition model. Our method avoids phoneme-specific modeling or additional task-specific training,...
2026ICASSPMispronunciation Detection and Diagnosis
Authors: Huu Tuong Tu, Huan Vu, Cuong Tien Nguyen, Dien Hy Ngo, Nguyen Thi Thu Trang
Voice conversion (VC) is a challenging task that aims to transform the voice of a source speaker into that of a target speaker. Recent advances in VC have been limited by the need for large amounts of aligned data, which is difficult to obtain in practice. This paper proposes a...
2025EMNLPMispronunciation Detection and Diagnosis
Authors: P. T. Dat, V. H. L, N. T. T. Trang
The VLSP 2025 Vietnamese Spoofing-Aware Speaker Verification (VSASV) Challenge extends the study of spoofing-aware speaker verification (SASV) to Vietnamese, a low-resource language with limited anti-spoofing data resources. Building upon prior SASV challenges, VSASV introduces an evaluation framework encompassing both bonafide and spoofed trials, including replay, voice conversion (VC), text-to-speech (TTS),...
2025VLSPAutomatic Speaker Verification
Authors: P. V. Hoang, H. B. Thu, H. V. Khanh
This paper presents our system for the Vietnamese Spoofing-Aware Speaker Verification in VLSP 2025 challenge. The proposed system consists of an automatic speaker verification sub-system, a spoof detection sub-system, and a fusion module operating at either the score or embedding level. To overcome limited model generalization caused by insufficient data,...
2025VLSPAutomatic Speaker Verification
Authors: N. T. Trung, T. D. An, C. H. Viet
This technical report describes the SVBK team's approach to the Vietnamese Spoofing-Aware Speaker Verification Challenge at VLSP 2025. Our system consists of two independently trained components: an Automatic Speaker Verification module and a Countermeasure module, whose outputs are fused at the score level to produce the final SASV decision. The...
2025VLSPAutomatic Speaker Verification
Authors: H. L. Vu, P. T. Dat, P. T. Nhi, N. S. Hao, N. T. T. Trang
Recent research in speaker recognition aims to address vulnerabilities due to variations between enrolment and test utterances, particularly in the multi-genre phenomenon where the utterances are in different speech genres. Previous resources for Vietnamese speaker recognition are either limited in size or do not focus on genre diversity, leaving studies...
2025ICASSPAutomatic Speaker Verification
Authors: P. V. Thanh, N. T. T. Huyen, P. N. Quan, N. T. T. Trang
Speech Emotion Recognition (SER) is an essential task in spoken language processing, applicable across various domains. While research on SER systems for English datasets is growing rapidly, the reliability of these models for tonal languages remains a significant concern. Therefore, this paper introduces Pitch-fusion, a novel SER model tailored for...
2025ICASSPSpeech Emotion Recognition
Authors: V. Hoang, V. T. Pham, H. N. Xuan, P. Nhi, P. Dat, T. T. T. Nguyen
Recent research in improving speaker verification systems to detect spoofed speech has seen a concentrated focus on English language, while the performance of such systems in other languages remains unexplored. This paper introduces the VSASV dataset for Spoofing-Aware Speaker Verification (SASV) in Vietnamese language. The dataset comprises over 174,000 spoofed...
2024InterspeechAutomatic Speaker Verification
Authors: V. T. Pham, X. T. H. Nguyen, V. Hoang, T. T. T. Nguyen
The success of speaker recognition systems heavily depends on large training datasets collected under real-world conditions. While common languages like English or Chinese have vastly available datasets, low-resource ones like Vietnamese remain limited. This paper presents a large-scale spontaneous dataset gathered under noisy environments, with over 87,000 utterances from 1,000...
2023InterspeechAutomatic Speaker Verification
Authors: H. T. Tu, V. T. Pham, T. T. T. Nguyen, T. L. Dao
A tonal language is a language in which the meaning of words is not only determined by the sounds of the consonants and vowels, but also by the pitch or tone used to pronounce them. Mispronunciation Detection and Diagnosis (MD&D) of tonal languages is challenging since tone presentation is difficult...
2023InterspeechMispronunciation Detection and Diagnosis