Skip to main content
Automatic Speaker Verification

Automatic Speaker Verification (ASV) is a subfield of speech processing and biometrics that aims to verify a person's claimed identity from speech signals. With the...

Automatic Speech Recognition

Automatic Speech Recognition (ASR), also known as Speech-to-Text (STT), is the task of transforming speech signals into written content. While not a new field, ASR...

Text to Speech

Text-to-Speech (TTS) is a fundamental task in speech processing and human–computer interaction that aims to convert written text into natural and intelligible speech. Over the...

Speech Emotion Recognition

Speech Emotion Recognition (SER) is a subfield of speech processing and affective computing that aims to identify human emotions from speech signals. With the rapid...

Mispronunciation Detection and Diagnosis

Mispronunciation Detection and Diagnosis (MDD) is a subfield of speech processing and computer-assisted language learning (CALL) that aims to automatically detect and analyze pronunciation errors...

Singing Voice Synthesis

AI music uses artificial intelligence models to analyze, generate, and transform musical elements such as melody, harmony, rhythm, and timbre. By learning from large collections...

Chatbot

Chatbot systems based on Retrieval-Augmented Generation (RAG) are an emerging research direction that aims to build reliable, knowledge-grounded conversational agents by combining information retrieval with...