Automatic Speaker Verification (ASV) is a subfield of speech processing and biometrics that aims to verify a person's claimed identity from speech signals. With the...
Automatic Speech Recognition (ASR), also known as Speech-to-Text (STT), is the task of transforming speech signals into written content. While not a new field, ASR...
Text-to-Speech (TTS) is a fundamental task in speech processing and human–computer interaction that aims to convert written text into natural and intelligible speech. Over the...
Speech Emotion Recognition (SER) is a subfield of speech processing and affective computing that aims to identify human emotions from speech signals. With the rapid...
Mispronunciation Detection and Diagnosis (MDD) is a subfield of speech processing and computer-assisted language learning (CALL) that aims to automatically detect and analyze pronunciation errors...
AI music uses artificial intelligence models to analyze, generate, and transform musical elements such as melody, harmony, rhythm, and timbre. By learning from large collections...
Chatbot systems based on Retrieval-Augmented Generation (RAG) are an emerging research direction that aims to build reliable, knowledge-grounded conversational agents by combining information retrieval with...






