Skip to main content

Automatic Speech Recognition

Our AI team focuses on cutting-edge artificial intelligence research and applications.

Introduction

Automatic Speech Recognition (ASR), also known as Speech-to-Text (STT), is the task of transforming speech signals into written content. While not a new field, ASR remains a challenging area of research due to the complexities of natural language and acoustic variability.

Our research group investigates the theoretical and practical limits of ASR across two distinct computational paradigms:

1. Real-Time Spontaneous Speech Understanding (Low-Latency ASR) This research stream focuses on the "latency-accuracy trade-off." We aim to construct lightweight, highly optimized architectures capable of processing streaming audio with minimal delay.

Research Challenges: Robustness in spontaneous environments and contextual biasing —the ability to recognize domain-specific entities unseen during training without retraining the core model.

2. High-Fidelity Asynchronous Transcription (Large-Scale ASR) This stream prioritizes transcription precision and semantic coherence over immediate latency. We explore heavy-weight architectures designed for offline inference on long-form audio.

Research Challenges: Efficient processing of long-context audio sequences and developing unified systems capable of handling code-switching (e.g., Vietnamese-English mixing) and generalizing across diverse linguistic syntaxes.

Contact: Nguyen Thi Tra My | ✉️ my.hust225049@gmail.com

Research Direction

  • Contextualized ASR: Developing models that can precisely recognize context-sensitive and often "biased" terms, such as proper names, addresses, locations, and phone numbers, which are critical for accurate information extraction.
  • Multilingual ASR: Creating models capable of recognizing speech content containing two or more languages within the same stream (code-switching), such as the Vietnamese-English mix often found in casual conversation (e.g., "enjoy cái moment này").
  • New Architectures for ASR: Investigating novel architectural approaches, including self-conditioned models, the use of discrete tokens, and weakly supervised loss functions to improve model efficiency and generalization.
  • LLM in ASR: Integrating Large Language Models (LLMs) into the ASR pipeline to leverage their semantic understanding for significantly improved transcription quality and correction.

Members

Nguyen Thi Tra My

Nguyen Thi Tra My

Team Leader

Vu Nhat Minh

Vu Nhat Minh

Researcher