Confidence Estimation of Speech Recognition Modules Using Deep Learning
No Thumbnail Available
Files
Pyo_Youngbin_2024.pdf (745.68 KB) (opens in new window)
Aalto login required (access for Aalto Staff only).
URL
Journal Title
Journal ISSN
Volume Title
Insinööritieteiden korkeakoulu |
Bachelor's thesis
Electronic archive copy is available locally at the Harald Herlin Learning Centre. The staff of Aalto University has access to the electronic bachelor's theses by logging into Aaltodoc with their personal Aalto user ID. Read more about the availability of the bachelor's theses.
Unless otherwise stated, all rights belong to the author. You may download, display and print this publication for Your own personal use. Commercial use is prohibited.
Authors
Date
2024-09-20
Department
Major/Subject
Computational Engineering
Mcode
ENG3082
Degree programme
Aalto Bachelor’s Programme in Science and Technology
Language
en
Pages
27
Series
Abstract
This paper provides a general overview for confidence estimation in automatic speech recognition (ASR) systems, focusing on two state-of-the-art methods: OpenAI’s Whisper and NVIDIA’s NeMo framework. The goal of the study is to address challenges in ASR by improving confidence estimation modules. The methodology involved evaluating Whisper on LibriSpeech and TIMIT datasets, measuring performance on Word Error Rate (WER). Moreoever, confidence estimation techniques were applied to the Conformer-CTC and Conformer-Trasducer models built in the NeMo framework. The results from this paper align with previous studies done on the same methods and also demonstrate the exceptional performance of Whisper on the TIMIT dataset, with WER as low as 2.73% for large-v1. For NeMo, proposed modification to confidence estimation methods, particularly using Gibbs entropy-based measures, showed improvements in certain metrics for RNN-T methods on the Librispeech ’test-other’ dataset. This paper confirms that while Whisper and NeMo demonstrate strong performance, there is room for improvement in confidence estimation techniques.Description
Supervisor
St-Pierre, LucThesis advisor
Rech, SilasKeywords
ASR, confidence estimation, end-to-end models