Confidence Estimation of Speech Recognition Modules Using Deep Learning

dc.contributorAalto-yliopistofi
dc.contributorAalto Universityen
dc.contributor.advisorRech, Silas
dc.contributor.authorPyo, Youngbin
dc.contributor.schoolInsinööritieteiden korkeakoulufi
dc.contributor.supervisorSt-Pierre, Luc
dc.date.accessioned2024-11-19T09:14:53Z
dc.date.available2024-11-19T09:14:53Z
dc.date.issued2024-09-20
dc.description.abstractThis paper provides a general overview for confidence estimation in automatic speech recognition (ASR) systems, focusing on two state-of-the-art methods: OpenAI’s Whisper and NVIDIA’s NeMo framework. The goal of the study is to address challenges in ASR by improving confidence estimation modules. The methodology involved evaluating Whisper on LibriSpeech and TIMIT datasets, measuring performance on Word Error Rate (WER). Moreoever, confidence estimation techniques were applied to the Conformer-CTC and Conformer-Trasducer models built in the NeMo framework. The results from this paper align with previous studies done on the same methods and also demonstrate the exceptional performance of Whisper on the TIMIT dataset, with WER as low as 2.73% for large-v1. For NeMo, proposed modification to confidence estimation methods, particularly using Gibbs entropy-based measures, showed improvements in certain metrics for RNN-T methods on the Librispeech ’test-other’ dataset. This paper confirms that while Whisper and NeMo demonstrate strong performance, there is room for improvement in confidence estimation techniques.en
dc.format.extent27
dc.format.mimetypeapplication/pdfen
dc.identifier.urihttps://aaltodoc.aalto.fi/handle/123456789/131666
dc.identifier.urnURN:NBN:fi:aalto-202411197184
dc.language.isoenen
dc.programmeAalto Bachelor’s Programme in Science and Technologyfi
dc.programme.majorComputational Engineeringen
dc.programme.mcodeENG3082fi
dc.subject.keywordASRen
dc.subject.keywordconfidence estimationen
dc.subject.keywordend-to-end modelsen
dc.titleConfidence Estimation of Speech Recognition Modules Using Deep Learningen
dc.typeG1 Kandidaatintyöfi
dc.type.dcmitypetexten
dc.type.ontasotBachelor's thesisen
dc.type.ontasotKandidaatintyöfi

Files

Original bundle

Now showing 1 - 1 of 1
No Thumbnail Available
Name:
Pyo_Youngbin_2024.pdf
Size:
745.68 KB
Format:
Adobe Portable Document Format