Confidence Estimation of Speech Recognition Modules Using Deep Learning
dc.contributor | Aalto-yliopisto | fi |
dc.contributor | Aalto University | en |
dc.contributor.advisor | Rech, Silas | |
dc.contributor.author | Pyo, Youngbin | |
dc.contributor.school | Insinööritieteiden korkeakoulu | fi |
dc.contributor.supervisor | St-Pierre, Luc | |
dc.date.accessioned | 2024-11-19T09:14:53Z | |
dc.date.available | 2024-11-19T09:14:53Z | |
dc.date.issued | 2024-09-20 | |
dc.description.abstract | This paper provides a general overview for confidence estimation in automatic speech recognition (ASR) systems, focusing on two state-of-the-art methods: OpenAI’s Whisper and NVIDIA’s NeMo framework. The goal of the study is to address challenges in ASR by improving confidence estimation modules. The methodology involved evaluating Whisper on LibriSpeech and TIMIT datasets, measuring performance on Word Error Rate (WER). Moreoever, confidence estimation techniques were applied to the Conformer-CTC and Conformer-Trasducer models built in the NeMo framework. The results from this paper align with previous studies done on the same methods and also demonstrate the exceptional performance of Whisper on the TIMIT dataset, with WER as low as 2.73% for large-v1. For NeMo, proposed modification to confidence estimation methods, particularly using Gibbs entropy-based measures, showed improvements in certain metrics for RNN-T methods on the Librispeech ’test-other’ dataset. This paper confirms that while Whisper and NeMo demonstrate strong performance, there is room for improvement in confidence estimation techniques. | en |
dc.format.extent | 27 | |
dc.format.mimetype | application/pdf | en |
dc.identifier.uri | https://aaltodoc.aalto.fi/handle/123456789/131666 | |
dc.identifier.urn | URN:NBN:fi:aalto-202411197184 | |
dc.language.iso | en | en |
dc.programme | Aalto Bachelor’s Programme in Science and Technology | fi |
dc.programme.major | Computational Engineering | en |
dc.programme.mcode | ENG3082 | fi |
dc.subject.keyword | ASR | en |
dc.subject.keyword | confidence estimation | en |
dc.subject.keyword | end-to-end models | en |
dc.title | Confidence Estimation of Speech Recognition Modules Using Deep Learning | en |
dc.type | G1 Kandidaatintyö | fi |
dc.type.dcmitype | text | en |
dc.type.ontasot | Bachelor's thesis | en |
dc.type.ontasot | Kandidaatintyö | fi |
Files
Original bundle
1 - 1 of 1
No Thumbnail Available
- Name:
- Pyo_Youngbin_2024.pdf
- Size:
- 745.68 KB
- Format:
- Adobe Portable Document Format
Download (opens in new window)
Aalto login required (access for Aalto Staff only).