Exploring Generative Error Correction for Dysarthric Speech Recognition

La Quatra, Moreno; Koudounas, Alkis; Salerno, Valerio Mario; Siniscalchi, Sabato Marco

doi:10.21437/Interspeech.2025-1553

Despite the remarkable progress in end-to-end Automatic Speech Recognition (ASR) engines, accurately transcribing dysarthric speech remains a major challenge. In this work, we proposed a two-stage framework for the Speech Accessibility Project Challenge at INTERSPEECH 2025, which combines cutting-edge speech recognition models with LLM-based gen- erative error correction (GER). We assess different configura- tions of model scales and training strategies, incorporating spe- cific hypothesis selection to improve transcription accuracy. Ex- periments on the Speech Accessibility Project dataset demon- strate the strength of our approach on structured and spon- taneous speech, while highlighting challenges in single-word recognition. Through comprehensive analysis, we provide in- sights into the complementary roles of acoustic and linguistic modeling in dysarthric speech recognition.

Exploring Generative Error Correction for Dysarthric Speech Recognition

Moreno La Quatra;Alkis Koudounas;Valerio Mario Salerno;Sabato Marco Siniscalchi

2025-01-01

Abstract

Despite the remarkable progress in end-to-end Automatic Speech Recognition (ASR) engines, accurately transcribing dysarthric speech remains a major challenge. In this work, we proposed a two-stage framework for the Speech Accessibility Project Challenge at INTERSPEECH 2025, which combines cutting-edge speech recognition models with LLM-based gen- erative error correction (GER). We assess different configura- tions of model scales and training strategies, incorporating spe- cific hypothesis selection to improve transcription accuracy. Ex- periments on the Speech Accessibility Project dataset demon- strate the strength of our approach on structured and spon- taneous speech, while highlighting challenges in single-word recognition. Through comprehensive analysis, we provide in- sights into the complementary roles of acoustic and linguistic modeling in dysarthric speech recognition.