Despite the remarkable progress in end-to-end Automatic Speech Recognition (ASR) engines, accurately transcribing dysarthric speech remains a major challenge. In this work, we proposed a two-stage framework for the Speech Accessibility Project Challenge at INTERSPEECH 2025, which combines cutting-edge speech recognition models with LLM-based gen- erative error correction (GER). We assess different configura- tions of model scales and training strategies, incorporating spe- cific hypothesis selection to improve transcription accuracy. Ex- periments on the Speech Accessibility Project dataset demon- strate the strength of our approach on structured and spon- taneous speech, while highlighting challenges in single-word recognition. Through comprehensive analysis, we provide in- sights into the complementary roles of acoustic and linguistic modeling in dysarthric speech recognition.
Exploring Generative Error Correction for Dysarthric Speech Recognition
Moreno La Quatra;Valerio Mario Salerno;Sabato Marco Siniscalchi
2025-01-01
Abstract
Despite the remarkable progress in end-to-end Automatic Speech Recognition (ASR) engines, accurately transcribing dysarthric speech remains a major challenge. In this work, we proposed a two-stage framework for the Speech Accessibility Project Challenge at INTERSPEECH 2025, which combines cutting-edge speech recognition models with LLM-based gen- erative error correction (GER). We assess different configura- tions of model scales and training strategies, incorporating spe- cific hypothesis selection to improve transcription accuracy. Ex- periments on the Speech Accessibility Project dataset demon- strate the strength of our approach on structured and spon- taneous speech, while highlighting challenges in single-word recognition. Through comprehensive analysis, we provide in- sights into the complementary roles of acoustic and linguistic modeling in dysarthric speech recognition.I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.