Over the past few years, there has been a resurgence of interest in designing high-accuracy automatic speech recognition (ASR) systems due to the key rule they can play in many real-world applications, such as voice print for biometric identification, language identification, and call-scanning. Improving current state-of-the-art technology is therefore vital for the success of those aforementioned applications, yet this is not simple with the standard technology based on hidden Markov models (HMMs) trained on short-term spectral features. This paper offers an innovative prospective on how two novel prominent approaches to ASR, namely speech attribute detection and discriminative training, can be combined into a unified framework with beneficial effects on the overall speech recognition performance. This goal is achieved by embedding phonetic feature detection into a penalized logistic regression machine (PLRM). The proposed approach is evaluated on both isolated and continuous phoneme recognition tasks. Experimental evidence indicate that the proposed framework is able to achieve state-of-the-art performance in the isolated speech recognition task and to outperform current technology in the continuous speech recognition task.
Combining Speech Attribute Detection and Penalized Logistic Regression for Phoneme Recognition
SINISCALCHI, SABATO MARCO
2012-01-01
Abstract
Over the past few years, there has been a resurgence of interest in designing high-accuracy automatic speech recognition (ASR) systems due to the key rule they can play in many real-world applications, such as voice print for biometric identification, language identification, and call-scanning. Improving current state-of-the-art technology is therefore vital for the success of those aforementioned applications, yet this is not simple with the standard technology based on hidden Markov models (HMMs) trained on short-term spectral features. This paper offers an innovative prospective on how two novel prominent approaches to ASR, namely speech attribute detection and discriminative training, can be combined into a unified framework with beneficial effects on the overall speech recognition performance. This goal is achieved by embedding phonetic feature detection into a penalized logistic regression machine (PLRM). The proposed approach is evaluated on both isolated and continuous phoneme recognition tasks. Experimental evidence indicate that the proposed framework is able to achieve state-of-the-art performance in the isolated speech recognition task and to outperform current technology in the continuous speech recognition task.I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.