We present a cross-language knowledge integration framework to improve the performance in large vocabulary continuous speech recognition. Two types of knowledge sources, manner attribute and prosodic structure, are incorporated. For manner of articulation, cross-lingual attribute detectors trained with an American English corpus (WSJ0) are utilized to verify and rescore hypothesized Mandarin syllables in word lattices obtained with state-of-the-art systems. For the prosodic structure, models trained with an unsupervised joint prosody labeling and modeling technique using a Mandarin corpus (TCC300) are used in lattice rescoring. Experimental results on Mandarin syllable, character and word recognition with the TCC300 corpus show that the proposed approach significantly outperforms the baseline system that does not use articulatory and prosodic information. It also demonstrates a potential of utilizing results from cross-lingual attribute detectors as a language-universal frontend for automatic speech recognition.

A study on cross-language knowledge integration in Mandarin LVCSR

SINISCALCHI, SABATO MARCO;
2012

Abstract

We present a cross-language knowledge integration framework to improve the performance in large vocabulary continuous speech recognition. Two types of knowledge sources, manner attribute and prosodic structure, are incorporated. For manner of articulation, cross-lingual attribute detectors trained with an American English corpus (WSJ0) are utilized to verify and rescore hypothesized Mandarin syllables in word lattices obtained with state-of-the-art systems. For the prosodic structure, models trained with an unsupervised joint prosody labeling and modeling technique using a Mandarin corpus (TCC300) are used in lattice rescoring. Experimental results on Mandarin syllable, character and word recognition with the TCC300 corpus show that the proposed approach significantly outperforms the baseline system that does not use articulatory and prosodic information. It also demonstrates a potential of utilizing results from cross-lingual attribute detectors as a language-universal frontend for automatic speech recognition.
9781467325059
9781467325066
9781467325073
File in questo prodotto:
Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: http://hdl.handle.net/11387/77330
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 9
  • ???jsp.display-item.citation.isi??? ND
social impact