We attempt to formulate Bayesian speaker adaptation for deep models and explore two different solutions. In the first “indirect” approach, Bayesian adaptation is applied to context-dependent, Gaussian-mixture-model based hidden Markov models (CD-GMM-HMMs) with bottleneck (BN) features derived from deep neural networks (DNNs). The second method directly formulates Bayesian adaptation for CD-DNN-HMMs by casting the adaptation step into a generative framework to formulate maximum-likelihood (ML) and maximum a posteriori (MAP) adaptation schemes. Experiments on the Wall Street Journal task demonstrate that both MAP and Structural MAP (SMAP) adaptation schemes are effective even with discriminative BN features. Furthermore, SMAP can attain a meaningful word error reduction (WERR) of 7.3% even when 80 hours of data, and 284 different speakers are available at training time. We have also observed a notable performance improvement with the indirect approach, and that supports the plausibility of proposed solution towards this novel direction.

Towards a direct Bayesian adaptation framework for deep models

SINISCALCHI, SABATO MARCO;
2017-01-01

Abstract

We attempt to formulate Bayesian speaker adaptation for deep models and explore two different solutions. In the first “indirect” approach, Bayesian adaptation is applied to context-dependent, Gaussian-mixture-model based hidden Markov models (CD-GMM-HMMs) with bottleneck (BN) features derived from deep neural networks (DNNs). The second method directly formulates Bayesian adaptation for CD-DNN-HMMs by casting the adaptation step into a generative framework to formulate maximum-likelihood (ML) and maximum a posteriori (MAP) adaptation schemes. Experiments on the Wall Street Journal task demonstrate that both MAP and Structural MAP (SMAP) adaptation schemes are effective even with discriminative BN features. Furthermore, SMAP can attain a meaningful word error reduction (WERR) of 7.3% even when 80 hours of data, and 284 different speakers are available at training time. We have also observed a notable performance improvement with the indirect approach, and that supports the plausibility of proposed solution towards this novel direction.
2017
978-988-14768-2-1
File in questo prodotto:
Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11387/123764
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 0
  • ???jsp.display-item.citation.isi??? 0
social impact