We propose a novel approach to addressing the adaptation effectiveness issue in parameter adaptation for deep neural network (DNN) based acoustic models for automatic speech recognition by adding one or more small auxiliary output layers modeling broad acoustic units, such as mono-phones or tied-state (often called senone) clusters. In scenarios with a limited amount of available adaptation data, most senones are usually rarely seen or not observed, and consequently the ability to model them in a new condition is often not fully exploited. With the original senone classification task as the primary task, and adding auxiliary mono-phone/senone-cluster classification as the secondary tasks, multi-task learning (MTL) is employed to adapt the DNN parameters. With the proposed MTL adaptation framework, we improve the learning ability of the original DNN structure, then enlarge the coverage of the acoustic space to deal with the unseen senone problem, and thus enhance the discrimination power of the adapted DNN models. Experimental results on the 20,000-word open vocabulary WSJ task demonstrate that the proposed framework consistently outperforms the conventional linear hidden layer adaptation schemes without MTL by providing 5.4% relative reduction in word error rate (WERR) with only 1 single adaptation utterance, and 10.7% WERR with 40 adaptation utterances against the un-adapted DNN models.
File in questo prodotto:
Non ci sono file associati a questo prodotto.