A Theory on Deep Neural Network Based Vector-to-Vector Regression With an Illustration of Its Expressive Power in Speech Enhancement

IRIS

This paper focuses on a theoretical analysis of deep neural network (DNN) based functional approximation. Leveraging upon two classical theorems on universal approximation, an artificial neural network (ANN) with a single hidden layer of neurons is used. With modified ReLU and Sigmoid activation functions, we first generalize the related concepts to vector-to-vector regression. Then, we show that the width of the hidden layer of ANN is numerically related to the approximation of the regression function. Furthermore, we increase the number of hidden layers and show that the depth of the ANN-based regression function can enhance its expressive power. We illustrate this representation with recently-emerged DNN based speech enhancement. We first compare the expressive power by varying ANN structures and then test its related regression performance under different noisy conditions in various noise types and signal-to-noise-ratio levels. Experimental results verify our theoretical prediction that an ANN of a broader hidden layer and a deeper architecture can jointly ensure a closer approximation of the vector-to-vector regression functions in terms of the Euclidean distance between the log power spectra of noisy and expected clean speech. Moreover, a DNN with a broader width at the top hidden layer can improve the regression performance relative to those with a narrower width at the top hidden layers.

A Theory on Deep Neural Network Based Vector-to-Vector Regression With an Illustration of Its Expressive Power in Speech Enhancement

J. Qi;J. Du;S. M. SINISCALCHI^{Formal Analysis};C. -H. Lee

2019-01-01

Abstract

This paper focuses on a theoretical analysis of deep neural network (DNN) based functional approximation. Leveraging upon two classical theorems on universal approximation, an artificial neural network (ANN) with a single hidden layer of neurons is used. With modified ReLU and Sigmoid activation functions, we first generalize the related concepts to vector-to-vector regression. Then, we show that the width of the hidden layer of ANN is numerically related to the approximation of the regression function. Furthermore, we increase the number of hidden layers and show that the depth of the ANN-based regression function can enhance its expressive power. We illustrate this representation with recently-emerged DNN based speech enhancement. We first compare the expressive power by varying ANN structures and then test its related regression performance under different noisy conditions in various noise types and signal-to-noise-ratio levels. Experimental results verify our theoretical prediction that an ANN of a broader hidden layer and a deeper architecture can jointly ensure a closer approximation of the vector-to-vector regression functions in terms of the Euclidean distance between the log power spectra of noisy and expected clean speech. Moreover, a DNN with a broader width at the top hidden layer can improve the regression performance relative to those with a narrower width at the top hidden layers.

Scheda breve

Scheda completa

Scheda completa (DC)

Anno

2019

Appare nelle tipologie:

1.1 Articolo in rivista

File in questo prodotto:

Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11387/137160

Citazioni

ND

33

25

social impact