IISc Logo    Title

etd AT Indian Institute of Science >
Division of Electrical Sciences >
Electrical Communication Engineering (ece) >

Please use this identifier to cite or link to this item: http://hdl.handle.net/2005/1007

Title: Spectro-Temporal Features For Robust Automatic Speech Recognition
Authors: Suryanarayana, Venkata K
Advisors: Sreenivas, T V
Keywords: Speech Processing (Artificial Intelligence)
Speech Recognition
Speech Signal Processing
Automatic Speech Recognition (ASR)
Robust Speech Recognition
Amplitude Modulated (AM) And Frequency Modulated (FM) Modeling
AM-FM Modeling
Submitted Date: Jan-2009
Series/Report no.: G23375
Abstract: The speech signal is inherently characterized by its variations in time, which get reflected as variations in frequency. The specto temporal changes are due to changes in vocaltract, intonation, co-articulation and successive articulation of different phonetic sounds. In this thesis we are looking for improving the speech recognition performance through better feature parameters using a non-stationary model of speech. One effective means of modeling a general non-stationary signal is using the AM-FM model. AM-FM model can be extended to speech through a sub-band analysis, which can be mimic the auditory analysis. In this thesis, we explore new methods for estimating AM and FM parameters based on the non-uniform samples of the signal. The non-uniform sample approach along with adaptive window estimation provides for important advantage because of multi-resolution analysis. We develop several new methods based on ZC intervals, local extrema intervals and signal derivative at ZC’s as different sample measures of the signal and explore their effectiveness for instantaneous frequency (IF) and instantaneous envelope (IE) estimation. To deal with speech signal for automatic speech recognition, we explore the use of auditory motivated spectro temporal information through the use of an auditory filter bank and signal parameters (or features) are derived from the instantaneous energy in each band using the non-linear energy operator over a larger window length. The temporal correlation present in the signal is exploited by using DCT and keeping the lower few coefficients of DCT to keep the trend in the energy in each band. The DCT coefficients from different frequency bands are concatenated together, and a further spectral decorrelation is achieved through KLT (Karhunen-Loeve Transform) of the concatenated feature vector. The changes in the vocaltract are well captured by the change in the formant structure and to emphasize these details for ASR we have defined a temporal formant by using the AM-FM decomposition of sub-band speech. A uniform wideband non-overlaping filters are used for sub-band decomposition. The temporal formant is defined using the AM-FM parameters of each subband signal. The temporal evolution of a formant is represented by the lower order DCT coefficients of the temporal formant in each band and its use for ASR is explored. To address the robustness of ASR performance to environmental noisy conditions, we have used a hybrid approach of enhancing the speech signal using statistical models of the speech and noise. Use of GMM for statistical speech enhancement has been shown to be effective. It is found that the spectro-temporal features derived from enhanced speech provide further improvement to ASR performance.
URI: http://hdl.handle.net/2005/1007
Appears in Collections:Electrical Communication Engineering (ece)

Files in This Item:

File Description SizeFormat
G23375.pdf1.22 MBAdobe PDFView/Open

Items in etd@IISc are protected by copyright, with all rights reserved, unless otherwise indicated.


etd@IISc is a joint service of NCSI & IISc Library ||
|| Powered by DSpace || Compliant to OAI-PMH V 2.0 and ETD-MS V 1.01