etd@IISc Community:
http://hdl.handle.net/2005/1
Thu, 20 Aug 2015 12:50:25 GMT2015-08-20T12:50:25ZLearning Robust Support Vector Machine Classifiers With Uncertain Observations
http://hdl.handle.net/2005/2475
Title: Learning Robust Support Vector Machine Classifiers With Uncertain Observations
Authors: Bhadra, Sahely
Abstract: The central theme of the thesis is to study linear and non linear SVM formulations in the presence of uncertain observations. The main contribution of this thesis is to derive robust classfiers from partial knowledge of the underlying uncertainty.
In the case of linear classification, a new bounding scheme based on Bernstein inequality has been proposed, which models interval-valued uncertainty in a less conservative fashion and hence is expected to generalize better than the existing methods. Next, potential of partial information such as bounds on second order moments along with support information has been explored. Bounds on second order moments make the resulting classifiers robust to moment estimation errors.
Uncertainty in the dataset will lead to uncertainty in the kernel matrices. A novel distribution free large deviation inequality has been proposed which handles uncertainty in kernels through co-positive programming in a chance constraint setting. Although such formulations are NP hard, under several cases of interest the problem reduces to a convex program. However, the independence assumption mentioned above, is restrictive and may not always define a valid uncertain kernel. To alleviate this problem an affine set based alternative is proposed and using a robust optimization framework the resultant problem is posed as a minimax problem.
In both the cases of Chance Constraint Program or Robust Optimization (for non-linear SVM), mirror descent algorithm (MDA) like procedures have been applied.Tue, 18 Aug 2015 18:30:00 GMThttp://hdl.handle.net/2005/24752015-08-18T18:30:00ZSentiment-Driven Topic Analysis Of Song Lyrics
http://hdl.handle.net/2005/2472
Title: Sentiment-Driven Topic Analysis Of Song Lyrics
Authors: Sharma, Govind
Abstract: Sentiment Analysis is an area of Computer Science that deals with the impact a document makes on a user. The very field is further sub-divided into Opinion Mining and Emotion Analysis, the latter of which is the basis for the present work. Work on songs is aimed at building affective interactive applications such as music recommendation engines. Using song lyrics, we are interested in both supervised and unsupervised analyses, each of which has its own pros and cons.
For an unsupervised analysis (clustering), we use a standard probabilistic topic model called Latent Dirichlet Allocation (LDA). It mines topics from songs, which are nothing but probability distributions over the vocabulary of words. Some of the topics seem sentiment-based, motivating us to continue with this approach. We evaluate our clusters using a gold dataset collected from an apt website and get positive results. This approach would be useful in the absence of a supervisor dataset.
In another part of our work, we argue the inescapable existence of supervision in terms of having to manually analyse the topics returned. Further, we have also used explicit supervision in terms of a training dataset for a classifier to learn sentiment specific classes. This analysis helps reduce dimensionality and improve classification accuracy. We get excellent dimensionality reduction using Support Vector Machines (SVM) for feature selection. For re-classification, we use the Naive Bayes Classifier (NBC) and SVM, both of which perform well. We also use Non-negative Matrix Factorization (NMF) for classification, but observe that the results coincide with those of NBC, with no exceptions. This drives us towards establishing a theoretical equivalence between the two.Sun, 16 Aug 2015 18:30:00 GMThttp://hdl.handle.net/2005/24722015-08-16T18:30:00ZA Hierarchical Approach To Music Analysis And Source Separation
http://hdl.handle.net/2005/2460
Title: A Hierarchical Approach To Music Analysis And Source Separation
Authors: Thoshkahna, Balaji
Abstract: Music analysis and source separation have become important and allied areas of research over the last decade. Towards this, analyzing a music signal for important events such as onsets, offsets and transients are important problems. These tasks help in music source separation and transcription. Approaches in source separation too have been making great strides, but most of these techniques are aimed at Western music and fail to perform well for Indian music. The fluid style of instrumentation in Indian music requires a slightly modified approach to analysis and source separation.
We propose an onset detection algorithm that is motivated by the human auditory system. This algorithm has the advantage of having a unified framework for the detection of both onsets and offsets in music signals. This onset detection algorithm is further extended to detect percussive transients. Percussive transients have sharp onsets followed closely by sharp offsets. This characteristic is exploited in the percussive transients detection algorithm. This detection does not lend itself well to the extraction of transients and hence we propose an iterative algorithm to extract all types of transients from a polyphonic music signal. The proposed iterative algorithm is both fast and accurate to extract transients of various strengths. This problem of transient extraction can be extended to the problem of harmonic/percussion sound separation(HPSS), where a music signal is separated into two streams consisting of components mainly from percussion and harmonic instruments. Many algorithms that have been proposed till date deal with HPSS for Western music. But with Indian classical/film music, a different style of instrumentation or singing is seen, including high degree of vibratos or glissando content. This requires new approaches to HPSS. We propose extensions to two existing HPSS techniques, adapting them for Indian music. In both the extensions, we retain the original framework of the algorithm, showing that it is easy to incorporate the changes needed to handle Indian music. We also propose a new HPSS algorithm that is inspired by our transient extraction technique. This algorithm can be considered a generalized extension to our
transient extraction algorithm and showcases our view that HPSS can be considered as an extension to transient analysis. Even the best HPSS techniques have leakages of harmonic components into percussion and this can lead to poor performances in tasks like rhythm analysis. In order to reduce this leakage, we propose a post processing technique on the percussion stream of the HPSS algorithm. The proposed method utilizes signal stitching by exploiting a commonly used model for percussive envelopes. We also developed a vocals extraction algorithm from the harmonic stream of the HPSS algorithm. The vocals extraction follows the popular paradigm of extracting the predominant pitch followed by generation of the vocals signal corresponding to the pitch. We show that HPSS as a pre-processing technique gives an advantage in reducing the interference from percussive sources in the extraction stage. It is also shown that the performance of vocal extraction algorithms improve with the knowledge about locations of the vocal segments. This is shown with the help of an oracle to locate the vocal segments. The use of the oracle greatly reduces the interferences from other dominating sources in the extracted vocals signal.Wed, 05 Aug 2015 18:30:00 GMThttp://hdl.handle.net/2005/24602015-08-05T18:30:00ZNonstationary Techniques For Signal Enhancement With Applications To Speech, ECG, And Nonuniformly-Sampled Signals
http://hdl.handle.net/2005/2452
Title: Nonstationary Techniques For Signal Enhancement With Applications To Speech, ECG, And Nonuniformly-Sampled Signals
Authors: Sreenivasa Murthy, A
Abstract: For time-varying signals such as speech and audio, short-time analysis becomes necessary to compute specific signal attributes and to keep track of their evolution. The standard technique is the short-time Fourier transform (STFT), using which one decomposes a signal in terms of windowed Fourier bases. An advancement over STFT is the wavelet analysis in which a function is represented in terms of shifted and dilated versions of a localized function called the wavelet. A specific modeling approach particularly in the context of speech is based on short-time linear prediction or short-time Wiener filtering of noisy speech. In most nonstationary signal processing formalisms, the key idea is to analyze the properties of the signal locally, either by first truncating the signal and then performing a basis expansion (as in the case of STFT), or by choosing compactly-supported basis functions (as in the case of wavelets). We retain the same motivation as these approaches, but use polynomials to model the signal on a short-time basis (“short-time polynomial representation”). To emphasize the local nature of the modeling aspect, we refer to it as “local polynomial modeling (LPM).”
We pursue two main threads of research in this thesis: (i) Short-time approaches for speech enhancement; and (ii) LPM for enhancing smooth signals, with applications to ECG, noisy nonuniformly-sampled signals, and voiced/unvoiced segmentation in noisy speech.
Improved iterative Wiener filtering for speech enhancement
A constrained iterative Wiener filter solution for speech enhancement was proposed by Hansen and Clements. Sreenivas and Kirnapure improved the performance of the technique by imposing codebook-based constraints in the process of parameter estimation. The key advantage is that the optimal parameter search space is confined to the codebook. The Nonstationary signal enhancement solutions assume stationary noise. However, in practical applications, noise is not stationary and hence updating the noise statistics becomes necessary. We present a new approach to perform reliable noise estimation based on spectral subtraction. We first estimate the signal spectrum and perform signal subtraction to estimate the noise power spectral density. We further smooth the estimated noise spectrum to ensure reliability. The key contributions are: (i) Adaptation of the technique for non-stationary noises; (ii) A new initialization procedure for faster convergence and higher accuracy; (iii) Experimental determination of the optimal LP-parameter space; and (iv) Objective criteria and speech recognition tests for performance comparison.
Optimal local polynomial modeling and applications
We next address the problem of fitting a piecewise-polynomial model to a smooth signal corrupted by additive noise. Since the signal is smooth, it can be represented using low-order polynomial functions provided that they are locally adapted to the signal. We choose the mean-square error as the criterion of optimality. Since the model is local, it preserves the temporal structure of the signal and can also handle nonstationary noise. We show that there is a trade-off between the adaptability of the model to local signal variations and robustness to noise (bias-variance trade-off), which we solve using a stochastic optimization technique known as the intersection of confidence intervals (ICI) technique. The key trade-off parameter is the duration of the window over which the optimum LPM is computed.
Within the LPM framework, we address three problems: (i) Signal reconstruction from noisy uniform samples; (ii) Signal reconstruction from noisy nonuniform samples; and (iii) Classification of speech signals into voiced and unvoiced segments.
The generic signal model is
x(tn)=s(tn)+d(tn),0 ≤ n ≤ N - 1.
In problems (i) and (iii) above, tn=nT(uniform sampling); in (ii) the samples are taken at nonuniform instants. The signal s(t)is assumed to be smooth; i.e., it should admit a local polynomial representation. The problem in (i) and (ii) is to estimate s(t)from x(tn); i.e., we are interested in optimal signal reconstruction on a continuous domain starting from uniform or nonuniform samples.
We show that, in both cases, the bias and variance take the general form:
The mean square error (MSE) is given by
where L is the length of the window over which the polynomial fitting is performed, f is a function of s(t), which typically comprises the higher-order derivatives of s(t), the order itself dependent on the order of the polynomial, and g is a function of the noise variance. It is clear that the bias and variance have complementary characteristics with respect to L. Directly optimizing for the MSE would give a value of L, which involves the functions f and g. The function g may be estimated, but f is not known since s(t)is unknown. Hence, it is not practical to compute the minimum MSE (MMSE) solution. Therefore, we obtain an approximate result by solving the bias-variance trade-off in a probabilistic sense using the ICI technique. We also propose a new approach to optimally select the ICI technique parameters, based on a new cost function that is the sum of the probability of false alarm and the area covered over the confidence interval. In addition, we address issues related to optimal model-order selection, search space for window lengths, accuracy of noise estimation, etc.
The next issue addressed is that of voiced/unvoiced segmentation of speech signal. Speech segments show different spectral and temporal characteristics based on whether the segment is voiced or unvoiced. Most speech processing techniques process the two segments differently. The challenge lies in making detection techniques offer robust performance in the presence of noise. We propose a new technique for voiced/unvoiced clas-sification by taking into account the fact that voiced segments have a certain degree of regularity, and that the unvoiced segments do not possess any smoothness. In order to capture the regularity in voiced regions, we employ the LPM. The key idea is that regions where the LPM is inaccurate are more likely to be unvoiced than voiced. Within this frame-work, we formulate a hypothesis testing problem based on the accuracy of the LPM fit and devise a test statistic for performing V/UV classification. Since the technique is based on LPM, it is capable of adapting to nonstationary noises. We present Monte Carlo results to demonstrate the accuracy of the proposed technique.Tue, 21 Jul 2015 18:30:00 GMThttp://hdl.handle.net/2005/24522015-07-21T18:30:00Z