IISc Logo    Title

etd AT Indian Institute of Science >
Division of Electrical Sciences >
Computer Science and Automation (csa) >

Please use this identifier to cite or link to this item: http://hdl.handle.net/2005/1076

Title: Efficient Kernel Methods For Large Scale Classification
Authors: Asharaf, S
Advisors: Narasimha Murty, M
Keywords: Machine Learning
Automatic Classification
Kernel Method
Classification Algorithms
Support Vector Machine (SVM)
Core Vector Machine (CVM)
Rough Support Vector Clustering (RSVC)
Multiclass Core Vector Machine (MCVM)
Submitted Date: Jul-2007
Series/Report no.: G21537
Abstract: Classification algorithms have been widely used in many application domains. Most of these domains deal with massive collection of data and hence demand classification algorithms that scale well with the size of the data sets involved. A classification algorithm is said to be scalable if there is no significant increase in time and space requirements for the algorithm (without compromising the generalization performance) when dealing with an increase in the training set size. Support Vector Machine (SVM) is one of the most celebrated kernel based classification methods used in Machine Learning. An SVM capable of handling large scale classification problems will definitely be an ideal candidate in many real world applications. The training process involved in SVM classifier is usually formulated as a Quadratic Programing(QP) problem. The existing solution strategies for this problem have an associated time and space complexity that is (at least) quadratic in the number of training points. This makes the SVM training very expensive even on classification problems having a few thousands of training examples. This thesis addresses the scalability of the training algorithms involved in both two class and multiclass Support Vector Machines. Efficient training schemes reducing the space and time requirements of the SVM training process are proposed as possible solutions. The classification schemes discussed in the thesis for handling large scale two class classification problems are a) Two selective sampling based training schemes for scaling Non-linear SVM and b) Clustering based approaches for handling unbalanced data sets with Core Vector Machine. To handle large scale multicalss classification problems, the thesis proposes Multiclass Core Vector Machine (MCVM), a scalable SVM based multiclass classifier. In MVCM, the multiclass SVM problem is shown to be equivalent to a Minimum Enclosing Ball (MEB) problem and is then solved using a fast approximate MEB finding algorithm. Experimental studies were done with several large real world data sets such as IJCNN1 and Acoustic data sets from LIBSVM page, Extended USPS data set from CVM page and network intrusion detection data sets of DARPA, US Defense used in KDD 99 contest. From the empirical results it is observed that the proposed classification schemes achieve good generalization performance at low time and space requirements. Further, the scalability experiments done with large training data sets have demonstrated that the proposed schemes scale well. A novel soft clustering scheme called Rough Support Vector Clustering (RSVC) employing the idea of Soft Minimum Enclosing Ball Problem (SMEB) is another contribution discussed in this thesis. Experiments done with a synthetic data set and the real world data set namely IRIS, have shown that RSVC finds meaningful soft cluster abstractions.
URI: http://etd.iisc.ernet.in/handle/2005/1076
Appears in Collections:Computer Science and Automation (csa)

Files in This Item:

File Description SizeFormat
G21537.pdf741.66 kBAdobe PDFView/Open

Items in etd@IISc are protected by copyright, with all rights reserved, unless otherwise indicated.


etd@IISc is a joint service of SERC & IISc Library ||
|| Powered by DSpace || Compliant to OAI-PMH V 2.0 and ETD-MS V 1.01