etd AT Indian Institute of Science >
Division of Electrical Sciences >
Computer Science and Automation (csa) >
Please use this identifier to cite or link to this item:
|Title: ||Computational Protein Structure Analysis : Kernel And Spectral Methods|
|Authors: ||Bhattacharya, Sourangshu|
|Advisors: ||Bhattacharyya, Chiranjib|
|Keywords: ||Protein - Structure|
Protein Structure - Data Processing
Protein Structure Alignment
Spectral Graph Theory
Protein Structure Classification
|Submitted Date: ||Aug-2008|
|Series/Report no.: ||G22590|
|Abstract: ||The focus of this thesis is to develop computational techniques for analysis of protein structures. We model protein structures as points in 3-dimensional space which in turn are modeled as weighted graphs. The problem of protein structure comparison is posed as a weighted graph matching problem and an algorithm motivated from the spectral graph matching techniques is developed. The thesis also proposes novel similarity measures by deriving kernel functions. These kernel functions allow the data to be mapped to a suitably defined Reproducing kernel Hilbert Space(RKHS), paving the way for efficient algorithms for protein structure classification.
Protein structure comparison (structure alignment)is a classical method of determining overall similarity between two protein structures. This problem can be posed as the approximate weighted subgraph matching problem, which is a well known NP-Hard problem. Spectral graph matching techniques provide efficient heuristic solution for the weighted graph matching problem using eigenvectors of adjacency matrices of the graphs. We propose a novel and efficient algorithm for protein structure comparison using the notion of neighborhood preserving projections (NPP) motivated from spectral graph matching. Empirically, we demonstrate that comparing the NPPs of two protein structures gives the correct equivalences when the sizes of proteins being compared are roughly similar. Also, the resulting algorithm is 3 -20 times faster than the existing state of the art techniques. This algorithm was used for retrieval of protein structures from standard databases with accuracies comparable to the state of the art.
A limitation of the above method is that it gives wrong results when the number of unmatched residues, also called insertions and deletions (indels), are very high. This problem was tackled by matching neighborhoods, rather than entire structures. For each pair of neighborhoods, we grow the neighborhood alignments to get alignments for entire structures. This results in a robust method that has outperformed the existing state of the art methods on standard benchmark datasets. This method was also implemented using MPI on a cluster for database search.
Another important problem in computational biology is classification of protein structures into classes exhibiting high structural similarity. Many manual and semi-automatic structural classification databases exist. Kernel methods along with support vector machines (SVM) have proved to be a robust and principled tool for classification. We have proposed novel positive semidefinite kernel functions on protein structures based on spatial neighborhoods. The kernels were derived using a general technique called convolution kernel, and showed to be related to the spectral alignment score in a limiting case. These kernels have outperformed the existing tools when validated on a well known manual classification scheme called SCOP. The kernels were designed keeping the general problem of capturing structural similarity in mind, and have been successfully applied to problems in other domains, e.g. computer vision.|
|Appears in Collections:||Computer Science and Automation (csa)|
Items in etd@IISc are protected by copyright, with all rights reserved, unless otherwise indicated.