<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:dc="http://purl.org/dc/elements/1.1/" version="2.0">
  <channel>
    <title>etd@IISc Collection:</title>
    <link>http://hdl.handle.net/2005/25</link>
    <description />
    <pubDate>Tue, 30 Apr 2013 11:37:11 GMT</pubDate>
    <dc:date>2013-04-30T11:37:11Z</dc:date>
    <item>
      <title>Algorithmic Approaches For Protein-Protein Docking And quarternary Structure Inference</title>
      <link>http://hdl.handle.net/2005/1066</link>
      <description>Title: Algorithmic Approaches For Protein-Protein Docking And quarternary Structure Inference
Authors: Mitra, Pralay
Abstract: Molecular interaction among proteins drives the cellular processes through the formation of complexes that perform the requisite biochemical function. While some of the complexes are obligate (i.e., they fold together while complexation) others are non-obligate, and are formed through macromolecular recognition. Macromolecular recognition in proteins is highly specific, yet it can be both permanent and non permanent in nature. Hallmarks of permanent recognition complexes include large surface of interaction, stabilization by hydrophobic interaction and other noncovalent forces. Several amino acids which contribute critically to the free energy of binding at these interfaces are called as “hot spot” residues. The non permanent recognition complexes, on the other hand, usually show small interface of interaction, with limited stabilization from non covalent forces. For both the permanent and non permanent complexes, the specificity of molecular interaction is governed by the geometric compatibility of the interaction surface, and the noncovalent forces that anchor them. A great deal of studies has already been performed in understanding the basis of protein macromolecular recognition.1; 2 Based on these studies efforts have been made to develop protein-protein docking algorithms that can predict the geometric orientation of the interacting molecules from their individual unbound states. Despite advances in docking methodologies, several significant difficulties remain.1 Therefore, in this thesis, we start with literature review to understand the individual merits and demerits of the existing approaches (Chapter 1),3 and then, we attempt to address some of the problems by developing methods to infer protein quaternary structure from the crystalline state, and improve structural and chemical understanding of protein-protein interactions through biological complex prediction. &#xD;
The understanding of the interaction geometry is the first step in a protein-protein interaction study. Yet, no consistent method exists to assess the geometric compatibility of the interacting interface because of its highly rugged nature. This suggested that new sensitive measures and methods are needed to tackle the problem. We, therefore, developed two new and conceptually different measures using the Delaunay tessellation and interface slice selection to compute the surface complementarity and atom packing at the protein-protein interface (Chapter 2).4 We called these Normalized Surface Complementarity (NSc) and Normalized Interface Packing (NIP). We rigorously benchmarked the measures on the non redundant protein complexes available in the Protein Data Bank (PDB) and found that they efficiently segregate the biological protein-protein contacts from the non biological ones, especially those derived from X-ray crystallography. Sensitive surface packing/complementarity recognition algorithms are usually computationally expensive and thus limited in application to high-throughput screening. Therefore, special emphasis was given to make our measure compute-efficient as well. Our final evaluation showed that NSc, and NIP have very strong correlation among themselves, and with the interface area normalized values available from the Surface Complementarity program (CCP4 Suite: &lt;http://smb.slac.stanford.edu/facilities/software/ccp4/html/sc.html&gt;); but at a fraction of the computing cost.  &#xD;
After building the geometry based surface complementarity and packing assessment methods to assess the rugged protein surface, we advanced our goal to determine the stabilities of the geometrically compatible interfaces formed. For doing so, we needed to survey the quaternary structure of proteins with various affinities. The emphasis on affinity arose due to its strong relationship with the permanent and non permanent life-time of the complex. We, therefore, set up data mining studies on two databases named PQS (Protein Quaternary structure database: http://pqs.ebi.ac.uk) and PISA (Protein Interfaces, Surfaces and Assemblies: www.ebi.ac.uk/pdbe/prot_int/pistart.html) that offered downloads on quaternary structure data on protein complexes derived from X-ray crystallographic methods. To our surprise, we found that above mentioned databases provided the valid quaternary structure mostly for moderate to strong affinity complexes. The limitation could be ascertained by browsing annotations from another curated database of protein quaternary structure (PiQSi:5 supfam.mrc-lmb.cam.ac.uk/elevy/piqsi/piqsi_home.cgi) and literature surveys. This necessitated that we at first develop a more robust method to infer quaternary structures of all affinity available from the PDB. We, therefore, developed a new scheme focused on covering all affinity category complexes, especially the weak/very weak ones, and heteromeric quaternary structures (Chapter 3).6 Our scheme combined the naïve Bayes classifier and point-group symmetry under a Boolean framework to detect all categories of protein quaternary structures in crystal lattice. We tested it on a standard benchmark consisting of 112 recognition heteromeric complexes, and obtained a correct recall in 95% cases, which are significantly better than 53% achieved by the PISA,7 a state-of-art quaternary structure detection method hosted at the European Bioinformatics Institute, Hinxton, UK. A few cases that failed correct detection through our scheme, offered interesting insights into the intriguing nature of protein contacts in the lattice. The findings have implications for accurate inference of quaternary states of proteins, especially weak affinity complexes, where biological protein contacts tend to be sacrificed for the energetically optimal ones that favor the formation/stabilization of the crystal lattice. We expect our method to be used widely by all researchers interested in protein quaternary structure and interaction. &#xD;
Having developed a method that allows us to sample all categories of quaternary structures in PDB, we set our goal in addressing the next problem that of accurately determining stabilities of the geometrically compatible protein surfaces involved in interaction. Reformulating the question in terms of protein-protein docking, we sought to ask how we could reliably infer the stabilities of any arbitrary interface that is formed when two protein molecules are brought sterically closer. In a real protein docking exercise this question is asked innumerable times during energy-based screening of thousands of decoys geometrically sampled (through rotation+translation) from the unbound subunits. The current docking methods face problems in two counts: (i), the number of interfaces from decoys to evaluate energies is rather large (64320 for a 9º rotation and translation for a dimeric complex), and (ii) the energy based screening is not quite efficient such that the decoys with native-like quaternary structure are rarely selected at high ranks. We addressed both the problems with interesting results. &#xD;
Intricate decoy filtering approaches have been developed, which are either applied during the search stage or the sampling stage, or both.  For filtering, usually statistical information, such as 3D conservation information of the interfacial residues, or similar facts is used; more expensive approaches screen for orientation, shape complementarity and electrostatics. We developed an interface area based decoy filter for the sampling stage, exploiting an assumption that native-like decoys must have the largest, or close to the largest, interface (Chapter 4).8 Implementation of this assumption and standard benchmarking showed that in 91% of the cases, we could recover native-like decoys of bound and unbound binary docking-targets of both strong and weak affinity. This allowed us to propose that “native-like decoys must have the largest, or close to the largest, interface” can be used as a rule to exclude non native decoys efficiently during docking sampling. This rule can dramatically clip the needle-in-a-haystack problem faced in a docking study by reducing &gt;95% of the decoy set available from sampling search. We incorporated the rule as a central part of our protein docking strategy. &#xD;
While addressing the question of energy based screening to rank the native-like decoys at high rank during docking, we came across a large volume of work already published. The mainstay of most of the energy based screenings that avoid statistical potential, involve some form of the Coulomb’s potential, Lennard Jones potential and solvation energy. Different flavors of the energy functions are used with diverse preferences and weights for individual terms. Interestingly, in all cases the energy functions were of the unnormalized form. Individual energy terms were simply added to arrive at a final score that was to be used for ranking. Proteins being large molecules, offer limited scope of applying semi-empirical or quantum mechanical methods for large scale evaluation of energy. We, therefore, developed a de novo empirical scoring function in the normalized form. As already stated, we found NSc and NIP to be highly discriminatory for segregating biological and non biological interface. We, therefore, incorporated them as parameters for our scoring function. Our data mining study revealed that there is a reasonable correlation of -0.73 between normalized solvation energy and normalized nonbonding energy (Coulombs + van der Waals) at the interface. Using the information, we extended our scoring function by combining the geometric measures and the normalized interaction energies. Tests on 30 unbound binary protein-protein complexes showed that in 16 cases we could identify at least one decoy in top three ranks with ≤10 Å backbone root-mean-square-deviation (RMSD) from true binding geometry. The scoring results were compared with other state-of-art methods, which returned inferior results. The salient feature of our scoring function was exclusion of any experiment guided restraints, evolutionary information, statistical propensities or modified interaction energy equations, commonly used by others. Tests on 118 less difficult bound binary protein-protein complexes with ≤35% sequence redundancy at the interface gave first rank in 77% cases, where the native like decoy was chosen among 1 in 10,000 and had ≤5 Å backbone RMSD from true geometry. The details about the scoring function, results and comparison with the other methods are extensively discussed in Chapter 5.9 The method has been implemented and made available for public use as a web server - PROBE (http://pallab.serc.iisc.ernet.in/probe). The development and use of PROBE has been elaborated in Chapter 7.10 &#xD;
On course of this work, we generated huge amounts of data, which is useful information that could be used by others, especially “protein dockers”. We, therefore, developed dockYard (http://pallab.serc.iisc.ernet.in/dockYard) - a repository for protein-protein docking decoys (Chapter 6).11 dockYard offers four categories of docking decoys derived from: Bound (native dimer co-crystallized), Unbound (individual subunits as well as the target are crystallized), Variants (match the previous two categories in at least one subunit with 100% sequence identity), and Interlogs (match the previous categories in at least one subunit with ≥90% or ≥50% sequence identity). There is facility for full or selective download based on search parameters. The portal also serves as a repository to modelers who may want to share their decoy sets with the community. &#xD;
In conclusion, although we made several contributions in development of algorithms for improved protein-protein docking and quaternary structure inference, a lot of challenges remain (Chapter 8). The principal challenge arises by considering proteins as flexible bodies, whose conformational states may change on quaternary structure formation. In addition, solvent plays a major role in the free energy of binding, but its exact contribution is not straightforward to estimate. Undoubtedly, the cost of computation is one of the limiting factors apart from good energy functions to evaluate the docking decoys. Therefore, the next generation of algorithms must focus on improved docking studies that realistically incorporate flexibility and solvent environment in all their evaluations.</description>
      <pubDate>Sun, 13 Feb 2011 18:30:00 GMT</pubDate>
      <guid isPermaLink="false">http://hdl.handle.net/2005/1066</guid>
      <dc:date>2011-02-13T18:30:00Z</dc:date>
    </item>
    <item>
      <title>Efficiently Approximating Query Optimizer Diagrams</title>
      <link>http://hdl.handle.net/2005/920</link>
      <description>Title: Efficiently Approximating Query Optimizer Diagrams
Authors: Dey, Atreyee
Abstract: Modern database systems use a query optimizer to identify the most efficient strategy, called “query execution plan”, to execute declarative SQL queries. The role of the query optimizer is especially critical for the complex decision-support queries featured in current data warehousing and data mining applications.&#xD;
Given an SQL query template that is parametrized on the selectivities of the participating base relations and a choice of query optimizer, a plan diagram is a color-coded pictorial enumeration of the execution plan choices of the optimizer over the query parameter space. Complementary to the plan-diagrams are cost and cardinality diagrams which graphically plot the estimated execution costs and cardinalities respectively, over the query parameter space. These diagrams are collectively known as optimizer diagrams. Optimizer diagrams have proved to be a powerful tool for the analysis and redesign of modern optimizers, and are gaining interest in diverse industrial and academic institutions. However, their utility is adversely impacted by the impractically large computational overheads incurred when standard brute-force approaches are used for producing fine-grained diagrams on high-dimensional query templates.&#xD;
In this thesis, we investigate strategies for efficiently producing close approximations to complex optimizer diagrams. Our techniques are customized for different classes of optimizers, ranging from the generic Class I optimizers that provide only the optimal plan for a query, to Class II optimizers that also support costing of sub-optimal plans and Class III optimizers which offer enumerated rank-ordered lists of plans in addition to both the former features.&#xD;
For approximating plan diagrams for Class I optimizers, we first present database oblivious techniques based on classical random sampling in conjunction with nearest neighbor (NN) inference scheme. Next we propose grid sampling algorithms which consider database specific knowledge such as(a) the structural differences between the operator trees of plans on the grid locations and (b) parametric query optimization principle. These algorithms become more efficient when modified to exploit the sub-optimal plan costing feature available with Class II optimizers. The final algorithm developed for Class III optimizers assume plan cost monotonicity and utilize the rank-ordered lists of plans to efficiently generate completely accurate optimizer diagrams. Subsequently, we provide a relaxed variant, which trades quality of approximation, for reduction in diagram generation overhead. Our proposed algorithms are capable of terminating according to user given error bound for plan diagram approximation.&#xD;
For approximating cost diagrams, our strategy is based on linear least square regression performed on a mathematical model of plan cost behavior over the parameter space, in conjunction with interpolation techniques. Game theoretic and linear programming approaches have been employed to further reduce the error in cost approximation.&#xD;
For approximating cardinality diagrams, we propose a novel parametrized mathematical model as a function of selectivities for characterizing query cardinality behavior. The complete cardinality model is constructed by clustering the data points according to their cardinality values and subsequently fitting the model through linear least square regression technique separately for each cluster. For non-sampled data points the cardinality values are estimated by first determining the cluster they belong to and then interpolating the cardinality value according to the suitable model.&#xD;
Extensive experimentation with a representative set of TPC-H and TPC-DS-based query templates on industrial-strength optimizers indicates that our techniques are capable of delivering 90% accurate optimizer diagrams while incurring no more than 20% of the computational overheads of the exhaustive approach. Infact, for full-featured optimizers, we can guarantee zero error optimizer diagrams which usually require less than 10% overheads. Our results exhibit that (a) the approximation is materially faithful to the features of the exact optimizer diagram, with the errors thinly spread across the picture and Largely confined to the plan transition boundaries and (b) the cost increase at the non-sampled point due to assignment of sub-optimal plan is also limited.&#xD;
These approximation techniques have been implemented in the publicly available Picasso optimizer visualizer tool. We have also modified PostgreSQL’s optimizer to incorporate costing of sub-optimal plans and enumerating rank-ordered lists of plans. In addition to these, we have designed estimators for predicting the time overhead involved in approximating optimizer diagrams with regard to user given error bounds.&#xD;
In summary, this thesis demonstrates that accurate approximations to exact optimizer diagrams can indeed be obtained cheaply and consistently, with typical overheads being an order of magnitude lower than the brute-force approach. We hope that our results will encourage database vendors to incorporate the foreign-plan-costing and plan-rank-list features in their optimizer APIs.</description>
      <pubDate>Wed, 20 Oct 2010 18:30:00 GMT</pubDate>
      <guid isPermaLink="false">http://hdl.handle.net/2005/920</guid>
      <dc:date>2010-10-20T18:30:00Z</dc:date>
    </item>
    <item>
      <title>Search On A Hypercubic Lattice Using Quantum Random Walk</title>
      <link>http://hdl.handle.net/2005/972</link>
      <description>Title: Search On A Hypercubic Lattice Using Quantum Random Walk
Authors: Rahaman, Md Aminoor
Abstract: Random walks describe diffusion processes, where movement at every time step is restricted only to neighbouring locations. Classical random walks are constructed using the non-relativistic Laplacian evolution operator and a coin toss instruction. In quantum theory, an alternative is to use the relativistic Dirac operator. That necessarily introduces an internal degree of freedom (chirality), which may be identified with the coin. The resultant walk spreads quadratically faster than the classical one, and can be applied to a variety of graph theoretical problems. &#xD;
We study in detail the problem of spatial search, i.e. finding a marked site on a hypercubic lattice in d-dimensions. For d=1, the scaling behaviour of classical and quantum spatial search is the same due to the restriction on movement. On the other hand, the restriction on movement hardly matters for d ≥ 3, and scaling behaviour close to Grover’s optimal algorithm(which has no restriction on movement) can be achieved. d=2 is the borderline critical dimension, where infrared divergence in propagation leads to logarithmic slow down that can be minimised using clever chirality flips. In support of these analytic expectations, we present numerical simulation results for d=2 to d=9, using a lattice implementation of the Dirac operator inspired by staggered fermions. We optimise the parameters of the algorithm, and the simulation results demonstrate that the number of binary oracle calls required for d= 2 and d ≥ 3 spatial search problems are O(√NlogN) and O(√N) respectively. Moreover, with increasing d, the results approach the optimal behaviour of Grover’s algorithm(corresponding to mean field theory or d → ∞ limit). In particular, the d = 3 scaling behaviour is only about 25% higher than the optimal value.</description>
      <pubDate>Wed, 29 Dec 2010 18:30:00 GMT</pubDate>
      <guid isPermaLink="false">http://hdl.handle.net/2005/972</guid>
      <dc:date>2010-12-29T18:30:00Z</dc:date>
    </item>
    <item>
      <title>RETHROTTLE : Execution Throttling In The REDEFINE SoC Architecture</title>
      <link>http://hdl.handle.net/2005/1017</link>
      <description>Title: RETHROTTLE : Execution Throttling In The REDEFINE SoC Architecture
Authors: Satrawala, Amar Nath
Abstract: REDEFINE is a reconfigurable SoC architecture that provides a unique platform for high performance and low power computing by exploiting the synergistic interaction between coarse grain dynamic dataflow model of computation (to expose abundant parallelism in the applications) and runtime composition of efficient compute structures (on the reconfigurable computation resources). Computer architectures based on the dynamic dataflow model of computation have to be an infinite resource implementation to be able to exploit all available parallelism in all applications. It is not feasible for any real architectural implementation. When limited resource implementations are considered, there is a possibility of loss of performance (inability to efficiently exploit available parallelism). In this thesis, we study the throttling of execution in the REDEFINE architecture to maximize the architecture efficiency. We have formulated it as a design space exploration problem at two levels i.e. architectural configurations and throttling schemes. &#xD;
Reduced feature/high level simulation or feature specific analytical approaches are very useful for the selective study/exploration of early in design phase architectures/systems. Our approach is similar to that of SEASAME Framework which is used for the study of MPSoC (Multiprocessor SoC) architectures. We have used abstraction (feature reduction) at the levels of architecture and model of computation to make the problem approachable and practically feasible. A feature specific fast hybrid (mixed level) simulation framework for the early in design phase study is developed and implemented for the huge design space exploration (1284 throttling schemes, 128 architectural configurations and 10 applications i.e. 1.6 million executions). &#xD;
We have done performance modeling in terms of selection of important performance criteria, ranking of the explored throttling schemes and investigation of the effectiveness of the design space exploration using statistical hypothesis testing. We found some interesting obvious/intuitive and some non-obvious/counterintuitive results. The two performance criteria namely Exec.T and Avg.TU were found sufficient to represent the performance and the resource usage characteristics of the architecture independent of the throttling schemes, the architectural configurations and the applications. The ranking of the throttling schemes based on the selected performance criteria is found to be statistically very significant. The intuitive throttling schemes span the range of performance from the best to the worst. We found absence of trade-off amongst all of the performance criteria. The best throttling schemes give appreciable overall performance (25%) and resource usage (37%) gains in the throttling unit simultaneously. The design space exploration of the throttling schemes is found to be fine and uniform.</description>
      <pubDate>Wed, 19 Jan 2011 18:30:00 GMT</pubDate>
      <guid isPermaLink="false">http://hdl.handle.net/2005/1017</guid>
      <dc:date>2011-01-19T18:30:00Z</dc:date>
    </item>
  </channel>
</rss>

