etd@IISc Collection:http://etd.iisc.ernet.in/2005/182018-07-10T00:04:46Z2018-07-10T00:04:46ZTowards a Charcterization of the Symmetries of the Nisan-Wigderson Polynomial FamilyGupta, Nikhilhttp://etd.iisc.ernet.in/2005/38022018-07-09T13:29:26Z2018-07-08T18:30:00ZTitle: Towards a Charcterization of the Symmetries of the Nisan-Wigderson Polynomial Family
Authors: Gupta, Nikhil
Abstract: Understanding the structure and complexity of a polynomial family is a fundamental problem of arithmetic circuit complexity. There are various approaches like studying the lower bounds, which deals with nding the smallest circuit required to compute a polynomial, studying the orbit and stabilizer of a polynomial with respect to an invertible transformation etc to do this. We have a rich understanding of some of the well known polynomial families like determinant, permanent, IMM etc. In this thesis we study some of the structural properties of the polyno-mial family called the Nisan-Wigderson polynomial family. This polynomial family is inspired from a well known combinatorial design called Nisan-Wigderson design and is recently used to prove strong lower bounds on some restricted classes of arithmetic circuits ([KSS14],[KLSS14], [KST16]). But unlike determinant, permanent, IMM etc, our understanding of the Nisan-Wigderson polynomial family is inadequate. For example we do not know if this polynomial family is in VP or VNP complete or VNP-intermediate assuming VP 6= VNP, nor do we have an understanding of the complexity of its equivalence test. We hope that the knowledge of some of the inherent properties of Nisan-Wigderson polynomial like group of symmetries and Lie algebra would provide us some insights in this regard.
A matrix A 2 GLn(F) is called a symmetry of an n-variate polynomial f if f(A x) = f(x): The set of symmetries of f forms a subgroup of GLn(F), which is also known as group of symmetries of f, denoted Gf . A vector space is attached to Gf to get the complete understanding of the symmetries of f. This vector space is known as the Lie algebra of group of symmetries of f (or Lie algebra of f), represented as gf . Lie algebra of f contributes some elements of Gf , known as continuous symmetries of f. Lie algebra has also been instrumental in designing e cient randomized equivalence tests for some polynomial families like determinant, permanent, IMM etc ([Kay12], [KNST17]).
In this work we completely characterize the Lie algebra of the Nisan-Wigderson polynomial family. We show that gNW contains diagonal matrices of a speci c type. The knowledge of gNW not only helps us to completely gure out the continuous symmetries of the Nisan-Wigderson polynomial family, but also gives some crucial insights into the other symmetries of Nisan-Wigderson polynomial (i.e. the discrete symmetries). Thereafter using the Hessian matrix of the Nisan-Wigderson polynomial and the concept of evaluation dimension, we are able to almost completely identify the structure of GNW . In particular we prove that any A 2 GNW is a product of diagonal and permutation matrices of certain kind that we call block-permuted permutation matrix. Finally, we give explicit examples of nontrivial block-permuted permutation matrices using the automorphisms of nite eld that establishes the richness of the discrete symmetries of the Nisan-Wigderson polynomial family.2018-07-08T18:30:00ZCompilation of Graph Algorithms for Hybrid, Cross-Platform and Distributed ArchitecturesPatel, Paritahttp://etd.iisc.ernet.in/2005/38032018-07-09T13:54:47Z2018-07-08T18:30:00ZTitle: Compilation of Graph Algorithms for Hybrid, Cross-Platform and Distributed Architectures
Authors: Patel, Parita
Abstract: 1. Main Contributions made by the supplicant:
This thesis proposes an Open Computing Language (OpenCL) framework to address the challenges of implementation of graph algorithms on parallel architectures and large scale graph processing. The proposed framework uses the front-end of the existing Falcon DSL compiler, andso, programmers enjoy conventional, imperative and shared memory programming style. The back-end of the framework generates implementations of graph algorithms in OpenCL to target single device architectures. The generated OpenCL code is portable across various platforms, e.g., CPU and GPU, and also vendors, e.g., NVIDIA, Intel and AMD. The framework automatically generates code for thread management and memory management for the devices. It hides all the lower level programming details from the programmers. A few optimizations are applied to reduce the execution time.
The large graph processing challenge is tackled through graph partitioning over multiple devices of a single node and multiple nodes of a distributed cluster. The programmer codes a graph algorithm in Falcon assuming that the graph fits into single machine memory and the framework handles graph partitioning without any intervention by the programmer. The framework analyses the Abstract Syntax Tree (AST) generated by Falcon to find all the necessary information about communication and synchronization. It automatically generates code for message passing to hide the complexity of programming in a distributed environment. The framework also applies a set of optimizations to minimize the communication latency. The thesis reports results of several experiments conducted on widely used graph algorithms: single source shortest path, pagerank and minimum spanning tree to name a few. Experimental evaluations show that the reported results are comparable to the state-of-art non-portable graph DSLs and frameworks on a single node. Experiments in a distributed environment to show the scalability and efficiency of the framework are also described.
2. Summary of the Referees' Written Comments:
Extracts from the referees' reports are provided below. A copy of the written replies to the clarifications sought by the external examiner is appended to this report.
Referee 1: This thesis extends the Falcon framework with OpenCL for parallel graph processing on multi-device and multi-node architectures. The thesis makes important contributions. Processing large graphs in short time is very important, and making use of multiple nodes and devices is perhaps the only way to achieve this. Towards this, the thesis makes good contributions for easy programming, compiler transformations and efficient runtime systems. One of the commendable aspects of the thesis that it demonstrates with graphs that cannot be accommodated In the memory of a single device. The thesis is generally written well. The related work coverage is very good. The magnitude of thesis excellent for a Masters work. The experimental setup is very comprehensive with good set of graphs, good experimental comparisons with state-of-art works and good platforms. Particularly. the demonstration with a GPU cluster with multiple GPU nodes (Chapter 5) is excellent. The attempt to demonstrate scalability with 2, 4 and 8 nodes is also noteworthy.
However, the contributions on optimizations are weak. Most of the optimizations and compiler transformations are straight-forward. There should be summary observations on the results in Chapter 3, especially given that the results are mixed and don't quite clearly convey the clear advantages of their work. The same is the case with multi-device results in chapter 4, where the results are once again mixed. Similarly, the speedups and scalability achieved with multiple nodes are not great. The problem size justification in the multi-node results is not clear. (Referee 1 also indicates a couple of minor changes to the thesis).
Referee 2: The thesis uses the OpenCL framework to address the problem of programming graph algorithms on distributed systems. The use of OpenCL ensures that the generated code is platform-agnoistic and vendor-agnoistic. Sufficient experimentation with large scale graphs and reasonable size clusters have been conducted to demonstrate the scalability and portability of the code generated by the framework. The automatically generated code is almost as efficient as manually written code. The thesis is well written and is of high quality. The related work section is well organized and displays a good knowledge of the subject matter under consideration. The author has made important contributions to a good publication as well.
3. An Account of the Open Oral Examination:
The oral examination of Ms. Parita Patel took place during 10 AM and 11AM on 27th November 2017, in the Seminar Hall of the Department of Computer Science and Automation. The members of the Oral Examination Board present were, Prof. Sathish Vadhiyar, external examiner and Prof. Y. N. Srikant, research supervisor.
The candidate presented the work in an open defense seminar highlighting the problem domain, the methodology used, the investigations carried out by her, and the resulting contributions documented in the thesis before an audience consisting of the examiners, some faculty members, and students. Some of the questions posed by the examiners and the members of the audience during the oral examination are listed below.
1. How much is the overlap between Falcon work and this thesis?
Response: We have used the Falcon front end in our work. Further, the existing Falcon compiler was useful to us to test our own implementation of algorithms in Falcon.
2. Why are speedup and scalability not very high with multiple nodes?
Response: For the multi-node architecture, we were not able to achieve linear scalability because, with the increase in number of nodes, communication cost increases significantly. Unless the computation cost in the nodes is significant and is much more than the communication cost, this is bound to happen. 3. Do you have plans of making the code available for use by the community?
Response: The code includes some part of Falcon implementation (front-end parsing/grammar) also. After discussion with the author of Falcon, the code can be made available to the community.
4. How can a graph that does not fit into a single device fit into a single node in the case of multiple nodes?
Response: Single node machine used in the experiments of “multi-device architecture” contains multiple devices while each node used in experiments of “multi-node architecture” contains only a single device. So, the graph which does not fit into single-node-single-device memory can fit into single-node-multi-device after partitioning.
5. Is there a way to permit morph algorithms to be coded in your framework?
Response: Currently, our framework does not translate morph algorithms. Supporting morph algorithms will require some kind of runtime system to manage memory on GPU since morph algorithms add and remove the vertices and edges to the graph dynamically. This can be further explored in future work.
6. Is it possible to accommodate FPGA devices in your framework?
Response: Yes, we can support FPGA devices (or any other device that is compatible for OpenCL) just by specifying the device type in the command line argument. We did not work with other devices because CPU and GPU are generally used to process graph algorithms.
The candidate provided satisfactory answers to all the questions posed and the clarifications sought by the audience and the examiners during the presentation. The candidate's overall performance during the open defense and the oral examination was very satisfactory to the oral examination board.
4. Certificate of Corrections and Changes: All the necessary corrections and changes suggested by the examiners have been made in the thesis and these have been verified by the members of the oral examination board. The thesis has been recommended for acceptance in its revised form.
5. Final Recommendation:
In view of the recommendations of the referees and the satisfactory performance of the candidate in the oral examination, the oral examination board recommends that the thesis of Ms. ParitaPatel be accepted for the award of the M.Sc(Engg.) Degree of the Institute.
Response to the comments by the external examiner on the M.Sc(Engg.) thesis “Compilation of Graph Algorithms for Hybrid, Cross-Platform, and Distributed Architectures” by Parita Patel
1. Comment: The contributions on optimizations are weak.
Response: The novelty of this thesis is to make the Falcon platform agnostic, and additionally process large scale graphs on multi-devices of a single node and multi-node clusters seamlessly. Our framework performs similar to the existing frameworks, but at the same time, it targets several types of architectures which are not possible in the existing works. Advanced optimizations are beyond the scope of this thesis.
2. Comment: The translation of Falcon to OpenCL is simple.
While the translation of Falcon to OpenCL was not hard, figuring out the details of the translation for multi-device and multi-node architectures was not simple. For example, design of implementations for collection, set, global variables, concurrency, etc., were non-trivial. These designs have already been explained in the appropriate places in the thesis. Further, such large software introduced its own intricacies during development.
3. Comment: Lines between Falcon work and this work are not clear.
Response: Appendix-A shows the falcon implementation of all the algorithms which we used to run the experiments. We compiled these falcon implementations through our framework and subsequently ran the generated code on different types of target architectures and compared the results with other framework's generated code. These falcon programs were written by us. We have also used the front-end of the Falcon compiler and this has already been stated in the thesis (page 16).
4. Comment: There should be a summary of observations in chapter 3.
Response: Summary of observations have been added to chapter 3 (pages 35-36), chapter 4 (page 46), and chapter 5 (page 51) of the thesis.
5. Comment: Speedup and scalability achieved with multiple nodes are not great.
Response: For the multi-node architecture, we were not able to achieve linear scalability because, with the increase in number of nodes, communication cost increases significantly. Unless the computation cost in the nodes is significant and is much more than the communication cost, this is bound to happen.
6. Comment: It will be good to separate the related work coverage into a separate chapter.
Response: The related work is coherent with the flow in chapter 1. It consists of just 4.5 pages and separating it into a separate chapter would make both (rest of) chapter 1 and the new chapter very small. Therefore, we do not recommend it.
7. Comment: The code should be made available for use by the community.
Response: The code includes some part of Falcon code (front-end parsing/grammar) also. After discussion with the author of Falcon, the code can be made available to the community.
8. Comment: Page 28: Shouldn’t the else part be inside the kernel?
Response: There was some missing text and a few minor changes in Figure 3.14 (page 28) which have been incorporated in the corrected thesis.
9. Comment: Figure 4.1 needs to be explained better.
Response: Explanation for Figure 4.1 (pages 38-39) has been added to the thesis.
10. Comment: The problem size justification in the multi-node results is not clear.
Response: Single node machine used in the experiments of “multi-device architecture” contains multiple devices while each node used in experiments of “multi-node architecture” contains only a single device. So, the graph which does not fit into single-node-single-device memory can fit into single-node-multi-device after partitioning.
Name of the Candidate: Parita Patel (S.R. No. 04-04-00-10-21-14-1-11610)
Degree Registered: M.Sc(Engg.)
Department: Computer Science & Automation
Title of the Thesis: Compilation of Graph Algorithms for Hybrid, Cross-Platform and
Graph algorithms are abundantly used in various disciplines. These algorithms perform poorly
due to random memory access and negligible spatial locality. In order to improve performance, parallelism exhibited by these algorithms can be exploited by leveraging modern high performance parallel computing resources. Implementing graph algorithms for these parallel architectures requires manual thread management and memory management which becomes tedious for a programmer.
Large scale graphs cannot fit into the memory of a single machine. One solution is to partition the graph either on multiple devices of a single node or on multiple nodes of a distributed network. All the available frameworks for such architectures demand unconventional programming which is difficult and error prone.
To address these challenges, we propose a framework for compilation of graph algorithms written in an intuitive graph domain-specific language, Falcon. The framework targets shared memory parallel architectures, computational accelerators and distributed architectures (CPU and GPU cluster). First, it analyses the abstract syntax tree (generated by Falcon) and gathers essential information. Subsequently, it generates optimized code in OpenCL for shared-memory parallel architectures and computational accelerators, and OpenCL coupled with MPI code for distributed architectures. Motivation behind generating OpenCL code is its platform-agnostic and vendor-agnostic behavior, i.e., it is portable to all kinds of devices. Our framework makes memory management, thread management, message passing, etc., transparent to the user. None of the available domain-specific languages, frameworks or parallel libraries handle portable implementations of graph algorithms.
Experimental evaluations demonstrate that the generated code performs comparably to the state-of-the-art non-portable implementations and hand-tuned implementations. The results also show portability and scalability of our framework.2018-07-08T18:30:00ZSupervised Classification of Missense Mutations as Pathogenic or Tolerated using Ensemble Learning MethodsBalasubramanyam, Rashmihttp://etd.iisc.ernet.in/2005/38042018-07-09T14:18:32Z2018-07-08T18:30:00ZTitle: Supervised Classification of Missense Mutations as Pathogenic or Tolerated using Ensemble Learning Methods
Authors: Balasubramanyam, Rashmi
Abstract: Missense mutations account for more than 50% of the mutations known to be involved in human inherited diseases. Missense classification is a challenging task that involves sequencing of the genome, identifying the variations, and assessing their deleteriousness. This is a very laborious, time and cost intensive task to be carried out in the laboratory. Advancements in bioinformatics have led to several large-scale next-generation genome sequencing projects, and subsequently the identification of genome variations. Several studies have combined this data with information on established deleterious and neutral variants to develop machine learning based classifiers.
There are significant issues with the missense classifiers due to which missense classification is still an open area of research. These issues can be classified under two broad categories: (a) Dataset overlap issue - where the performance estimates reported by the state-of-the-art classifiers are overly optimistic as they have often been evaluated on datasets that have significant overlaps with their training datasets. Also, there is no comparative analysis of these tools using a common benchmark dataset that contains no overlap with the training datasets, therefore making it impossible to identify the best classifier among them. Also, such a common benchmark dataset is not available. (b) Inadequate capture of vital biological information of the protein and mutations - such as conservation of long-range amino acid dependencies, changes in certain physico-chemical properties of the wild-type and mutant amino acids, due to the mutation. It is also not clear how to extract and use this information. Also, some classifiers use structural information that is not available for all proteins.
In this study, we compiled a new dataset, containing around 2 - 15% overlap with the popularly used training datasets, with 18,036 mutations in 5,642 proteins. We reviewed and evaluated 15 state-of-the-art missense classifiers - SIFT, PANTHER, PROVEAN, PhD-SNP, Mutation Assessor, FATHMM, SNPs&GO, SNPs&GO3D, nsSNPAnalyzer, PolyPhen-2, SNAP, MutPred, PON-P2, CONDEL and MetaSNP, using the six metrics - accuracy, sensitivity, specificity, precision, NPV and MCC. When evaluated on our dataset, we observe huge performance drops from what has been claimed. Average drop in the performance for these 13 classifiers are around 15% in accuracy, 17% in sensitivity, 14% in specificity, 7% in NPV, 24% in precision and 30% in MCC. With this we show that the performance of these tools is not consistent on different datasets, and thus not reliable for practical use in a clinical setting.
As we observed that the performance of the existing classifiers is poor in general, we tried to develop a new classifier that is robust and performs consistently across datasets, and better than the state-of-the-art classifiers. We developed a novel method of capturing long-range amino acid dependency conservation by boosting the conservation frequencies of substrings of amino acids of various lengths around the mutation position using AdaBoost learning algorithm. This score alone performed equivalently to the sequence conservation based tools in classifying missense mutations. Popularly used sequence conservation properties was combined with this boosted long-range dependency conservation scores using AdaBoost algorithm. This reduced the class bias, and improved the overall accuracy of the classifier. We trained a third classifier by incorporating changes in 21 important physico-chemical properties, due to the mutation. In this case, we observed that the overall performance further improved and the class bias further reduced. The performance of our final classifier is comparable with the state-of-the-art classifiers. We did not find any significant improvement, but the class-specific accuracies and precisions are marginally better by around 1-2% than those of the existing classifiers. In order to understand our classifier better, we dissected our benchmark dataset into: (a) seen and unseen proteins, and (b) pure and mixed proteins, and analysed the performance in detail. Finally we concluded that our classifier performs consistently across each of these categories of seen, unseen, pure and mixed protein.2018-07-08T18:30:00ZRanking from Pairwise Comparisons : The Role of the Pairwise Preference MatrixRajkumar, Arunhttp://etd.iisc.ernet.in/2005/37872018-07-05T12:18:39Z2018-07-04T18:30:00ZTitle: Ranking from Pairwise Comparisons : The Role of the Pairwise Preference Matrix
Authors: Rajkumar, Arun
Abstract: Ranking a set of candidates or items from pair-wise comparisons is a fundamental problem that arises in many settings such as elections, recommendation systems, sports team rankings, document rankings and so on. Indeed it is well known in the psychology literature that when a large number of items are to be ranked, it is easier for humans to give pair-wise comparisons as opposed to complete rankings. The problem of ranking from pair-wise comparisons has been studied in multiple communities such as machine learning, operations research, linear algebra, statistics etc., and several algorithms (both classic and recent) have been proposed. However, it is not well under-stood under what conditions these different algorithms perform well. In this thesis, we aim to fill this fundamental gap, by elucidating precise conditions under which different algorithms perform well, as well as giving new algorithms that provably perform well under broader conditions. In particular, we consider a natural statistical model wherein for every pair of items (i; j), there is a probability Pij such that each time items i and j are compared, item j beats item i with probability Pij . Such models, which we summarize through a matrix containing all these pair-wise probabilities, have been used explicitly or implicitly in much previous work in the area; we refer to the resulting matrix as the pair-wise preference matrix, and elucidate clearly the crucial role it plays in determining the performance of various algorithms.
In the first part of the thesis, we consider a natural generative model where all pairs of items can be sampled and where the underlying preferences are assumed to be acyclic. Under this setting, we elucidate the conditions on the pair-wise preference matrix under which popular algorithms such as matrix Borda, spectral ranking, least squares and maximum likelihood under a Bradley-Terry-Luce (BTL) model produce optimal rankings that minimize the pair-wise disagreement error. Specifically, we derive explicit sample complexity bounds for each of these algorithms to output an optimal ranking under interesting subclasses of the class of all acyclic pair-wise preference matrices. We show that none of these popular algorithms is guaranteed to produce optimal rankings for all acyclic preference matrices. We then pro-pose a novel support vector machine based rank aggregation algorithm that provably does so.
In the second part of the thesis, we consider the setting where preferences may contain cycles. Here, finding a ranking that minimizes the pairwise disagreement error is in general NP-hard. However, even in the presence of cycles, one may wish to rank 'good' items ahead of the rest. We develop a framework for this setting using notions of winners based on tournament solution concepts from social choice theory. We first show that none of the existing algorithms are guaranteed to rank winners ahead of the rest for popular tournament solution based winners such as top cycle, Copeland set, Markov set etc. We propose three algorithms - matrix Copeland, unweighted Markov and parametric Markov - which provably rank winners at the top for these popular tournament solutions. In addition to ranking winners at the top, we show that the rankings output by the matrix Copeland and the parametric Markov algorithms also minimize the pair-wise disagreement error for certain classes of acyclic preference matrices.
Finally, in the third part of the thesis, we consider the setting where the number of items to be ranked is large and it is impractical to obtain comparisons among all pairs. Here, one samples a small set of pairs uniformly at random and compares each pair a fixed number of times; in particular, the goal is to come up with good algorithms that sample comparisons among only O(nlog(n)) item pairs (where n is the number of items). Unlike existing results for such settings, where one either assumes a noisy permutation model (under which there is a true underlying ranking and the outcome of every comparison differs from the true ranking with some fixed probability) or assumes a BTL or Thurstone model, we develop a general algorithmic framework based on ideas from matrix completion, termed low-rank pair-wise ranking, which provably produces an good ranking by comparing only O(nlog(n)) pairs, O(log(n)) times each, not only for popular classes of models such as BTL and Thurstone, but also for much more general classes of models wherein a suitable transform of the pair-wise probabilities leads to a low-rank matrix; this subsumes the guarantees of many previous algorithms in this setting.
Overall, our results help to understand at a fundamental level the statistical properties of various algorithms for the problem of ranking from pair-wise comparisons, and under various natural settings, lead to novel algorithms with improved statistical guarantees compared to existing algorithms for this problem.2018-07-04T18:30:00Z