etd@IISc Collection:
http://hdl.handle.net/2005/25
Thu, 25 Jan 2018 17:37:12 GMT2018-01-25T17:37:12ZDevelopment of Sparse Recovery Based Optimized Diffuse Optical and Photoacoustic Image Reconstruction Methods
http://hdl.handle.net/2005/3007
Title: Development of Sparse Recovery Based Optimized Diffuse Optical and Photoacoustic Image Reconstruction Methods
Authors: Shaw, Calvin B
Abstract: Diﬀuse optical tomography uses near infrared (NIR) light as the probing media to re-cover the distributions of tissue optical properties with an ability to provide functional information of the tissue under investigation. As NIR light propagation in the tissue is dominated by scattering, the image reconstruction problem (inverse problem) is non-linear and ill-posed, requiring usage of advanced computational methods to compensate this.
Diffuse optical image reconstruction problem is always rank-deficient, where finding the independent measurements among the available measurements becomes challenging problem. Knowing these independent measurements will help in designing better data acquisition set-ups and lowering the costs associated with it. An optimal measurement selection strategy based on incoherence among rows (corresponding to measurements) of the sensitivity (or weight) matrix for the near infrared diﬀuse optical tomography is proposed. As incoherence among the measurements can be seen as providing maximum independent information into the estimation of optical properties, this provides high level of optimization required for knowing the independency of a particular measurement on its counterparts. The utility of the proposed scheme is demonstrated using simulated and experimental gelatin phantom data set comparing it with the state-of-the-art methods.
The traditional image reconstruction methods employ ℓ2-norm in the regularization functional, resulting in smooth solutions, where the sharp image features are absent. The sparse recovery methods utilize the ℓp-norm with p being between 0 and 1 (0 ≤ p1), along with an approximation to utilize the ℓ0-norm, have been deployed for the reconstruction of diﬀuse optical images. These methods are shown to have better utility in terms of being more quantitative in reconstructing realistic diﬀuse optical images compared to traditional methods.
Utilization of ℓp-norm based regularization makes the objective (cost) function non-convex and the algorithms that implement ℓp-norm minimization utilizes approximations to the original ℓp-norm function. Three methods for implementing the ℓp-norm were con-sidered, namely Iteratively Reweigthed ℓ1-minimization (IRL1), Iteratively Reweigthed Least-Squares (IRLS), and Iteratively Thresholding Method (ITM). These results in-dicated that IRL1 implementation of ℓp-minimization provides optimal performance in terms of shape recovery and quantitative accuracy of the reconstructed diﬀuse optical tomographic images.
Photoacoustic tomography (PAT) is an emerging hybrid imaging modality combining optics with ultrasound imaging. PAT provides structural and functional imaging in diverse application areas, such as breast cancer and brain imaging. A model-based iterative reconstruction schemes are the most-popular for recovering the initial pressure in limited data case, wherein a large linear system of equations needs to be solved. Often, these iterative methods requires regularization parameter estimation, which tends to be a computationally expensive procedure, making the image reconstruction process to be performed oﬀ-line. To overcome this limitation, a computationally eﬃcient approach that computes the optimal regularization parameter is developed for PAT. This approach is based on the least squares-QR (LSQR) decomposition, a well-known dimensionality reduction technique for a large system of equations. It is shown that the proposed framework is eﬀective in terms of quantitative and qualitative reconstructions of initial pressure distribution.Wed, 10 Jan 2018 18:30:00 GMThttp://hdl.handle.net/2005/30072018-01-10T18:30:00ZVariance of Difference as Distance Like Measure in Time Series Microarray Data Clustering
http://hdl.handle.net/2005/2986
Title: Variance of Difference as Distance Like Measure in Time Series Microarray Data Clustering
Authors: Mukhopadhyay, Sayan
Abstract: Our intention is to find similarity among the time series expressions of the genes in microarray experiments. It is hypothesized that at a given time point the concentration of one gene’s mRNA is directly affected by the concentration of other gene’s mRNA, and may have biological significance. We define dissimilarity between two time-series data set as the variance of Euclidean distances of each time points. The large numbers of gene expressions make the calculation of variance of distance in each point computationally expensive and therefore computationally challenging in terms of execution time. For this reason we use autoregressive model which estimates nineteen points gene expression to a three point vector. It allows us to find variance of difference between two data sets without point-to-point matching. Previous analysis from the microarray experiments data found that 62 genes are regulated following EGF (Epidermal Growth Factor) and HRG (Heregulin) treatment of the MCF-7 breast cancer cells. We have chosen these suspected cancer-related genes as our reference and investigated which additional set of genes has similar time point expression profiles. Keeping variance of difference as a measure of distance, we have used several methods for clustering the gene expression data, such as our own maximum clique finding heuristics and hierarchical clustering. The results obtained were validated through a text mining study. New predictions from our study could be a basis for further investigations in the genesis of breast cancer. Overall in 84 new genes are found in which 57 genes are related to cancer among them 35 genes are associated with breast cancer.Mon, 08 Jan 2018 18:30:00 GMThttp://hdl.handle.net/2005/29862018-01-08T18:30:00ZMetascheduling of HPC Jobs in Day-Ahead Electricity Markets
http://hdl.handle.net/2005/2954
Title: Metascheduling of HPC Jobs in Day-Ahead Electricity Markets
Authors: Murali, Prakash
Abstract: High performance grid computing is a key enabler of large scale collaborative computational science. With the promise of exascale computing, high performance grid systems are expected to incur electricity bills that grow super-linearly over time. In order to achieve cost effectiveness in these systems, it is essential for the scheduling algorithms to exploit electricity price variations, both in space and time, that are prevalent in the dynamic electricity price markets. Typically, a job submission in the batch queues used in these systems incurs a variable queue waiting time before the resources necessary for its execution become available. In variably-priced electricity markets, the electricity prices ﬂuctuate over discrete intervals of time. Hence, the electricity prices incurred during a job execution will depend on the start and end time of the job.
Our thesis consists of two parts. In the first part, we develop a method to predict the start and end time of a job at each system in the grid. In batch queue systems, similar jobs which arrive during similar system queue and processor states, experience similar queue waiting times. We have developed an adaptive algorithm for the prediction of queue waiting times on a parallel system based on spatial clustering of the history of job submissions at the system. We represent each job as a point in a feature space using the job characteristics, queue state and the state of the compute nodes at the time of job submission. For each incoming job, we use an adaptive distance function, which assigns a real valued distance to each history job submission based on its similarity to the incoming job. Using a spatial clustering algorithm and a simple empirical characterization of the system states, we identify an appropriate prediction model for the job from among standard deviation minimization method, ridge regression and k-weighted average. We have evaluated our adaptive prediction framework using historical production workload traces of many supercomputer systems with varying system and job characteristics, including two Top500 systems. Across workloads, our predictions result in up to 22% reduction in the average absolute error and up to 56% reduction in the percentage prediction errors over existing techniques. To predict the execution time of a job, we use a simple model based on the estimate of job runtime provided by the user at the time of job submission.
In the second part of the thesis, we have developed a metascheduling algorithm that schedules jobs to the individual batch systems of a grid, to reduce both the electricity prices for the systems and response times for the users. We formulate the metascheduling problem as a Minimum Cost Maximum Flow problem and leverage execution period and electricity price predictions to accurately estimate the cost of job execution at a system. The network simplex algorithm is used to minimize the response time and electricity cost of job execution using an appropriate ﬂow network. Using trace based simulation with real and synthetic workload traces, and real electricity price data sets, we demonstrate our approach on two currently operational grids, XSEDE and NorduGrid. Our experimental setup collectively constitute more than 433K processors spread across 58 compute systems in 17 geographically distributed locations. Experiments show that our approach simultaneously optimizes the total electricity cost and the average response time of the grid, without being unfair to users of the local batch systems. Considering that currently operational HPC systems budget millions of dollars for annual operational costs, our approach which can save $167K in annual electricity bills, compared to a baseline strategy, for one of the grids in our test suite with over 76000 cores, is very relevant for reducing grid operational costs in the coming years.Tue, 02 Jan 2018 18:30:00 GMThttp://hdl.handle.net/2005/29542018-01-02T18:30:00ZRanking And Classification of Chemical Structures for Drug Discovery : Development of Fragment Descriptors And Interpolation Scheme
http://hdl.handle.net/2005/2850
Title: Ranking And Classification of Chemical Structures for Drug Discovery : Development of Fragment Descriptors And Interpolation Scheme
Authors: Kandel, Durga Datta
Abstract: Deciphering the activity of chemical molecules against a pathogenic organism is an essential task in drug discovery process. Virtual screening, in which few plausible molecules are selected from a large set for further processing using computational methods, has become an integral part and complements the expensive and time-consuming in vivo and in vitro experiments. To this end, it is essential to extract certain features from molecules which in the one hand are relevant to the biological activity under consideration, and on the other are suitable for designing fast and robust algorithms. The features/representations are derived either from physicochemical properties or their structures in numerical form and are known as descriptors.
In this work we develop two new molecular-fragment descriptors based on the critical analysis of existing descriptors. This development is primarily guided by the notion of coding degeneracy, and the ordering induced by the descriptor on the fragments. One of these descriptors is derived based on the simple graph representation of the molecule, and attempts to encode topological feature or the connectivity pattern in a hierarchical way without discriminating atom or bond types. Second descriptor extends the first one by weighing the atoms (vertices) in consideration with the bonding pattern, valence state and type of the atom.
Further, the usefulness of these indices is tested by ranking and classifying molecules in two previously studied large heterogeneous data sets with regard to their anti-tubercular and other bacterial activity. This is achieved by developing a scoring function based on clustering using these new descriptors. Clusters are obtained by ordering the descriptors of training set molecules, and identifying the regions which are (almost) exclusively coming from active/inactive molecules. To test the activity of a new molecule, overlap of its descriptors in those cluster (interpolation) is weighted. Our results are found to be superior compared to previous studies: we obtained better classification performance by using only structural information while previous studies used both structural features and some physicochemical parameters. This makes our model simple, more interpretable and less vulnerable to statistical problems like chance correlation and over fitting. With focus on predictive modeling, we have carried out rigorous statistical validation.
New descriptors utilize primarily the topological information in a hierarchical way. This can have significant implications in the design of new bioactive molecules (inverse QSAR, combinatorial library design) which is plagued by combinatorial explosion due to use of large number of descriptors. While the combinatorial generation of molecules with desirable properties is still a problem to be satisfactorily solved, our model has potential to reduce the number of degrees of freedom, thereby reducing the complexity.Thu, 30 Nov 2017 18:30:00 GMThttp://hdl.handle.net/2005/28502017-11-30T18:30:00Z