
etd AT Indian Institute of Science >
Division of Mechanical Sciences >
Aerospace Engineering (aero) >
Please use this identifier to cite or link to this item:
http://hdl.handle.net/2005/161

Title:  Distributed Computation With Communication Delays: Design And Analysis Of Load Distribution Strategies 
Authors:  Bharadwaj, V 
Advisors:  Ghose, Debasish Mani, V 
Submitted Date:  Jun1994 
Publisher:  Indian Institute of Science 
Abstract:  Load distribution problems in distributed computing networks have attracted much attention in the literature. A major objective in these studies is to distribute the processing load so as to minimize the time of processing of the entire load. In general, the processing load can be indivisible or divisible. An indivisible load has to be processed in its entirety on a single processor. On the other hand, a divisible load can be partitioned and processed on more than one processor. Divisible loads are either modularly divisible or arbitrarily divisible. Modularly divisible loads can be divided into predefined modules and cannot be further subdivided. Further, precedence relations between modules may exist. Arbitrarily divisible loads can be divided into several fractions of arbitrary lengths which usually do not have any precedence relations. Such type of loads are characterized by their large volume and the property that each data element requires an identical and independent processing. One of the important problems here is to obtain an optimal load distribution, which minimizes the processing time when the distribution is subject to communication delays in the interconnecting links. A specific application in which such loads are encountered is in edgedetection of images. Here the given image frame can be arbitrarily divided into many subframes and each of these can be independently processed. Other applications include processing of massive experimental data. The problems associated with the distribution of such arbitrarily divisible loads are usually analysed in the framework of what is known as divisible job theory.
The research work reported in this thesis is a contribution in the area of distributing arbitrarily divisible loads in distributed computing systems subject to communication delays. The main objective in this work is to design and analyseload distribution strategies to minimize the processing time of the entire load in a given network. Two types of networks are considered, namely (i) singlelevel tree (or star) network and (ii) linear network. In both the networks we assume that there is a nonzero delay associated with load transfer. Further, the processors in the network may or may not be equipped with frontends (Le., communication coprocessors). The main contributions in this thesis are summarized below.
First, a mathematical formulation of the load distribution problem in singlelevel tree and linear networks is presented. In both the networks, it is assumed that there are (m +1) processors and m communication links. In the case of singlelevel tree networks, the load to be processed is assumed to originate at the root processor, which divides the load into (m +1) fractions, keeps its own share of the load for processing, and distributes the rest to the child processors one at a time and in a fixed sequence. In all the earlier studies in the literature, it had been assumed that for a load distribution to be optimal, it should be such that all the processors must stop computing at the same time. In this thesis, it is shown that this assumption is in general not true, and holds only for a restricted class of singlelevel tree networks which satisfy a certain condition. The concept of an equivalent network is introduced to obtain a precise formulation of this condition in terms of the processor and link speed parameters. It is shown that this condition can be used to identify processorlink pairs which can be eliminated from a given network (i.e., these processors need not be given any computational load) without degrading its time performance. It is proved that the resultant reduced network (a network from which these inefficient processorlink pairs have been removed) gives the optimal time performance if and only if the load distribution is such that all the processors stop computing at the same time instant. These results are first proved for the case when the root processor is equipped with a frontend and then extended to the case when it is not. In the latter case, an additional condition, between the speed of the root processor and the speed of each of the links, to be satisfied by the network is specified. An optimal sequence for applying these conditions is also obtained. In the case of linear networks the processing load is assumed to originate at the processor situated at one end of the network. Each processor in the network keeps its own load fraction for computing and transmits the rest to its successor. Here too, in all the earlier studies in the literature, it has been assumed that for the processing time to be a minimum, the load distribution must be such that all the processors must stop computing at the same instant in time. Though this condition has been proved by others to be both necessary and sufficient, a different and more rigorous proof, similar to the case of singlelevel tree network, is presented here. Finally, the effect of inaccurate modelling on the processing time and on the above conditions are discussed through an illustrative example and it is shown that the model adopted in this thesis gives reasonably accurate results.
In the case of singlelevel tree networks, so far it has been assumed that the root processor distributes the processing load in a fixed sequence. However, since there are m child processors, a total of m! different sequences of load distribution are possible. Using the closedform derived for the processing time, it is proved here that the optimal sequence of load distribution follows the decreasing order of link speeds. Further, if physical rearrangement of processors and links is allowed, then it is shown that the optimal arrangement follows a decreasing order of link and processor speeds with the fastest processor at the root. The entire analysis is first done for the case when the root processor is equipped with a frontend, and then extended to the case when it is not. In the without frontend case, it is shown that the same optimal sequencing result holds. However, in an optimal arrangement, the root processor need not be the fastest. In this case an algorithm has been proposed for obtaining optimal arrangement. Illustrative examples are given for all the cases considered.
Next, a new strategy of load distribution is proposed by which the processing time obtained in earlier studies can be further minimized. Here the load is distributed by the root processor to a child processor in more than one installment (instead of in a single installment) such that the processing time is further minimized. First; the case in which all the processors are equipped :tn frontends is considered. Recursive equations are obtained for a heterogeneous network and these are solved for the special case of a homogeneous network (having identical processors and identical links). Using this closedform solution, the ultimate limits of performance are explored through an asymptotic analysis with respect to the number of installments and number of processors in the network. Tradeoff relationships between the number of installments and the number of processors in the network are also presented. These results are then extended to the case when the processors are not equipped with frontends. Finally, the efficiency of this new strategy of load distribution is demonstrated by comparing it with the existing singleinstallment strategy in the literature.
The multiinstallment strategy explained above is then applied to linear netAs. Here, .the processing load is assumed to originate at one extreme end of the network, First the case when all the processors are equipped with frontends is considered. Recursive equations for a heterogeneous network are obtained and these are solved for the special case of a homogeneous network. Using this closed form solution, an asymptotic analysis is performed with respect to the number of installments. However, the asymptotic results with respect to the number of processors was obtained computationally since analytical results could not be obtained. It is found that for a given network, once the number of installments is fixed, there is an optimum number of processors to be used in the network, beyond which the time performance degrades. Tradeoff relationships between the number of installments and the number of processors is obtained. These results are then extended to the case when the processors are not equipped with frontends. Comparisions with the existing singleinstallment strategy is also done.
The singleinstallment strategy discussed in the literature has the disadvantage that the frontends of the processors are not utilized efficiently in a linear network. This is due to the fact that a processor starts computing its own load fraction only after the entire load to be communicated through its frontend has been received. In this thesis, a new strategy is proposed in which a processor starts computing as soon as it receives its load fraction, simultaneously allowing its frontend to receive and transmit load to its successors. Recursive equations are developed and solved for the special case of a heterogeneous network in which the processors and links are arranged in the decreasing order of speeds. Further, it is shown that in this strategy, if the processing load originates in the interior of the network, the sequence of load distribution should be such that the load should be first distributed to the side with a lesser number of processors. An expression for the optimal load origination point in the network is derived. A comparative study of this strategy with an earlier strategy is also presented. Finally, it is shown that even though the analysis is carried out for a special case of a heterogeneous network, this load distribution strategy can also be applied to a linear network in which the processors and links are arbitrarily arranged and still obtain a significant improvement in the time performance. 
URI:  http://hdl.handle.net/2005/161 
Appears in Collections:  Aerospace Engineering (aero)

Items in etd@IISc are protected by copyright, with all rights reserved, unless otherwise indicated.
