Analysis_of_parallel_algorithms

Analysis of parallel algorithms

Add article description

In computer science, the analysis of parallel algorithms is the process of finding the computational complexity of algorithms executed in parallel – the amount of time, storage, or other resources needed to execute them. In many respects, analysis of parallel algorithms is similar to the analysis of sequential algorithms, but is generally more involved because one must reason about the behavior of multiple cooperating threads of execution. One of the primary goals of parallel analysis is to understand how a parallel algorithm's use of resources (speed, space, etc.) changes as the number of processors is changed.

Definitions

Suppose computations are executed on a machine that has p processors. Let T_p denote the time that expires between the start of the computation and its end. Analysis of the computation's running time focuses on the following notions:

The work of a computation executed by p processors is the total number of primitive operations that the processors perform.^[6] Ignoring communication overhead from synchronizing the processors, this is equal to the time used to run the computation on a single processor, denoted T₁.
The depth or span is the length of the longest series of operations that have to be performed sequentially due to data dependencies (the critical path). The depth may also be called the critical path length of the computation.^[7] Minimizing the depth/span is important in designing parallel algorithms, because the depth/span determines the shortest possible execution time.^[8] Alternatively, the span can be defined as the time T_∞ spent computing using an idealized machine with an infinite number of processors.^[9]
The cost of the computation is the quantity pT_p. This expresses the total time spent, by all processors, in both computing and waiting.^[6]

Several useful results follow from the definitions of work, span and cost:

Work law. The cost is always at least the work: pT_p ≥ T₁. This follows from the fact that p processors can perform at most p operations in parallel.^[6]^[9]
Span law. A finite number p of processors cannot outperform an infinite number, so that T_p ≥ T_∞.^[9]

Using these definitions and laws, the following measures of performance can be given:

Speedup is the gain in speed made by parallel execution compared to sequential execution: S_p = T₁ / T_p. When the speedup is Ω(p) for p processors (using big O notation), the speedup is linear, which is optimal in simple models of computation because the work law implies that T₁ / T_p ≤ p (super-linear speedup can occur in practice due to memory hierarchy effects). The situation T₁ / T_p = p is called perfect linear speedup.^[9] An algorithm that exhibits linear speedup is said to be scalable.^[6]
Efficiency is the speedup per processor, S_p / p.^[6]
Parallelism is the ratio T₁ / T_∞. It represents the maximum possible speedup on any number of processors. By the span law, the parallelism bounds the speedup: if p > T₁ / T_∞, then:^[9] ${\frac {T_{1}}{T_{p}}}\leq {\frac {T_{1}}{T_{\infty }}}<p.$
The slackness is T₁ / (pT_∞). A slackness less than one implies (by the span law) that perfect linear speedup is impossible on p processors.^[9]

Execution on a limited number of processors

Analysis of parallel algorithms is usually carried out under the assumption that an unbounded number of processors is available. This is unrealistic, but not a problem, since any computation that can run in parallel on N processors can be executed on p < N processors by letting each processor execute multiple units of work. A result called Brent's law states that one can perform such a "simulation" in time T_p, bounded by^[10]

T_{p}\leq T_{N}+{\frac {T_{1}-T_{N}}{p}},

or, less precisely,^[6]

T_{p}=O\left(T_{N}+{\frac {T_{1}}{p}}\right).

An alternative statement of the law bounds T_p above and below by

{\frac {T_{1}}{p}}\leq T_{p}\leq {\frac {T_{1}}{p}}+T_{\infty }

.

showing that the span (depth) T_∞ and the work T₁ together provide reasonable bounds on the computation time.^[2]

Share this article:

This article uses material from the Wikipedia article Analysis_of_parallel_algorithms, and is written by contributors. Text is available under a CC BY-SA 4.0 International License; additional terms may apply. Images, videos and audio are available under their respective licenses.

[shiloach-1] [1]
Shiloach, Yossi; Vishkin, Uzi (1982). "An O(n² log n) parallel max-flow algorithm". Journal of Algorithms. 3 (2): 128–146. doi:10.1016/0196-6774(82)90013-X.

[brent-2] [2]
Brent, Richard P. (1974-04-01). "The Parallel Evaluation of General Arithmetic Expressions". Journal of the ACM. 21 (2): 201–206. CiteSeerX 10.1.1.100.9361. doi:10.1145/321812.321815. ISSN 0004-5411. S2CID 16416106.

[jaja-3] [3]
JaJa, Joseph (1992). An Introduction to Parallel Algorithms. Addison-Wesley. ISBN 978-0-201-54856-3.

[kkt-4] [4]
Keller, Jorg; Kessler, Cristoph W.; Traeff, Jesper L. (2001). Practical PRAM Programming. Wiley-Interscience. ISBN 978-0-471-35351-5.

[uv-5] [5]
Vishkin, Uzi (2009). Thinking in Parallel: Some Basic Data-Parallel Algorithms and Techniques, 104 pages (PDF). Class notes of courses on parallel algorithms taught since 1992 at the University of Maryland, College Park, Tel Aviv University and the Technion.

[casanova-6] [6]
Casanova, Henri; Legrand, Arnaud; Robert, Yves (2008). Parallel Algorithms. CRC Press. p. 10. CiteSeerX 10.1.1.466.8142.

[cacm-7] [7]
Blelloch, Guy (1996). "Programming Parallel Algorithms" (PDF). Communications of the ACM. 39 (3): 85–97. CiteSeerX 10.1.1.141.5884. doi:10.1145/227234.227246. S2CID 12118850.

[spp-8] [8]
Michael McCool; James Reinders; Arch Robison (2013). Structured Parallel Programming: Patterns for Efficient Computation. Elsevier. pp. 4–5.

[clrs-9] [9]
Cormen, Thomas H.; Leiserson, Charles E.; Rivest, Ronald L.; Stein, Clifford (2009) [1990]. Introduction to Algorithms (3rd ed.). MIT Press and McGraw-Hill. pp. 779–784. ISBN 0-262-03384-4.

[10] [10]
Gustafson, John L. (2011). "Brent's Theorem". Encyclopedia of Parallel Computing. pp. 182–185. doi:10.1007/978-0-387-09766-4_80. ISBN 978-0-387-09765-7.

[1]

[2]

[3]

[4]

[5]

[6]

[7]

[8]

[9]

[10]

Analysis_of_parallel_algorithms

Analysis of parallel algorithms

Background

Definitions

Execution on a limited number of processors

References

Share this article: