Saturday, November 7, 2009

Thoughts on OPL patterns: Pipeline, Geometric Decomposition, and Data Parallelism patterns

This blog is about another set of parallel algorithm strategy patterns in OPL: Pipeline, Geometric Decomposition, and Data Parallelism.

The concept of a pipeline should be vary familiar to most CS students as there are numerous examples in different abstraction levels that leverage such design. Here again the Pipeline pattern being a parallel algorithm strategy pattern is more focused on ways to perform tasks concurrently. Yunsup Lee gives a good and concise statement on the prerequisite of this pattern: it has to be possible to perform different operation on different data elements simultaneously, although there can be ordering constraints on the operation on a single data element. To have a pipeline with its different stages working on different data, the data elements will be modified simultaneously. On the other hand the pipeline order enforces the ordering constraints of a single data element.

The author states that having long pipeline increases latency. This is true but it is not the intrinsic property of pipeline. Rather, it is the result of overheads from having multiple stages to finish a problem such as inter-stage queuing due to unequal processing time between stages and communication delays. As for the actual time spent in processing the data, it should still be the same. Thus, there could be ways to tune a long pipeline to avoid most latency increases. The author could have made this more clear.

The author then describes message passing as a mean of communication between pipeline stages. With this approach as its communication method, the pattern now looks to be a specific case of the Discrete Event patttern.

With Geometric Decomposition, it is the "divide and conquer" of the data instead of the usual tasks. Here, the 64-thousand-dollar question is how to efficiently divide data into chunks. It is often hard to calculate it, rather it is usually obtained from emperical results. Tim Mattson and Mark Murphy make a good point that the programs should be implemented to have its decomposition strategy parameterized.

As for data parallelism, if the same instruction can be applied to all data chunks, it then makes it easier to parallelize.

No comments:

Post a Comment