This time the blog is about three of the computational patterns in OPL: Branch-and-Bound, Graphical Models, and Structured Grids.
The Branch-and-Bound is intuitive from high-level. Since the search space of a problem is simply to grand to do an exhaustive search, we can only try a subset of it to get a "feel" of what the correct answer would be. Based on the subset, we derive the ranges that the target falls in and thus we can prune the space outside these ranges. Consequently, the solution of this pattern naturally falls into four steps: Branching, Evaluating, Bounding, and Pruning.
This pattern is grouped in the computational patterns category, however some portion of it extensively addresses parallel issues which makes it seems like a  parallel algorithm strategy patterns. It might be more clear if the authors follow the examples of some other patterns in OPL and separate them into two patterns.
The Graphical Models is the one that I have trouble getting a firm grasp on. A high-level description of the pattern, such as "given the  probability of occurrence of some events, one can use the model to infer probabilities of other events" seems to make sense but I found myself looking for an example to have a firm understanding. Unfortunately the Example section was omitted.
The structured grids pattern deals with data that can be represented nicely as a multi-dimensional arrays. Here the pattern description is not about how to represent a problem in structured grids rather it is about having efficient ways to decompose the grids into chucks. In this sense, it is strongly tied to the Geometric Decomposition pattern.
Thursday, November 12, 2009
Wednesday, November 11, 2009
Thoughts on OPL patterns: Shared Qeue, Speculation, and Digital Circuits
This blog is about the Shared Queue, Speculation, and Digital Circuits patterns in OPL.
The Shared Queue pattern is one those patterns having a dedicated entity that is responsible for distribution. In some patterns it distributes worker threads or units of execution to tasks; in contrast this pattern distributes tasks to units of execution.
Having a central queue is the most straight forward and conceptually simple implementation of this pattern. However, such a centralized structure soon becomes the performance bottleneck. Timothy Mattson and Youngmin Yi propose using distributed shared queues as the solution for the bottleneck. However, this has its own disadvantages such as more complex logic and difficult to enforce ordering. As mentioned in the Task Queue pattern, when going with distributed queues solution one can set up local queues for local threads containing local tasks to take advantage of spacial locality.
One comment on the structure of this pattern: the solution section seems to be an amalgam of solution, forces, implementation, and example sections and it is quite long. It would be more crisp if the authors separate them out.
Speculation is one of the pattern that I have not used. On the other hand, it is a very common technique in modern processors and so the concept is definitely not foreign. I tried to think why I have never used it and I believe it has to do with the difficulties in implementing the consolidation/ recovery phase. I also noticed in this pattern the degree of speculation is limited to one (a speculative task does not trigger execution of more speculative tasks) else it gets very hard to track.
As for the Digital Circuits pattern, it seems to be about packing bits into words and perform bit-wise operation in parallel due to the fact that most of the instruction set architecture has a data granularity of a word. If so, this is how I always do it, especially since C-like languagess do support bitwise operators. In this sense, this pattern is pretty much universal.
The Shared Queue pattern is one those patterns having a dedicated entity that is responsible for distribution. In some patterns it distributes worker threads or units of execution to tasks; in contrast this pattern distributes tasks to units of execution.
Having a central queue is the most straight forward and conceptually simple implementation of this pattern. However, such a centralized structure soon becomes the performance bottleneck. Timothy Mattson and Youngmin Yi propose using distributed shared queues as the solution for the bottleneck. However, this has its own disadvantages such as more complex logic and difficult to enforce ordering. As mentioned in the Task Queue pattern, when going with distributed queues solution one can set up local queues for local threads containing local tasks to take advantage of spacial locality.
One comment on the structure of this pattern: the solution section seems to be an amalgam of solution, forces, implementation, and example sections and it is quite long. It would be more crisp if the authors separate them out.
Speculation is one of the pattern that I have not used. On the other hand, it is a very common technique in modern processors and so the concept is definitely not foreign. I tried to think why I have never used it and I believe it has to do with the difficulties in implementing the consolidation/ recovery phase. I also noticed in this pattern the degree of speculation is limited to one (a speculative task does not trigger execution of more speculative tasks) else it gets very hard to track.
As for the Digital Circuits pattern, it seems to be about packing bits into words and perform bit-wise operation in parallel due to the fact that most of the instruction set architecture has a data granularity of a word. If so, this is how I always do it, especially since C-like languagess do support bitwise operators. In this sense, this pattern is pretty much universal.
Sunday, November 8, 2009
Thoughts on OPL patterns: Loop Parallelism, Task Queue, and Graph Partitioning
This blog is again about a set of parallel algorithm strategy patterns in OPL: Loop Parallelism, Task Queue, and Graph Partitioning.
Loop parallelism deals specifically with concurrent execution of loop recurrences. This is a common form of optimization in compilers and processors. Mark Murphy went through the common techniques in this pattern. It is interesting that the authors states it is usually easiest to develop a program as sequential and then gradually refactor in parallelism. It echos various other reading materials we have in CS527 that say doing parallelism right is hard. This implmentation approach also stresses the need to have a good unit testing suite which is I feel is one of the most important enablers of refactoring.
Task queue is also a familiar concept to me. Although in my experience the task queues that I've encountered are all using the FIFO scheduling since it is the simplest. Here Ekaterina Gonina also mentions about intelligent scheduling such as by having multiple queues (to guarantee locality) or by assigning priority (to guarantee ordering).
The last pattern is about graph partitioning. As I have commented on the Graph Algorithms pattern, coming up with a good representation of the problem is more than a half of the battle. After that, it is then to dig out and to apply the algorithms learned in the theory class. Just as the Graph Algorithm pattern, here in the pattern Mark Murphy also gives a good list of partitioning algorithms that we can use.
Loop parallelism deals specifically with concurrent execution of loop recurrences. This is a common form of optimization in compilers and processors. Mark Murphy went through the common techniques in this pattern. It is interesting that the authors states it is usually easiest to develop a program as sequential and then gradually refactor in parallelism. It echos various other reading materials we have in CS527 that say doing parallelism right is hard. This implmentation approach also stresses the need to have a good unit testing suite which is I feel is one of the most important enablers of refactoring.
Task queue is also a familiar concept to me. Although in my experience the task queues that I've encountered are all using the FIFO scheduling since it is the simplest. Here Ekaterina Gonina also mentions about intelligent scheduling such as by having multiple queues (to guarantee locality) or by assigning priority (to guarantee ordering).
The last pattern is about graph partitioning. As I have commented on the Graph Algorithms pattern, coming up with a good representation of the problem is more than a half of the battle. After that, it is then to dig out and to apply the algorithms learned in the theory class. Just as the Graph Algorithm pattern, here in the pattern Mark Murphy also gives a good list of partitioning algorithms that we can use.
Saturday, November 7, 2009
Thoughts on OPL patterns: Pipeline, Geometric Decomposition, and Data Parallelism patterns
This blog is about another set of parallel algorithm strategy patterns in OPL: Pipeline, Geometric Decomposition, and Data Parallelism.
The concept of a pipeline should be vary familiar to most CS students as there are numerous examples in different abstraction levels that leverage such design. Here again the Pipeline pattern being a parallel algorithm strategy pattern is more focused on ways to perform tasks concurrently. Yunsup Lee gives a good and concise statement on the prerequisite of this pattern: it has to be possible to perform different operation on different data elements simultaneously, although there can be ordering constraints on the operation on a single data element. To have a pipeline with its different stages working on different data, the data elements will be modified simultaneously. On the other hand the pipeline order enforces the ordering constraints of a single data element.
The author states that having long pipeline increases latency. This is true but it is not the intrinsic property of pipeline. Rather, it is the result of overheads from having multiple stages to finish a problem such as inter-stage queuing due to unequal processing time between stages and communication delays. As for the actual time spent in processing the data, it should still be the same. Thus, there could be ways to tune a long pipeline to avoid most latency increases. The author could have made this more clear.
The author then describes message passing as a mean of communication between pipeline stages. With this approach as its communication method, the pattern now looks to be a specific case of the Discrete Event patttern.
With Geometric Decomposition, it is the "divide and conquer" of the data instead of the usual tasks. Here, the 64-thousand-dollar question is how to efficiently divide data into chunks. It is often hard to calculate it, rather it is usually obtained from emperical results. Tim Mattson and Mark Murphy make a good point that the programs should be implemented to have its decomposition strategy parameterized.
As for data parallelism, if the same instruction can be applied to all data chunks, it then makes it easier to parallelize.
The concept of a pipeline should be vary familiar to most CS students as there are numerous examples in different abstraction levels that leverage such design. Here again the Pipeline pattern being a parallel algorithm strategy pattern is more focused on ways to perform tasks concurrently. Yunsup Lee gives a good and concise statement on the prerequisite of this pattern: it has to be possible to perform different operation on different data elements simultaneously, although there can be ordering constraints on the operation on a single data element. To have a pipeline with its different stages working on different data, the data elements will be modified simultaneously. On the other hand the pipeline order enforces the ordering constraints of a single data element.
The author states that having long pipeline increases latency. This is true but it is not the intrinsic property of pipeline. Rather, it is the result of overheads from having multiple stages to finish a problem such as inter-stage queuing due to unequal processing time between stages and communication delays. As for the actual time spent in processing the data, it should still be the same. Thus, there could be ways to tune a long pipeline to avoid most latency increases. The author could have made this more clear.
The author then describes message passing as a mean of communication between pipeline stages. With this approach as its communication method, the pattern now looks to be a specific case of the Discrete Event patttern.
With Geometric Decomposition, it is the "divide and conquer" of the data instead of the usual tasks. Here, the 64-thousand-dollar question is how to efficiently divide data into chunks. It is often hard to calculate it, rather it is usually obtained from emperical results. Tim Mattson and Mark Murphy make a good point that the programs should be implemented to have its decomposition strategy parameterized.
As for data parallelism, if the same instruction can be applied to all data chunks, it then makes it easier to parallelize.
Thursday, November 5, 2009
Thoughts on Armstrong theses ch. 6 - Building An Application
In contrast to the previous chapters that deal with implementing fault-tolerant systems, this chapter of the Armstrong's thesis has more to do with OTP-specific implementation.
The concept of "behavior" is new to me. I think it is rather unusual for a programming environment to offer such a high-level support. The idea seems cool that there are common programming patterns ready for one to leverage and to piece them together into an application. On the other hand, it is hard to judge on how customizable these stock patterns really are just by reading through the examples, but it does seems promising.
What's also cool about the code structure Armstrong presents here is that by separating out the application-specific code into a module, the bulk of the patterns are generic and reusable. This style can also be leveraged in other environments to construct a more reusable infrastructure.
The concept of "behavior" is new to me. I think it is rather unusual for a programming environment to offer such a high-level support. The idea seems cool that there are common programming patterns ready for one to leverage and to piece them together into an application. On the other hand, it is hard to judge on how customizable these stock patterns really are just by reading through the examples, but it does seems promising.
What's also cool about the code structure Armstrong presents here is that by separating out the application-specific code into a module, the bulk of the patterns are generic and reusable. This style can also be leveraged in other environments to construct a more reusable infrastructure.
Sunday, November 1, 2009
Thoughts on Armstrong thesis ch. 5 - Programming Fault-Tolerant Systems
This chapter of Armstrong's thesis presents a specific method for programming fault-tolerant systems. In my experience, it is often easier to envision how a system should work then how it may fail. Unfortunately, the later is critical for error handling and that in turn is a precursor for having a fault-tolerant system.
Armstrong's idea of having a hierarchy of tasks and supervisors is interesting. I have worked on systems that have supervisors monitoring worker processes, but the arranging of supervisors in a tree here is different and it is great as it also sets up monitoring hierarchy among the supervisors. I suspect this hierarchy eventually translates to the different task levels that the author has mentioned multiple time, although I'll still need to read more into the thesis to understand how it is actually set up.
At the end of the chapter, Armstrong describes the four rules of a Well-Behaved Function (WBF). Rule 2 is about "if the specification doesn't say what to do raise an exception" which is particularly interesting to me. In my experience, when a specification doesn't say something, it usually does not mean that something is not supposed to happen but rather there is not a required behavior. Thus, most of the time it is handled by a system-specific behavior. The author obviously has a different take on it and it seems he would like to eliminate such ambiguity. I am not sure whether this is a realistic expectation for the specification.
Armstrong's idea of having a hierarchy of tasks and supervisors is interesting. I have worked on systems that have supervisors monitoring worker processes, but the arranging of supervisors in a tree here is different and it is great as it also sets up monitoring hierarchy among the supervisors. I suspect this hierarchy eventually translates to the different task levels that the author has mentioned multiple time, although I'll still need to read more into the thesis to understand how it is actually set up.
At the end of the chapter, Armstrong describes the four rules of a Well-Behaved Function (WBF). Rule 2 is about "if the specification doesn't say what to do raise an exception" which is particularly interesting to me. In my experience, when a specification doesn't say something, it usually does not mean that something is not supposed to happen but rather there is not a required behavior. Thus, most of the time it is handled by a system-specific behavior. The author obviously has a different take on it and it seems he would like to eliminate such ambiguity. I am not sure whether this is a realistic expectation for the specification.
Thoughts on OPL patterns: Task Parallelism, Recursive Splitting, and Discrete Event
This blog is about the three parallel algorithm strategy patterns I read in OPL: Task Parallelism, Recursive Splitting, and Discrete Event. All three patterns are written from the point of view of providing a strategy on the parallel implementation of a problem. Thus, one may already choose a structural or a computational pattern to solve the problem conceptually, here we focus on the actual implementation, mapping the conceptual solution to a parallel implementation.
It is a good thing that the Task Parallelism pattern brings up the two prerequisites to make the pattern work: understanding the application concurrency characteristics and understanding the platform characteristics. Typically, many people only master one or the other. This is typical since we are so used to concentrating our mastery effort in one particular layer of abstraction. Here it shows in order to truly take advantage of parallelism, one needs to have thorough knowledge of both the algorithm and the hardware. On the flip side, this also shows the difficulty of having one universal efficient parallel algorithm implementation across platforms.
The Recursive Splitting pattern focus on problems solved by recursion. Conceptually, we have been taught that recursion as continuously splitting problems into smaller chunks. I believe most of us also conceptually think those small chunks are done in parallel as we often draw the tasks in the same level as one horizontal row of a recursion tree. Thus, this pattern should not be too far fetched for us. One good point this pattern and the previous pattern both mention is the trade-off between the costly overhead due to fine task division vs. less chance of concurrency due to course-grain tasks. This seems to be a common design decision one needs to make when making the solution of a problem concurrent.
The last pattern is Discrete Event which conceptually seems similar to the Event-Based Implicit Invocation pattern. The author raises a good point that the pattern can be used in the case when control flow of the data is not as clean-cut as in the case of Pipes and Filters. After reading it, I am still unsure the significance of the term "Discrete" in the context of this pattern, perhaps it is because I always picture the events as discrete from a source to a sink.
It is a good thing that the Task Parallelism pattern brings up the two prerequisites to make the pattern work: understanding the application concurrency characteristics and understanding the platform characteristics. Typically, many people only master one or the other. This is typical since we are so used to concentrating our mastery effort in one particular layer of abstraction. Here it shows in order to truly take advantage of parallelism, one needs to have thorough knowledge of both the algorithm and the hardware. On the flip side, this also shows the difficulty of having one universal efficient parallel algorithm implementation across platforms.
The Recursive Splitting pattern focus on problems solved by recursion. Conceptually, we have been taught that recursion as continuously splitting problems into smaller chunks. I believe most of us also conceptually think those small chunks are done in parallel as we often draw the tasks in the same level as one horizontal row of a recursion tree. Thus, this pattern should not be too far fetched for us. One good point this pattern and the previous pattern both mention is the trade-off between the costly overhead due to fine task division vs. less chance of concurrency due to course-grain tasks. This seems to be a common design decision one needs to make when making the solution of a problem concurrent.
The last pattern is Discrete Event which conceptually seems similar to the Event-Based Implicit Invocation pattern. The author raises a good point that the pattern can be used in the case when control flow of the data is not as clean-cut as in the case of Pipes and Filters. After reading it, I am still unsure the significance of the term "Discrete" in the context of this pattern, perhaps it is because I always picture the events as discrete from a source to a sink.
Subscribe to:
Comments (Atom)
