Anhoe's Posts: 2009

Thursday, November 12, 2009

Thoughts on OPL patterns: Branch-and-Bound, Graphical Models, and Structured Grids

This time the blog is about three of the computational patterns in OPL: Branch-and-Bound, Graphical Models, and Structured Grids.

The Branch-and-Bound is intuitive from high-level. Since the search space of a problem is simply to grand to do an exhaustive search, we can only try a subset of it to get a "feel" of what the correct answer would be. Based on the subset, we derive the ranges that the target falls in and thus we can prune the space outside these ranges. Consequently, the solution of this pattern naturally falls into four steps: Branching, Evaluating, Bounding, and Pruning.

This pattern is grouped in the computational patterns category, however some portion of it extensively addresses parallel issues which makes it seems like a parallel algorithm strategy patterns. It might be more clear if the authors follow the examples of some other patterns in OPL and separate them into two patterns.

The Graphical Models is the one that I have trouble getting a firm grasp on. A high-level description of the pattern, such as "given the probability of occurrence of some events, one can use the model to infer probabilities of other events" seems to make sense but I found myself looking for an example to have a firm understanding. Unfortunately the Example section was omitted.

The structured grids pattern deals with data that can be represented nicely as a multi-dimensional arrays. Here the pattern description is not about how to represent a problem in structured grids rather it is about having efficient ways to decompose the grids into chucks. In this sense, it is strongly tied to the Geometric Decomposition pattern.

Wednesday, November 11, 2009

Thoughts on OPL patterns: Shared Qeue, Speculation, and Digital Circuits

This blog is about the Shared Queue, Speculation, and Digital Circuits patterns in OPL.

The Shared Queue pattern is one those patterns having a dedicated entity that is responsible for distribution. In some patterns it distributes worker threads or units of execution to tasks; in contrast this pattern distributes tasks to units of execution.

Having a central queue is the most straight forward and conceptually simple implementation of this pattern. However, such a centralized structure soon becomes the performance bottleneck. Timothy Mattson and Youngmin Yi propose using distributed shared queues as the solution for the bottleneck. However, this has its own disadvantages such as more complex logic and difficult to enforce ordering. As mentioned in the Task Queue pattern, when going with distributed queues solution one can set up local queues for local threads containing local tasks to take advantage of spacial locality.

One comment on the structure of this pattern: the solution section seems to be an amalgam of solution, forces, implementation, and example sections and it is quite long. It would be more crisp if the authors separate them out.

Speculation is one of the pattern that I have not used. On the other hand, it is a very common technique in modern processors and so the concept is definitely not foreign. I tried to think why I have never used it and I believe it has to do with the difficulties in implementing the consolidation/ recovery phase. I also noticed in this pattern the degree of speculation is limited to one (a speculative task does not trigger execution of more speculative tasks) else it gets very hard to track.

As for the Digital Circuits pattern, it seems to be about packing bits into words and perform bit-wise operation in parallel due to the fact that most of the instruction set architecture has a data granularity of a word. If so, this is how I always do it, especially since C-like languagess do support bitwise operators. In this sense, this pattern is pretty much universal.

Sunday, November 8, 2009

Thoughts on OPL patterns: Loop Parallelism, Task Queue, and Graph Partitioning

This blog is again about a set of parallel algorithm strategy patterns in OPL: Loop Parallelism, Task Queue, and Graph Partitioning.

Loop parallelism deals specifically with concurrent execution of loop recurrences. This is a common form of optimization in compilers and processors. Mark Murphy went through the common techniques in this pattern. It is interesting that the authors states it is usually easiest to develop a program as sequential and then gradually refactor in parallelism. It echos various other reading materials we have in CS527 that say doing parallelism right is hard. This implmentation approach also stresses the need to have a good unit testing suite which is I feel is one of the most important enablers of refactoring.

Task queue is also a familiar concept to me. Although in my experience the task queues that I've encountered are all using the FIFO scheduling since it is the simplest. Here Ekaterina Gonina also mentions about intelligent scheduling such as by having multiple queues (to guarantee locality) or by assigning priority (to guarantee ordering).

The last pattern is about graph partitioning. As I have commented on the Graph Algorithms pattern, coming up with a good representation of the problem is more than a half of the battle. After that, it is then to dig out and to apply the algorithms learned in the theory class. Just as the Graph Algorithm pattern, here in the pattern Mark Murphy also gives a good list of partitioning algorithms that we can use.

Saturday, November 7, 2009

Thoughts on OPL patterns: Pipeline, Geometric Decomposition, and Data Parallelism patterns

This blog is about another set of parallel algorithm strategy patterns in OPL: Pipeline, Geometric Decomposition, and Data Parallelism.

The concept of a pipeline should be vary familiar to most CS students as there are numerous examples in different abstraction levels that leverage such design. Here again the Pipeline pattern being a parallel algorithm strategy pattern is more focused on ways to perform tasks concurrently. Yunsup Lee gives a good and concise statement on the prerequisite of this pattern: it has to be possible to perform different operation on different data elements simultaneously, although there can be ordering constraints on the operation on a single data element. To have a pipeline with its different stages working on different data, the data elements will be modified simultaneously. On the other hand the pipeline order enforces the ordering constraints of a single data element.

The author states that having long pipeline increases latency. This is true but it is not the intrinsic property of pipeline. Rather, it is the result of overheads from having multiple stages to finish a problem such as inter-stage queuing due to unequal processing time between stages and communication delays. As for the actual time spent in processing the data, it should still be the same. Thus, there could be ways to tune a long pipeline to avoid most latency increases. The author could have made this more clear.

The author then describes message passing as a mean of communication between pipeline stages. With this approach as its communication method, the pattern now looks to be a specific case of the Discrete Event patttern.

With Geometric Decomposition, it is the "divide and conquer" of the data instead of the usual tasks. Here, the 64-thousand-dollar question is how to efficiently divide data into chunks. It is often hard to calculate it, rather it is usually obtained from emperical results. Tim Mattson and Mark Murphy make a good point that the programs should be implemented to have its decomposition strategy parameterized.

As for data parallelism, if the same instruction can be applied to all data chunks, it then makes it easier to parallelize.

Thursday, November 5, 2009

Thoughts on Armstrong theses ch. 6 - Building An Application

In contrast to the previous chapters that deal with implementing fault-tolerant systems, this chapter of the Armstrong's thesis has more to do with OTP-specific implementation.

The concept of "behavior" is new to me. I think it is rather unusual for a programming environment to offer such a high-level support. The idea seems cool that there are common programming patterns ready for one to leverage and to piece them together into an application. On the other hand, it is hard to judge on how customizable these stock patterns really are just by reading through the examples, but it does seems promising.

What's also cool about the code structure Armstrong presents here is that by separating out the application-specific code into a module, the bulk of the patterns are generic and reusable. This style can also be leveraged in other environments to construct a more reusable infrastructure.

Sunday, November 1, 2009

Thoughts on Armstrong thesis ch. 5 - Programming Fault-Tolerant Systems

This chapter of Armstrong's thesis presents a specific method for programming fault-tolerant systems. In my experience, it is often easier to envision how a system should work then how it may fail. Unfortunately, the later is critical for error handling and that in turn is a precursor for having a fault-tolerant system.

Armstrong's idea of having a hierarchy of tasks and supervisors is interesting. I have worked on systems that have supervisors monitoring worker processes, but the arranging of supervisors in a tree here is different and it is great as it also sets up monitoring hierarchy among the supervisors. I suspect this hierarchy eventually translates to the different task levels that the author has mentioned multiple time, although I'll still need to read more into the thesis to understand how it is actually set up.

At the end of the chapter, Armstrong describes the four rules of a Well-Behaved Function (WBF). Rule 2 is about "if the specification doesn't say what to do raise an exception" which is particularly interesting to me. In my experience, when a specification doesn't say something, it usually does not mean that something is not supposed to happen but rather there is not a required behavior. Thus, most of the time it is handled by a system-specific behavior. The author obviously has a different take on it and it seems he would like to eliminate such ambiguity. I am not sure whether this is a realistic expectation for the specification.

Thoughts on OPL patterns: Task Parallelism, Recursive Splitting, and Discrete Event

This blog is about the three parallel algorithm strategy patterns I read in OPL: Task Parallelism, Recursive Splitting, and Discrete Event. All three patterns are written from the point of view of providing a strategy on the parallel implementation of a problem. Thus, one may already choose a structural or a computational pattern to solve the problem conceptually, here we focus on the actual implementation, mapping the conceptual solution to a parallel implementation.

It is a good thing that the Task Parallelism pattern brings up the two prerequisites to make the pattern work: understanding the application concurrency characteristics and understanding the platform characteristics. Typically, many people only master one or the other. This is typical since we are so used to concentrating our mastery effort in one particular layer of abstraction. Here it shows in order to truly take advantage of parallelism, one needs to have thorough knowledge of both the algorithm and the hardware. On the flip side, this also shows the difficulty of having one universal efficient parallel algorithm implementation across platforms.

The Recursive Splitting pattern focus on problems solved by recursion. Conceptually, we have been taught that recursion as continuously splitting problems into smaller chunks. I believe most of us also conceptually think those small chunks are done in parallel as we often draw the tasks in the same level as one horizontal row of a recursion tree. Thus, this pattern should not be too far fetched for us. One good point this pattern and the previous pattern both mention is the trade-off between the costly overhead due to fine task division vs. less chance of concurrency due to course-grain tasks. This seems to be a common design decision one needs to make when making the solution of a problem concurrent.

The last pattern is Discrete Event which conceptually seems similar to the Event-Based Implicit Invocation pattern. The author raises a good point that the pattern can be used in the case when control flow of the data is not as clean-cut as in the case of Pipes and Filters. After reading it, I am still unsure the significance of the term "Discrete" in the context of this pattern, perhaps it is because I always picture the events as discrete from a source to a sink.

Wednesday, October 21, 2009

Thoughts on OPL patterns: Dense Linear Algebra, Graph Algorithms, and Monte Carlo

This time I read three of the computational patterns from OPL: Dense Linear Algebra, Graph Algorithms, and Monte Carlo. These are mainly suited for data-centric applications, in particular computing-intensive ones that need a lot of computations and thus the need to go parallel. Calling these systematic methods for computing and for aggregating results as patterns is new to me, in the past I am more familiar with structural/ architectural patterns or design patterns. With applications that are computing-intensive, during the high-level design stage the implementers would need to go back and forth between the structural patterns and the computational patterns to use for the overall architecture, since in this scenario optimizing the computation time is the key.

All three of the patterns are math-centric. The first two the name should be self-explanatory; for Monte Carlo it involves solving a problem by statistically sample its solution space under different parameter settings to derive a good estimation/ approximation. In one sense they are new to me since I have never done programming using those patterns, but at the same time they are familiar as I have done such exercises in my Math and Algorithms classes. For the types of the problems that we need to leverage these computational patterns, correctly formatting the problem to a linear algebra formula or a graph representation is more than half of the battle. In fact most of the time I spent in my Algorithm class was to try to reduce problems to a graph so I could then use well-known graph algorithms to solve them. (Incidentally, the solution section of the Graph Algorithms pattern makes a good final review sheet for CS473). So I definitely see benefits if there are also followup lower-level patterns that show how to convert some recurring types of problems to some mathematical representations.

Tuesday, October 20, 2009

Thoughts on Armstrong thesis ch. 2 - The Architectural Model

This is my first blog on the thesis written by Joe Armstrong titled "Making reliable
distributed systems in the presence of software errors" and it focuses on chapter 2. Chapter 1 of the thesis goes through some history of his projects and provides a breakdown of the rest of the thesis, thus chapter is actually the first chapter that touches the core of the thesis.

Joe Armstrong starts by defining his proposed architecture for building a fault-tolerant Telecom system. He does this by giving a set of descriptions to characterize his architecture. Out of the six descriptions that he provides, the fifth one, a way of describing things, seems to be a prelude to the actual description of an architecture. The first two, a problem domain and a philosophy, to me are the core to understand any architecture and thus I would place them above the rest. As for what else can be added to characterize an architecture better, these descriptions do not directly describe the interface between the modules but in this case it is probably OK since we are clear that all the communication is through message passing. Another description that I would expect from architecture dealing with data flows is a description of how an unit of data flows through the system, sometimes it is called "a day of life of a data packet" or something similar, which gives a lot of insight of a system.

The author then drills down to the various descriptions of the architecture. As for the problem domain, he makes it clearly that reliability and concurrency are critical. The system requirements also include the ability to support soft real-time and to be distributed. One of his way to achieve high reliability is to locally isolates faults. To be able to do this, he argues that there should not be data sharing between processes. Instead, all communication is through messages. Such discipline also prevent any of hidden dependencies between processes, many are unaware by the software maintainers. I also work on a system that use message passing as the primary mean of inter-process communication. Not only does it facilitate high reliability, it also makes in-state software upgrade easy. Very often when upgrading from one version of software to another its data structure also change; thus one can see if the structures are shared between processes, then it makes hard for an individual process to upgrade. In contrast, with message passing it is easy for a new version of software to add translation in its send or receive routines to deal with different version of the messages. Since each process knows which message versions that it supports, it also makes it possible to have a sanity checking to ensure two processes can communicate (share some common versions of a message).

The author then introduces the concept of concurrency oriented programming. He also creates Erlang, a Concurrency Oriented Language (COPL). He believes that programming in such language makes modeling the real world, which is concurrent, easy. Erlang handles process and concurrency management in the language. This is new to me as I am used to letting the OS to handle them. As for anything that is OS-agnostic, it makes an application written in Erlang more portable and to have a more consistent behavior across different OS. However, on the flop side this also means that the language and its libraries need to provide more support.

Sunday, October 18, 2009

Thoughts on OPL patterns: Event Based and Map Reduce

As a continuation of reading through patterns documented by OPL, this blog is on the Event Based pattern and the Map Reduce pattern.

For the Event Based pattern, the Yury Markovsky, Dai Bui, Hugo Andrade, and Jimmy Su frame it as a separation of intent to take a certain action from the actual implementation of the action; which I think it is both an interesting take and a concise way to describe the pattern. This pattern also makes the implementation of inter-module communication easy for the individual module implementers. They can just focus on implementing the event handlers.

As an invariant of this pattern, the author states that announcers of events do not know which listeners will be affected by those events. I think this invariant is too rigid. What about a system that the interested parties directly register to the announcer? I would think even this type of systems still qualify for Event Based pattern.

As for the Map Reduce pattern, it is used when a problem can be clearly separated into two stages: one stage for computation on subsets of its data, and the second stage to aggregate the result. As Tim Mattson points out, problems using this pattern has distinct two stages and thus one can provide a map-reduce framework shared by different applications; the domain expert for an application only needs to provide a map and a reduce function. This fits into an overall theme that we have encounters in various readings which is the separation of concerns between the concurrency experts and the application domain experts.

Thursday, October 15, 2009

Thoughts on OPL patterns: Pipes & Filters, Layered Systems, and Iterative Refinement

The reading for the first part of the next class consists of three patterns from Out Pattern Language (OPL). The first two, Pipes & Filters and Layered Systems are familiar to us as we have read about them from another source. The third pattern, Iterative Refinement, is new.

Comparing to the description from A System of Patterns, OPL's description on Pipes & Filters is definitely shorter. as a result some aspects of and details about the pattern are left out. For example, it does not mention about the correlation between the similarity of the interface of different filters and the ease of recombining them which is an important design consideration. It also does not cover liabilities of this pattern such as the difficulty of error reporting which the reader should also know.

The description seems to focus on a particular setup of the pipes & filters pattern that consists of a pipeline with each stage handled by a different resource; thus the discussion on dividing the amount of work at each filter equally and buffering capacity. Such discussion may not apply to a setup when all different processes responsible for the different filters are all running on a same processor.

The one on the Layered system seems very clear to me. It also helps greatly that I am already familiar with the pattern. I do feel that the authors put a lot of stress on portability. I guess this must be one of the main reason that people choose to use this pattern.

As for the third pattern, I find it hard to follow. I have seen some examples that avoids the dependencies between tasks to promote parallel execution; then addressing the dependencies at the end when merging the results. Since the invariants of this pattern are very general, I am not sure whether they actually fit into this pattern. Perhaps some pictures or a more concrete example can help to clarify.

Tuesday, October 13, 2009

Thoughts on Refactoring for Reentrancy

This paper is the third paper we read in CS527 regarding refactoring an aspect of Java program to enable it to run concurrently. This time it is to about making a Java program reentrant. As the other two papers, Jan Wloka, Manu Sridharan, and Frank Tip first characterize and analyze non-reentrant Java programs and propose steps to refactor them. After, they provide a semi-automatic Eclipse refactoring plugin and evaluate the tool to be both useful and efficient.

One major factor that makes a Java program non-reentrant is having global state. Thus, the refactoring involves replacing it and its static initialization with thread-local state and explicit lazy-initialization. Although doing so will affect the timing of the initialization but it is the authors' experience that such change often does not affect outcome.

While reading through it, I have a lingering doubt that such refactoring only work with the type of global state that does not correlate to the number of times the program is executed. For example, if there is a global state, counter, that tracks the number of times a program is called to perform a certain action, then this refactoring would not work as it does not increment a single copy of the counter. In this case, one will need to rely on other technique such as marking the global state synchronized. Granted that I don't use Java in my daily work but I think there are many cases with the global states that fall under this category. Since the authors do not explicitly call this out, I am still not sure.

Sunday, October 11, 2009

Thoughts on Beautiful Architecture ch. 14 - Reading the Classics

This chapter is roughly divided into three sections. First, Panagiotis Louridas generally describes the software classics and just as the literary classics they have heavy influences on the later works. The author also proposes that programming languages can also be considered as classics as they also have major influences on the later languages and the software architecture that build on top of them. The author uses the Design Patterns and the Smalltalk language to make his point. One thing about thinking in patterns that the author has pointed out is that it inspires design reusability. These days, many of us already think software as consisted of modules with clear APIs that are reusable and easily swappable.

The author then dives into an introduction of the Smalltalk language and at the same time describes typical Object-Oriented language features in general. Compare to many other OO languages, Smalltalk is the purist that everything in it is an object. Unlike the strongly-typed language, its users do not explicitly declare type information. Instead, types are defined implicitly by what they do and their interfaces thus the type-checking is done at runtime when a selector (message) is passed to an object. There, an unexpected receiver may not handle such message and thus an error is notified.

At the end, the author turns the focus to brick and mortar architecture and in an attempt to also explain Smalltalk's lack of popularity. It is a rather interesting read as the author illustartes that some of the most influential architectures are actually hardly usable. As the author quotes Louis Sullivan, "form ever follows function". While we set to in search of beautiful architecture, we should not sacrifice the function. This being at the end of the last chapter of the book serves as an awakening advice to its readers.

Friday, October 9, 2009

Thoughts on ReLooper: Refactoring for Loop Parallelism

Like the Concurrencer paper that I've posted a review on this week, this paper also presents a semi-automatic tool in Eclipse for refactoring Java program. It has a similar approach and philosophy as the Concurrencer paper. Just like the previous paper, it thinks the programmer knows that best and should make the ultimate call. It also has the same flow as the last paper which provides analysis on the underlying refactoring, presents a tool to retrofit parallelism into sequential Java programs semi-automatically, and gives empirical data to prove its accuracy and usefulness. In this paper it takes advantage of Java's ParalleArray framework and provides a tool, Relooper, that refactors loops on arrays and vectors acting on each element per iteration into parallel loops.

The tool should be a nice addition to the Eclipse refactoring toolkit. Looking at the results, it provides relatively few false positives except in one case. The authors have manually verified that there are no false negatives. Having no false negatives should be one of the most important characteristic this tool out to have to be really useful. Unfortunately it seems there is not a good way to test this thoroughly except by manual verification of the results after running Relooper on test programs.

The paper also outlines the characteristics of a loop that can be made parallel. It is a good idea to note down these so we can make sure our future programs can easily be parallel. What I gathered from this paper is that a loop should have these three characteristics:
1. A loop traverses all elements of an array.
2. No loop-carried dependencies.
3. Memory update in loop iterations are not conflicting.

Thursday, October 8, 2009

Thoughts on Beautiful Architecture ch. 13 - Software Architecture: Object-Oriented versus Functional

In this chapter, Bertrand Meyer attempts to solidify the sense of "beauty" in beautiful architectures with certain criteria. The author focuses on three qualities: reliability, extendability, and reusability. He then argues that the object-oriented programming style achieves higher marks in all the criteria. In this day of age, when the OO style is much more prevalent than functional programming, I suspect that he is mostly preaching to the choir.

One of the main advantage of functional programming is the avoidance of program state. As the author has successfully argued, having states is useful and sometimes almost necessary and thus even the modern functional languages provide some mechanism for states. On the flip side, the OO style can lessen the complexity that come with having states with programming styles such as the separation of commands and queries, which I believe is a beneficial and common style today.

A couple of major advantages I believe regarding OO programming that the authors have also mentioned are first, by thinking in terms of object, it brings a programmer's mind set to a higher abstraction level and that usually helps when designing the overall architecture. It becomes even more necessary as programs grow in complexity; second, class inheritance greatly improves code reuse. With inheritance one inherits the default behavior from its parent(s) for free. This automatically promotes code reuse. The code reduction is felt even greater with multi-layer inheritance.

Tuesday, October 6, 2009

Thoughts on Refactoring Sequential Java Code for Concurrency via Concurrent Libraries

This paper is regarding semi-automatic refactoring support for Java programs. It takes advantage of the java.util.concurrent package available in Java 5 and the FrontJoinTask framework in Java 7. The authors present a tool, Concurrencer that enables users to transform a sequential program to parallel via converting int to AtomicInteger, HashMap to ConcurrentHashMap, and Recursion to ForkJoinTask.

In the paper, one major sell point for this tool is that it does not need user annotation. However, this tool being semi-automatic means that it still requires manual inspection to identify the places in the code for transformation. Nevertheless, it is useful. As the authors have pointed out, manual refactoring is error-prone; having a tool to do the busy work should be a lot safer provided that it does the work correctly.

One factor that influences whether a programmer chooses to use a refactoring tool is trust. When a programmer uses a tool, he or she loses control. Thus, it is very important that he can comprehend the automatic refactoring being done. Out of the three types of refactoring Concurrencer provides, the first two a programmer should have no problems understand but the last one may not be obvious. It is very likely that a programmer needs to go through the changed code in detail just to make sure it is correct. Thus, although the authors advertises the LOC that the tool saves the programmer from typing, in reality it may not save time. Saying that, it does help if complete automatic unit tests are available to verify or if the programmer reads this paper beforehand to understand the ForkJoin conversion. The trust level will build up gradually over time.

Monday, October 5, 2009

Thoughts on Beautiful Architecture ch. 12 - When the Bazaar Sets Out to Build Cathedrals

In this chpater Till Adam and Mirko Boehm set out to describe the KDE project, focusing on Akonadi and ThreadWeaver. Along the way, they gave out insights on large-scale open source projects as a whole.

The authors made some keen statements on the strengths of open source projects. The mantra of "those who do the work decide" not only empowers the foot soldiers also install a sense of ownership and responsibilities in them. The ideas that people are the most scarce and valuable resource and the only criteria to include code in the software is its quality and if it is being maintained install honor in the authors. These are the same ideas being kicked around in for-profit software companies but are never fully practiced due to various non-technical reasons such as time to market, inter-team politics, leaders's desire to strongarm an decision, etc. Sometimes it is rather surprising what the scope and the quality an open source project can achieve when motivation by money and fame has failed.

Back to KDE and Akonadi in specific, it is driven by needs of the masses. It is the more Bazaar-style project of the two the authors talk about. Its background provides a good example about how an architecture based on some underlying assumptions deteriorated over time due to requirements changes. Since changes do not come instantly but rather gradually, there is always a constant struggle to either retrofit or rebuild. In the space of Akondi framework which deals with personal information management, there has been huge changes in its scope and scale as more applications converge to share and access personal data (a recurring theme in this class - data is king), the community finally come to a consensus to rebuild. To tackle this complex problem, the Layer architecture pattern is used to separate the concerns. Adding a platform API layer between applications and the storage access infrastructure so the applications using the Genome API and those using the KDE API can both share the infrastructure makes it development-platform-independent which greatly increases its usability.

In contrast, the ThreadWeaver project follows a more Cathedral-style path: it stemmed from an individual's vision. The individual has a vision to provide a conveniently available job scheduler for concurrency to make concurrent programming easy (another recurring theme - concurrent framework to shield the burden of concurrency from applications). The ThreadWeaver scheduler takes care of job scheduling and dependency enforcement at the same time allows its user enough control by specifying job dependency, priority, and queue policy. The specific example on the use of inverse ordering and priority to gives more weights on data that is already partially processed over the new data which prevents all data being processed at locked steps is a nice pattern that can be used by most of the Pipeline architecture that needs the similar behavior.

The authors made another key statement at the end: to take advantage of concurrency, it is not enough to provide a tool to move stuff to threads, the difference is scheduling. This is especially true if the move to concurrency is to achieve better performance.

Wednesday, September 30, 2009

Thoughts on Beautiful Architecture ch. 11 - GNU Emacs: Creeping Featurism Is a Strength

Being a long-time Emacs user, I am very impressed with its extendability and mutability. Almost everyone I know who uses Emacs has his or her Emacs set up in an unique way that suits one's own needs and style. The funny thing is that there are actually not too many people who are truly proficient in Lisp; however most people still can pick up some Emacs Lisp excerpts and just by reading the comments on them modify and incorporate the code pieces into their own environment. This shows the ease of extendability. Other systems like Eclipse have a steeper learning curve.

Not only is it easy to extend and customize, the good thing about Emacs is that in incorporating new changes even if some errors are introduced to one's own emacs.el file, the basic Emacs almost always still comes up. Thus one is able to go through the Emacs's Messages buffer and gather information on the errors. This gives people the assurance to try new things. In contrast, this is usually not true for other highly extendable systems.

Although I have been using Emacs for a long time, however I have never read a paper discussing its architecture. I see this chapter is able to explain it in a high level basically in one diagram and a few pages. This shows the core Emacs's simplicity; any bell and whistle is added an extension. This is how an application ought to be to satisfy a broad and diverse user pool. This model also supports piecemeal growth and it actually has an Agile-programming philosophy to it. "Necessity is the father of creation", if someone needs something, he or she will build an extension. The power of Emacs to incorporate new packages makes it easy to leverage on other people's creations.

Tuesday, September 29, 2009

Thoughts on Our Pattern Language

Our Pattern Language (OPL) is a pattern language that focuses on the design of parallel software. It is concerned with software architecture and ways to design and implement parallel algorithms. The required reading for my class is essentially an introduction paper on OPL. Here in the paper it breaks parallel systems into five layers with the first two side-by-side; it then lists out the patterns belonging to each of these five layers: architecture, computation, parallel algorithm strategy, implementation strategy, and concurrent execution.

The paper starts out by listing the four typical types of programmers. I agree with the authors that there is quite a difference between them. From my own experience I could easily discern the differences between application programmers and what I called system programmers (where the paper calls them the platform programmers). For one thing, system programmers need to take into account of hardware capabilities; they know the hardware is not perfect and it needs to work with clever software in fulfilling various requirements such as performance and scalability. On the other hand, most of the application programmers can work with the view of an ideal machine and get by with that. The paper also supports my view as well as it suggests that the application programmers are mostly domain experts who in the context of paper may not know much about parallel programming; this is fine since most of it is abstracted out from them and instead is supported by the framework or platform underneath.

If so, then I am not sure whether the top two layers are necessary for OPL. Granted that whatever implemented in the application domain needs to hook to the parallel framework below to achieve parallelism. However, taking a look at the top-layer patterns they do not seem parallel-system-specific. In fact, most of them are well known patterns that is used in all sorts of systems. In a way, this shows the success of the layers down below in supporting various architecture at top. If one then tries to document all the different architectural and computational patterns possible on a parallel system, won't it make the language top-heavy and out-of-focus? Perhaps it should just concentrate on the lower three layers where there is direct contact with parallelism.

Sunday, September 27, 2009

Thoughts on Beautiful Architecture ch. 10 - The Strength of Metacircular Virtual Machines: Jikes RVM

In this chapter Ian Rogers and Dave Grove described a metacircular runtime environment for Java - Jikes RVM. The authors made a convincing point - a program language should allow enough expression that itself can be written in its own language. Just like the C compiler writers usually are the ones that know C the best, same applies to Java VM implementers. Writing the runtime environment in Java gives the language experts a chance to test out the language and to identify any shortcomings.

One recurring architectural decision throughout the chapter has to do with cost of optimization vs. its potential performance gain such as the use of basic compiler vs. optimizing compiler that is more costly and requires more system resource. Taking a step back, even the runtime analysis itself requires resources, although the authors do point out these requirement are modest; on the other hand, I would image the cost also depends on the amount and frequency of the analysis. This gets me thinking whether one would need to take into account of the performance of the underlying hardware to set the VM behavior. Could different decisions be the best under different hardware setup, such as when the VM runs in a high-performance server vs. a low-speed PC?

Thursday, September 24, 2009

Thoughts on The Adaptive Object-Model Architectural Style

After reading the high-level description of the adaptive object-model architecture I think it is brilliant. Instead of implementing classes that correspond to the different business entities, in contrast this style is based on user defining descriptions of structures and rules within that particular business domain thus making it very flexible to changes in business rules. In the last lecture Professor Johnson was asking the class about what kinds of architecture could last a long time; with its advantages of adaptability and quick turnaround, applications done in this architectural style should last much longer than the traditional object model in our ever-changing business world.

Since this style involves another level of abstraction, I can see it is harder for one to comprehend. It also depends on each person's past experience and the programming language he or she uses. For Smalltalk programmers it should be more familiar due to their exposure to metaclasses. Here I have a suggestion on section 3, Examples of Adaptive Object-Models, it could go add an example of type creation, it is hard to grasp the brilliance of the architecture just by reading the high-level description and looking at the UML diagram.

This architecture hinges on two key patterns: TypeObject and Property. I was unfamiliar with the former and thus I digged up a bit online. The link below points to another paper written by professor Johnson which clearly explains the pattern:

http://www.comlab.ox.ac.uk/people/jeremy.gibbons/dpa/typeobject.pdf

I also saw the term "framework" being used multiple times in this paper. Typically, a framework enables easy creation of multiple programs. Since in this architectural style new types can be created dynamically based on different user rules, thus each run of the application with a different rule set will produce a different set of object types; one could argue that they are different programs. I think that is why the examples in the paper are referred to as frameworks, which means that this architectural style also gives us a template for creating a framework.

Tuesday, September 22, 2009

Thoughts on JPC: An x86 PC Emulator in Pure Java

In this chapter Rhys Newman and Christopher Dennis went through the design of JPC, an x86 PC emulator in Java and gave tips on constructing an efficient large-scale Java program. At the end, the authors gave a recipe for constructing a beautiful architecture.

The authors first presented their case for an emulator instead of a VM. They argued that VMs need to always rely on some degree of hardware support. In contrast, an emulator, especially a Java-based emulator, can run on any architecture. As the x86 architecture becomes even more dominating (perhaps except at the space of small embedded environment such as cell phones), this may not be too advantageous. I think the user's choice between one and the other will come down to performance and security. No doubt that the Java VM, being one of the earliest large-scale VM deployment, has been proven to be reliable and secure. The authors' task now is to make it as efficient as possible.

I agree with the authors that emulating different pieces of hardware require different considerations as there is a gradient of complexity and execution time between the hardware components. It is the right tradeoff to focus on code clarify and modular design for simple and less used components and on the ultimate performance for CPU and memory. After all, according to Amdahl's law, when optimizing a component in an attempt to improve the overall system performance, one can only improve at most up to the time or resource that component takes.

To increase the perforamce of JPC, it is necessary for the authors to have a good understanding of JVM. From my own experience, when working on various hardware platforms, we also usually have the same approach. Typically, during the development time one knows about the initial platform family that the code will be deployed on and thus one writes code to take advantage of the characteristics of that hardware to capture performance gain. However, as hardware progresses and the new generations come out replacing the old ones, they may have different characteristics such as the cache line size and the numbers of registers. The special tricks used to enhance the performance no longer makes sense, sometimes even become harmful. Unfortunately there is simply no time to go back and rewrite the code, sometimes it is even hard to do when the code needs to be capable to run on both hardware. This usually goes unnoticed externally just because the new hardware often gives better performance and thus shadow the software's inefficiency; however still this means that the code is no longer taking advantage of the hardware features. Reading this chapter, I was thinking having a VM or an emulator in the middle may be the solution. As a new hardware comes out, now only the VM or the emulator needs to be optimized for the new hardware, the rest of the code running on top of them can take advantage of the new hardware indirectly.

At the end, the authors gave a game plan for implementing a beautiful architecture. It is interesting that they talk about having an end-to-end prototype at an early stage and at each stage focusing the testing on the whole prototype. This runs orthogonal to the usual division of labor that I've seen where the whole system is broken into modules and each module owner is responsible for perfecting his or her module until a final integration at the end. It seems that we will need to change habits and to integrate early and to integrate often.

Sunday, September 20, 2009

Thoughts on the Big Ball of Mud Pattern

Big Ball of Mud can be thought as anti-pattern. Code in the Big Ball of Mud style lacks a form needed to be classified to a pattern. However, it is so prevalent thus the authors here classify it as the pattern that lacks clear patterns. It is almost impossible to give a precise description but we all know when we see it. One way to tell whether an architecture is a big ball of mud is to attempt to describe it either in a short document or a graph. If it is not possible or if it comes with all sorts of exceptions, then yeah, it is a big ball of mud.

Brian Foote and Joseph Yoder also listed out the various forces that cause the Big Ball of Mud architecture. Being at a big company where one does not need to worry about survival all the time, my personal experience was that most of the projects actually started out with a good design. However, some of them eventually turned to a big ball of mud. Besides the forces such as changes in requirements that the authors had already described, another force that I have noticed was Confidence, or the lack of.

Extreme Programming advocates thorough and automatic unit testing as the enabler of continuous refactoring. However, especially in a large project development, this is not so simple. It is hard to have the absolute confidence that one's own UT suite can catch everything that can go wrong with a code change, let alone the UT suite created by someone else. When proposing changes in common code that are shared by multiple projects, it is understandable that the other teams would have concerns about the new changes especially when they don't need them. Therefore there is always a strong put to leave the original code alone and instead to duplicate it for the new use. Gradually the code base becomes bloated and the architecture starts to lose its form. Once a precedence is set, more exceptions will follow. Inevitably the architecture eventually becomes a ball of mud.

Another source of architectural form degradation is during performance tuning. As the phrase says, "make it work, make it right, make it fast", performance tuning often occurs at the end of development when one starts to realize that the system performance will not fulfill the requirement (typically at the end of the development is the time when one can actually measure the performance). Unfortunately, many times the architecture suffers such as the module boundaries are violated for better performance. The end result is that the architecture the developers have so meticulously tried to maintain now is deformed; eventually it becomes a ball of mud.

Thoughts on Beautiful Architecture ch. 8 - Guardian: A Fault-Tolerant OS Environment

In this chapter Greg Lehey described the architecture of Guardian - the OS for Tandem's "NonStop" series of computer and attempted to answer the question on why such a revolutionary architecture had so little impact.

The first thing that impressed me was its total hardware redundancy at its time. I have also worked on a couple of high availability systems with hardware redundancy myself in this decade. In the systems that I've worked on, in contrast the design is to have one single set of components on one board with high speed links between boards. This way the user can decide whether to pay the additional cost for redundancy. Thus, it actually makes sense to have a total duplication of hardware to prevent the cost associated with a second design. As for Tandem's case, its hardware architecture actually incorporates redundancy on the same board which may be less flexible. It is interesting that they did not take into account of the difference between the mean time between failure (MTBF) of the different components. For low-risk components such as IPB, one might avoid having a standby to cut down on the cost; especially from reading the chapter it seems the additional cost was one of the disadvantages when competing with its competitors.

Since many devices can only be directly accessible by only a couple of CPUs, the Tandem machine relies heavily on the messages exchanges between CPUs. The Tandem engineers later cleverly extended the message-passing mechanism beyond one box to support EXPAND and FOX. I am curious about the time difference between local IO access, in-box IO access, and remote IO access and I think the difference most likely is significant. In this case, it may be necessary for the OS to allocate the resources with locality in mind. I am not sure whether this is done in Guardian.

The author reserves his explanation on Tandem machine's lack of impact at the end. Since this was way before my time, I really don't know how open that companies at the seventies on documenting their architecture. I can't help to speculate perhaps since this is a proprietary machine, disclosing the architecture is therefore discouraged? Perhaps that is why it did not make a bigger splash?

Wednesday, September 16, 2009

Thoughts on the Layers pattern

The author describes the Layers pattern as the architectural pattern that can help in a situation when a complex application can be decomposed to subtasks at different levels of abstraction. It is a fine idea however the choice of using the network stack as its example makes it confusing.

For network stack, it makes sense only when there are two entities in communication. One major theme here is to provide an illusion that each layer of the stack is directly communicating with its corresponding layer of the other entity. What's more, each layer has its own data. What the upper layer sends down to the lower layer is usually considered opaque by the lower layer. It simply encapsulates the data as the payload with its header consisting the information this layer cares about. In contrast, judging from the other parts of the chapter, such behaviors are not necessary true nor is important in deploying the Layers pattern. It seems to me that for network stack, the upper layer exchanges data through the lower layers which simply serves as a delivery service; in contrast here in the Layers pattern, the upper layer uses the lower layer to transform its data.

Thus I believe what's more essential here is to have the ability to recursively break a complex task to simpler subtasks and to enforce clearly defined APIs between a task and its subtasks for isolation and exchangeability. If we define the pattern this ways, then programs containing clear boundaries between modules, having a consistent abstraction level throughout the body of their functions, plus always maneuvering the abstraction layers sequentially can be considered as practicing this architectural pattern.

Tuesday, September 15, 2009

Thoughts on Beautiful Architecture ch. 7 - Xen and the Beauty of Virtualization

I am now at the half way of this book. Since each chapter of the book is written by different authors, one can definitely feel the different styles the various authors used in describing the architecture of their systems. Even the depth can vary considerably from chapter to chapter. For example, this chapter addresses the architecture at a higher level than the previous Facebook chapter. Just like a couple of the previous chapters are written by authors favoring agile programming methodologies, this chapter heavily promotes the open source concept.

Derek Murray and Keir Fraser first gave the background on the Xen project. It is interesting in their description of the Xenoservers that they mentioned that there is mutual distrust between the customer and the provider. It seems to me that the use of Xen only solves this problem in one direction. I can see that Xen provides the provider (assuming to be the one running the hypervisor) protection from the customer but not the other way around. Professor Sam King et al. wrote a paper on malicious virtual-machine monitors (SubVirt: Implementing malware with virtual machines: https://agora.cs.illinois.edu/download/attachments/19924902/04_king06.pdf?version=1&modificationDate=1200454246000) which shows that it is very difficult for upper-layer software to detect malicious lower-layer software. In this case, the customer is running a guest OS on top, it will be difficult for him/her to detect a malicious hypervisor installed by the producer.

The idea of paravirtualization represents the tradeoff between performance and ease of use. Since the authors has listed performance as the first goal, such tradeoff makes sense. It is great that the hardware vendors have then caught on the virtualization trend and provided the right hardware support that software can leverage on to avoid such decision.

Sunday, September 13, 2009

Thoughts on Pipes and Filters pattern

Any UNIX user is no stranger to the Pipes and Filters pattern. In fact, I consider it a signature of UNIX for its ease in combing various utilities (filters) to create a final output by piping the output of one utility as an input of another. Since this pattern enables the intermediate filters to execute in parallel, as the prevalence of multi-core systems increase, I can see it gaining even higher popularity.

The author lists direct function calling as the most simplistic but yet flexibility-limiting piping scheme. One variation I have seen to preserve flexibility, especially for passive filters, is to have a meta structure listing the ordering of the filters. When one filter produces output, it simply relies on the main routine or a controller to deliver it to the next filter based on the meta structure. To support a different pipeline configuration, one only needs to edit the meta structure.

Another variation that I've seen with uniform-data-type pipelines is instead of setting up various pipelines based on the current configuration, there is a super pipeline consisting of all filters. Then, based on the current configuration, a filter can be turned ON or OFF. When it is OFF, it simply pass along the result to the next filter.

Thoughts on Beautiful Architecture ch. 6 - Data Grows Up: The Architecture of the Facebook Platform

Dave Fetterman did a fantastic job explaining the architecture of Facebook. Even for a Faceboot-ignorant person like me, after reading through the chapter I have now a fairly good understanding of its architecture. The diagrams throughout the chapter are very useful, they really help the readers to picture the overall design.

A couple of years ago I got a chance to sit alongside the CTO of a networking gear at a banquet. One thing I asked was regarding Content vs Delivery. He firmly told me that money was definitely at Disney not Comcast. The content is what the end users care for and thus the delivery model has to be content-centric. The equivalent comparison at the web is Data vs. Website Technology; thus the model needs to be data-centric.

The genius of Web 2.0 is to have it users voluntarily contribute data that are interesting to the groups. The company no longer needs to worry about the content, instead it can focus on the delivery. On top of that, Facebook is clever to eliminate the need of each individual sites to create its own social network; instead they get to rely on the networks at Facebook and even present their own content at the site. In return, Facebook becomes the hub of these web destinations.

A large portion of the chapter focuses on FQL and FBML. It is a good architectural decision to leverage the proven technologies instead of creating new ones from scratch. I originally had some concerns on the inappropriate intimacy caused by FQL - the needs of the other websites o know the Facebook-internal table formats, however these can just be logical views and thus do not need to reflect the actual implementation. So it should turn out to be OK.

As I said, I am not yet a Facebook-user and so I do not know the actual implementation and interface. However, I do get from the chapter that security is definitely an important consideration. Even though, I still have doubts on exposing data to the third-party. For example, it seems that an user can agree to export the email addresses of his or her friends to a third-party website. Do the friends have a say on this? In addition, what prevent the third-party from caching this info and either use it or sell it to others for other purposes? What's more, how can an user ensure that the third-party will remove the data once he revokes the permission on data export?

Wednesday, September 9, 2009

Thoughts on Beautiful Architecture ch. 5 - Resource-Oriented Architectures

This chapter differs from the chapters before in the sense that the author's major focus is on describing an architectural style instead of a particular project. To a person with very limited experience in web applications, I feel the brilliance of REST is in addressing data like documents. It has an Unix resonance to it (where data is addressed like files). Just as in Unix one can do "cat /proc/1234" to retrieve information on a particular process, here one can retrieve http://server/employee/1234" to get information on an employee. Through just a few HTTP operations, the syntax supports the various operations on the data effectively. What's even better here is that the HTTP protocol supports providing information on the physical representation of the returned data and thus a client can present it correctly.

The author also envisions that a server can determine the amount of data to return based on the role or permission of a client. This gets me thinking, can a client specify the amount of data it wants? This surely would save network resource. Instead of a client receiving the whole data and then has to filter out the interesting portions, the server can just skipping them. On the other hand, to support such functionality it may cause unnecessary intimacy between the client and the server (client now needs to know the "views" of data the server can provide).

Tuesday, September 8, 2009

Thoughts on ArchJava: Connecting Software Architecture to Implementation

Although the title of the paper was about connecting architecture with implementation, the authors cleverly narrowed down the scope to a manageable subarea: communication integrity. I believe this is the right approach for the first phase of a grand topic as interface between components are certainly important and they are more tangible and thus much easier to enforce automatically than other aspects as design guidelines.

Two problems immediately come into mind, since the approach ArchJava requires annotating the source code, could it be done on the existing or the leveraged code? The authors's experience suggested that this answer is yes, although it did require quite a bit of effort; on the positively side this provided a refactoring opportunity. Second, could it be easily adapted to other languages? For starters, a separate effort is needed to produce a compiler for that each language to recognize the new annotations. I'm also not sure whether it can map well to non-OO languages such as C.

Thursday, September 3, 2009

Thoughts on Beautiful Architecture ch. 3 - Architecting for Scale

The author pointed out a major trend in the processor technology that really becomes significant in this decade - the trend of going multicore. This trend can now be felt in pretty much any software niche. Personally, being in the embedded system space, my company has resisted the trend for quite a while until we could no longer pretend the other cores do not exist since no one is planning to make a faster uni-core chip!

Another point I got from this chapter is the human factor in the architecture selection. When choosing an architecture, should the architect consider the skill sets of his or her team? In this chapter, the architect is facing such dilemma. A highly parallel system would be the right way to go however the team lacks the expertise in parallel programming. Fortunately, the architect was able to cleverly design an architecture that mostly shields the parallelism from the developers. This makes a great compromise.

This hardware trend to go multicore will eventually push for more software systems to become parallel, however many software developers are still not comfortable with it. I would image such architecture that hides parallelism from the developers would gain popularity.

Saturday, August 29, 2009

Thoughts on Beautiful Architecture ch. 2 - A Tale of Two Systems

In this chapter, the author tells the story of two comparable projects but with very different outcomes: one failed miserably requiring a rewrite while the other one successful even after many revisions. To me, the most important factor is to have everyone having the same big picture.

This is much easier said than done. From personal experience, even a relatively simple diagram can be interpreted differently by different people, even after numerous meetings and even when everyone agrees! It is very dangerous for an architect to have a false sense of group agreement.

A few words on the technical debt. My experience is that technical debt is hardly ever repaid especially in a large-scale software development. Typically in a large development the responsibilities of each team becomes very specific: with each team responsible for a module or a single aspect of the overall product with separate testing teams, regression teams, system testing teams, etc. Suddenly repaying the technical debt requires buy-in from multiple teams. One would not want the testing manager griping about rerunning the same tests and the schedule impact. It is equally difficult to justify to the upper management the additional time to work on something that's already "working".

Thoughts on Beautiful Architecture ch. 1 - What is Architecture?

This initial chapter focus on the overview of software architecture. It is interesting that the authors state that "architecture is a subset of design" especially since I have come across the question on architecture vs. design multiple time and still don't have a clear-cut answer to it. The statement certainly make sense intuitively as I often imagine software architecture as the high-level design focusing on inter-component structures with the goal to fulfill all user requirements.

The chapter also asks the question on who to make architectural decisions. This strikes me as the typical question on top-down vs. bottom-up. The book seems to suggest an architect should do it to preserve conceptual integrity. No doubt the end product of the top-down approach will be more consistent to the original blueprint. However, does it mean that it must be a better product? I think the Agile programming methodology in contrast can be seen as a more bottom-up approach and many would argue it is much more flexible in meeting ever-changing requirements. Also, taking an example in biology, all complex biological beings are a congregation of individual cells which at one point of the time are individual uni-cell organisms that then evolve to corporate and to co-exist together. There is not a master plan and the only driving force is to constantly adapt to the new requirements (to survive in constantly changing environment). Yet don't all come out pretty well?

From the short story at the end of the chapter, the authors seem to say that an architect should still code, to get his or her hand dirty. I agree with the authors and I think this is the best prevention for an architect becoming idealist and elitist.

Wednesday, August 26, 2009

Hello World

Welcome to my blog. I am a MCS student at Univ. of Illinois at Urbana-Champaign. This semseter I am taking CS527 - Adv. Topics in Software Engineering and I'll be blogging mainly about various readings and related topics in Software Engineering. Please stay tuned.