Anhoe's Posts: Thoughts on Armstrong thesis ch. 2

This is my first blog on the thesis written by Joe Armstrong titled "Making reliable
distributed systems in the presence of software errors" and it focuses on chapter 2. Chapter 1 of the thesis goes through some history of his projects and provides a breakdown of the rest of the thesis, thus chapter is actually the first chapter that touches the core of the thesis.

Joe Armstrong starts by defining his proposed architecture for building a fault-tolerant Telecom system. He does this by giving a set of descriptions to characterize his architecture. Out of the six descriptions that he provides, the fifth one, a way of describing things, seems to be a prelude to the actual description of an architecture. The first two, a problem domain and a philosophy, to me are the core to understand any architecture and thus I would place them above the rest. As for what else can be added to characterize an architecture better, these descriptions do not directly describe the interface between the modules but in this case it is probably OK since we are clear that all the communication is through message passing. Another description that I would expect from architecture dealing with data flows is a description of how an unit of data flows through the system, sometimes it is called "a day of life of a data packet" or something similar, which gives a lot of insight of a system.

The author then drills down to the various descriptions of the architecture. As for the problem domain, he makes it clearly that reliability and concurrency are critical. The system requirements also include the ability to support soft real-time and to be distributed. One of his way to achieve high reliability is to locally isolates faults. To be able to do this, he argues that there should not be data sharing between processes. Instead, all communication is through messages. Such discipline also prevent any of hidden dependencies between processes, many are unaware by the software maintainers. I also work on a system that use message passing as the primary mean of inter-process communication. Not only does it facilitate high reliability, it also makes in-state software upgrade easy. Very often when upgrading from one version of software to another its data structure also change; thus one can see if the structures are shared between processes, then it makes hard for an individual process to upgrade. In contrast, with message passing it is easy for a new version of software to add translation in its send or receive routines to deal with different version of the messages. Since each process knows which message versions that it supports, it also makes it possible to have a sanity checking to ensure two processes can communicate (share some common versions of a message).

The author then introduces the concept of concurrency oriented programming. He also creates Erlang, a Concurrency Oriented Language (COPL). He believes that programming in such language makes modeling the real world, which is concurrent, easy. Erlang handles process and concurrency management in the language. This is new to me as I am used to letting the OS to handle them. As for anything that is OS-agnostic, it makes an application written in Erlang more portable and to have a more consistent behavior across different OS. However, on the flop side this also means that the language and its libraries need to provide more support.

Anhoe's Posts

Tuesday, October 20, 2009

Thoughts on Armstrong thesis ch. 2 - The Architectural Model

No comments:

Post a Comment

Followers

Blog Archive

About Me