## Dining philosophers in C++11: Chandy-Misra algorithm

In my previous post, Dining Philosophers in C++11, I have provided an implementation for the dining philosophers problem using modern C++ features, such as threads and mutexes. However, it was noted in the comments that the implementation did not prevent the philosophers starving to death when you remove the waiting times.

An algorithm that prevents the philosophers from starving was proposed by Mani Chandy and J. Misra and is known as the Chandy/Misra solution. This is a bit different than the original problem because it requires the philosophers to communicate with each other. The algorithm, as described on Wikipedia, is the following:

1. For every pair of philosophers contending for a resource, create a fork and give it to the philosopher with the lower ID (n for agent Pn). Each fork can either be dirty or clean. Initially, all forks are dirty.
2. When a philosopher wants to use a set of resources (i.e. eat), said philosopher must obtain the forks from their contending neighbors. For all such forks the philosopher does not have, they send a request message.
3. When a philosopher with a fork receives a request message, they keep the fork if it is clean, but give it up when it is dirty. If the philosopher sends the fork over, they clean the fork before doing so.
4. After a philosopher is done eating, all their forks become dirty. If another philosopher had previously requested one of the forks, the philosopher that has just finished eating cleans the fork and sends it.

In order to implement this, we must make several changes to the solution proposed in the previous post:

• forks and philosophers must have identifiers
• there is an initial setup of both forks and philosophers
• use std::condition_variable to communicate between threads
• increase the number of philosophers

Because it has been also argued that string_view is only available in C++17 and this implementation is supposed to work in C++11, I have replaced that with std::string const&.

In this implementation, philosophers, i.e. threads, need to communicate with each other to request the forks, i.e. resources. For this, we will use a std::condition_variable, which is a synchronization primitive that enables the blocking of one or more threads until another thread notifies it. A std::condition_variable requires a std::mutex to protect access to a shared variable. The following class, sync_channel, contains both a condition variable and a mutex and provides two methods: one that waits on the condition variable, blocking the calling thread(s), and one that notifies the condition variable, unblocking all the threads that are waiting for a signal.

The table class from the previous implementation is modified: the forks are no longer defined here, but a sync_channel is used to prevent philosophers start dining until the table setup is completed. Its name has been changed to table_setup.

The fork class is no longer a wrapper for a mutex. It has an identifier, an owner, a flag to indicate whether it is dirty or clean, a mutex, and a sync_channel that enables owners to request used forks. It has two methods:

• request() that enables a philosopher to request the fork. If the fork is dirty, it is set to clean, and the ownership is given to the philosopher that asked for it. If the fork is clean (i.e. the current owner is eating), than the philosopher that asked for it will block, waiting for it to become dirty (i.e. the current owner has finished eating).

• done_using() a philosopher indicates that has finished eating and notifies other philosopher that is waiting for the fork that it can have it.

There are less changes to the philosopher class: it has an identifier, and there are no more waiting times to simulate eating and thinking. There are some small changes to the following methods:

• dine(): each philosopher only starts eating after the entire table has been setup. A condition variable, from the table_setup object is used for this.

• eat(): each philosopher first requests the left and right fork. When they are available, they are locked using std::lock() to avoid possible deadlocks, and then their ownership is transfered to a std::lock_guard object, so they are properly released when done. After eating, the fork is set as dirty and other philosophers waiting for it are notified of this.

According to the initial setup, each fork is given to the philosopher with the lower ID. That means fokm 1, placed between philosopher 1 and N, goes to philosopher 1. Fork 2, placed between philosophers 2 and 3 is given to philosopher 2. Eventually, fork N, placed between philosophers N and 1, is given to philosopher 1. Overall, this means all philosophers have initially 1 fork, except for the first one that has two, and the last philosopher, that has none.

Put all together, the code looks like this:

The output of the program looks like this:

Hits for this post: 9448 .

## Dining Philosophers in C++11

UPDATE: for an implementation of the Chandy/Misra solution see Dining philosophers in C++11: Chandy-Misra algorithm

The problem of the dining philosophers, first proposed by Edsger Dijkstra and reformulated by Tony Hoare, is a famous problem for concurrent programming that illustrates problems with synchronizing access to data. The description of the problem, taken from Wikipedia, is the following:

Five silent philosophers sit at a round table with bowls of spaghetti. Forks are placed between each pair of adjacent philosophers.

Each philosopher must alternately think and eat. However, a philosopher can only eat spaghetti when they have both left and right forks. Each fork can be held by only one philosopher and so a philosopher can use the fork only if it is not being used by another philosopher. After an individual philosopher finishes eating, they need to put down both forks so that the forks become available to others. A philosopher can take the fork on their right or the one on their left as they become available, but cannot start eating before getting both forks.

Eating is not limited by the remaining amounts of spaghetti or stomach space; an infinite supply and an infinite demand are assumed.

The idea is to find a solution so that none of the philosophers would starve, i.e. never have the chance to acquire the forks necessary for him to eat.

Below I propose a simple implementation to this problem using C++11 language and library features. The following classes are defined:

• fork represents a fork at the table; the only member of this structure is a std::mutex that will be locked when the philosopher picks up the fork and unlocked when he puts it down.

• table represents the round table where the philosophers are dining. It has an array of forks, but also an atomic boolean that indicates that the table is ready for the philosophers to start thinking and eating.

• philosopher represents a philosopher dining at the table. It has a name and a reference to the forks on his left and right.

Most of the implementation of the solution is part of the philosopher class. When an object of this class is instantiated, a thread is started. This thread is joined when the object is destroyed. The thread runs a loop of thinking and eating until the dinner is signaled to end by setting the ready member of the table to false. There are three main methods in the philosopher class:

• dine() is the thread function; this is implemented as a simple loop of thinking and eating.

• think() is the method that represents the thinking period. To model this the thread sleeps for a random period of time.

• eat() is the method that models the eating. The left and right forks are acquired in a deadlock free manner using std::lock. After the forks, i.e. mutexes, are acquired, their ownership is transfered to a std::lock_guard object, so that the mutexes are correctly released when the function returns. Eating is simulated with a sleep.

To see how this works, we create a table object and an array of phylosophers. Upon creating the philosopher objects their own working thread is started, but nothing happens until the table is signaled as being ready. Philosophers then compete for the forks (i.e. mutexes), eat and think until the dinner is signaled as finished by setting the ready flag of the table object back to false.

The whole implementation is show below:

The output for this program (that varies with each execution) has the following form:

Though the problem is usually described in terms of five philosophers, any number of philoposhers can be present at the table (of course, at least two are necessary for the problem to make sense). Adding more philosophers does not require any changes in the implementation.

Hits for this post: 8737 .

## Beware of parallelization

A recent question on stackoverflow raised the problem of a fast algorithm to count the unique colors in an image (Faster algorithm to check the colors in a image). My answer what that this kind of problems are suited for parallelization. However, parallelization may only help when used judiciously.

To demonstrate the point I have written a pretty simple implementation in VC++ (with C++11), that uses a std::set to store the unique colors. The size of the set is the number of the colors. The implementation simply iterates over the pixels of the image and adds them to the set.

There are probably better solutions than the one presented in this article. The code shown here is merely for the purpose of showing how parallelization can help, though not everything that runs in parallel is faster that a sequential code.

There are some things to note:

• I used CImage class to load an image from disk
• images can be 1,4,8,16,24 or 32 bit per pixel, but in the code presented here, for simplicity, I assume the image loaded from disk is 24 bits per pixel
• Windows device-independent bitmaps can be stored in memory in two ways: top-down (as you would naturally expect) or bottom-up. The processing of the image varies based on the storage. You can find details here: Top-Down vs. Bottom-Up DIBs.
• CImage::GetPitch method returns the pitch of the bitmap, which is the distance in bytes between the two memory addresses representing the beginning of two consecutive lines of the bitmap. This is important because the bitmap can be stored (align) with padding bytes at the end of a line, and the pitch and the width are used to determine the number of padding bytes. If the pitch is positive, then the bitmap is stored top-down. If the pitch is negative then the image is stored bottom-up.
• the pointer returned by CImage::GetBits points to the beginning of the buffer if the image is top-down, and to the end of the buffer if the image is bottom-up.
• since the image data is stored in contiguous array, the position of each pixel is given by the following algorithm:

Since the pitch is positive when the bitmap is stored top-down and negative when then image is stored bottom-up, the above algorithm can be simplified to:

My simple implementation of the sequential algorithm looks like this:

Loading the image from disk, and timing the execution looks like this:

The result on my machine (Intel Core i7 2.67GHz, 6 GB RAM) on four random bitmaps with sizes 200×150, 1680×1050, 3360×1065 and 3360×2100 look like this (obviously with small variations):

The simplest parallelization you can think of is using parallel_for from the Parallel Patterns Library. Especially, because the conversion from a sequential implementation to a parallel one is pretty trivial.

When you run this code you get some pretty surprising results.

It’s not that surprising after all, if you look at how the insertion is performed. The access to the std::set is guarded with std::mutex, allowing only one thread to insert a new element into the set. Because of this synchronization it takes much more, in the order of 50-100 times, than the sequential implementation. Of course, the bigger the image is, the smaller this performance lost.

Obviously, parallelization through the means of parallel_for is not a solution in this case. Using threads directly however, can help, if we can get rid of syncing. This is possible by using a separate std::set for each thread, and merging the results at the end.

A few considerations:

• the std::thread constructor does not take (yet) any number of parameters, so I had to pack the arguments in a structure
• the std::set with the colors is passed by reference, and therefore it has to be packed in a std::ref
• I create N threads, call the thread function with the arguments and then join them all
• when all threads have finished I aggregate the individual sets in a single one

Running this code with 4 threads yields results that look like this:

As you can see the performance is better than the sequential code in each case (and implicitly far better than the parallel_for version).

As I said at the beginning, one can imagine better algorithms (both sequential and parallel), but the straight forward conclusion is that you have to beware of parallelization, as not every parallel version of an algorithm can run faster than the sequential version. The more syncing for shared-access takes place, the more performance is affected and the results can get much worse than the sequential version.

Hits for this post: 29132 .