In my previous post, Dining Philosophers in C++11, I have provided an implementation for the dining philosophers problem using modern C++ features, such as threads and mutexes. However, it was noted in the comments that the implementation did not prevent the philosophers starving to death when you remove the waiting times.

An algorithm that prevents the philosophers from starving was proposed by Mani Chandy and J. Misra and is known as the Chandy/Misra solution. This is a bit different than the original problem because it requires the philosophers to communicate with each other. The algorithm, as described on Wikipedia, is the following:

  1. For every pair of philosophers contending for a resource, create a fork and give it to the philosopher with the lower ID (n for agent Pn). Each fork can either be dirty or clean. Initially, all forks are dirty.
  2. When a philosopher wants to use a set of resources (i.e. eat), said philosopher must obtain the forks from their contending neighbors. For all such forks the philosopher does not have, they send a request message.
  3. When a philosopher with a fork receives a request message, they keep the fork if it is clean, but give it up when it is dirty. If the philosopher sends the fork over, they clean the fork before doing so.
  4. After a philosopher is done eating, all their forks become dirty. If another philosopher had previously requested one of the forks, the philosopher that has just finished eating cleans the fork and sends it.

In order to implement this, we must make several changes to the solution proposed in the previous post:

  • forks and philosophers must have identifiers
  • there is an initial setup of both forks and philosophers
  • use std::condition_variable to communicate between threads
  • increase the number of philosophers

Because it has been also argued that string_view is only available in C++17 and this implementation is supposed to work in C++11, I have replaced that with std::string const&.

In this implementation, philosophers, i.e. threads, need to communicate with each other to request the forks, i.e. resources. For this, we will use a std::condition_variable, which is a synchronization primitive that enables the blocking of one or more threads until another thread notifies it. A std::condition_variable requires a std::mutex to protect access to a shared variable. The following class, sync_channel, contains both a condition variable and a mutex and provides two methods: one that waits on the condition variable, blocking the calling thread(s), and one that notifies the condition variable, unblocking all the threads that are waiting for a signal.

The table class from the previous implementation is modified: the forks are no longer defined here, but a sync_channel is used to prevent philosophers start dining until the table setup is completed. Its name has been changed to table_setup.

The fork class is no longer a wrapper for a mutex. It has an identifier, an owner, a flag to indicate whether it is dirty or clean, a mutex, and a sync_channel that enables owners to request used forks. It has two methods:

  • request() that enables a philosopher to request the fork. If the fork is dirty, it is set to clean, and the ownership is given to the philosopher that asked for it. If the fork is clean (i.e. the current owner is eating), than the philosopher that asked for it will block, waiting for it to become dirty (i.e. the current owner has finished eating).

  • done_using() a philosopher indicates that has finished eating and notifies other philosopher that is waiting for the fork that it can have it.

There are less changes to the philosopher class: it has an identifier, and there are no more waiting times to simulate eating and thinking. There are some small changes to the following methods:

  • dine(): each philosopher only starts eating after the entire table has been setup. A condition variable, from the table_setup object is used for this.

  • eat(): each philosopher first requests the left and right fork. When they are available, they are locked using std::lock() to avoid possible deadlocks, and then their ownership is transfered to a std::lock_guard object, so they are properly released when done. After eating, the fork is set as dirty and other philosophers waiting for it are notified of this.

According to the initial setup, each fork is given to the philosopher with the lower ID. That means fokm 1, placed between philosopher 1 and N, goes to philosopher 1. Fork 2, placed between philosophers 2 and 3 is given to philosopher 2. Eventually, fork N, placed between philosophers N and 1, is given to philosopher 1. Overall, this means all philosophers have initially 1 fork, except for the first one that has two, and the last philosopher, that has none.

Put all together, the code looks like this:

The output of the program looks like this:

, , , , , , Hits for this post: 1909 .

Dining Philosophers in C++11

UPDATE: for an implementation of the Chandy/Misra solution see Dining philosophers in C++11: Chandy-Misra algorithm

The problem of the dining philosophers, first proposed by Edsger Dijkstra and reformulated by Tony Hoare, is a famous problem for concurrent programming that illustrates problems with synchronizing access to data. The description of the problem, taken from Wikipedia, is the following:

Five silent philosophers sit at a round table with bowls of spaghetti. Forks are placed between each pair of adjacent philosophers.

Each philosopher must alternately think and eat. However, a philosopher can only eat spaghetti when they have both left and right forks. Each fork can be held by only one philosopher and so a philosopher can use the fork only if it is not being used by another philosopher. After an individual philosopher finishes eating, they need to put down both forks so that the forks become available to others. A philosopher can take the fork on their right or the one on their left as they become available, but cannot start eating before getting both forks.

Eating is not limited by the remaining amounts of spaghetti or stomach space; an infinite supply and an infinite demand are assumed.

The idea is to find a solution so that none of the philosophers would starve, i.e. never have the chance to acquire the forks necessary for him to eat.

Below I propose a simple implementation to this problem using C++11 language and library features. The following classes are defined:

  • fork represents a fork at the table; the only member of this structure is a std::mutex that will be locked when the philosopher picks up the fork and unlocked when he puts it down.

  • table represents the round table where the philosophers are dining. It has an array of forks, but also an atomic boolean that indicates that the table is ready for the philosophers to start thinking and eating.

  • philosopher represents a philosopher dining at the table. It has a name and a reference to the forks on his left and right.

Most of the implementation of the solution is part of the philosopher class. When an object of this class is instantiated, a thread is started. This thread is joined when the object is destroyed. The thread runs a loop of thinking and eating until the dinner is signaled to end by setting the ready member of the table to false. There are three main methods in the philosopher class:

  • dine() is the thread function; this is implemented as a simple loop of thinking and eating.

  • think() is the method that represents the thinking period. To model this the thread sleeps for a random period of time.

  • eat() is the method that models the eating. The left and right forks are acquired in a deadlock free manner using std::lock. After the forks, i.e. mutexes, are acquired, their ownership is transfered to a std::lock_guard object, so that the mutexes are correctly released when the function returns. Eating is simulated with a sleep.

To see how this works, we create a table object and an array of phylosophers. Upon creating the philosopher objects their own working thread is started, but nothing happens until the table is signaled as being ready. Philosophers then compete for the forks (i.e. mutexes), eat and think until the dinner is signaled as finished by setting the ready flag of the table object back to false.

The whole implementation is show below:

The output for this program (that varies with each execution) has the following form:

Though the problem is usually described in terms of five philosophers, any number of philoposhers can be present at the table (of course, at least two are necessary for the problem to make sense). Adding more philosophers does not require any changes in the implementation.

, , , , , , Hits for this post: 1905 .

C++11 has provided support for range-based for loops. They allow iterating over the elements of a range without using an index.

However, if you try the following MFC code you get some errors because the compiler is looking for a begin() and end() function that provides access to the first and last element of the range:

1>error C3312: no callable ‘begin’ function found for type ‘CStringArray’
1>error C3312: no callable ‘end’ function found for type ‘CStringArray’

MFC does not define such functions for its containers.

Enter MFC Collection Utilities

Tom Kirby-Green and I have developed a small open-source library that enables the use of all MFC collection types in range-based for loops. The library is called MFC Collection Utilities and is available on codeplex.

The library consists of a single header, called mfciterators.h, that you include in your MFC projects.

Compiler and collections support

The library works in Visual Studio 2012 (the first version of the C++ compiler that supports range-based for loops) or a newer version.

The library enables all the MFC collections, both template and non-template, to be used in range-based for loops. This means arrays, lists and maps. For maps you get access to the content through a key-value pair that has two fields: key and value.

Supported template collections

Arrays Lists Maps
CArray CList CMap
CTypedPtrArray CTypedPtrList CTypedPtrMap

Supported non-template collections

Arrays Lists Maps
CObArray CObList CMapPtrToWord
CByteArray CPtrList CMapPtrToPtr
CDWordArray CStringList CMapStringToOb
CPtrArray CMapStringToPtr
CStringArray CMapStringToString
CWordArray CMapWordToOb
CUIntArray CMapWordToPtr



Version 1.0 can be downloaded from codeplex from here.

For simpler installation you can use the available nuget package.


Let us know if you encounter any issues.

, , , , , , , Hits for this post: 25178 .

Many years ago I published on my blog a helper class for working with the Windows console that was wrapping the Windows console API. Looking back at it I realized it was a pretty naive implementation. So I decided to start a new and make something more flexible and easier to use. Hopefully, I was more successful. The result is a small C++ template library called cppconlib, available on codeplex.

cppconlib is built with C++11 features and requires Visual Studio 2012 or newer. The library is available in a single header called conmanip.h and provides a set of helper classes, functions and constants for manipulating a Windows console (using the Windows console functions). The library features the following components:

  • console_context<T>: represents a context object for console operations; its main purpose is restoring console settings; typedefs for the three consoles are available (console_in_context, console_out_context and console_err_context)
  • console<T>: represents a console objects providing operations such as changing the foreground and background colors, the input mode, screen buffer size, title, and others; typedefs for the three consoles are available (console_in, console_out and console_err)
  • manipulating functions that can be used with cout/wcout and cin/wcin: settextcolor()/restoretextcolor(), setbgcolor()/restorebgcolor(), setcolors(), setmode()/clearmode(), setposx()/setposy()/setpos().

The library can be downloaded from here. Detailed documentation is available here.



The following example prints some text in custom colors and then reads text in a different set of colors.


The following code prints a rhomb to the console:


For more details and updates check the project at codeplex:

UPDATE: A NuGet package for cppconlib is available.

, , , , , , , Hits for this post: 35084 .

A recent question on stackoverflow raised the problem of a fast algorithm to count the unique colors in an image (Faster algorithm to check the colors in a image). My answer what that this kind of problems are suited for parallelization. However, parallelization may only help when used judiciously.

To demonstrate the point I have written a pretty simple implementation in VC++ (with C++11), that uses a std::set to store the unique colors. The size of the set is the number of the colors. The implementation simply iterates over the pixels of the image and adds them to the set.

There are probably better solutions than the one presented in this article. The code shown here is merely for the purpose of showing how parallelization can help, though not everything that runs in parallel is faster that a sequential code.

There are some things to note:

  • I used CImage class to load an image from disk
  • images can be 1,4,8,16,24 or 32 bit per pixel, but in the code presented here, for simplicity, I assume the image loaded from disk is 24 bits per pixel
  • Windows device-independent bitmaps can be stored in memory in two ways: top-down (as you would naturally expect) or bottom-up. The processing of the image varies based on the storage. You can find details here: Top-Down vs. Bottom-Up DIBs.
  • CImage::GetPitch method returns the pitch of the bitmap, which is the distance in bytes between the two memory addresses representing the beginning of two consecutive lines of the bitmap. This is important because the bitmap can be stored (align) with padding bytes at the end of a line, and the pitch and the width are used to determine the number of padding bytes. If the pitch is positive, then the bitmap is stored top-down. If the pitch is negative then the image is stored bottom-up.
  • the pointer returned by CImage::GetBits points to the beginning of the buffer if the image is top-down, and to the end of the buffer if the image is bottom-up.
  • since the image data is stored in contiguous array, the position of each pixel is given by the following algorithm:

    Since the pitch is positive when the bitmap is stored top-down and negative when then image is stored bottom-up, the above algorithm can be simplified to:

My simple implementation of the sequential algorithm looks like this:

Loading the image from disk, and timing the execution looks like this:

The result on my machine (Intel Core i7 2.67GHz, 6 GB RAM) on four random bitmaps with sizes 200×150, 1680×1050, 3360×1065 and 3360×2100 look like this (obviously with small variations):

The simplest parallelization you can think of is using parallel_for from the Parallel Patterns Library. Especially, because the conversion from a sequential implementation to a parallel one is pretty trivial.

When you run this code you get some pretty surprising results.

It’s not that surprising after all, if you look at how the insertion is performed. The access to the std::set is guarded with std::mutex, allowing only one thread to insert a new element into the set. Because of this synchronization it takes much more, in the order of 50-100 times, than the sequential implementation. Of course, the bigger the image is, the smaller this performance lost.

Obviously, parallelization through the means of parallel_for is not a solution in this case. Using threads directly however, can help, if we can get rid of syncing. This is possible by using a separate std::set for each thread, and merging the results at the end.

A few considerations:

  • the std::thread constructor does not take (yet) any number of parameters, so I had to pack the arguments in a structure
  • the std::set with the colors is passed by reference, and therefore it has to be packed in a std::ref
  • I create N threads, call the thread function with the arguments and then join them all
  • when all threads have finished I aggregate the individual sets in a single one

Running this code with 4 threads yields results that look like this:

As you can see the performance is better than the sequential code in each case (and implicitly far better than the parallel_for version).

As I said at the beginning, one can imagine better algorithms (both sequential and parallel), but the straight forward conclusion is that you have to beware of parallelization, as not every parallel version of an algorithm can run faster than the sequential version. The more syncing for shared-access takes place, the more performance is affected and the results can get much worse than the sequential version.

, , , , , Hits for this post: 25104 .

The Oxford meeting of the ISO C++ standards committee between 15-20 April resulted in new features beeing into the draft paper of the C++0x standard.

One of the features refer to Unicode support: a new header, called <cuchar>, was introduced. This header makes available new built-in types char16_t and char32_t, as well as new prefixes u and U to designate UTF-16 and UTF-32 encoded characters and strings.

A list of all papers submited before the Oxford meeting can be found here.

More about the results of the Oxford meeting can be found in:

, , , Hits for this post: 17745 .