ParallelFX Saves!

After the MVP Summit in Seattle, I started to dig into the Parallel FX framework (currently under a CTP available here). In just a few words, the framework is composed of:

  • Task Parallel Library (TPL), that provides means to manage task (i.e. units of execution), and exposes a
  • TPL API, represented by the static function in the Parallel class: For, ForEach and Do
  • Parallel LINQ (PLINQ): allows parallelization of LINQ queries (AsParallel)

Since I have a 2 codes CPU (AMP Athlon 64 X2 Dual) I wanted to see how much performance do I actually gain by parallelizing loops. So I benchmarked matrices multiplication and arrays sorting.

Matrices Multiplication

Having this Matrix class:

I could write the multiplication routine like this:

Modifying the function to use Parallel.For was pretty simple, since this is very straight forward:

I used matrices of various sizes, started from 100×100 to 1000×1000 and compared the execution time for the sequential version or the parallel version.

The results were:

They show that the time for executing the parallel version was only half the time for the sequential multiplication (remember that I have 2 cores). That means reducing the execution time, not 20 or 30%, but 50%, basically the maximum possible.

Array Sorting

For sorting the arrays I tested with the wost sorting algorithm: bubblesort. Here is the implementation:

First, I tested that with small array, 100 to 1000 elements.

And the results were:

Then, I changed the for loop, so that it creates array from 5000 to 50000 elements. The results were:

The conclusion is that the parallel sorting is not efficient when the array size is less than 1000 elements; in other words there is no performance gain, but perhaps performance lost due to overhead of creating tasks and executing them and managing the context switching. The same conclusion is expressed in the framework’s documentation:

Target areas of your program where algorithms are computationally expensive and/or data sets are large (e.g. > 1000 elements). In such cases, there will likely be benefit to using parallelism.

That means, that when you parallelize an algorithm, you should take into consideration the size of the data structures it performs on, and unless a given threshold is not reached, a non-parallelized version should be used.

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.