<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>Marius Bancila's Blog &#187; Parallel Programming</title>
	<atom:link href="http://mariusbancila.ro/blog/category/it/software/parallel-programming/feed/" rel="self" type="application/rss+xml" />
	<link>http://mariusbancila.ro/blog</link>
	<description>Sharing my opinions and ideas!</description>
	<lastBuildDate>Sun, 08 Aug 2010 09:36:16 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.0.1</generator>
		<item>
		<title>Questionaire on concurrency</title>
		<link>http://mariusbancila.ro/blog/2010/03/10/questionaire-on-concurrency/</link>
		<comments>http://mariusbancila.ro/blog/2010/03/10/questionaire-on-concurrency/#comments</comments>
		<pubDate>Wed, 10 Mar 2010 10:20:48 +0000</pubDate>
		<dc:creator>Marius Bancila</dc:creator>
				<category><![CDATA[Parallel Programming]]></category>
		<category><![CDATA[concurrency]]></category>
		<category><![CDATA[multithreads]]></category>
		<category><![CDATA[parallel]]></category>
		<category><![CDATA[survey]]></category>

		<guid isPermaLink="false">http://mariusbancila.ro/blog/?p=473</guid>
		<description><![CDATA[I think application development faces two challenges nowadays: 64-bit and multi-core/many-core hardware. Switching from 32 to 64 bit is just another step in the evolution of processors. There was a time when we switched from 8 to 16, and then from 16 to 32. There are problems that arise every time, but probably in 10-20 [...]]]></description>
			<content:encoded><![CDATA[<p>I think application development faces two challenges nowadays: 64-bit and multi-core/many-core hardware. Switching from 32 to 64 bit is just another step in the evolution of processors. There was a time when we switched from 8 to 16, and then from 16 to 32. There are problems that arise every time, but probably in 10-20 years we will have 128 bit platforms. On the other hand, multi-core/many-core is a different shift not only in development but also thinking. We either run one core or multiple core processors. My working stations has 8 cores (4 physical and 4 virtualized) and so does my laptop. Still, I don&#8217;t see an 8 times improvement of the applications running on these machines; yet I know 8 times I not what I should expect. But not even 4 times. For instance, the time for building from scratch the application I&#8217;ve working on with Visual Studio 2008 dropped from 20 minutes to 8 minutes; that&#8217;s a 2.5 improvement. I know that having N cores doesn&#8217;t mean that applications can run N times faster, because not everything can run in several threads, and then we have problems with resource access, synchronization and others. All these prevent applications run N times faster. But the problem is that we are not thinking in parallel. We are still used to program in a single thread; and when I say that I mean not only most developers are not used to do parallelization for boosting performance of some routines, but also many operations are run in the main (UI) thread, making that GUI freeze.</p>
<p>Therefore I would like to get some feedback from people working in applications development to get an idea about awareness, issues, solutions regarding concurrency. Please take several minutes to answer the questions in this survey.</p>
<p>I&#8217;d like to quote James Reinders, lead evangelist and Director of Marketing and Business Development for Intel Software Development Products, who said that:</p>
<blockquote><p>I am still confident that software development in 2016 will not be kind to programmers who have not learned to &#8220;Think Parallel.&#8221;</p></blockquote>
<blockquote><p>The &#8220;not parallel&#8221; era we are now exiting will appear to be a very primitive time in the history of computers when people look back in a hundred years. The world works in parallel, and it is time for computer programs to do the same.</p></blockquote>
<blockquote><p>Doing one thing at a time is &#8220;so yesterday.&#8221;</p></blockquote>
<p>The questionnaire bellow is also available <a href="https://spreadsheets.google.com/viewform?hl=en&#038;formkey=dE5mRlc2ejhXMVpaSUtPZmkzaHZkanc6MA" target="_blank">here</a>.</p>
<p><iframe src="https://spreadsheets.google.com/embeddedform?formkey=dE5mRlc2ejhXMVpaSUtPZmkzaHZkanc6MA" width="700" height="3400" frameborder="0" marginheight="0" marginwidth="0">Loading&#8230;</iframe></p>
<p>Thank you for answering the questionnaire.</p>
]]></content:encoded>
			<wfw:commentRss>http://mariusbancila.ro/blog/2010/03/10/questionaire-on-concurrency/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
		<item>
		<title>Helpers For Multithreading in C++</title>
		<link>http://mariusbancila.ro/blog/2010/02/01/helpers-for-multithreading-cpp/</link>
		<comments>http://mariusbancila.ro/blog/2010/02/01/helpers-for-multithreading-cpp/#comments</comments>
		<pubDate>Mon, 01 Feb 2010 20:32:17 +0000</pubDate>
		<dc:creator>Marius Bancila</dc:creator>
				<category><![CDATA[C++]]></category>
		<category><![CDATA[Parallel Programming]]></category>
		<category><![CDATA[C++ multithreads threads concurrency]]></category>

		<guid isPermaLink="false">http://mariusbancila.ro/blog/?p=466</guid>
		<description><![CDATA[On of the most important challenges nowadays in programming is concurrency. If we don&#8217;t learn to write programs that are able to run on multiple cores the progress in hardware will be pointless. But when you run multiple threads for various processing you might face the situation when you have to write over and over [...]]]></description>
			<content:encoded><![CDATA[<p>On of the most important challenges nowadays in programming is concurrency. If we don&#8217;t learn to write programs that are able to run on multiple cores the progress in hardware will be pointless. But when you run multiple threads for various processing you might face the situation when you have to write over and over again same or similar code for creating the threads, setting up the parameters for the threads, joining the threads, checking the result, cleaning-up, etc.</p>
<p>In this post I will show how you can create some helpers in C++ to simplify this process. This is not going to be a full solution, neither a solution that fits all needs, but can be a start.</p>
<p>What I would like to have is a helper class that will take care of:</p>
<ul>
<li>finding how many threads can run (considering each available core can run a thread)</li>
<li>creating and starting the threads</li>
<li>joining the threads</li>
<li>checking the result of the threads execution</li>
<li>cleaning up</li>
</ul>
<p>The class show bellow does just that.</p>
<pre class="prettyprint">
#include < windows.h >

class ThreadHelper
{
	LPVOID* m_Params;
	int m_ThreadsNo;

private:
	int GetProcessorsCount()
	{
		SYSTEM_INFO info;
		::GetSystemInfo(&amp;info);
		return info.dwNumberOfProcessors;
	}

public:
	ThreadHelper()
	{
		m_ThreadsNo = GetProcessorsCount();

		m_Params = new LPVOID[m_ThreadsNo];
		for(int i = 0; i < m_ThreadsNo; ++i)
			m_Params[i] = NULL;
	}

	ThreadHelper(int threadsNo)
	{
		if(threadsNo < 1)
			m_ThreadsNo = GetProcessorsCount();
		else
			m_ThreadsNo = threadsNo;

		m_Params = new LPVOID[m_ThreadsNo];
		for(int i = 0; i < m_ThreadsNo; ++i)
			m_Params[i] = NULL;
	}

	~ThreadHelper()
	{
		delete [] m_Params;
	}

	int GetThreadsNo() const {return m_ThreadsNo;}
	bool SetThreadParams(int threadIndex, LPVOID lpData)
	{
		if(threadIndex >= 0 &#038;&#038; threadIndex < m_ThreadsNo)
		{
			m_Params[threadIndex] = lpData;
			return true;
		}

		return false;
	}

	bool Run(LPTHREAD_START_ROUTINE threadProc, BOOL startImmediatelly, DWORD timeout = INFINITE)
	{
		bool success = false;

		HANDLE* hThreads = new HANDLE[m_ThreadsNo];
		DWORD* dwThreadIds = new DWORD[m_ThreadsNo];

		bool allThreadsOK = true;

		// create the threads
		for(int i = 0; i < m_ThreadsNo &amp;&amp; allThreadsOK; ++i)
		{
			hThreads[i] = ::CreateThread(
				NULL,
				0,
				threadProc,
				m_Params[i],
				startImmediatelly ? 0 : CREATE_SUSPENDED,
				&amp;dwThreadIds[i]);

			if(hThreads[i] == NULL)
			{
				for(int j = 0; j < i; ++j)
				{
					::CloseHandle(hThreads[j]);
				}

				allThreadsOK = false;
			}
		}

		if(allThreadsOK)
		{
			// start the threads if they were suspended first
			if(!startImmediatelly)
			{
				for(int i = 0; i < m_ThreadsNo; ++i)
				{
					::ResumeThread(hThreads[i]);
				}
			}

			// wait for all threads
			DWORD joinret = ::WaitForMultipleObjects(
				m_ThreadsNo,
				hThreads,
				TRUE,
				timeout);

			if(joinret == WAIT_FAILED)
			{

			}
			else if(joinret = WAIT_TIMEOUT)
			{

			}
			else if(joinret >= WAIT_OBJECT_0 &amp;&amp; joinret < WAIT_OBJECT_0 + m_ThreadsNo)
			{
				success = true;
			}
			else if(joinret >= WAIT_ABANDONED_0 &amp;&amp; joinret < WAIT_ABANDONED_0 + m_ThreadsNo)
			{

			}

			// close the thread handles
			for(int i = 0; i < m_ThreadsNo; ++i)
			{
				::CloseHandle(hThreads[i]);
			}
		}

		delete [] hThreads;
		delete [] dwThreadIds;

		return success;
	}
};
</pre>
<p>This helper class contains:</p>
<ul>
<li>one parameter-less constructor that identifies the number of available processors and sets the threads count equal to the processors count</li>
<li>one constructor that takes the number of threads that should be created</li>
<li>one method (SetThreadParams) for setting the parameters for each thread that will be created</li>
<li>one method (Run) that creates and runs the thread, waits for them and checks the result of the execution</li>
</ul>
<p>As you can see the Run() method is simplistic. It does not handle for instance timed out or abandoned thread executions. Also it joins all threads, waiting until all of them finished execution. A more flexible method could wait only until the first thread finishes and then maybe closes the other threads. But as I said, this is a sample and not a complete solution.</p>
<p>Having this helper set up, I will start several threads to find the prime numbers in a sequence and print them in the console. </p>
<p>The following function computes whether a number is prime/</p>
<pre class="prettyprint">
#include < cmath >

bool IsPrime(int number)
{
	const int max = static_cast< int >(
		std::sqrt(static_cast< double >(number))) + 1;

	for (int i=2; i!=max; ++i)
	{
		if (number % i == 0) return false;
	}

	return true;
}
</pre>
<p>The thread procedure will run through a sub-sequence of a vector of integers and verify if each element is prime. I will use the following structure to pass the sequence bounds to the thread procedure:</p>
<pre class="prettyprint">
struct vector_bounds
{
	std::vector< int >::const_iterator begin;
	std::vector< int >::const_iterator end;
};
</pre>
<p>The thread procedure could look like this:</p>
<pre class="prettyprint">
static CRITICAL_SECTION cs;

DWORD WINAPI FindPrimes(LPVOID lpData)
{
	vector_bounds* bounds = static_cast< vector_bounds* >(lpData);
	if(bounds == NULL)
		return 1;

	for(std::vector< int >::const_iterator cit = bounds->begin;
		cit != bounds->end; ++cit)
	{
		if(IsPrime(*cit))
		{
			EnterCriticalSection(&amp;cs);

			std::cout << *cit << std::endl;

			LeaveCriticalSection(&amp;cs);
		}
	}

	return 0;
};
</pre>
<p>To print to the console a locking mechanism is necessary, otherwise prints from two different threads could collide. The critical section will be initialized before the threads are started.</p>
<p>What remains to be done is generating a sequence of integers, setting up the parameters with the sequence bounds for each thread and run the threads using the helper.</p>
<pre class="prettyprint">
int main()
{
	// generate some random numbers
	srand((unsigned long)time(NULL));
	std::vector< int > numbers;
	std::generate_n(std::back_inserter(numbers), 1000, rand);

	// create the thread helper
	ThreadHelper helper(4);
	int threads = helper.GetThreadsNo();

	// create the parameters for the threads
	std::vector< vector_bounds > params;
	std::vector< int >::const_iterator begin = numbers.begin();
	size_t partitionsize = numbers.size()/threads;

	for(int i = 0; i < threads; ++i)
	{
		vector_bounds bound;
		bound.begin = begin;
		bound.end = (i == threads - 1) ? numbers.end() : begin + partitionsize;
		params.push_back(bound);

		begin = bound.end;
	}

	for(int i = 0; i < threads; ++i)
		helper.SetThreadParams(i, &amp;params[i]);

	// run the threads
	InitializeCriticalSection(&amp;cs);

	std::cout << "start running..." << std::endl;

	bool success = helper.Run(FindPrimes, FALSE);

	std::cout << "finished " << (success? "successfully" : "failed") << std::endl;

	DeleteCriticalSection(&amp;cs);

	return 0;
}
</pre>
<p>Having this threads helper class, what I need to do when running some processing in several threads is:</p>
<ul>
<li>setup thread parameters (if the case)</li>
<li>write the thread procedure</li>
<li>create a ThreadHelper object and initialize it</li>
<li>run the threads and collect the results</li>
</ul>
<p>The helper class prevents writing same code over and over again and helps focusing on the most important tasks: writing the thread procedure. As I said earlier it is not a full solution, nor one that fits all scenarios, but you can develop it to suit your needs.</p>
]]></content:encoded>
			<wfw:commentRss>http://mariusbancila.ro/blog/2010/02/01/helpers-for-multithreading-cpp/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>STM.NET</title>
		<link>http://mariusbancila.ro/blog/2009/07/29/stm-net/</link>
		<comments>http://mariusbancila.ro/blog/2009/07/29/stm-net/#comments</comments>
		<pubDate>Wed, 29 Jul 2009 11:19:59 +0000</pubDate>
		<dc:creator>Marius Bancila</dc:creator>
				<category><![CDATA[.NET]]></category>
		<category><![CDATA[C#]]></category>
		<category><![CDATA[CTPs & Betas]]></category>
		<category><![CDATA[Parallel Programming]]></category>
		<category><![CDATA[Tools]]></category>
		<category><![CDATA[beta]]></category>
		<category><![CDATA[STM.NET]]></category>

		<guid isPermaLink="false">http://mariusbancila.ro/blog/?p=334</guid>
		<description><![CDATA[Microsoft has made available a first beta version of an experimental version of .NET 4.0, called .NET Framework 4.0 Beta 1 Enabled for Software Transactional Memory v1.0. Since that is quite a long name, the short one is STM.NET. This is a special version of .NET 4.0 that enables software transactional memory for C#. It [...]]]></description>
			<content:encoded><![CDATA[<p>Microsoft has made available a first beta version of an experimental version of .NET 4.0, called .NET Framework 4.0 Beta 1 Enabled for Software Transactional Memory v1.0. Since that is quite a long name, the short one is STM.NET. This is a special version of .NET 4.0 that enables software transactional memory for C#. It allows programmers to demarcate regions of code as operating in an atomic, isolated transaction from other code running concurrently. The means to do this is a delegate called Atomic.Do, or try-catch blocks. Might be that in the future an &#8216;atomic&#8217; block will be added to the language(s).</p>
<p>This first version of the framework, also comes with additional tools:</p>
<ul>
<li>tooling (debugging, ETW tracing)</li>
<li>lock interoperability</li>
<li>interoperability with traditional transactions</li>
<li>annotations (how methods run in transactions, suppressed transactions on methods, etc.)</li>
<li>static and dynamic checking of annotations</li>
</ul>
<p>On the other hand there are some limitations:</p>
<ul>
<li>only works for C# for now</li>
<li>cannot be installed on a machine with VS 2010, nor the opposite</li>
<li>there is only a 32-bit version</li>
</ul>
<p>More information about it can be found at the <a href="http://blogs.msdn.com/stmteam/">STM team blog</a> or <a href="http://msdn.microsoft.com/en-us/devlabs/ee334183.aspx">MSDN DevLabs</a>.</p>
]]></content:encoded>
			<wfw:commentRss>http://mariusbancila.ro/blog/2009/07/29/stm-net/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Axum, A Language for Building Parallel Applications</title>
		<link>http://mariusbancila.ro/blog/2009/05/13/axum-a-language-for-building-parallel-applications/</link>
		<comments>http://mariusbancila.ro/blog/2009/05/13/axum-a-language-for-building-parallel-applications/#comments</comments>
		<pubDate>Wed, 13 May 2009 06:26:09 +0000</pubDate>
		<dc:creator>Marius Bancila</dc:creator>
				<category><![CDATA[.NET]]></category>
		<category><![CDATA[CTPs & Betas]]></category>
		<category><![CDATA[Parallel Programming]]></category>
		<category><![CDATA[agents]]></category>
		<category><![CDATA[axum]]></category>
		<category><![CDATA[isolation]]></category>
		<category><![CDATA[parallel]]></category>

		<guid isPermaLink="false">http://mariusbancila.ro/blog/?p=268</guid>
		<description><![CDATA[Last week Microsoft published on DevLabs a .NET language for building parallel applications, called Axum, and earlier known as Maestro. This new language is build on the architecture of the web, on the principles of isolation, message-passing, fault-tolerance, loose-coupling. It is said to have a more succinct syntax than Erlang, and have the isolation advantage [...]]]></description>
			<content:encoded><![CDATA[<p>Last week Microsoft published on <a href="http://msdn.microsoft.com/en-us/devlabs/dd795202.aspx">DevLabs</a> a .NET language for building parallel applications, called Axum, and earlier known as Maestro. This new language is build on the architecture of the web, on the principles of isolation, message-passing, fault-tolerance, loose-coupling. It is said to have a more succinct syntax than Erlang, and have the isolation advantage over MPI, CCR and Asynchronous Agents.</p>
<p>Isolation is key in this architecture and is achieved with:</p>
<ul>
<li>domains, that limits the runtime scope of data to its compile-time scope (objects don&#8217;t escape domains)</li>
<li>agents, active components that provide access to domains, and live in a thread of their own, different that the callers; their methods are not accessible outside;</li>
<li>channels, are the mean to communicate with agents; they are established by the runtime, when agents are created. The most important parts of the channels are the ports (input or output), that can be viewed as queues in which data is placed.</li>
</ul>
<p>Here are more readings about these topics here:</p>
<ul>
<li><a href="http://blogs.msdn.com/maestroteam/archive/2009/02/27/we-haven-t-forgotten-about-other-models-honest.aspx">Axum core architecture principles</a></li>
<li><a href="http://blogs.msdn.com/maestroteam/archive/2009/02/27/isolation-in-maestro.aspx">Isolation in Axum</a></li>
<li><a href="http://blogs.msdn.com/maestroteam/archive/2009/03/02/channels.aspx">Channels</a></li>
</ul>
<p>You can find more about Axum at:</p>
<ul>
<li><a href="http://msdn.microsoft.com/en-us/devlabs/dd795202.aspx">Home page at Microsoft DevLabs</a></li>
<li><a href="http://blogs.msdn.com/maestroteam/">Axum Team Blog</a></li>
<li><a href="http://social.msdn.microsoft.com/Forums/en-US/axum/threads">Axum Forum</a></li>
</ul>
<p></p>
]]></content:encoded>
			<wfw:commentRss>http://mariusbancila.ro/blog/2009/05/13/axum-a-language-for-building-parallel-applications/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Word Reducing Puzzle</title>
		<link>http://mariusbancila.ro/blog/2008/06/05/word-reducing-puzzle/</link>
		<comments>http://mariusbancila.ro/blog/2008/06/05/word-reducing-puzzle/#comments</comments>
		<pubDate>Thu, 05 Jun 2008 11:52:35 +0000</pubDate>
		<dc:creator>Marius Bancila</dc:creator>
				<category><![CDATA[F#]]></category>
		<category><![CDATA[Parallel Programming]]></category>

		<guid isPermaLink="false">http://mariusbancila.ro/blog/?p=124</guid>
		<description><![CDATA[I recently found an interesting problem on the web, about reducing words, letter by letter until only one letter remains. Here is a formal definition: We define word reduction as removing a single letter from a word while leaving the remaining letters in their original order so that the resulting sequence of characters is also [...]]]></description>
			<content:encoded><![CDATA[<p>I recently found an interesting problem on the web, about reducing words, letter by letter until only one letter remains. Here is a formal definition:</p>
<blockquote><p>
We define word reduction as removing a single letter from a word while leaving the remaining letters in their original order so that the resulting sequence of characters is also a word. A good word can be reduced step-by-step until all that is left is a or i. Your program will answer the question: what is the biggest word in the given dictionary that can be reduced?
</p></blockquote>
<p>A typical example, not very long is this:</p>
<pre class="prettyprint">
planets
plants
pants
pant
ant
an
a
</pre>
<p>So I though this could be a good exercise for F#. Several good dictionaries, both small and big, can be found <a href="http://cs.millersville.edu/~katz/cs362/examples/dictionaries/" target="_blank">here</a>.</p>
<p>My approach was to read the dictionary and build a list for each word length: one for 1-letter words, one for 2-letter words, etc. This could be a Dictionary&lt;int, list<string>&gt;, and let&#8217;s call it simply dictionary. Then these lists could be traversed and create a second set of lists (let&#8217;s call this reducedwords), but only with the words that meet the defined reduction.</p>
<p>An good approach would be to take each list of words from the initial dictionary, starting with the list with words of 2 letters, and for each word to delete one letter at a time. Then check to see if the resulting word already exists in the list from reducedwords corresponding to a length smaller with 1. If it&#8217;s there, that this word could be reduced and should be added to the list from reducedwords corresponding to the current length. In other words, the algorithm would be:</p>
<pre class="prettyprint">
copy dictionary[1] to reducedwords[1]

for length = 2 to maxwordlength do
  foreach word in dictionary[length]
    foreach letter in word
      * delete the letter
      * if new word exists in reducedwords[length-1] then
          * add new word to reducedwords[length]
</pre>
<p>Here is a function for reading a dictionary file:</p>
<pre class="prettyprint">
let readWords (filename:string) =
   let dictionary = new Dictionary< int, list< string > >()
   let reader = new StreamReader(filename)
   let word = String.Empty
   let fileend = ref false
   while (!fileend = false) do
      let word = reader.ReadLine()
      if word = null then
         fileend := true
      else
         let len = String.length word
         let ok, words = dictionary.TryGetValue(len)
         if ok then dictionary.[len] <- words@[word]
         else dictionary.[len] <- [word]
   done
   dictionary.[1] <- ["a";"e";"i";"o";"u"]
   dictionary
</pre>
<p>One I have the dictionary read, I can apply the algorithm and generate a second Dictionary structure. The following function also returns length of the longest word(s) in the reduced dictionary. This is useful for printing.</p>
<pre class="prettyprint">
let findReducedWords (dictionary:Dictionary< int, list< string > >) =
   let reducedwords = new Dictionary< int, list< string > >()
   reducedwords.[1] <- dictionary.[1]

   let notdone = ref true
   let i = ref 2
   while (!notdone = true) do
      let ok, words = dictionary.TryGetValue(!i)
      if ok <> true then
         notdone := false
      else
         let added = ref false
         let ok, reducedpre = reducedwords.TryGetValue(!i - 1)
         reducedwords.[!i] <- []
         words |> List.iter (fun word ->
            for j = 0 to word.Length-1 do
               let trimmedword = word.Remove(j, 1)
               if reducedpre.Exists(fun x -> x = trimmedword) then
                  reducedwords.[!i] <- reducedwords.[!i]@[word]
                  added := true
            done;
         )
         reducedwords.[!i] <- reducedwords.[!i] |> Set.of_list |> List.of_seq
         if !added then
            i := !i + 1
         else
            reducedwords.Remove(!i) |> ignore
            notdone := false
   done

   (reducedwords, !i-1)
</pre>
<p>Since the problem is about printing only the longest such reductions, I'll only consider starting from the list of reduced words that has the longest words. That's why I needed findReducedWords to return the length of longest reducible word.</p>
<p>To print these paths I apply the same algorithm as before. The only difference is that I build a list with the words in the reducing path, starting with the longest word and ending with a 1-letter word.</p>
<pre class="prettyprint">
let rec printSequence
   (word:string)
   (reducedwords:Dictionary< int, list< string > >)
   (path:list< string >) =
   match word.Length with
   | 1 ->
      path@[word] |> List.iter (fun x -> printf "%s " x);
      printfn ""
   | _ ->
      let ok, reducedpre = reducedwords.TryGetValue(word.Length-1)
      if ok then
         for j = 0 to word.Length-1 do
            let trimmedword = word.Remove(j, 1)
            if reducedpre.Exists(fun x -> x = trimmedword) then
               printSequence trimmedword reducedwords (path@[word])
         done;

let printSequences
   (reducedwords:Dictionary< int, list< string > >)
   (maxlen:int) =
   let ok, words = reducedwords.TryGetValue(maxlen)
   if ok then
      words |> List.iter (fun word ->
         printSequence word reducedwords [])
</pre>
<p>The only thing left to do is calling all these functions:</p>
<pre class="prettyprint">
let main()=
   printfn "reading dictionary..."
   let dictionary = readWords "huge_dict.txt"

   printfn "building structures..."
   let reducedwords, max = findReducedWords dictionary

   printfn "printing matches..."
   printSequences reducedwords max

   Console.WriteLine("Press any key to continue...")
   Console.ReadKey()

main()
</pre>
<p>My results for <a href="http://cs.millersville.edu/~katz/cs362/examples/dictionaries/huge.dict" target="_blank">this dictionary</a> were:</p>
<pre class="prettyprint">
complecting completing competing compting comping coping oping ping pig pi i
complecting completing competing compting comping coping oping ping pin in i
complecting completing competing compting comping coping oping ping pin pi i
</pre>
<p></p>
]]></content:encoded>
			<wfw:commentRss>http://mariusbancila.ro/blog/2008/06/05/word-reducing-puzzle/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
		<item>
		<title>Applying File Transformations with F#</title>
		<link>http://mariusbancila.ro/blog/2008/05/06/applying-file-transformations-with-f/</link>
		<comments>http://mariusbancila.ro/blog/2008/05/06/applying-file-transformations-with-f/#comments</comments>
		<pubDate>Tue, 06 May 2008 20:10:57 +0000</pubDate>
		<dc:creator>Marius Bancila</dc:creator>
				<category><![CDATA[F#]]></category>
		<category><![CDATA[Parallel Programming]]></category>
		<category><![CDATA[files]]></category>
		<category><![CDATA[parallel]]></category>
		<category><![CDATA[pattern]]></category>
		<category><![CDATA[record]]></category>
		<category><![CDATA[recursive]]></category>
		<category><![CDATA[regex]]></category>

		<guid isPermaLink="false">http://mariusbancila.ro/blog/?p=121</guid>
		<description><![CDATA[In this post I&#8217;ll show some F# constructs, all put together in a simple application that modifies file names that match a criteria. This would be an application that is started from a console with the following command line options: filesmod.exe -f < folder > [-r] -p < pattern > [-pre < prefix >] [-suf [...]]]></description>
			<content:encoded><![CDATA[<p>In this post I&#8217;ll show some F# constructs, all put together in a simple application that modifies file names that match a criteria. This would be an application that is started from a console with the following command line options:</p>
<pre class="console">
filesmod.exe -f < folder > [-r] -p < pattern > [-pre < prefix >] [-suf < suffix >]
  -f < folder>   specifies the folder where the files are located
  -r            indicates that the specified folder should be parsed
                recursively
  -p < pattern>  indicates a pattern used for filtering files
  -pre < prefix> indicats a prefix to the added to all files
                that match the criteria
  -suf < suffix> indicats a suffix to the added to all files
                that match the criteria
</pre>
<h3>Reading command line</h3>
<p>The command line arguments can be retrieved using the Environment class from the .NET framework. This class has a static method called GetCommandLineArgs() that returns a list of the passed arguments.<br />
We can define a type that contains all the parsed arguments.</p>
<pre class="prettyprint">
type CommandOptions =
    { mutable Folder : string;
      mutable Recursive : bool;
      mutable Pattern : string;
      mutable Prefix : string;
      mutable Suffix : string;}
</pre>
<p>This mutable record can be instantiated, and the value can be mutated while parsing the arguments. This is how you instantiate it:</p>
<pre class="prettyprint">
   let cmdops =
      { new CommandOptions
        with Folder = String.Empty
        and Recursive = false
        and Pattern = String.Empty
        and Prefix = String.Empty
        and Suffix = String.Empty }
</pre>
<p>Parsing the command line arguments can be done with pattern matching. This is the equivalent of switches in C+/C#/Java, only more powerful.<br />
Basically, I&#8217;m checking each argument, and if it&#8217;s a flag in the command line (-f, -r, -p, -pre, -sub) I take the next argument and put it in the appropriate property of the record.</p>
<pre class="prettyprint">
   try
      let args = Environment.GetCommandLineArgs()
      for i = 0 to args.Length-1 do
         match args.(i) with
            | "-f" when i+1 <= args.Length-1 -> cmdops.Folder <- args.(i+1)
            | "-r" -> cmdops.Recursive <- true
            | "-p" when i+1 <= args.Length-1 -> cmdops.Pattern <- args.(i+1)
            | "-pre" when i+1 <= args.Length-1 -> cmdops.Prefix <- args.(i+1)
            | "-suf" when i+1 <= args.Length-1 -> cmdops.Suffix <- args.(i+1)
            | _ -> ()
      done
   with e -> printfn "%s" e.Message
</pre>
<p>There are two things you could notice here. The first is the try &#8230; with block that makes sure any possible exception is caught.<br />
The second is the quarding the rules with the contidion that the current argument is not the last one in the list. (-f should be followed by a folder, -suf by a suffix, etc.)<br />
You can see what in the when statement:</p>
<pre class="prettyprint">
when i+1 <= args.Length-1
</pre>
<h3>Getting the files in a directory</h3>
<p>We can get all files in a folder, using the following algorithm:</p>
<ul>
<li>get all the files in the current folder</li>
<li>get all the sub-folders in the current folder and for each of them apply the algorithm again</li>
</ul>
<p>That is spelled "recursion"!. Our function should take several arguments: the path of a folder, a pattern for mathing filenames and a flag indicating whether sub-folders should be parsed or not.</p>
<pre class="prettyprint">
let rec allFiles dir pattern r =
    seq
        { for file in Directory.GetFiles(dir, pattern) do
            yield file
          if r then
            for subdir in Directory.GetDirectories(dir) do
                for file in allFiles subdir pattern r do
                    yield file }
</pre>
<p>The above function is recursive and returns a sequence of file names. Sequences are lazy, which means that successive elements are computed and returned on demand, when they are needed.<br />
That is the opposite of a list or array, whose elements are created at once. The keyword 'yield' here is used to return a new value as the sequence is iterated.</p>
<h3>Processing the files</h3>
<p>To process the files, we simply iterate over the sequence of files from the specified folder, match it against the provided parttern, and if there is a match, apply the prefix and/or suffix transformation.</p>
<pre class="prettyprint">
   for name in (allFiles cmdops.Folder "*.*" cmdops.Recursive) do
      let file = new FileInfo(name)
      if(Regex.IsMatch(file.Name, cmdops.Pattern, RegexOptions.Singleline)) then
        let filename = file.Name.Substring(0, file.Name.LastIndexOf('.'))
        let newname = file.Directory.FullName+"\\"+cmdops.Prefix+filename+cmdops.Suffix+file.Extension
        System.IO.File.Move(file.FullName, newname)
        printfn "%s -> %s" file.FullName newname
   done
</pre>
<p>Well, I have two cores on my machine, and since the Parallel FX framework is available, I like to use it. So here is the parallel version of that:</p>
<pre class="prettyprint">
       try
          Parallel.ForEach(allFiles cmdops.Folder "*.*" cmdops.Recursive, fun name ->
             let file = new FileInfo(name)
             if(Regex.IsMatch(file.Name, cmdops.Pattern, RegexOptions.Singleline)) then
                let filename = file.Name.Substring(0, file.Name.LastIndexOf('.'))
                let newname = file.Directory.FullName+"\\"+cmdops.Prefix+filename+cmdops.Suffix+file.Extension
                System.IO.File.Move(file.FullName, newname)
                printfn "%s -> %s" file.FullName newname)
       with e -> printfn "%s" e.InnerException.Message
</pre>
<p>The provided (via command line) pattern is a regular expression. Initially, the folder is checked for all files and then these files are matched against this regular expression.</p>
<p>As I was saying in a previous post, if you use PFX, you have to add a reference to the System.threading.dll assembly, which requires a reference to the System.Core.dll assembly.<br />
That should be specified at the project's propertyes.</p>
<blockquote><p>
-r C:\WINDOWS\assembly\GAC_MSIL\System.Core\3.5.0.0__b77a5c561934e089\System.Core.dll -r "C:\Program Files\Microsoft Parallel Extensions Dec07 CTP\System.Threading.dll"
</p></blockquote>
<h3>Putting all together</h3>
<p>All that put together looks like this:</p>
<pre class="prettyprint">
#light

open System
open System.IO
open System.Text.RegularExpressions

open System.Threading

let rec allFiles dir pattern r =
    seq
        { for file in Directory.GetFiles(dir, pattern) do
            yield file
          if r then
            for subdir in Directory.GetDirectories(dir) do
                for file in allFiles subdir pattern r do
                    yield file }

let showUsage() =
    printfn "filesmod.exe -f < folder > [-r] -p < pattern > [-pre < prefix >] [-suf < suffix >]"
    printfn "  -f < folder >\tspecifies the folder where the files are located"
    printfn "  -r\t\tindicates that the specified folder should be parsed\n\t\trecursively"
    printfn "  -p < pattern >\tindicates a pattern used for filtering files"
    printfn "  -pre < prefix >\tindicats a prefix to the added to all files\n\t\tthat match the criteria"
    printfn "  -suf < suffix >\tindicats a suffix to the added to all files\n\t\tthat match the criteria"

type CommandOptions =
    { mutable Folder : string;
      mutable Recursive : bool;
      mutable Pattern : string;
      mutable Prefix : string;
      mutable Suffix : string;}

let main()=
   let cmdops =
      { new CommandOptions
        with Folder = String.Empty
        and Recursive = false
        and Pattern = String.Empty
        and Prefix = String.Empty
        and Suffix = String.Empty }

   try
      let args = Environment.GetCommandLineArgs()
      for i = 0 to args.Length-1 do
         match args.(i) with
            | "-f" when i+1 <= args.Length-1 -> cmdops.Folder <- args.(i+1)
            | "-r" -> cmdops.Recursive <- true
            | "-p" when i+1 <= args.Length-1 -> cmdops.Pattern <- args.(i+1)
            | "-pre" when i+1 <= args.Length-1 -> cmdops.Prefix <- args.(i+1)
            | "-suf" when i+1 <= args.Length-1 -> cmdops.Suffix <- args.(i+1)
            | _ -> ()
      done
   with e -> printfn "%s" e.Message

   if ((String.IsNullOrEmpty(cmdops.Prefix) &#038;&#038; String.IsNullOrEmpty(cmdops.Suffix)) ||
        String.IsNullOrEmpty(cmdops.Pattern) ||
        String.IsNullOrEmpty(cmdops.Folder)) then
        showUsage()
   else
       try
          Parallel.ForEach(allFiles cmdops.Folder "*.*" cmdops.Recursive, fun name ->
             let file = new FileInfo(name)
             if(Regex.IsMatch(file.Name, cmdops.Pattern, RegexOptions.Singleline)) then
                let filename = file.Name.Substring(0, file.Name.LastIndexOf('.'))
                let newname = file.Directory.FullName+"\\"+cmdops.Prefix+filename+cmdops.Suffix+file.Extension
                System.IO.File.Move(file.FullName, newname)
                printfn "%s -> %s" file.FullName newname)
       with e -> printfn "%s" e.InnerException.Message

   Console.WriteLine("Press any key to continue...")
   Console.ReadKey()

main()
</pre>
<p>Of course, the options available in this application (on file name changes) are pretty limited, but that can be extended at will.</p>
]]></content:encoded>
			<wfw:commentRss>http://mariusbancila.ro/blog/2008/05/06/applying-file-transformations-with-f/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Parallelization in F#</title>
		<link>http://mariusbancila.ro/blog/2008/04/30/parallelization-in-f/</link>
		<comments>http://mariusbancila.ro/blog/2008/04/30/parallelization-in-f/#comments</comments>
		<pubDate>Wed, 30 Apr 2008 21:20:01 +0000</pubDate>
		<dc:creator>Marius Bancila</dc:creator>
				<category><![CDATA[F#]]></category>
		<category><![CDATA[Parallel Programming]]></category>

		<guid isPermaLink="false">http://mariusbancila.ro/blog/?p=120</guid>
		<description><![CDATA[In my last post I was writing about parallelizing loops with Parallel.For in C#. Today I though it would be nice to try that in F#. So, here is the benchmarking of the matrix multiplication and the bubblesort algorithm in F4. Matrices Multiplication I started with a create_matrix function that creates and randomly initializes a [...]]]></description>
			<content:encoded><![CDATA[<p>In my last post I was writing about parallelizing loops with Parallel.For in C#. Today I though it would be nice to try that in F#. So, here is the benchmarking of the matrix multiplication and the bubblesort algorithm in F4.</p>
<h3>Matrices Multiplication</h3>
<p>I started with a create_matrix function that creates and randomly initializes a matrix of doubles.</p>
<pre class="prettyprint">
let create_matrix rows columns =
   let rnd = System.Random()
   Array2.init rows columns (fun i j -> rnd.NextDouble())
</pre>
<p>Sequential multiplication could look like this:</p>
<pre class="prettyprint">
let multiply_sequential (m1:float[,]) (m2:float[,]) =
   let rows1 = Array2.length1 m1
   let cols1 = Array2.length2 m1
   let rows2 = Array2.length1 m2
   let cols2 = Array2.length2 m2
   let result = Array2.create rows1 cols2 0.0

   if(cols1 <> rows2) then
      failwith "Matrices size incorrect for multiplication!"

   for i = 0 to rows1-1 do
      for j = 0 to cols2-1 do
         for k = 0 to cols1-1 do
            result.[i,j] <- result.[i,j] + m1.[i,k] * m2.[k,j]
         done
      done
   done
   result
</pre>
<p>Parallelizing it only implies replacing the outer loop with Parallel.For.</p>
<pre class="prettyprint">
let multiply_parallel (m1:float[,]) (m2:float[,]) =
   let rows1 = Array2.length1 m1
   let cols1 = Array2.length2 m1
   let rows2 = Array2.length1 m2
   let cols2 = Array2.length2 m2
   let result = Array2.create rows1 cols2 0.0

   if(cols1 <> rows2) then
      failwith "Matrices size incorrect for multiplication!"

   Parallel.For(0, rows1, (fun i->
      for j = 0 to cols2-1 do
         for k = 0 to cols1-1 do
            result.[i,j] <- result.[i,j] + m1.[i,k] * m2.[k,j]))
   result
</pre>
<p>We can test those function and get the same output as I had in my previous post with this:</p>
<pre class="prettyprint">
let main() =
   let step = 100
   let size = ref step
   while (!size <= step*10) do
      let m1 = create_matrix !size !size
      let m2 = create_matrix !size !size
      printfn "Matrices size: %dx%d" !size !size

      printf "Sequential...\t"
      let starts = DateTime.Now
      let ms = multiply_sequential m1 m2
      printfn "%a" output_any (DateTime.Now - starts)

      printf "Parallel...\t"
      let startp = DateTime.Now
      let ms = multiply_parallel m1 m2
      printfn "%a" output_any (DateTime.Now - startp)

      size := !size + step
   done   

main()
</pre>
<p>Before running, you have to make sure you add System.Threading.dll to the referred assemblies. And since this one depends on System.Core.dll, you also have to add this one. In you are using Visual Studio and a F# project, you can add the two references from the project properties.</p>
<blockquote><p>-r c:\Windows\assembly\GAC_MSIL\System.Core\3.5.0.0__b77a5c561934e089\System.Core.dll -r "C:\Program Files\Microsoft Parallel Extensions Dec07 CTP\System.Threading.dll"</p></blockquote>
<p>The results are shown below:</p>
<pre class="console">
Matrices size: 100x100
Sequential...   00:00:00.1250000
Parallel...     00:00:00.1250000
Matrices size: 200x200
Sequential...   00:00:00.9218750
Parallel...     00:00:00.6093750
Matrices size: 300x300
Sequential...   00:00:03.1093750
Parallel...     00:00:01.9375000
Matrices size: 400x400
Sequential...   00:00:07.5000000
Parallel...     00:00:04.7343750
Matrices size: 500x500
Sequential...   00:00:15.1562500
Parallel...     00:00:09.3125000
Matrices size: 600x600
Sequential...   00:00:25.7031250
Parallel...     00:00:16.5312500
Matrices size: 700x700
Sequential...   00:00:41.9843750
Parallel...     00:00:26.4375000
Matrices size: 800x800
Sequential...   00:01:03.5781250
Parallel...     00:00:39.8281250
Matrices size: 900x900
Sequential...   00:01:32.1093750
Parallel...     00:00:57.3125000
Matrices size: 1000x1000
Sequential...   00:02:07.0468750
Parallel...     00:01:18.9687500
</pre>
<h3>Array Sorting</h3>
<p>First, I created a function, create_array, that creates and randomly initializes an array of doubles.</p>
<pre class="prettyprint">
let create_array size =
    let rnd = new Random()
    let arr = Array.create size 0.0
    for i = 0 to arr.Length-1 do
        arr.(i) <- rnd.NextDouble()
    arr
</pre>
<p>The sequential bubblesort implementation is quite straight forward, of course.</p>
<pre class="prettyprint">
let bubblesort_seq (arr : double array) =
    for i = 0 to arr.Length-1 do
        for j = 0 to arr.Length-1 do
            if (arr.(i).CompareTo(arr.(j)) < 0) then
                let temp = arr.(j)
                arr.(j) <- arr.(i)
                arr.(i) <- temp
    arr
</pre>
<p>Parallelizing it, again, only means replacing the outer for loop with Parallel.For.</p>
<pre class="prettyprint">
let bubblesort_parallel (arr : double array) =
    Parallel.For(0, arr.Length, (fun i ->
            for j = 0 to arr.Length-1 do
                if (arr.(i).CompareTo(arr.(j)) < 0) then
                    let temp = arr.(j)
                    arr.(j) <- arr.(i)
                    arr.(i) <- temp))

    arr
</pre>
<p>And this is how the two functions were used:</p>
<pre class="prettyprint">
let main()=
    let step = 5000
    let size = ref step
    while (!size <= step*10) do
        let arr = create_array !size
        printfn "Array size: %d" arr.Length

        printf "Sequential...\t"
        let starts = DateTime.Now
        let arrs = bubblesort_seq arr
        printfn "%a" output_any (DateTime.Now - starts)

        printf "Parallel...\t"
        let startp = DateTime.Now
        let arrp = bubblesort_parallel arr
        printfn "%a" output_any (DateTime.Now - startp)

        size := !size + step
    done

main()
</pre>
<p>The output for the program is:</p>
<pre class="console">
Array size: 5000
Sequential...   00:00:00.2343750
Parallel...     00:00:00.1562500
Array size: 10000
Sequential...   00:00:00.8593750
Parallel...     00:00:00.5156250
Array size: 15000
Sequential...   00:00:01.9531250
Parallel...     00:00:01.1718750
Array size: 20000
Sequential...   00:00:03.3125000
Parallel...     00:00:02.1562500
Array size: 25000
Sequential...   00:00:05.4062500
Parallel...     00:00:03.5312500
Array size: 30000
Sequential...   00:00:07.4062500
Parallel...     00:00:05.0312500
Array size: 35000
Sequential...   00:00:10.6562500
Parallel...     00:00:06.8906250
Array size: 40000
Sequential...   00:00:13.2343750
Parallel...     00:00:08.9375000
Array size: 45000
Sequential...   00:00:17.6406250
Parallel...     00:00:11.4687500
Array size: 50000
Sequential...   00:00:20.8281250
Parallel...     00:00:14.2187500
</pre>
<p>If you compare the output with the one from C#, you'll notice that the times are smaller. It looks like F# is faster than C#. Of course it can get faster if I replace the call to CompareTo() with operator <.</p>
<pre class="prettyprint">
if (arr.(i).CompareTo(arr.(j)) < 0) then
</pre>
<pre class="prettyprint">
if (arr.(i) < arr.(j)) then
</pre>
<p>In this case the results look like this:</p>
<pre class="console">
Array size: 5000
Sequential...   00:00:00.1093750
Parallel...     00:00:00.1093750
Array size: 10000
Sequential...   00:00:00.4843750
Parallel...     00:00:00.2343750
Array size: 15000
Sequential...   00:00:01.1093750
Parallel...     00:00:00.4687500
Array size: 20000
Sequential...   00:00:01.9062500
Parallel...     00:00:00.8437500
Array size: 25000
Sequential...   00:00:03.0156250
Parallel...     00:00:01.2500000
Array size: 30000
Sequential...   00:00:04.3437500
Parallel...     00:00:01.8906250
Array size: 35000
Sequential...   00:00:05.9062500
Parallel...     00:00:02.4375000
Array size: 40000
Sequential...   00:00:07.7656250
Parallel...     00:00:03.3593750
Array size: 45000
Sequential...   00:00:09.8281250
Parallel...     00:00:04.0312500
Array size: 50000
Sequential...   00:00:12.2031250
Parallel...     00:00:05.2343750
</pre>
]]></content:encoded>
			<wfw:commentRss>http://mariusbancila.ro/blog/2008/04/30/parallelization-in-f/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>ParallelFX Saves!</title>
		<link>http://mariusbancila.ro/blog/2008/04/29/parallelfx-saves/</link>
		<comments>http://mariusbancila.ro/blog/2008/04/29/parallelfx-saves/#comments</comments>
		<pubDate>Tue, 29 Apr 2008 12:12:38 +0000</pubDate>
		<dc:creator>Marius Bancila</dc:creator>
				<category><![CDATA[C#]]></category>
		<category><![CDATA[CTPs & Betas]]></category>
		<category><![CDATA[Parallel Programming]]></category>

		<guid isPermaLink="false">http://mariusbancila.ro/blog/?p=119</guid>
		<description><![CDATA[After the MVP Summit in Seattle, I started to dig into the Parallel FX framework (currently under a CTP available here). In just a few words, the framework is composed of: Task Parallel Library (TPL), that provides means to manage task (i.e. units of execution), and exposes a TPL API, represented by the static function [...]]]></description>
			<content:encoded><![CDATA[<p>After the MVP Summit in Seattle, I started to dig into the Parallel FX framework (currently under a CTP available <a href="http://www.microsoft.com/downloads/details.aspx?FamilyID=e848dc1d-5be3-4941-8705-024bc7f180ba&#038;displaylang=en" target="_blank">here</a>). In just a few words, the framework is composed of:</p>
<ul>
<li>Task Parallel Library (TPL), that provides means to manage task (i.e. units of execution), and exposes a</li>
<li>TPL API, represented by the static function in the Parallel class: For, ForEach and Do</li>
<li>Parallel LINQ (PLINQ): allows parallelization of LINQ queries (AsParallel)
</ul>
<p>Since I have a 2 codes CPU (AMP Athlon 64 X2 Dual) I wanted to see how much performance do I actually gain by parallelizing loops. So I benchmarked matrices multiplication and arrays sorting.</p>
<h3>Matrices Multiplication</h3>
<p>Having this Matrix class:</p>
<pre class="prettyprint">
class Matrix< T >
{
    private T[,] _values;

    public int Rows { get; private set; }
    public int Columns { get; private set; }

    public T this[int row, int column]
    {
        get { return _values[row, column]; }
        set { _values[row, column] = value; }
    }

    public Matrix(int rows, int columns)
    {
        if (rows < 1) throw new ArgumentOutOfRangeException("rows");
        if (columns < 1) throw new ArgumentOutOfRangeException("columns");
        Rows = rows;
        Columns = columns;
        _values = new T[rows, columns];
    }
}
</pre>
<p>I could write the multiplication routine like this:</p>
<pre class="prettyprint">
public static Matrix< double > MultiplySequential(Matrix< double> m1, Matrix< double > m2)
{
    if (m1.Columns != m2.Rows)
         throw new ArgumentException("incorrect matrix size", "m2");

    Matrix< double > result = new Matrix< double >(m1.Rows, m2.Columns);
    for (int i = 0; i < m1.Rows; i++)
    {
        for (int j = 0; j < m2.Columns; j++)
        {
            result[i, j] = 0;
            for (int k = 0; k < m1.Columns; k++)
            {
                result[i, j] += m1[i, k] * m2[k, j];
            }
         }
    }
    return result;
}
</pre>
<p>Modifying the function to use Parallel.For was pretty simple, since this is very straight forward:</p>
<pre class="prettyprint">
public static Matrix< double > MultiplyParallel(Matrix< double> m1, Matrix< double > m2)
{
    if (m1.Columns != m2.Rows)
        throw new ArgumentException("incorrect matrix size", "m2");

    Matrix< double > result = new Matrix< double >(m1.Rows, m2.Columns);
    Parallel.For(
        0,
        m1.Rows,
        i =>
        {
            for (int j = 0; j < m2.Columns; j++)
            {
                result[i, j] = 0;
                for (int k = 0; k < m1.Columns; k++)
                {
                    result[i, j] += m1[i, k] * m2[k, j];
                }
            }
        });

    return result;
}
</pre>
<p>I used matrices of various sizes, started from 100x100 to 1000x1000 and compared the execution time for the sequential version or the parallel version.</p>
<pre class="prettyprint">
for(int SIZE = 100; SIZE <= 1000; SIZE += 100)
{
    Console.WriteLine("Matrices size: {0}x{0}", SIZE);

    Matrix< double > m1 = MatrixOperations.GenerateRandom(SIZE, SIZE);
    Matrix< double > m2 = MatrixOperations.GenerateRandom(SIZE, SIZE);

    Console.Write("Sequential...\t");
    DateTime starts = DateTime.Now;
    Matrix< double > ms = MatrixOperations.MultiplySequential(m1, m2);
    Console.WriteLine("{0}", DateTime.Now - starts);

    Console.Write("Parallel...\t");
    DateTime startp = DateTime.Now;
    Matrix< double > mp = MatrixOperations.MultiplyParallel(m1, m2);
    Console.WriteLine("{0}", DateTime.Now - startp);
}
</pre>
<p>The results were:</p>
<pre class="console">
Matrices size: 100x100
Sequential...   00:00:00.0625000
Parallel...     00:00:00.0937500
Matrices size: 200x200
Sequential...   00:00:00.4531250
Parallel...     00:00:00.2343750
Matrices size: 300x300
Sequential...   00:00:01.4843750
Parallel...     00:00:00.7500000
Matrices size: 400x400
Sequential...   00:00:03.7812500
Parallel...     00:00:02.0625000
Matrices size: 500x500
Sequential...   00:00:07.8750000
Parallel...     00:00:04.1406250
Matrices size: 600x600
Sequential...   00:00:14.3437500
Parallel...     00:00:07.2656250
Matrices size: 700x700
Sequential...   00:00:23.4531250
Parallel...     00:00:11.7812500
Matrices size: 800x800
Sequential...   00:00:36.1250000
Parallel...     00:00:18.4218750
Matrices size: 900x900
Sequential...   00:00:51.5312500
Parallel...     00:00:25.8906250
Matrices size: 1000x1000
Sequential...   00:01:11.5156250
Parallel...     00:00:35.6875000
</pre>
<p>They show that the time for executing the parallel version was only half the time for the sequential multiplication (remember that I have 2 cores). That means reducing the execution time, not 20 or 30%, but 50%, basically the maximum possible.</p>
<h3>Array Sorting</h3>
<p>For sorting the arrays I tested with the wost sorting algorithm: bubblesort. Here is the implementation:</p>
<pre class="prettyprint">
interface ISort< T >
{
    void SortSequential(T[] array);
    void SortParallel(T[] array);
}

class BubbleSort< T > : ISort< T >
    where T: IComparable
{
    public void SortSequential(T[] array)
    {
        for(int i = 0; i < array.Length; ++i)
        {
            for(int j = 0; j < array.Length; ++j)
            {
                if(array[i].CompareTo(array[j]) < 0)
                {
                    T aux = array[i];
                    array[i] = array[j];
                    array[j] = aux;
                }
            }
        }
    }

    public void SortParallel(T[] array)
    {
        Parallel.For(
            0,
            array.Length,
            i =>
            {
                for (int j = 0; j < array.Length; ++j)
                {
                    if (array[i].CompareTo(array[j]) < 0)
                    {
                        T aux = array[i];
                        array[i] = array[j];
                        array[j] = aux;
                    }
                }
            });
    }
}
</pre>
<p>First, I tested that with small array, 100 to 1000 elements.</p>
<pre class="prettyprint">
int STEP = 100;
for (int SIZE = STEP; SIZE <= STEP*20; SIZE += STEP)
{
    double[] array = GenerateRandomDoubles(SIZE);

    ISort< double > sorter = new BubbleSort< double >();

    Console.WriteLine("Array size: {0}", SIZE);

    Console.Write("Sequential...\t");
    DateTime starts = DateTime.Now;
    sorter.SortSequential(array);
    Console.WriteLine("{0}", DateTime.Now - starts);

    Console.Write("Parallel...\t");
    DateTime startp = DateTime.Now;
    sorter.SortParallel(array);
    Console.WriteLine("{0}", DateTime.Now - startp);
}
</pre>
<p>And the results were:</p>
<pre class="console">
Array size: 100
Sequential...   00:00:00
Parallel...     00:00:00.0781250
Array size: 200
Sequential...   00:00:00
Parallel...     00:00:00
Array size: 300
Sequential...   00:00:00
Parallel...     00:00:00
Array size: 400
Sequential...   00:00:00
Parallel...     00:00:00
Array size: 500
Sequential...   00:00:00.0156250
Parallel...     00:00:00
Array size: 600
Sequential...   00:00:00
Parallel...     00:00:00.0156250
Array size: 700
Sequential...   00:00:00
Parallel...     00:00:00.0156250
Array size: 800
Sequential...   00:00:00.0156250
Parallel...     00:00:00
Array size: 900
Sequential...   00:00:00.0156250
Parallel...     00:00:00.0156250
Array size: 1000
Sequential...   00:00:00.0312500
Parallel...     00:00:00.0156250
Array size: 1100
Sequential...   00:00:00.0156250
Parallel...     00:00:00.0156250
Array size: 1200
Sequential...   00:00:00.0312500
Parallel...     00:00:00.0312500
Array size: 1300
Sequential...   00:00:00.0312500
Parallel...     00:00:00.0156250
Array size: 1400
Sequential...   00:00:00.0468750
Parallel...     00:00:00.0312500
Array size: 1500
Sequential...   00:00:00.0468750
Parallel...     00:00:00.0312500
Array size: 1600
Sequential...   00:00:00.0468750
Parallel...     00:00:00.0312500
Array size: 1700
Sequential...   00:00:00.0625000
Parallel...     00:00:00.0468750
Array size: 1800
Sequential...   00:00:00.0625000
Parallel...     00:00:00.0468750
Array size: 1900
Sequential...   00:00:00.0781250
Parallel...     00:00:00.0468750
Array size: 2000
Sequential...   00:00:00.0937500
Parallel...     00:00:00.0468750
</pre>
<p>Then, I changed the for loop, so that it creates array from 5000 to 50000 elements. The results were:</p>
<pre class="console">
Array size: 5000
Sequential...   00:00:00.5312500
Parallel...     00:00:00.4062500
Array size: 10000
Sequential...   00:00:02.0468750
Parallel...     00:00:01.2500000
Array size: 15000
Sequential...   00:00:04.5781250
Parallel...     00:00:02.7187500
Array size: 20000
Sequential...   00:00:08.1562500
Parallel...     00:00:04.7187500
Array size: 25000
Sequential...   00:00:12.7343750
Parallel...     00:00:07.5625000
Array size: 30000
Sequential...   00:00:18.5468750
Parallel...     00:00:10.6406250
Array size: 35000
Sequential...   00:00:25.4687500
Parallel...     00:00:14.5468750
Array size: 40000
Sequential...   00:00:33.6562500
Parallel...     00:00:18.9531250
Array size: 45000
Sequential...   00:00:42.7968750
Parallel...     00:00:23.8906250
Array size: 50000
Sequential...   00:00:52.9218750
Parallel...     00:00:29.4687500
</pre>
<p>The  conclusion is that the parallel sorting is not efficient when the array size is less than 1000 elements; in other words there is no performance gain, but perhaps performance lost due to overhead of creating tasks and executing them and managing the context switching. The same conclusion is expressed in the framework's documentation:</p>
<p><cite>Target areas of your program where algorithms are computationally expensive and/or data sets are large (e.g. > 1000 elements). In such cases, there will likely be benefit to using parallelism.</cite></p>
<p>That means, that when you parallelize an algorithm, you should take into consideration the size of the data structures it performs on, and unless a given threshold is not reached, a non-parallelized version should be used.</p>
]]></content:encoded>
			<wfw:commentRss>http://mariusbancila.ro/blog/2008/04/29/parallelfx-saves/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
	</channel>
</rss>
