One of the new features in .NET 4.0 is the BigInteger class. This was suppose to be part of 3.5 framework, but after the CTP it was gone. Now it’s back. This big integer is a really large integer representation. However, I was not able to find the precision documented anywhere.

The class is available in the System.Numerics namespace and provides overloaded operators, parsing methods and other utility features for bit integer numbers.

Not very long ago, on codexpert.ro we launched a contest for finding the biggest pair of amicable numbers. One of the members (crystyce) that submited a solution found this big pair (2724918040393706557785752240819405848576, 2724918040396184856306258038787235905536) with a C++ implementation using a 3rd party BigInt class. I decided to put that in C# 4.0 and use BigInteger. Maybe you can find things to optimize in the below code, but that’s not the point here. I’m showing you this code just to get a feeling of BitInteger.

using System.Numerics;

namespace cs_test
{
   static class Amicable
   {
      public static BigInteger Pow2(BigInteger exp)
      {
         BigInteger res = 1;

         for (BigInteger i = 0; i < exp; i++)
            res *= 2;

         return res;
      }

      public static BigInteger ExpMod(BigInteger value, BigInteger exp, BigInteger mod)
      {
         BigInteger ullResult = 1;
         BigInteger ullValue = value;

         while (exp > 0)
         {
            if (exp % 2 != 0)
            { // odd
               ullResult *= ullValue;
               ullResult %= mod;
            }

            ullValue *= ullValue;
            ullValue %= mod;
            exp /= 2;
         }

         return (ullResult);
      }

      static public bool IsMillerRabinPrime(BigInteger prime)
      {
         // randomWitness = witness that the "prime" is NOT composite
         // 1 < randomWitness < prime-1
         long totalWitness = 15;
         BigInteger [] randomWitness = { 2, 3, 5, 7, 11, 13, 17, 19, 23, 29, 31, 37, 41, 43, 47, 51001, 351011};
         BigInteger primeMinusOne = prime - 1;
         BigInteger oddMultiplier;
         long bitLength;
         long i, j;
         BigInteger result;
         // find oddMultiplier, defined as "prime - 1 = (2^B) * M"
         // get bitLength (B) and find the oddMultiplier (M)

         // init value multiplier
         oddMultiplier = primeMinusOne;

         bitLength = 0; // reset
         while(oddMultiplier % 2 == 0)
         {
            oddMultiplier /= 2;
            bitLength++;
         }
         for(i = 0; i < totalWitness; i++)
         {
            if (randomWitness[i] == prime)
               return true;

            // init value of result = (randomWitness ^ oddMultiplier) mod prime
            result = ExpMod(randomWitness[i], oddMultiplier, prime);

            // is y = 1 ?
            if(result == 1)
               continue; // maybe prime

            // is y = prime-1 ?
            if(result == primeMinusOne)
               continue; // maybe prime

            // loop to get AT LEAST one "result = primeMinusOne"
            for(j = 1; j <= bitLength; j++)
            {
               // square of result
               result = ExpMod(result, 2, prime);

               // is result = primeMinusOne ?
               if(result == primeMinusOne)
                  break; // maybe prime
            }

            if(result != primeMinusOne)
               return false; // COMPOSITE
         }

         // it may be pseudoprime/prime
         return true;
      }

   }

   class Program
   {
      const int STARTM = 1;
      const int STOPM = 100;
      const int STOPN = 30;

      static void Main(string[] args)
      {
         DateTime start = DateTime.Now;

         for (BigInteger m = STARTM ; m <= STOPM ; m++)
         {
            for (BigInteger n = m+1 ; n <= m+STOPN ; n++)
            {
               BigInteger p = Amicable.Pow2(n) + Amicable.Pow2(m) - 1;
               if (!Amicable.IsMillerRabinPrime(p)) continue;

               BigInteger q = Amicable.Pow2(2 * n - m) + Amicable.Pow2(n) - 1;
               if (!Amicable.IsMillerRabinPrime(q)) continue;

               BigInteger r = Amicable.Pow2(3 * n - m) +
                              Amicable.Pow2(n + m) +
                              Amicable.Pow2(2 * n + 1) - 1;
               if (!Amicable.IsMillerRabinPrime(r)) continue;

               BigInteger f1 = Amicable.Pow2(n) * p * q;
               BigInteger f2 = Amicable.Pow2(n) * r;

               DateTime end = DateTime.Now;
               Console.WriteLine(
                  "{2} ... {0}, {1}",
                  f1,
                  f2,
                  (end - start).Milliseconds / 1000.0);
            }
         }
      }
   }
}
0.017 ... 220, 284
0.021 ... 2172649216, 2181168896
0.028 ... 17296, 18416
0.040 ... 9363584, 9437056
0.138 ... 2.7249180403937065577857522408E+39, 2.7249180403961848563062580388E+39

As you can see, it was very fast finding the same big pair. Well it was at least on my 2 core machine.

The only thing I don't understand about the implementation is the presence of some helper functions in the BigInteger class.

public static BigInteger Abs(BigInteger value);
public static BigInteger Add(BigInteger left, BigInteger right);
public static BigInteger GreatestCommonDivisor(BigInteger left, BigInteger right);
public static BigInteger Divide(BigInteger dividend, BigInteger divisor);
public static BigInteger DivRem(BigInteger dividend, BigInteger divisor, out BigInteger remainder);
public static double Log(BigInteger value);
public static double Log(BigInteger value, double baseValue);
public static double Log10(BigInteger value);
public static BigInteger Max(BigInteger left, BigInteger right);
public static BigInteger Min(BigInteger left, BigInteger right);
public static BigInteger ModPow(BigInteger value, BigInteger exponent, BigInteger modulus);
public static BigInteger Multiply(BigInteger left, BigInteger right);
public static BigInteger Negate(BigInteger value);
public static BigInteger Pow(BigInteger value, int exponent);
public static BigInteger Remainder(BigInteger dividend, BigInteger divisor);
public static BigInteger Subtract(BigInteger left, BigInteger right);

That belongs to System.Math, or maybe to a System.Numerics.Math class. They are all helper functions, not specific to the implementation of the big integer; and the implementation does not depend on the existing of these functions. So they are clearly utility functions. They should go somewhere else. I have submitted this as feedback to Microsoft Connect. If you agree with it, vote for it here. Of course, if you have arguments for the current implementation, you can vote against it too.

, , , Hits for this post: 11482 .

So far, C#, unlike C++, did not support optional arguments. For instance, suppose you need a function to print log a message, that can add a new line or not after writing the message. Most of the times you want a new line, so you don’t want to specify that for most of the calls. Until now, the only possibility was using overloaded functions, with different parameters.

class Logger
{
   public void LogMessage(string error, bool addNewLine)
   {
      Console.Write(error);
      if (addNewLine) Console.WriteLine();
   }

   public void LogMessage(string error)
   {
      LogMessage(error, true);
   }
};

class Program
{
   static void Main(string[] args)
   {
      Logger log = new Logger();
      log.LogMessage("trying to connect...", false);
      // do more stuff here
      log.LogMessage("SUCCESS");
   }
}
trying to connect...SUCCESS
Press any key to continue . . .

C# 4.0, that will be released with Visual Studio 2010, and is for now available with the Visual Studio 2010 CTP, supports, just like C++, optional arguments. So instead of using overloaded functions, you can specify a default value for a parameter. When doing the call, if you don’t provide a value for the argument, the default one will be used.

The Logger class below is identical from the functionality point of view with the one above. (The previous Main() function doesn’t have to change.)

class Logger
{
   public void LogMessage(string error, bool addNewLine = true)
   {
      Console.Write(error);
      if (addNewLine) Console.WriteLine();
   }
};

The only restriction is that optional parameters have to appear in the function after all required parameters. In other words, this is not legal:

class foo
{
   public void func(int a, int b = 0, int c)
   {
   }
}
error CS1737: Optional parameters must appear after all required parameters

Considering this implementation of foo

class foo
{
   public void func(int a, int b = 0, int c = 1)
   {
   }
}

the following calls can be made:

var f = new foo();
f.func(42, 3, 4);          // f.func(42, 3, 4)
f.func(42);                // f.func(42, 0, 1)
f.func(42, 100);           // f.func(42, 100, 1)

But this is not all. C# 4.0 brings another feature: named parameters (and this does not exist in C++). It means that when you make a call, you can specify an argument by its name, not by position. In this case you use the parameter’s name followed by ‘:’ and the value. Here are several examples:

var f = new foo();
f.func(42, c: 4);          // f.func(42, 0, 4)
f.func(42, c: 4, b: 3);    // f.func(42, 3, 4)
f.func(c: 3, a: 1, b: 2);  // f.func(1, 2, 3)

Of course, this can be used regardless the function has optional parameters or not. However, I would not use this feature for anything else than specifying an optional parameter when I would have to first suply values for other several optional parameters. In other words:

var f = new foo();
f.func(42, c: 4);          // this is good, skip b, provide value for c
f.func(42, c: 4, b: 3);    // don't like this, rather call f.func(42, 3, 4)
f.func(c: 3, a: 1, b: 2);  // don't like this, rather call f.func(1, 2, 3)

, , , , , Hits for this post: 19282 .

The new C++0x standard provides support for type inference. The auto keyword that was doing nothing in C++ was given a new meaning: a placeholder for a type inferred by the compiler. For those familiar with C#’s var keyword, this is basically the same.

Here is a comparison between auto in C++ and var in C#.

C++ C#
int _tmain(int argc, _TCHAR* argv[])
{
   auto i = 42;
   auto s = "Hello world!";
   auto d = 12.50;
   auto l = [](int n) {return n * n;};

   cout << "i = " << i << endl;
   cout << "s = " << s << endl;
   cout << "d = " << d << endl;
   cout << "l(i) = " << l(i) << endl;

   return 0;
}
class Program
{
   static void Main(string[] args)
   {
      var i = 42;
      var s = "hello world";
      var d = 12.50;
      //var l = (n) => n * n; // error CS0815

      Console.WriteLine("i = {0}", i);
      Console.WriteLine("s = {0}", s);
      Console.WriteLine("d = {0}", d);
   }
}

with the output

C++ C#
i = 42
s = Hello world!
d = 12.5
l(i) = 1764
i = 42
s = Hello world!
d = 12.5

As you can see, the use is very similar, except that in C++ you can use auto for a lambda expression, while in C# you cannot. The C# compiler raises an error, CS0815, when you try to use var with an anonymous function expressions, a lambda expression, a method group expressions, or the null literal expression, because these don't have a type.

Just like with var, you can only use auto locally, and not as return type from a function, or parameter to a function.

auto power(int n)
{
   return n * n;
}
error C3532: 'auto (int)': incorrect usage of 'auto'
int power(auto n)
{
   return n * n;
}
error C3533: 'auto': a parameter cannot have a type that contains 'auto'

While it might not be of that much help for ints or strings, using auto for instead of vector< int >::iterator, or map< string, list< string >>::const_iterator comes in handy. The auto type inference helps us write less verbose code.

vector< string > words;
words.push_back("hello");
words.push_back("world");

for(auto it = words.begin(); it != words.end(); ++it)
   cout << *it << endl;

In addition to auto, the C++0x standard introduces a new keyword, called decltype (think of it as 'typeof'), that can be used to determine the type of an expression at compile time.

auto i = 42;
decltype(i) i2 = 44;

However, this new keyword was not yet introduced in Visual Studio 2010 CTP. According to Stephan T. Lavavej from the VC++ team, it might be possible to be introduced in a next CTP or beta version.

Implementing auto doesn't implement decltype (C++0x's name for what was previously called typeof) for free.

Paraphrasing our libraries and compiler front-end program manager Damien Watkins, the CTP is only the first look at our VC10 functionality and there is definitely more to come. We understand that certain features "go together", and that adding complementary features is a good thing in general. As always, customer feedback is a vital resource in forming our plans.

, , , , , , , , Hits for this post: 23540 .

In my previous post I have provided some background theory for evaluating expressions with abstract syntax trees. As I was mentioning, the first step towards this goal is to parse the expression, make sure it is correct syntactically. This is what I’ll show you in this post.

Having the grammar defined, we’ll make one function for each non-terminal symbol (EXP, EXP1, TERM, TERM1, FACTOR).

Simply put the code will look like this:

   void Expression()
   {
      Term();
      Expression1();
   }

   void Expression1()
   {
      switch(current_token)
      {
      case '+':
         GetNextToken();
         Term();
         Expression1();
         break;

      case '-':
         GetNextToken();
         Term();
         Expression1();
         break;
      }
   }

However, I want to make it a little bit more organized, so the first thing to do will be defining a Token structure that will indicate the type of last extracted token and if the case its value (for numbers). A token is basically a symbol extracted (one at a time) from the input text. The possible tokens will be the arithmetical operators (‘+’, ‘-’, ‘/’, ‘*’), the parentheses (‘(‘ and ‘)’), numbers and the end of the text.

Here is how I defined the token type and the token:

enum TokenType
{
   Error,
   Plus,
   Minus,
   Mul,
   Div,
   EndOfText,
   OpenParenthesis,
   ClosedParenthesis,
   Number
};

struct Token
{
   TokenType	Type;
   double		Value;
   char		Symbol;

   Token():Type(Error), Value(0), Symbol(0)
   {}
};

To be able to do the parsing, we’ll need some helper functions:

  • SkipWhitespaces(), skips all whitespaces between two tokens:
       void SkipWhitespaces()
       {
          while(isspace(m_Text[m_Index])) m_Index++;
       }
    
  • GetNextToken(), extracts the next token from the text; if an illegal token appears it throws an exception
       void GetNextToken()
       {
          // ignore white spaces
          SkipWhitespaces();
    
          m_crtToken.Value = 0;
          m_crtToken.Symbol = 0;
    
          // test for the end of text
          if(m_Text[m_Index] == 0)
          {
             m_crtToken.Type = EndOfText;
             return;
          }
    
          // if the current character is a digit read a number
          if(isdigit(m_Text[m_Index]))
          {
             m_crtToken.Type = Number;
             m_crtToken.Value = GetNumber();
             return;
          }
    
          m_crtToken.Type = Error;
    
          // check if the current character is an operator or parentheses
          switch(m_Text[m_Index])
          {
          case '+': m_crtToken.Type = Plus; break;
          case '-': m_crtToken.Type = Minus; break;
          case '*': m_crtToken.Type = Mul; break;
          case '/': m_crtToken.Type = Div; break;
          case '(': m_crtToken.Type = OpenParenthesis; break;
          case ')': m_crtToken.Type = ClosedParenthesis; break;
          }
    
          if(m_crtToken.Type != Error)
          {
             m_crtToken.Symbol = m_Text[m_Index];
             m_Index++;
          }
          else
          {
             std::stringstream sstr;
             sstr << "Unexpected token '" << m_Text[m_Index] << "' at position " << m_Index;
             throw ParserException(sstr.str(), m_Index);
          }
       }
    
  • GetNumber() extracts a number from the input text from the current position; the purpose of this tutorial is didactical, so this function is quite simple: it reads integers and doubles with '.' As the decimal point; it doesn't read numbers in a format like 123.3E+2.
       double GetNumber()
       {
          SkipWhitespaces();
    
          int index = m_Index;
          while(isdigit(m_Text[m_Index])) m_Index++;
          if(m_Text[m_Index] == '.') m_Index++;
          while(isdigit(m_Text[m_Index])) m_Index++;
    
          if(m_Index - index == 0)
             throw ParserException("Number expected but not found!", m_Index);
    
          char buffer[32] = {0};
          memcpy(buffer, &m_Text[index], m_Index - index);
    
          return atof(buffer);
       }
    

With these defined, we can build the parser for the specified grammar.

class Parser
{
   Token m_crtToken;
   const char* m_Text;
   size_t m_Index;

private:

   void Expression()
   {
      Term();
      Expression1();
   }

   void Expression1()
   {
      switch(m_crtToken.Type)
      {
      case Plus:
         GetNextToken();
         Term();
         Expression1();
         break;

      case Minus:
         GetNextToken();
         Term();
         Expression1();
         break;
      }
   }

   void Term()
   {
      Factor();
      Term1();
   }

   void Term1()
   {
      switch(m_crtToken.Type)
      {
      case Mul:
         GetNextToken();
         Factor();
         Term1();
         break;

      case Div:
         GetNextToken();
         Factor();
         Term1();
         break;
      }
   }

   void Factor()
   {
      switch(m_crtToken.Type)
      {
      case OpenParenthesis:
         GetNextToken();
         Expression();
         Match(')');
         break;

      case Minus:
         GetNextToken();
         Factor();
         break;

      case Number:
         GetNextToken();
         break;

      default:
         {
            std::stringstream sstr;
            sstr << "Unexpected token '" << m_crtToken.Symbol << "' at position " << m_Index;
            throw ParserException(sstr.str(), m_Index);
         }
      }
   }

   void Match(char expected)
   {
      if(m_Text[m_Index-1] == expected)
         GetNextToken();
      else
      {
         std::stringstream sstr;
         sstr << "Expected token '" << expected << "' at position " << m_Index;
         throw ParserException(sstr.str(), m_Index);
      }
   }

   void SkipWhitespaces()
   {
      while(isspace(m_Text[m_Index])) m_Index++;
   }

   void GetNextToken()
   {
      // ignore white spaces
      SkipWhitespaces();

      m_crtToken.Value = 0;
      m_crtToken.Symbol = 0;

      // test for the end of text
      if(m_Text[m_Index] == 0)
      {
         m_crtToken.Type = EndOfText;
         return;
      }

      // if the current character is a digit read a number
      if(isdigit(m_Text[m_Index]))
      {
         m_crtToken.Type = Number;
         m_crtToken.Value = GetNumber();
         return;
      }

      m_crtToken.Type = Error;

      // check if the current character is an operator or parentheses
      switch(m_Text[m_Index])
      {
      case '+': m_crtToken.Type = Plus; break;
      case '-': m_crtToken.Type = Minus; break;
      case '*': m_crtToken.Type = Mul; break;
      case '/': m_crtToken.Type = Div; break;
      case '(': m_crtToken.Type = OpenParenthesis; break;
      case ')': m_crtToken.Type = ClosedParenthesis; break;
      }

      if(m_crtToken.Type != Error)
      {
         m_crtToken.Symbol = m_Text[m_Index];
         m_Index++;
      }
      else
      {
         std::stringstream sstr;
         sstr << "Unexpected token '" << m_Text[m_Index] << "' at position " << m_Index;
         throw ParserException(sstr.str(), m_Index);
      }
   }

   double GetNumber()
   {
      SkipWhitespaces();

      int index = m_Index;
      while(isdigit(m_Text[m_Index])) m_Index++;
      if(m_Text[m_Index] == '.') m_Index++;
      while(isdigit(m_Text[m_Index])) m_Index++;

      if(m_Index - index == 0)
         throw ParserException("Number expected but not found!", m_Index);

      char buffer[32] = {0};
      memcpy(buffer, &m_Text[index], m_Index - index);

      return atof(buffer);
   }

public:
   void Parse(const char* text)
   {
      m_Text = text;
      m_Index = 0;
      GetNextToken();

      Expression();
   }
};

The exception class is defined like this:

class ParserException : public std::exception
{
   int m_Pos;

public:
   ParserException(const std::string& message, int pos):
      std::exception(message.c_str()),
      m_Pos(pos)
   {
   }
};

As you can see, the code for the grammar production is quite simple and straight forward. Now, let's put it to the test.

void Test(const char* text)
{
   Parser parser;
   try
   {
      parser.Parse(text);
      std::cout << """ << text << ""t OK" << std::endl;
   }
   catch(ParserException& ex)
   {
      std::cout << """ << text << ""t " << ex.what() << std::endl;
   }
}

int main()
{
   Test("1+2+3+4");
   Test("1*2*3*4");
   Test("1-2-3-4");
   Test("1/2/3/4");
   Test("1*2+3*4");
   Test("1+2*3+4");
   Test("(1+2)*(3+4)");
   Test("1+(2*3)*(4+5)");
   Test("1+(2*3)/4+5");
   Test("5/(4+3)/2");
   Test("1 + 2.5");
   Test("125");
   Test("-1");
   Test("-1+(-2)");
   Test("-1+(-2.0)");

   Test("   1*2,5");
   Test("   1*2.5e2");
   Test("M1 + 2.5");
   Test("1 + 2&5");
   Test("1 * 2.5.6");
   Test("1 ** 2.5");
   Test("*1 / 2.5");

   return 0;
}

The output for this testing program is:

"1+2+3+4"        OK
"1*2*3*4"        OK
"1-2-3-4"        OK
"1/2/3/4"        OK
"1*2+3*4"        OK
"1+2*3+4"        OK
"(1+2)*(3+4)"    OK
"1+(2*3)*(4+5)"  OK
"1+(2*3)/4+5"    OK
"5/(4+3)/2"      OK
"1 + 2.5"        OK
"125"    OK
"-1"     OK
"-1+(-2)"        OK
"-1+(-2.0)"      OK
"   1*2,5"       Unexpected token ',' at position 6
"   1*2.5e2"     Unexpected token 'e' at position 8
"M1 + 2.5"       Unexpected token 'M' at position 0
"1 + 2&5"        Unexpected token '&' at position 5
"1 * 2.5.6"      Unexpected token '.' at position 7
"1 ** 2.5"       Unexpected token '*' at position 4
"*1 / 2.5"       Unexpected token '*' at position 1

Which is exactly what we expected: it validates correct expressions and throws an exception when the exception is incorrect.

In the next post I'll show how to modify this code to build an abstract syntax tree.

, , , , , , , Hits for this post: 23696 .

This is the record number of error I ever got in a project: 50917.

After compiling that VC# project (with T4), Visual Studio 2008 was a little bit overwhelmed. It started to report out of memory errors and failed to build after fixing the error. A restart was, of course, the cure.

, Hits for this post: 14445 .

A few days ago Anders Hejlsberg delivered a presentation about the Future of C# at the Professional Developers Conveference 2008. He focused on the new things that will be available in C# 4.0:

  • dynamic typed objects
  • optional and named parameters
  • improved COM interoperability
  • co- and contra-variance

The recording of the presentation is available on Channel9 and worth watching.

A walk-through of these new features, together with several samples for dynamic typed objects, and simple variance, are also available at MSDN.

If you want to try them you can download the VS 2010 September CTP. Notice that it is available only as a VM image.

Hits for this post: 10429 .

After the MVP Summit in Seattle, I started to dig into the Parallel FX framework (currently under a CTP available here). In just a few words, the framework is composed of:

  • Task Parallel Library (TPL), that provides means to manage task (i.e. units of execution), and exposes a
  • TPL API, represented by the static function in the Parallel class: For, ForEach and Do
  • Parallel LINQ (PLINQ): allows parallelization of LINQ queries (AsParallel)

Since I have a 2 codes CPU (AMP Athlon 64 X2 Dual) I wanted to see how much performance do I actually gain by parallelizing loops. So I benchmarked matrices multiplication and arrays sorting.

Matrices Multiplication

Having this Matrix class:

class Matrix< T >
{
    private T[,] _values;

    public int Rows { get; private set; }
    public int Columns { get; private set; }

    public T this[int row, int column]
    {
        get { return _values[row, column]; }
        set { _values[row, column] = value; }
    }

    public Matrix(int rows, int columns)
    {
        if (rows < 1) throw new ArgumentOutOfRangeException("rows");
        if (columns < 1) throw new ArgumentOutOfRangeException("columns");
        Rows = rows;
        Columns = columns;
        _values = new T[rows, columns];
    }
}

I could write the multiplication routine like this:

public static Matrix< double > MultiplySequential(Matrix< double> m1, Matrix< double > m2)
{
    if (m1.Columns != m2.Rows)
         throw new ArgumentException("incorrect matrix size", "m2");

    Matrix< double > result = new Matrix< double >(m1.Rows, m2.Columns);
    for (int i = 0; i < m1.Rows; i++)
    {
        for (int j = 0; j < m2.Columns; j++)
        {
            result[i, j] = 0;
            for (int k = 0; k < m1.Columns; k++)
            {
                result[i, j] += m1[i, k] * m2[k, j];
            }
         }
    }
    return result;
}

Modifying the function to use Parallel.For was pretty simple, since this is very straight forward:

public static Matrix< double > MultiplyParallel(Matrix< double> m1, Matrix< double > m2)
{
    if (m1.Columns != m2.Rows)
        throw new ArgumentException("incorrect matrix size", "m2");

    Matrix< double > result = new Matrix< double >(m1.Rows, m2.Columns);
    Parallel.For(
        0,
        m1.Rows,
        i =>
        {
            for (int j = 0; j < m2.Columns; j++)
            {
                result[i, j] = 0;
                for (int k = 0; k < m1.Columns; k++)
                {
                    result[i, j] += m1[i, k] * m2[k, j];
                }
            }
        });

    return result;
}

I used matrices of various sizes, started from 100x100 to 1000x1000 and compared the execution time for the sequential version or the parallel version.

for(int SIZE = 100; SIZE <= 1000; SIZE += 100)
{
    Console.WriteLine("Matrices size: {0}x{0}", SIZE);

    Matrix< double > m1 = MatrixOperations.GenerateRandom(SIZE, SIZE);
    Matrix< double > m2 = MatrixOperations.GenerateRandom(SIZE, SIZE);

    Console.Write("Sequential...\t");
    DateTime starts = DateTime.Now;
    Matrix< double > ms = MatrixOperations.MultiplySequential(m1, m2);
    Console.WriteLine("{0}", DateTime.Now - starts);

    Console.Write("Parallel...\t");
    DateTime startp = DateTime.Now;
    Matrix< double > mp = MatrixOperations.MultiplyParallel(m1, m2);
    Console.WriteLine("{0}", DateTime.Now - startp);
}

The results were:

Matrices size: 100x100
Sequential...   00:00:00.0625000
Parallel...     00:00:00.0937500
Matrices size: 200x200
Sequential...   00:00:00.4531250
Parallel...     00:00:00.2343750
Matrices size: 300x300
Sequential...   00:00:01.4843750
Parallel...     00:00:00.7500000
Matrices size: 400x400
Sequential...   00:00:03.7812500
Parallel...     00:00:02.0625000
Matrices size: 500x500
Sequential...   00:00:07.8750000
Parallel...     00:00:04.1406250
Matrices size: 600x600
Sequential...   00:00:14.3437500
Parallel...     00:00:07.2656250
Matrices size: 700x700
Sequential...   00:00:23.4531250
Parallel...     00:00:11.7812500
Matrices size: 800x800
Sequential...   00:00:36.1250000
Parallel...     00:00:18.4218750
Matrices size: 900x900
Sequential...   00:00:51.5312500
Parallel...     00:00:25.8906250
Matrices size: 1000x1000
Sequential...   00:01:11.5156250
Parallel...     00:00:35.6875000

They show that the time for executing the parallel version was only half the time for the sequential multiplication (remember that I have 2 cores). That means reducing the execution time, not 20 or 30%, but 50%, basically the maximum possible.

Array Sorting

For sorting the arrays I tested with the wost sorting algorithm: bubblesort. Here is the implementation:

interface ISort< T >
{
    void SortSequential(T[] array);
    void SortParallel(T[] array);
}

class BubbleSort< T > : ISort< T >
    where T: IComparable
{
    public void SortSequential(T[] array)
    {
        for(int i = 0; i < array.Length; ++i)
        {
            for(int j = 0; j < array.Length; ++j)
            {
                if(array[i].CompareTo(array[j]) < 0)
                {
                    T aux = array[i];
                    array[i] = array[j];
                    array[j] = aux;
                }
            }
        }
    }

    public void SortParallel(T[] array)
    {
        Parallel.For(
            0,
            array.Length,
            i =>
            {
                for (int j = 0; j < array.Length; ++j)
                {
                    if (array[i].CompareTo(array[j]) < 0)
                    {
                        T aux = array[i];
                        array[i] = array[j];
                        array[j] = aux;
                    }
                }
            });
    }
}

First, I tested that with small array, 100 to 1000 elements.

int STEP = 100;
for (int SIZE = STEP; SIZE <= STEP*20; SIZE += STEP)
{
    double[] array = GenerateRandomDoubles(SIZE);

    ISort< double > sorter = new BubbleSort< double >();

    Console.WriteLine("Array size: {0}", SIZE);

    Console.Write("Sequential...\t");
    DateTime starts = DateTime.Now;
    sorter.SortSequential(array);
    Console.WriteLine("{0}", DateTime.Now - starts);

    Console.Write("Parallel...\t");
    DateTime startp = DateTime.Now;
    sorter.SortParallel(array);
    Console.WriteLine("{0}", DateTime.Now - startp);
}

And the results were:

Array size: 100
Sequential...   00:00:00
Parallel...     00:00:00.0781250
Array size: 200
Sequential...   00:00:00
Parallel...     00:00:00
Array size: 300
Sequential...   00:00:00
Parallel...     00:00:00
Array size: 400
Sequential...   00:00:00
Parallel...     00:00:00
Array size: 500
Sequential...   00:00:00.0156250
Parallel...     00:00:00
Array size: 600
Sequential...   00:00:00
Parallel...     00:00:00.0156250
Array size: 700
Sequential...   00:00:00
Parallel...     00:00:00.0156250
Array size: 800
Sequential...   00:00:00.0156250
Parallel...     00:00:00
Array size: 900
Sequential...   00:00:00.0156250
Parallel...     00:00:00.0156250
Array size: 1000
Sequential...   00:00:00.0312500
Parallel...     00:00:00.0156250
Array size: 1100
Sequential...   00:00:00.0156250
Parallel...     00:00:00.0156250
Array size: 1200
Sequential...   00:00:00.0312500
Parallel...     00:00:00.0312500
Array size: 1300
Sequential...   00:00:00.0312500
Parallel...     00:00:00.0156250
Array size: 1400
Sequential...   00:00:00.0468750
Parallel...     00:00:00.0312500
Array size: 1500
Sequential...   00:00:00.0468750
Parallel...     00:00:00.0312500
Array size: 1600
Sequential...   00:00:00.0468750
Parallel...     00:00:00.0312500
Array size: 1700
Sequential...   00:00:00.0625000
Parallel...     00:00:00.0468750
Array size: 1800
Sequential...   00:00:00.0625000
Parallel...     00:00:00.0468750
Array size: 1900
Sequential...   00:00:00.0781250
Parallel...     00:00:00.0468750
Array size: 2000
Sequential...   00:00:00.0937500
Parallel...     00:00:00.0468750

Then, I changed the for loop, so that it creates array from 5000 to 50000 elements. The results were:

Array size: 5000
Sequential...   00:00:00.5312500
Parallel...     00:00:00.4062500
Array size: 10000
Sequential...   00:00:02.0468750
Parallel...     00:00:01.2500000
Array size: 15000
Sequential...   00:00:04.5781250
Parallel...     00:00:02.7187500
Array size: 20000
Sequential...   00:00:08.1562500
Parallel...     00:00:04.7187500
Array size: 25000
Sequential...   00:00:12.7343750
Parallel...     00:00:07.5625000
Array size: 30000
Sequential...   00:00:18.5468750
Parallel...     00:00:10.6406250
Array size: 35000
Sequential...   00:00:25.4687500
Parallel...     00:00:14.5468750
Array size: 40000
Sequential...   00:00:33.6562500
Parallel...     00:00:18.9531250
Array size: 45000
Sequential...   00:00:42.7968750
Parallel...     00:00:23.8906250
Array size: 50000
Sequential...   00:00:52.9218750
Parallel...     00:00:29.4687500

The conclusion is that the parallel sorting is not efficient when the array size is less than 1000 elements; in other words there is no performance gain, but perhaps performance lost due to overhead of creating tasks and executing them and managing the context switching. The same conclusion is expressed in the framework's documentation:

Target areas of your program where algorithms are computationally expensive and/or data sets are large (e.g. > 1000 elements). In such cases, there will likely be benefit to using parallelism.

That means, that when you parallelize an algorithm, you should take into consideration the size of the data structures it performs on, and unless a given threshold is not reached, a non-parallelized version should be used.

Hits for this post: 12991 .

There are two way of converting strings to numerical values in .NET. The first one, supported since the first version of the platform, is the use of System.Convert class. It has methods like ToInt32, ToChar, ToDouble, ToDataTime, ToDecimal, etc.

Here is an example of converting string “123″ to numerical value 123.

string text = "123";
int number = Convert.ToInt32(text);
Console.WriteLine(number);

The catch with this method (and all the others from Convert) is that an exception is thrown, if the conversion failed.

 Unhandled Exception: System.FormatException: Input string was not in a correct f
 ormat.
    at System.Number.StringToNumber(String str, NumberStyles options, NumberBuffer&
        number, NumberFormatInfo info, Boolean parseDecimal)
    at System.Number.ParseInt32(String s, NumberStyles style, NumberFormatInfo info)
    at System.Convert.ToInt32(String value)
    at vs2008_cs_test.Program.Main(String[] args) in [...]\Program.cs:line 14

The correct use of ToIn32 is:

string text = "123s";
try
{
    int number = Convert.ToInt32(text);
    Console.WriteLine(number);
}
catch(FormatException e1)
{
    Console.WriteLine(e1.Message);
}
catch(OverflowException e2)
{
    Console.WriteLine(e2.Message);
}

Probably because too many programmers failed to try-catch these ToXXX methods and run into trouble, beginning with version 2.0 of the framework, a second method of converting string to numerical values was provided. All built-in numerical types (such as char, boolean, int, double, decimal, etc.) and others (such as DateTime, TimeSpan, IPAddress, etc.) have a static method called TryParse, that takes an in parameter the string to convert and and out parameter, the parsed value, and returns a boolean value to indicate success or failure.

string text = "123";
int number;
if(int.TryParse(text, out number))
{
    Console.WriteLine(number);
}
else
{
    Console.WriteLine("Could not parse string!");
}

Between the two, the second is recommended because is less error-prone. If an error occurs during parsing, you don’t get an exception so your program won’t crash. Moreover, the out value is set to the default value (false, 0, etc.).

Hits for this post: 11401 .

Suppose you need a ListBox that has to display items with the text spread across multiple lines. With the default implementation that would look like this:

Normal listbox

The CR LF characters are displayed as squares and the item has only one line of text. To make the list box items have variable height depending on the number of lines of the text you need an owner draw ListBox.

That requires two this:

  • set the DrawMode property to OwnerDrawVariable
  • handle the MeasureItem and DrawItem event; the first one is fired when the size of the item (height and width) must be computed, and the second when the item is to be displayed
listBox1.DrawMode = System.Windows.Forms.DrawMode.OwnerDrawVariable;
listBox1.DrawItem += new System.Windows.Forms.DrawItemEventHandler(this.listBox1_DrawItem);
listBox1.MeasureItem += new System.Windows.Forms.MeasureItemEventHandler(this.listBox1_MeasureItem);

A simple implementation for these handlers is:

private void listBox1_DrawItem(object sender, DrawItemEventArgs e)
{
        e.DrawBackground();
        e.DrawFocusRectangle();
        e.Graphics.DrawString(
             (string)listBox1.Items[e.Index],
             e.Font,
             new SolidBrush(e.ForeColor),
             e.Bounds);
}

private void listBox1_MeasureItem(object sender, MeasureItemEventArgs e)
{
        e.ItemHeight = 13 * GetLinesNumber((string)listBox1.Items[e.Index]);
}

private int GetLinesNumber(string text)
{
        int count = 1;
        int pos = 0;
        while ((pos = text.IndexOf("\r\n", pos)) != -1)  { count++; pos += 2; }
        return count;
}

With those changes made the items should now look like this:
Owner-draw list box
Owner-draw list box

And that was fast and easy!

Hits for this post: 14276 .

When I delivered the LINQ presentation at the RONUA meeting in April, I was asked how does LINQ perform on big data sets. To answer that I decided to test LINQ to XML against a 100+MB file. I decided to extract three different sets of data from this XML file:

  • 1 set representing about 0.5MB of the XML file,
  • 1 set representing about 10MB of the XML file, and
  • 1 set representing about 80MB of the XML file

Of course, I designed some data structures to map on the data from the XML file and run three queries against this file that would project instances of those data structures. The result was that all the three (different) queries took about same time to execute and generate my internal objects. Each time the entire file was re-parsed. The time for each query was about 3.5 seconds. Thus, I can draw two conclusions:

  • LINQ is very performant: it took less than 12 seconds to extract 90% of the data from a 100MB file; the performance is several times greater than the one I get in C++ for parsing the file; not to mention that the code is more than several times simpler;
  • there was’t too much difference between extracting 0.5MB or 100 times that;

I am quite confident that LINQ to SQL is as performant as LINQ to XML. If I’ll find a really big data base, I will query it.

Hits for this post: 31074 .