A few days ago, Microsoft, through Jason Zander, General Manager of Visual Studio, has revealed the new look of Visual Studio 2010 at the VSLive conference. Several screenshots and description of the new look you can find on Zander’s blog.

The new editor is using WPF and makes extensive usage of .NET 4.0. Here is a screenshot.

Visual Studio 2010 new WPF look

One of the important new features is support for multiple monitors. One can now put code or designer windows onto multiple monitors.

The New Project dialog allows you to view online templates and provides search options. Moreover there is an Extension Manager that allows you to search for online templates and tools, like add-ins or macros and install them into Visual Studio.

Other features offer support for working with multiple UML digrams and will allow arhitects to enforce the intended architecture. As Doug Holland explains,

When a developer checks in code within Visual Studio 2010 to the Team Foundation Server the developers code will be parsed to ensure that the developers code complies with the layer diagram. Within the layer diagram, for example, architects can prescribe that the UI code must communicate with the business layer and not directly with the data layer. If developers use types from the data layer within the UI layer then the checkin will not be allowed to proceed.

I think that will be a quite cool feature, and I’m looking forward to testing that.

Here you can read more:

, , Hits for this post: 11430 .

The new C++ standard defines a new keyword, static_assert, that is already available in Visual Studio 2010 CTP. This new feature allows introducing compile time asserts. It takes an expression that can evaluate to bool and a string. If the expression evaluates to false, the compiler issues an error with the given string literal. If the expression evaluates to true, static_assert has no effect.

Here is an example for using static_assert. Suppose you want to create a template vector class, but you don’t want to allow vectors with a size smaller than 4. Then you can use a static assertion to enforce that.

template < class T, int Size >
class Vector
{
   static_assert(Size > 3, "Vector size is too small!");

   T m_values[Size];
};

int _tmain(int argc, _TCHAR* argv[])
{
   Vector< int, 4 > four;
   Vector< short, 2 > two;

   return 0;
}

Compiling this program triggers an error at the second declaration.

c:\projects\cpp_demo\cpp_demo.cpp(17) : error C2338: Vector size is too small!
  c:\projects\cpp_demo\cpp_demo.cpp(33) : see reference to class template instantiation 'Vector< T,Size >'
  being compiled
  with
  [
     T=short,
     Size=2
  ]

Most of the use cases for this feature, in my opinion, would be to test on the size of different types. For instance this sample is taken from the C++0x draft.

static_assert(sizeof(long) >= 8, "64-bit code generation required for this library.");

To me, static_assert looks like a niche feature. It would have been great if this could be used together with some other features to enfore compile time constraints on types. For instance, I want to restrict the possible types for a template class or function to only those that are derived from IListener (a fictive class).

template < class T >
class foo
{
   static_assert(convertible_from< IListener >(decltype(T)), "Type is not a correct type!");
};

Perhaps a future standard version will offer support for such things.

, , , , , Hits for this post: 16258 .

The new C++0x standard adds lambda expressions to the language. Visual Studio 2010 CTP is already supporting this new feature that brings functional techniques to C++ too.

What is a lambda expression? It’s basically a function. In F# it’s an anonymous function, in C# it’s an anonymous delegate; in C++ it’s in fact an anonymous function object. Whenever you create a new lambda function, the compiler creates a function object for you.

int main()
{
   auto l_pow2 = [](int n) {return n*n;};

   std::cout << "5 pow 2 = " << l_pow2(5) << std::endl;
}

That is equivalent to:

struct LambdaFunctor
{
   int operator()(int n) const
   {
      return n * n;
   }
};

int main()
{
   LambdaFunctor l_pow2;
   std::cout << "5 pow 2 = " << l_pow2(5) << std::endl;
}

Of course, the lambda functor can be more complicated, when the lambda function captures state from the local scope. However, that is beyond the scope of my post. I recommend that you read more about lambdas in C++ on the VC++ blog.

Question is, what are these lambdas good for? Well, they mostly come in handly with algorithms that take predicates (function objects) as arguments. I'll try to give you some examples in this post.

Let's first consider a filter function, that takes a sequence (vector of T) and a predicate that indicates what values should be filtered, and returns a new sequence. That would look like this:

template < class T >
std::vector< T > Filter(const std::vector< T >& sequence,
                        std::tr1::function< bool (T) > predicate)
{
   std::vector< T > result;

   for(auto it = sequence.begin(); it != sequence.end(); ++it)
      if(predicate(*it))
         result.push_back(*it);

   return result;
}

We can use this filter function to extract the odds numbers from a sequence (vector).

#include < iostream >
#include < vector >
#include < algorithm >
#include < functional >

using namespace std;

int _tmain(int argc, _TCHAR* argv[])
{
   std::vector< int > nums;
   for(int i = 0; i < 10; ++i)
      nums.push_back(i);

   // get the odds numbers
   std::vector< int > odds = Filter< int >(nums, [](int i) {return (i % 2) == 1;});

   // print the new sequence
   for_each(odds.begin(), odds.end(), [](int n){std::cout << n << std::endl;});

   return 0;
}
1
3
5
7
9

You could see in the above example that a second lambda function is used for printing the numbers to the console.

Since the Filter function is a template function it could be used with other types too. In the next example we'll see how to filter words that have at least 4 letters from a sequence of words.

#include < iostream >
#include < vector >
#include < algorithm >
#include < functional >
#include < string >

using namespace std;

int _tmain(int argc, _TCHAR* argv[])
{
   std::vector< string > snums;
   snums.push_back("one");
   snums.push_back("two");
   snums.push_back("three");
   snums.push_back("four");
   snums.push_back("five");

   // filter the words, notice the new lambda
   std::vector< string > bigwords = Filter< string >(snums, [](string w) {return w.length() > 3;});

   // print the selected words
   for_each(bigwords.begin(), bigwords.end(), [](string s){std::cout << s << std::endl;});

   return 0;
}
three
four
five

Let's consider a second example, a Find (template) function, that takes a sequence and a predicate (that checks a condition for an element), and returns the first element in the sequence for which the predicate returned true.

template < class T >
T Find(const std::vector< T >& sequence,
       std::tr1::function< bool (T) > predicate)
{
   for(auto it = sequence.begin(); it != sequence.end(); ++it)
      if(predicate(*it))
         return *it;

   throw std::runtime_error("Item not found");
}

We'll use this function to find the first element in a sequence that is greater than a given value.

int _tmain(int argc, _TCHAR* argv[])
{
   std::vector< int > nums;
   nums.push_back(1);
   nums.push_back(3);
   nums.push_back(5);
   nums.push_back(7);
   nums.push_back(9);

   int min;
   cout << "find first after: ";
   cin >> min;

   try
   {
      int val = Find< int >(odds, [min](int i){return i > min;});
      cout << val << endl;
   }
   catch(std::runtime_error& ex)
   {
      cout << ex.what() << endl;
   }

   return 0;
}

If you input 4 for instance, it will return 5. If you input 10, an exception will be thrown. You can see that this time the lambda function is [min](int i){return i > min;}. This means that it captures by value the min variable from the local scope, so that it can compare each element with that given value.

The last example I'm going to show is an accumulator function (also known as aggregate or fold). This function takes a sequence of elements, a seed (or initial value) and a function that specifies how to aggregate the elements, and returns the aggregate.

template < class TSource, class TAccumulate >
TAccumulate Aggregate(const std::vector< TSource >& sequence,
                      TAccumulate seed,
                      std::tr1::function< TAccumulate (TSource, TAccumulate) > func)
{
   TAccumulate acc = seed;
   for(auto it = sequence.begin(); it != sequence.end(); ++it)
      acc = func(acc, *it);

   return acc;
}

First, we can use it to compute the sum of all elements in a sequence.

int _tmain(int argc, _TCHAR* argv[])
{
   std::vector< int > nums;
   for(int i = 1; i <= 10; ++i)
      nums.push_back(i);

   int sum = Aggregate< int, int >(nums, 0, [](int e, int acc) {return e + acc;});
   cout << "sum = " << sum << endl;

   int prod = Aggregate< int, int >(nums, 1, [](int e, int acc) {return e * acc;});
   cout << "prod = " << prod << endl;

   return 0;
}
sum = 55
prod = 3628800

The first lambda function above sums the current element with the previous sum, which initially is given as 0. The result is 55. The second lambda function multiplies the current element with the previous product, which initially is 1. The result is 3628800.

But the Aggregate function can be used with other types too. Here is a last example with strings.

int _tmain(int argc, _TCHAR* argv[])
{
   std::vector< string > words;
   words.push_back("the");
   words.push_back("quick");
   words.push_back("brown");
   words.push_back("fox");
   words.push_back("jumps");
   words.push_back("over");
   words.push_back("the");
   words.push_back("lazy");
   words.push_back("dog");

   string sentence = Aggregate< string, string >(
      words,
      "",
      [](string workingSentence, string next){return next + " " + workingSentence;});

   cout << sentence << endl;

   return 0;
}
dog lazy the over jumps fox brown quick the

These were several examples of how lambda functions help us write more generic, and less verbose code. I suggest you read more about lambdas in C++ here.

, , , , , , , , Hits for this post: 14819 .

So far, C#, unlike C++, did not support optional arguments. For instance, suppose you need a function to print log a message, that can add a new line or not after writing the message. Most of the times you want a new line, so you don’t want to specify that for most of the calls. Until now, the only possibility was using overloaded functions, with different parameters.

class Logger
{
   public void LogMessage(string error, bool addNewLine)
   {
      Console.Write(error);
      if (addNewLine) Console.WriteLine();
   }

   public void LogMessage(string error)
   {
      LogMessage(error, true);
   }
};

class Program
{
   static void Main(string[] args)
   {
      Logger log = new Logger();
      log.LogMessage("trying to connect...", false);
      // do more stuff here
      log.LogMessage("SUCCESS");
   }
}
trying to connect...SUCCESS
Press any key to continue . . .

C# 4.0, that will be released with Visual Studio 2010, and is for now available with the Visual Studio 2010 CTP, supports, just like C++, optional arguments. So instead of using overloaded functions, you can specify a default value for a parameter. When doing the call, if you don’t provide a value for the argument, the default one will be used.

The Logger class below is identical from the functionality point of view with the one above. (The previous Main() function doesn’t have to change.)

class Logger
{
   public void LogMessage(string error, bool addNewLine = true)
   {
      Console.Write(error);
      if (addNewLine) Console.WriteLine();
   }
};

The only restriction is that optional parameters have to appear in the function after all required parameters. In other words, this is not legal:

class foo
{
   public void func(int a, int b = 0, int c)
   {
   }
}
error CS1737: Optional parameters must appear after all required parameters

Considering this implementation of foo

class foo
{
   public void func(int a, int b = 0, int c = 1)
   {
   }
}

the following calls can be made:

var f = new foo();
f.func(42, 3, 4);          // f.func(42, 3, 4)
f.func(42);                // f.func(42, 0, 1)
f.func(42, 100);           // f.func(42, 100, 1)

But this is not all. C# 4.0 brings another feature: named parameters (and this does not exist in C++). It means that when you make a call, you can specify an argument by its name, not by position. In this case you use the parameter’s name followed by ‘:’ and the value. Here are several examples:

var f = new foo();
f.func(42, c: 4);          // f.func(42, 0, 4)
f.func(42, c: 4, b: 3);    // f.func(42, 3, 4)
f.func(c: 3, a: 1, b: 2);  // f.func(1, 2, 3)

Of course, this can be used regardless the function has optional parameters or not. However, I would not use this feature for anything else than specifying an optional parameter when I would have to first suply values for other several optional parameters. In other words:

var f = new foo();
f.func(42, c: 4);          // this is good, skip b, provide value for c
f.func(42, c: 4, b: 3);    // don't like this, rather call f.func(42, 3, 4)
f.func(c: 3, a: 1, b: 2);  // don't like this, rather call f.func(1, 2, 3)

, , , , , Hits for this post: 17398 .

The new C++0x standard provides support for type inference. The auto keyword that was doing nothing in C++ was given a new meaning: a placeholder for a type inferred by the compiler. For those familiar with C#’s var keyword, this is basically the same.

Here is a comparison between auto in C++ and var in C#.

C++ C#
int _tmain(int argc, _TCHAR* argv[])
{
   auto i = 42;
   auto s = "Hello world!";
   auto d = 12.50;
   auto l = [](int n) {return n * n;};

   cout << "i = " << i << endl;
   cout << "s = " << s << endl;
   cout << "d = " << d << endl;
   cout << "l(i) = " << l(i) << endl;

   return 0;
}
class Program
{
   static void Main(string[] args)
   {
      var i = 42;
      var s = "hello world";
      var d = 12.50;
      //var l = (n) => n * n; // error CS0815

      Console.WriteLine("i = {0}", i);
      Console.WriteLine("s = {0}", s);
      Console.WriteLine("d = {0}", d);
   }
}

with the output

C++ C#
i = 42
s = Hello world!
d = 12.5
l(i) = 1764
i = 42
s = Hello world!
d = 12.5

As you can see, the use is very similar, except that in C++ you can use auto for a lambda expression, while in C# you cannot. The C# compiler raises an error, CS0815, when you try to use var with an anonymous function expressions, a lambda expression, a method group expressions, or the null literal expression, because these don't have a type.

Just like with var, you can only use auto locally, and not as return type from a function, or parameter to a function.

auto power(int n)
{
   return n * n;
}
error C3532: 'auto (int)': incorrect usage of 'auto'
int power(auto n)
{
   return n * n;
}
error C3533: 'auto': a parameter cannot have a type that contains 'auto'

While it might not be of that much help for ints or strings, using auto for instead of vector< int >::iterator, or map< string, list< string >>::const_iterator comes in handy. The auto type inference helps us write less verbose code.

vector< string > words;
words.push_back("hello");
words.push_back("world");

for(auto it = words.begin(); it != words.end(); ++it)
   cout << *it << endl;

In addition to auto, the C++0x standard introduces a new keyword, called decltype (think of it as 'typeof'), that can be used to determine the type of an expression at compile time.

auto i = 42;
decltype(i) i2 = 44;

However, this new keyword was not yet introduced in Visual Studio 2010 CTP. According to Stephan T. Lavavej from the VC++ team, it might be possible to be introduced in a next CTP or beta version.

Implementing auto doesn't implement decltype (C++0x's name for what was previously called typeof) for free.

Paraphrasing our libraries and compiler front-end program manager Damien Watkins, the CTP is only the first look at our VC10 functionality and there is definitely more to come. We understand that certain features "go together", and that adding complementary features is a good thing in general. As always, customer feedback is a vital resource in forming our plans.

, , , , , , , , Hits for this post: 20548 .

Urmatoarea conferinta europeana PASS (Professional Association of SQL Server) va avea loc intre 22-24 Aprilie 2009 la Neuss in Germania. AXTI a obtinut un promo-code pentru membrii RONUA si DboDev (RO16F6). Cei interesati de a participa si a beneficia de aceasta reducere, vizitati blog-ul lui Aurelian, unde puteti afla mai multe informatii.

European PASS Conference 2009
, , Hits for this post: 6440 .

After blogging for two years I have decided to make a change in the look of my blog. So I have changed the appearance theme from ‘indigo’ to ‘coogee’. I think it looks better and hopefully will prove to be a better “reader experience”. I tested the new theme on IE7, Firefox 3, Google Chrome and Safari (hopefully I won’t be blamed for not having Opera on my machine). They all look good starting with a resolution of 1024 x 768 (I didn’t actually checked lower resolution because I think nobody uses less than that anyway).

And since I’m talking about updates, I must say that the WordPress Automatic Upgrade plugin for WordPress works just great. It saved me a lot of time, and I recommend it to anyone using WordPress.

, , Hits for this post: 7015 .

Evaluate Expressions – Part 1: The Approaches
Evaluate Expressions – Part 2: Parse the Expression
Evaluate Expressions – Part 3: Building the Abstract Syntax Tree
Evaluate Expressions – Part 4: Evaluate the Abstract Syntax Tree

So far we have managed to parse the text representing an expression and build an abstract syntax tree. The only thing left, and the simplest of them all, is traversing this abstract syntax tree and evaluating the expression is represents.

In pseudo code, this would look like this:

double Evaluate(subtree)
{
   if(subtree is numeric)
      return value;
   else
   {
      op = subtree.operator
      v1 = Evaluate(subtree.left)
      v2 = Evaluate(subtree.right)
      return v1 op v2;
   }
}

We actually have to check for one more type of node, the one representing an unary expression. However, the evaluation function is as simple as this:

class Evaluator
{
   double EvaluateSubtree(ASTNode* ast)
   {
      if(ast == NULL)
         throw EvaluatorException("Incorrect syntax tree!");

      if(ast->Type == NumberValue)
         return ast->Value;
      else if(ast->Type == UnaryMinus)
         return -EvaluateSubtree(ast->Left);
      else
      {
         double v1 = EvaluateSubtree(ast->Left);
         double v2 = EvaluateSubtree(ast->Right);
         switch(ast->Type)
         {
         case OperatorPlus:  return v1 + v2;
         case OperatorMinus: return v1 - v2;
         case OperatorMul:   return v1 * v2;
         case OperatorDiv:   return v1 / v2;
         }
      }

      throw EvaluatorException("Incorrect syntax tree!");
   }

public:
   double Evaluate(ASTNode* ast)
   {
      if(ast == NULL)
         throw EvaluatorException("Incorrect abstract syntax tree");

      return EvaluateSubtree(ast);
   }
};

The exception class is defined as:

class EvaluatorException : public std::exception
{
public:
   EvaluatorException(const std::string& message):
      std::exception(message.c_str())
      {
      }
};

So let’s try it:

void Test(const char* text)
{
   Parser parser;

   try
   {
      ASTNode* ast = parser.Parse(text);

      try
      {
         Evaluator eval;
         double val = eval.Evaluate(ast);

         std::cout << text << " = " << val << std::endl;
      }
      catch(EvaluatorException& ex)
      {
		  std::cout << text << " t " << ex.what() << std::endl;
      }

      delete ast;
   }
   catch(ParserException& ex)
   {
      std::cout << text << " t " << ex.what() << std::endl;
   }
}

int main()
{
   Test("1+2+3+4");
   Test("1*2*3*4");
   Test("1-2-3-4");
   Test("1/2/3/4");
   Test("1*2+3*4");
   Test("1+2*3+4");
   Test("(1+2)*(3+4)");
   Test("1+(2*3)*(4+5)");
   Test("1+(2*3)/4+5");
   Test("5/(4+3)/2");
   Test("1 + 2.5");
   Test("125");
   Test("-1");
   Test("-1+(-2)");
   Test("-1+(-2.0)");

   Test("   1*2,5");
   Test("   1*2.5e2");
   Test("M1 + 2.5");
   Test("1 + 2&5");
   Test("1 * 2.5.6");
   Test("1 ** 2.5");
   Test("*1 / 2.5");

   return 0;
}

The output of this test program is:

1+2+3+4 = 10
1*2*3*4 = 24
1-2-3-4 = -8
1/2/3/4 = 0.0416667
1*2+3*4 = 14
1+2*3+4 = 11
(1+2)*(3+4) = 21
1+(2*3)*(4+5) = 55
1+(2*3)/4+5 = 7.5
5/(4+3)/2 = 0.357143
1 + 2.5 = 3.5
125 = 125
-1 = -1
-1+(-2) = -3
-1+(-2.0) = -3
   1*2,5         Unexpected token ',' at position 6
   1*2.5e2       Unexpected token 'e' at position 8
M1 + 2.5         Unexpected token 'M' at position 0
1 + 2&5          Unexpected token '&' at position 5
1 * 2.5.6        Unexpected token '.' at position 7
1 ** 2.5         Unexpected token '*' at position 4
*1 / 2.5         Unexpected token '*' at position 1

And that's it. Define the grammar, build the parser, insert semantic actions and build the abstract syntax tree and then traverse it and evaluate the expression. If you are interested in understanding the grammar, and the parsing in a deeper manner than I presented in this posts, I suggest you read more articles. The purpose was not to teach compilers theory, but put it to a practical purpose.

Here you can download a Visual Studio 2008 project with the code contained in this tutorial.

, , , , Hits for this post: 16320 .

In my previous post we’ve parsed an exception verifying whether it’s correct or not syntactically. But we still have to evaluate it. To be able to do that we’ll have to build an abstract syntax tree. This can be done by modifying the previous code and inserting semantic action. That means we do something more when we match productions.

An abstract syntax tree is a binary tree. The inner nodes will represent operators and leafs will be numerical values.

Here is how a node in the AST will look:

AST node

It is defined like this:

enum ASTNodeType
{
   Undefined,
   OperatorPlus,
   OperatorMinus,
   OperatorMul,
   OperatorDiv,
   UnaryMinus,
   NumberValue
};

class ASTNode
{
public:
   ASTNodeType Type;
   double      Value;
   ASTNode*    Left;
   ASTNode*    Right;

   ASTNode()
   {
      Type = Undefined;
      Value = 0;
      Left = NULL;
      Right = NULL;
   }

   ~ASTNode()
   {
      delete Left;
      delete Right;
   }
};

For the expression 1+2*3, the AST will be:

AST example

We’ll build this tree by inserting semantic actions and adding nodes according to the following rules:

AST semantic rules

You’ll probably notice that based on these rules the AST shown above will be modified a little bit, with some additional nodes for operators + and *, having on the left a leaf node with the neutral element for the operation (zero for + and 1 for *), and on the right a node corresponding to a TERM or FACTOR. This won’t affect the evaluation.

The Parser class will change so that the functions corresponding to the non-terminal symbols EXP, EXP1, TERM, TERM1 and FACTOR will return an ASTNode* instead of void. That is the node created as a semantic action.

class Parser
{
   Token m_crtToken;
   const char* m_Text;
   size_t m_Index;

private:

   ASTNode* Expression()
   {
      ASTNode* tnode = Term();
      ASTNode* e1node = Expression1();

      return CreateNode(OperatorPlus, tnode, e1node);
   }

   ASTNode* Expression1()
   {
      ASTNode* tnode;
      ASTNode* e1node;

      switch(m_crtToken.Type)
      {
      case Plus:
         GetNextToken();
         tnode = Term();
         e1node = Expression1();

         return CreateNode(OperatorPlus, e1node, tnode);

      case Minus:
         GetNextToken();
         tnode = Term();
         e1node = Expression1();

         return CreateNode(OperatorMinus, e1node, tnode);
      }

      return CreateNodeNumber(0);
   }

   ASTNode* Term()
   {
      ASTNode* fnode = Factor();
      ASTNode* t1node = Term1();

      return CreateNode(OperatorMul, fnode, t1node);
   }

   ASTNode* Term1()
   {
      ASTNode* fnode;
      ASTNode* t1node;

      switch(m_crtToken.Type)
      {
      case Mul:
         GetNextToken();
         fnode = Factor();
         t1node = Term1();
         return CreateNode(OperatorMul, t1node, fnode);

      case Div:
         GetNextToken();
         fnode = Factor();
         t1node = Term1();
         return CreateNode(OperatorDiv, t1node, fnode);
      }

      return CreateNodeNumber(1);
   }

   ASTNode* Factor()
   {
      ASTNode* node;
      switch(m_crtToken.Type)
      {
      case OpenParenthesis:
         GetNextToken();
         node = Expression();
         Match(')');
         return node;

      case Minus:
         GetNextToken();
		 node = Factor();
         return CreateUnaryNode(node);

      case Number:
         {
            double value = m_crtToken.Value;
            GetNextToken();
            return CreateNodeNumber(value);
         }

      default:
         {
            std::stringstream sstr;
            sstr << "Unexpected token '" << m_crtToken.Symbol << "' at position " << m_Index;
            throw ParserException(sstr.str(), m_Index);
         }
      }
   }

   ASTNode* CreateNode(ASTNodeType type, ASTNode* left, ASTNode* right)
   {
      ASTNode* node = new ASTNode;
      node->Type = type;
      node->Left = left;
      node->Right = right;

      return node;
   }

   ASTNode* CreateUnaryNode(ASTNode* left)
   {
      ASTNode* node = new ASTNode;
      node->Type = UnaryMinus;
      node->Left = left;
      node->Right = NULL;

      return node;
   }

   ASTNode* CreateNodeNumber(double value)
   {
      ASTNode* node = new ASTNode;
      node->Type = NumberValue;
      node->Value = value;

      return node;
   }

   void Match(char expected)
   {
      if(m_Text[m_Index-1] == expected)
         GetNextToken();
      else
      {
         std::stringstream sstr;
         sstr << "Expected token '" << expected << "' at position " << m_Index;
         throw ParserException(sstr.str(), m_Index);
      }
   }

   void SkipWhitespaces()
   {
      while(isspace(m_Text[m_Index])) m_Index++;
   }

   void GetNextToken()
   {
      SkipWhitespaces();

	  m_crtToken.Value = 0;
	  m_crtToken.Symbol = 0;

      if(m_Text[m_Index] == 0)
      {
         m_crtToken.Type = EndOfText;
         return;
      }

      if(isdigit(m_Text[m_Index]))
      {
         m_crtToken.Type = Number;
		 m_crtToken.Value = GetNumber();
         return;
      }

      m_crtToken.Type = Error;

      switch(m_Text[m_Index])
      {
      case '+': m_crtToken.Type = Plus; break;
      case '-': m_crtToken.Type = Minus; break;
      case '*': m_crtToken.Type = Mul; break;
      case '/': m_crtToken.Type = Div; break;
      case '(': m_crtToken.Type = OpenParenthesis; break;
      case ')': m_crtToken.Type = ClosedParenthesis; break;
      }

      if(m_crtToken.Type != Error)
	  {
         m_crtToken.Symbol = m_Text[m_Index];
         m_Index++;
	  }
      else
      {
         std::stringstream sstr;
         sstr << "Unexpected token '" << m_Text[m_Index] << "' at position " << m_Index;
         throw ParserException(sstr.str(), m_Index);
      }
   }

   double GetNumber()
   {
      SkipWhitespaces();

      int index = m_Index;
      while(isdigit(m_Text[m_Index])) m_Index++;
      if(m_Text[m_Index] == '.') m_Index++;
      while(isdigit(m_Text[m_Index])) m_Index++;

      if(m_Index - index == 0)
         throw ParserException("Number expected but not found!", m_Index);

      char buffer[32] = {0};
      memcpy(buffer, &m_Text[index], m_Index - index);

      return atof(buffer);
   }

public:
   ASTNode* Parse(const char* text)
   {
      m_Text = text;
      m_Index = 0;
      GetNextToken();

      return Expression();
   }
};

Now the Parse() method will return the created abstract syntax tree. We will see how to evaluate the expression by traversing this tree in the next post.

, , , , Hits for this post: 18961 .

In my previous post I have provided some background theory for evaluating expressions with abstract syntax trees. As I was mentioning, the first step towards this goal is to parse the expression, make sure it is correct syntactically. This is what I’ll show you in this post.

Having the grammar defined, we’ll make one function for each non-terminal symbol (EXP, EXP1, TERM, TERM1, FACTOR).

Simply put the code will look like this:

   void Expression()
   {
      Term();
      Expression1();
   }

   void Expression1()
   {
      switch(current_token)
      {
      case '+':
         GetNextToken();
         Term();
         Expression1();
         break;

      case '-':
         GetNextToken();
         Term();
         Expression1();
         break;
      }
   }

However, I want to make it a little bit more organized, so the first thing to do will be defining a Token structure that will indicate the type of last extracted token and if the case its value (for numbers). A token is basically a symbol extracted (one at a time) from the input text. The possible tokens will be the arithmetical operators (‘+’, ‘-’, ‘/’, ‘*’), the parentheses (‘(‘ and ‘)’), numbers and the end of the text.

Here is how I defined the token type and the token:

enum TokenType
{
   Error,
   Plus,
   Minus,
   Mul,
   Div,
   EndOfText,
   OpenParenthesis,
   ClosedParenthesis,
   Number
};

struct Token
{
   TokenType	Type;
   double		Value;
   char		Symbol;

   Token():Type(Error), Value(0), Symbol(0)
   {}
};

To be able to do the parsing, we’ll need some helper functions:

  • SkipWhitespaces(), skips all whitespaces between two tokens:
       void SkipWhitespaces()
       {
          while(isspace(m_Text[m_Index])) m_Index++;
       }
    
  • GetNextToken(), extracts the next token from the text; if an illegal token appears it throws an exception
       void GetNextToken()
       {
          // ignore white spaces
          SkipWhitespaces();
    
          m_crtToken.Value = 0;
          m_crtToken.Symbol = 0;
    
          // test for the end of text
          if(m_Text[m_Index] == 0)
          {
             m_crtToken.Type = EndOfText;
             return;
          }
    
          // if the current character is a digit read a number
          if(isdigit(m_Text[m_Index]))
          {
             m_crtToken.Type = Number;
             m_crtToken.Value = GetNumber();
             return;
          }
    
          m_crtToken.Type = Error;
    
          // check if the current character is an operator or parentheses
          switch(m_Text[m_Index])
          {
          case '+': m_crtToken.Type = Plus; break;
          case '-': m_crtToken.Type = Minus; break;
          case '*': m_crtToken.Type = Mul; break;
          case '/': m_crtToken.Type = Div; break;
          case '(': m_crtToken.Type = OpenParenthesis; break;
          case ')': m_crtToken.Type = ClosedParenthesis; break;
          }
    
          if(m_crtToken.Type != Error)
          {
             m_crtToken.Symbol = m_Text[m_Index];
             m_Index++;
          }
          else
          {
             std::stringstream sstr;
             sstr << "Unexpected token '" << m_Text[m_Index] << "' at position " << m_Index;
             throw ParserException(sstr.str(), m_Index);
          }
       }
    
  • GetNumber() extracts a number from the input text from the current position; the purpose of this tutorial is didactical, so this function is quite simple: it reads integers and doubles with '.' As the decimal point; it doesn't read numbers in a format like 123.3E+2.
       double GetNumber()
       {
          SkipWhitespaces();
    
          int index = m_Index;
          while(isdigit(m_Text[m_Index])) m_Index++;
          if(m_Text[m_Index] == '.') m_Index++;
          while(isdigit(m_Text[m_Index])) m_Index++;
    
          if(m_Index - index == 0)
             throw ParserException("Number expected but not found!", m_Index);
    
          char buffer[32] = {0};
          memcpy(buffer, &m_Text[index], m_Index - index);
    
          return atof(buffer);
       }
    

With these defined, we can build the parser for the specified grammar.

class Parser
{
   Token m_crtToken;
   const char* m_Text;
   size_t m_Index;

private:

   void Expression()
   {
      Term();
      Expression1();
   }

   void Expression1()
   {
      switch(m_crtToken.Type)
      {
      case Plus:
         GetNextToken();
         Term();
         Expression1();
         break;

      case Minus:
         GetNextToken();
         Term();
         Expression1();
         break;
      }
   }

   void Term()
   {
      Factor();
      Term1();
   }

   void Term1()
   {
      switch(m_crtToken.Type)
      {
      case Mul:
         GetNextToken();
         Factor();
         Term1();
         break;

      case Div:
         GetNextToken();
         Factor();
         Term1();
         break;
      }
   }

   void Factor()
   {
      switch(m_crtToken.Type)
      {
      case OpenParenthesis:
         GetNextToken();
         Expression();
         Match(')');
         break;

      case Minus:
         GetNextToken();
         Factor();
         break;

      case Number:
         GetNextToken();
         break;

      default:
         {
            std::stringstream sstr;
            sstr << "Unexpected token '" << m_crtToken.Symbol << "' at position " << m_Index;
            throw ParserException(sstr.str(), m_Index);
         }
      }
   }

   void Match(char expected)
   {
      if(m_Text[m_Index-1] == expected)
         GetNextToken();
      else
      {
         std::stringstream sstr;
         sstr << "Expected token '" << expected << "' at position " << m_Index;
         throw ParserException(sstr.str(), m_Index);
      }
   }

   void SkipWhitespaces()
   {
      while(isspace(m_Text[m_Index])) m_Index++;
   }

   void GetNextToken()
   {
      // ignore white spaces
      SkipWhitespaces();

      m_crtToken.Value = 0;
      m_crtToken.Symbol = 0;

      // test for the end of text
      if(m_Text[m_Index] == 0)
      {
         m_crtToken.Type = EndOfText;
         return;
      }

      // if the current character is a digit read a number
      if(isdigit(m_Text[m_Index]))
      {
         m_crtToken.Type = Number;
         m_crtToken.Value = GetNumber();
         return;
      }

      m_crtToken.Type = Error;

      // check if the current character is an operator or parentheses
      switch(m_Text[m_Index])
      {
      case '+': m_crtToken.Type = Plus; break;
      case '-': m_crtToken.Type = Minus; break;
      case '*': m_crtToken.Type = Mul; break;
      case '/': m_crtToken.Type = Div; break;
      case '(': m_crtToken.Type = OpenParenthesis; break;
      case ')': m_crtToken.Type = ClosedParenthesis; break;
      }

      if(m_crtToken.Type != Error)
      {
         m_crtToken.Symbol = m_Text[m_Index];
         m_Index++;
      }
      else
      {
         std::stringstream sstr;
         sstr << "Unexpected token '" << m_Text[m_Index] << "' at position " << m_Index;
         throw ParserException(sstr.str(), m_Index);
      }
   }

   double GetNumber()
   {
      SkipWhitespaces();

      int index = m_Index;
      while(isdigit(m_Text[m_Index])) m_Index++;
      if(m_Text[m_Index] == '.') m_Index++;
      while(isdigit(m_Text[m_Index])) m_Index++;

      if(m_Index - index == 0)
         throw ParserException("Number expected but not found!", m_Index);

      char buffer[32] = {0};
      memcpy(buffer, &m_Text[index], m_Index - index);

      return atof(buffer);
   }

public:
   void Parse(const char* text)
   {
      m_Text = text;
      m_Index = 0;
      GetNextToken();

      Expression();
   }
};

The exception class is defined like this:

class ParserException : public std::exception
{
   int m_Pos;

public:
   ParserException(const std::string& message, int pos):
      std::exception(message.c_str()),
      m_Pos(pos)
   {
   }
};

As you can see, the code for the grammar production is quite simple and straight forward. Now, let's put it to the test.

void Test(const char* text)
{
   Parser parser;
   try
   {
      parser.Parse(text);
      std::cout << """ << text << ""t OK" << std::endl;
   }
   catch(ParserException& ex)
   {
      std::cout << """ << text << ""t " << ex.what() << std::endl;
   }
}

int main()
{
   Test("1+2+3+4");
   Test("1*2*3*4");
   Test("1-2-3-4");
   Test("1/2/3/4");
   Test("1*2+3*4");
   Test("1+2*3+4");
   Test("(1+2)*(3+4)");
   Test("1+(2*3)*(4+5)");
   Test("1+(2*3)/4+5");
   Test("5/(4+3)/2");
   Test("1 + 2.5");
   Test("125");
   Test("-1");
   Test("-1+(-2)");
   Test("-1+(-2.0)");

   Test("   1*2,5");
   Test("   1*2.5e2");
   Test("M1 + 2.5");
   Test("1 + 2&5");
   Test("1 * 2.5.6");
   Test("1 ** 2.5");
   Test("*1 / 2.5");

   return 0;
}

The output for this testing program is:

"1+2+3+4"        OK
"1*2*3*4"        OK
"1-2-3-4"        OK
"1/2/3/4"        OK
"1*2+3*4"        OK
"1+2*3+4"        OK
"(1+2)*(3+4)"    OK
"1+(2*3)*(4+5)"  OK
"1+(2*3)/4+5"    OK
"5/(4+3)/2"      OK
"1 + 2.5"        OK
"125"    OK
"-1"     OK
"-1+(-2)"        OK
"-1+(-2.0)"      OK
"   1*2,5"       Unexpected token ',' at position 6
"   1*2.5e2"     Unexpected token 'e' at position 8
"M1 + 2.5"       Unexpected token 'M' at position 0
"1 + 2&5"        Unexpected token '&' at position 5
"1 * 2.5.6"      Unexpected token '.' at position 7
"1 ** 2.5"       Unexpected token '*' at position 4
"*1 / 2.5"       Unexpected token '*' at position 1

Which is exactly what we expected: it validates correct expressions and throws an exception when the exception is incorrect.

In the next post I'll show how to modify this code to build an abstract syntax tree.

, , , , , , , Hits for this post: 21124 .