.NET allows you to expose components as COM and consume them from unmanaged code. There are many references on how to this (and you can only start with MSDN), and I will not talk about that part. What I want to explain here is something different. Suppose you have this interface:

[Guid("2F8433FE-4771-4037-B6B2-ED5F6585ED04")]
[InterfaceType(ComInterfaceType.InterfaceIsIDispatch)]
public interface IAccounts
{
      [DispId(1)]
      string[] GetUsers();
}

Method GetUsers() returns an array on string representing the user names. But what if you also wanted the user passwords or addresses? Since this is exposed as COM, you cannot return an array of User. But you can return multiple arrays of string. So, how would you deal with out string[]? This is what I want to show you in this tutorial.

This is a .NET interface exposed to COM. It has two methods, GetUsers() that returns an array of string representing user names, and GetUsers2() that returns an array of strings as an output parameters and a bool as return type, indicating whether any user was found.

namespace SampleLibrary
{
   [Guid("2F8433FE-4771-4037-B6B2-ED5F6585ED04")]
   [InterfaceType(ComInterfaceType.InterfaceIsIDispatch)]
   public interface IAccounts
   {
      [DispId(1)]
      string[] GetUsers();

      [DispId(2)]
      bool GetUsers2(out string [] users);
   }
}

And this is the implementation:

namespace SampleLibrary
{
   [Guid("C4713144-5D29-4c65-BF9C-188B1B7CD2B6")]
   [ClassInterface(ClassInterfaceType.None)]
   [ProgId("SampleLibrary.DataQuery")]
   public class Accounts : IAccounts
   {
      List< string > m_users;

      public Accounts()
      {
         m_users = new List< string > {
            "marius.bancila",
            "john.doe",
            "anna.kepler"
         };
      }

      #region IDataQuery Members

      public string[] GetUsers()
      {
         return m_users.ToArray();
      }

      public bool GetUsers2(out string[] users)
      {
         users = m_users.ToArray();

         return users.Length > 0;
      }

      #endregion
   }
}

Note: If you are trying this example make sure you set the ComVisible attribute to true, either for each type or per assembly (in AssemblyInfo.cs)

[assembly: ComVisible(true)]

Second, you have to check the “Register for COM interop” setting in the Build page of the project properties.

The first thing to do in C++ is importing the .TLB file that was generated by regasm.exe.

#import "SampleLibrary.tlb"
using namespace SampleLibrary;

If we look in the .TLB file, we can see how the IAccounts interface looks like:

struct __declspec(uuid("2f8433fe-4771-4037-b6b2-ed5f6585ed04"))
IAccounts : IDispatch
{
    //
    // Wrapper methods for error-handling
    //

    // Methods:
    SAFEARRAY * GetUsers ( );
    VARIANT_BOOL GetUsers2 (
        SAFEARRAY * * users );
};

The following C++ functions, GetUsers1() retrieves the users users list using method GetUsers() from IAccounts. It puts the users in a CStringArray (notice that this container does not have an assignment operator, so the only way to return such an array is with a reference in the parameters list).

void GetUsers1(CStringArray& arrUsers)
{
   IAccountsPtr pAccounts(__uuidof(Accounts));

   SAFEARRAY* sarrUsers = pAccounts->GetUsers();

   _variant_t varUsers;
   varUsers.parray = sarrUsers;
   varUsers.vt = VT_ARRAY | VT_BSTR;

   UnpackBstrArray(varUsers, arrUsers);
   SafeArrayDestroy(sarrUsers);

   pAccounts->Release();
}

UnpackBstrArray() is a function (see below) that extracts the elements of a SAFEARRAY and adds them to a CStringArray.

Function GetUsers2() uses the second method, GetUsers2() from IAccounts. This needs the address of a pointer to a SAFEARRAY (i.e. SAFEARRAY**) that will hold the values returned by the COM method. This time we have to create an empty SAFEARRAY and then pass its address to the COM method. The rest is similar to the previous case.

void GetUsers2(CStringArray& arrUsers)
{
   IAccountsPtr pAccounts(__uuidof(Accounts));

   SAFEARRAYBOUND aDim[1];
   aDim[0].lLbound = 0;
   aDim[0].cElements = 0;

   SAFEARRAY* sarrUsers = SafeArrayCreate(VT_BSTR, 1, aDim);

   VARIANT_BOOL ret = pAccounts->GetUsers2(&sarrUsers);
   if(ret != VARIANT_FALSE)
   {
      _variant_t varUsers;
      varUsers.parray = sarrUsers;
      varUsers.vt = VT_ARRAY | VT_BSTR;
      UnpackBstrArray(varUsers, arrUsers);
   }

   SafeArrayDestroy(sarrUsers);

   pAccounts->Release();
}

The helper method UnpackBstrArray() used previous looks like this:

void UnpackBstrArrayHelper(VARIANT* pvarArrayIn, CStringArray* pstrarrValues)
{
   if (!pstrarrValues || !pvarArrayIn || pvarArrayIn->vt == VT_EMPTY)
      return;

   pstrarrValues->RemoveAll();

   VARIANT* pvarArray = pvarArrayIn;
   SAFEARRAY* parrValues = NULL;

   SAFEARRAYBOUND arrayBounds[1];
   arrayBounds[0].lLbound = 0;
   arrayBounds[0].cElements = 0;

   if((pvarArray->vt & (VT_VARIANT|VT_BYREF|VT_ARRAY)) == (VT_VARIANT|VT_BYREF) &&
      NULL != pvarArray->pvarVal &&
      (pvarArray->pvarVal->vt & VT_ARRAY))
   {
      pvarArray = pvarArray->pvarVal;
   }

   if (pvarArray->vt & VT_ARRAY)
   {
      if (VT_BYREF & pvarArray->vt)
         parrValues = *pvarArray->pparray;
      else
         parrValues = pvarArray->parray;
   }
   else
      return;

   if (parrValues != NULL)
   {
      HRESULT hr = SafeArrayGetLBound(parrValues, 1, &arrayBounds[0].lLbound);
      hr = SafeArrayGetUBound(parrValues, 1, (long*)&arrayBounds[0].cElements);
      arrayBounds[0].cElements -= arrayBounds[0].lLbound;
      arrayBounds[0].cElements += 1;
   }

   if (arrayBounds[0].cElements > 0)
   {
      for (ULONG i = 0; i < arrayBounds[0].cElements; i++)
      {
         LONG lIndex = (LONG)i;
         CString strValue = _T("");

         VARTYPE vType;
         BSTR bstrItem;

         ::SafeArrayGetVartype(parrValues, &vType);
         HRESULT hr = ::SafeArrayGetElement(parrValues, &lIndex, &bstrItem);

         if(SUCCEEDED(hr))
         {
            switch(vType)
            {
            case VT_BSTR:
               strValue = (LPCTSTR)bstrItem;
               break;
            }

            ::SysFreeString(bstrItem);
         }

         pstrarrValues->Add(strValue);
      }
   }
}

void UnpackBstrArray( const _variant_t &var, CStringArray &strarrValues  )
{
   UnpackBstrArrayHelper( &(VARIANT)const_cast< _variant_t & >(var), &strarrValues );
}

Attached you can find a demo project (C# and C++) with the complete example show in this tutorial.

[download id="4"]

Hits for this post: 6930 .

Project Tuva is an enhanced video player created by Microsoft Research to freely host the lectures give my Richard Feynman at the Cornell University in the ’60s. Bill Gates saw the lectures two decades ago, was impressed with them and wanted to make them freely available. Now, it finally happened. You can watch them at Microsoft Research.

The seven lectures given by professor Feynman are:

  • Law of Gravity
  • The Relation of Mathematics and Physics
  • The Great Conservation Principles
  • Symmetry in Physical Law
  • The Distinction of Past and Future
  • Probability and Uncertainty – The Quantum Mechanical View of Nature
  • Seeking New Laws

These are great lectures given by one of the greatest physicists of the 20th century. They really worth watching.

, , , Hits for this post: 2572 .

Evaluate Expressions – Part 1: The Approaches
Evaluate Expressions – Part 2: Parse the Expression
Evaluate Expressions – Part 3: Building the Abstract Syntax Tree
Evaluate Expressions – Part 4: Evaluate the Abstract Syntax Tree

So far we have managed to parse the text representing an expression and build an abstract syntax tree. The only thing left, and the simplest of them all, is traversing this abstract syntax tree and evaluating the expression is represents.

In pseudo code, this would look like this:

double Evaluate(subtree)
{
   if(subtree is numeric)
      return value;
   else
   {
      op = subtree.operator
      v1 = Evaluate(subtree.left)
      v2 = Evaluate(subtree.right)
      return v1 op v2;
   }
}

We actually have to check for one more type of node, the one representing an unary expression. However, the evaluation function is as simple as this:

class Evaluator
{
   double EvaluateSubtree(ASTNode* ast)
   {
      if(ast == NULL)
         throw EvaluatorException("Incorrect syntax tree!");

      if(ast->Type == NumberValue)
         return ast->Value;
      else if(ast->Type == UnaryMinus)
         return -EvaluateSubtree(ast->Left);
      else
      {
         double v1 = EvaluateSubtree(ast->Left);
         double v2 = EvaluateSubtree(ast->Right);
         switch(ast->Type)
         {
         case OperatorPlus:  return v1 + v2;
         case OperatorMinus: return v1 - v2;
         case OperatorMul:   return v1 * v2;
         case OperatorDiv:   return v1 / v2;
         }
      }

      throw EvaluatorException("Incorrect syntax tree!");
   }

public:
   double Evaluate(ASTNode* ast)
   {
      if(ast == NULL)
         throw EvaluatorException("Incorrect abstract syntax tree");

      return EvaluateSubtree(ast);
   }
};

The exception class is defined as:

class EvaluatorException : public std::exception
{
public:
   EvaluatorException(const std::string& message):
      std::exception(message.c_str())
      {
      }
};

So let’s try it:

void Test(const char* text)
{
   Parser parser;

   try
   {
      ASTNode* ast = parser.Parse(text);

      try
      {
         Evaluator eval;
         double val = eval.Evaluate(ast);

         std::cout << text << " = " << val << std::endl;
      }
      catch(EvaluatorException& ex)
      {
		  std::cout << text << " t " << ex.what() << std::endl;
      }

      delete ast;
   }
   catch(ParserException& ex)
   {
      std::cout << text << " t " << ex.what() << std::endl;
   }
}

int main()
{
   Test("1+2+3+4");
   Test("1*2*3*4");
   Test("1-2-3-4");
   Test("1/2/3/4");
   Test("1*2+3*4");
   Test("1+2*3+4");
   Test("(1+2)*(3+4)");
   Test("1+(2*3)*(4+5)");
   Test("1+(2*3)/4+5");
   Test("5/(4+3)/2");
   Test("1 + 2.5");
   Test("125");
   Test("-1");
   Test("-1+(-2)");
   Test("-1+(-2.0)");

   Test("   1*2,5");
   Test("   1*2.5e2");
   Test("M1 + 2.5");
   Test("1 + 2&5");
   Test("1 * 2.5.6");
   Test("1 ** 2.5");
   Test("*1 / 2.5");

   return 0;
}

The output of this test program is:

1+2+3+4 = 10
1*2*3*4 = 24
1-2-3-4 = -8
1/2/3/4 = 0.0416667
1*2+3*4 = 14
1+2*3+4 = 11
(1+2)*(3+4) = 21
1+(2*3)*(4+5) = 55
1+(2*3)/4+5 = 7.5
5/(4+3)/2 = 0.357143
1 + 2.5 = 3.5
125 = 125
-1 = -1
-1+(-2) = -3
-1+(-2.0) = -3
   1*2,5         Unexpected token ',' at position 6
   1*2.5e2       Unexpected token 'e' at position 8
M1 + 2.5         Unexpected token 'M' at position 0
1 + 2&5          Unexpected token '&' at position 5
1 * 2.5.6        Unexpected token '.' at position 7
1 ** 2.5         Unexpected token '*' at position 4
*1 / 2.5         Unexpected token '*' at position 1

And that's it. Define the grammar, build the parser, insert semantic actions and build the abstract syntax tree and then traverse it and evaluate the expression. If you are interested in understanding the grammar, and the parsing in a deeper manner than I presented in this posts, I suggest you read more articles. The purpose was not to teach compilers theory, but put it to a practical purpose.

Here you can download a Visual Studio 2008 project with the code contained in this tutorial.

, , , , Hits for this post: 8531 .

In my previous post we’ve parsed an exception verifying whether it’s correct or not syntactically. But we still have to evaluate it. To be able to do that we’ll have to build an abstract syntax tree. This can be done by modifying the previous code and inserting semantic action. That means we do something more when we match productions.

An abstract syntax tree is a binary tree. The inner nodes will represent operators and leafs will be numerical values.

Here is how a node in the AST will look:

AST node

It is defined like this:

enum ASTNodeType
{
   Undefined,
   OperatorPlus,
   OperatorMinus,
   OperatorMul,
   OperatorDiv,
   UnaryMinus,
   NumberValue
};

class ASTNode
{
public:
   ASTNodeType Type;
   double      Value;
   ASTNode*    Left;
   ASTNode*    Right;

   ASTNode()
   {
      Type = Undefined;
      Value = 0;
      Left = NULL;
      Right = NULL;
   }

   ~ASTNode()
   {
      delete Left;
      delete Right;
   }
};

For the expression 1+2*3, the AST will be:

AST example

We’ll build this tree by inserting semantic actions and adding nodes according to the following rules:

AST semantic rules

You’ll probably notice that based on these rules the AST shown above will be modified a little bit, with some additional nodes for operators + and *, having on the left a leaf node with the neutral element for the operation (zero for + and 1 for *), and on the right a node corresponding to a TERM or FACTOR. This won’t affect the evaluation.

The Parser class will change so that the functions corresponding to the non-terminal symbols EXP, EXP1, TERM, TERM1 and FACTOR will return an ASTNode* instead of void. That is the node created as a semantic action.

class Parser
{
   Token m_crtToken;
   const char* m_Text;
   size_t m_Index;

private:

   ASTNode* Expression()
   {
      ASTNode* tnode = Term();
      ASTNode* e1node = Expression1();

      return CreateNode(OperatorPlus, tnode, e1node);
   }

   ASTNode* Expression1()
   {
      ASTNode* tnode;
      ASTNode* e1node;

      switch(m_crtToken.Type)
      {
      case Plus:
         GetNextToken();
         tnode = Term();
         e1node = Expression1();

         return CreateNode(OperatorPlus, e1node, tnode);

      case Minus:
         GetNextToken();
         tnode = Term();
         e1node = Expression1();

         return CreateNode(OperatorMinus, e1node, tnode);
      }

      return CreateNodeNumber(0);
   }

   ASTNode* Term()
   {
      ASTNode* fnode = Factor();
      ASTNode* t1node = Term1();

      return CreateNode(OperatorMul, fnode, t1node);
   }

   ASTNode* Term1()
   {
      ASTNode* fnode;
      ASTNode* t1node;

      switch(m_crtToken.Type)
      {
      case Mul:
         GetNextToken();
         fnode = Factor();
         t1node = Term1();
         return CreateNode(OperatorMul, t1node, fnode);

      case Div:
         GetNextToken();
         fnode = Factor();
         t1node = Term1();
         return CreateNode(OperatorDiv, t1node, fnode);
      }

      return CreateNodeNumber(1);
   }

   ASTNode* Factor()
   {
      ASTNode* node;
      switch(m_crtToken.Type)
      {
      case OpenParenthesis:
         GetNextToken();
         node = Expression();
         Match(')');
         return node;

      case Minus:
         GetNextToken();
		 node = Factor();
         return CreateUnaryNode(node);

      case Number:
         {
            double value = m_crtToken.Value;
            GetNextToken();
            return CreateNodeNumber(value);
         }

      default:
         {
            std::stringstream sstr;
            sstr << "Unexpected token '" << m_crtToken.Symbol << "' at position " << m_Index;
            throw ParserException(sstr.str(), m_Index);
         }
      }
   }

   ASTNode* CreateNode(ASTNodeType type, ASTNode* left, ASTNode* right)
   {
      ASTNode* node = new ASTNode;
      node->Type = type;
      node->Left = left;
      node->Right = right;

      return node;
   }

   ASTNode* CreateUnaryNode(ASTNode* left)
   {
      ASTNode* node = new ASTNode;
      node->Type = UnaryMinus;
      node->Left = left;
      node->Right = NULL;

      return node;
   }

   ASTNode* CreateNodeNumber(double value)
   {
      ASTNode* node = new ASTNode;
      node->Type = NumberValue;
      node->Value = value;

      return node;
   }

   void Match(char expected)
   {
      if(m_Text[m_Index-1] == expected)
         GetNextToken();
      else
      {
         std::stringstream sstr;
         sstr << "Expected token '" << expected << "' at position " << m_Index;
         throw ParserException(sstr.str(), m_Index);
      }
   }

   void SkipWhitespaces()
   {
      while(isspace(m_Text[m_Index])) m_Index++;
   }

   void GetNextToken()
   {
      SkipWhitespaces();

	  m_crtToken.Value = 0;
	  m_crtToken.Symbol = 0;

      if(m_Text[m_Index] == 0)
      {
         m_crtToken.Type = EndOfText;
         return;
      }

      if(isdigit(m_Text[m_Index]))
      {
         m_crtToken.Type = Number;
		 m_crtToken.Value = GetNumber();
         return;
      }

      m_crtToken.Type = Error;

      switch(m_Text[m_Index])
      {
      case '+': m_crtToken.Type = Plus; break;
      case '-': m_crtToken.Type = Minus; break;
      case '*': m_crtToken.Type = Mul; break;
      case '/': m_crtToken.Type = Div; break;
      case '(': m_crtToken.Type = OpenParenthesis; break;
      case ')': m_crtToken.Type = ClosedParenthesis; break;
      }

      if(m_crtToken.Type != Error)
	  {
         m_crtToken.Symbol = m_Text[m_Index];
         m_Index++;
	  }
      else
      {
         std::stringstream sstr;
         sstr << "Unexpected token '" << m_Text[m_Index] << "' at position " << m_Index;
         throw ParserException(sstr.str(), m_Index);
      }
   }

   double GetNumber()
   {
      SkipWhitespaces();

      int index = m_Index;
      while(isdigit(m_Text[m_Index])) m_Index++;
      if(m_Text[m_Index] == '.') m_Index++;
      while(isdigit(m_Text[m_Index])) m_Index++;

      if(m_Index - index == 0)
         throw ParserException("Number expected but not found!", m_Index);

      char buffer[32] = {0};
      memcpy(buffer, &m_Text[index], m_Index - index);

      return atof(buffer);
   }

public:
   ASTNode* Parse(const char* text)
   {
      m_Text = text;
      m_Index = 0;
      GetNextToken();

      return Expression();
   }
};

Now the Parse() method will return the created abstract syntax tree. We will see how to evaluate the expression by traversing this tree in the next post.

, , , , Hits for this post: 7802 .

In my previous post I have provided some background theory for evaluating expressions with abstract syntax trees. As I was mentioning, the first step towards this goal is to parse the expression, make sure it is correct syntactically. This is what I’ll show you in this post.

Having the grammar defined, we’ll make one function for each non-terminal symbol (EXP, EXP1, TERM, TERM1, FACTOR).

Simply put the code will look like this:

   void Expression()
   {
      Term();
      Expression1();
   }

   void Expression1()
   {
      switch(current_token)
      {
      case '+':
         GetNextToken();
         Term();
         Expression1();
         break;

      case '-':
         GetNextToken();
         Term();
         Expression1();
         break;
      }
   }

However, I want to make it a little bit more organized, so the first thing to do will be defining a Token structure that will indicate the type of last extracted token and if the case its value (for numbers). A token is basically a symbol extracted (one at a time) from the input text. The possible tokens will be the arithmetical operators (‘+’, ‘-’, ‘/’, ‘*’), the parentheses (‘(‘ and ‘)’), numbers and the end of the text.

Here is how I defined the token type and the token:

enum TokenType
{
   Error,
   Plus,
   Minus,
   Mul,
   Div,
   EndOfText,
   OpenParenthesis,
   ClosedParenthesis,
   Number
};

struct Token
{
   TokenType	Type;
   double		Value;
   char		Symbol;

   Token():Type(Error), Value(0), Symbol(0)
   {}
};

To be able to do the parsing, we’ll need some helper functions:

  • SkipWhitespaces(), skips all whitespaces between two tokens:
       void SkipWhitespaces()
       {
          while(isspace(m_Text[m_Index])) m_Index++;
       }
    
  • GetNextToken(), extracts the next token from the text; if an illegal token appears it throws an exception
       void GetNextToken()
       {
          // ignore white spaces
          SkipWhitespaces();
    
          m_crtToken.Value = 0;
          m_crtToken.Symbol = 0;
    
          // test for the end of text
          if(m_Text[m_Index] == 0)
          {
             m_crtToken.Type = EndOfText;
             return;
          }
    
          // if the current character is a digit read a number
          if(isdigit(m_Text[m_Index]))
          {
             m_crtToken.Type = Number;
             m_crtToken.Value = GetNumber();
             return;
          }
    
          m_crtToken.Type = Error;
    
          // check if the current character is an operator or parentheses
          switch(m_Text[m_Index])
          {
          case '+': m_crtToken.Type = Plus; break;
          case '-': m_crtToken.Type = Minus; break;
          case '*': m_crtToken.Type = Mul; break;
          case '/': m_crtToken.Type = Div; break;
          case '(': m_crtToken.Type = OpenParenthesis; break;
          case ')': m_crtToken.Type = ClosedParenthesis; break;
          }
    
          if(m_crtToken.Type != Error)
          {
             m_crtToken.Symbol = m_Text[m_Index];
             m_Index++;
          }
          else
          {
             std::stringstream sstr;
             sstr << "Unexpected token '" << m_Text[m_Index] << "' at position " << m_Index;
             throw ParserException(sstr.str(), m_Index);
          }
       }
    
  • GetNumber() extracts a number from the input text from the current position; the purpose of this tutorial is didactical, so this function is quite simple: it reads integers and doubles with '.' As the decimal point; it doesn't read numbers in a format like 123.3E+2.
       double GetNumber()
       {
          SkipWhitespaces();
    
          int index = m_Index;
          while(isdigit(m_Text[m_Index])) m_Index++;
          if(m_Text[m_Index] == '.') m_Index++;
          while(isdigit(m_Text[m_Index])) m_Index++;
    
          if(m_Index - index == 0)
             throw ParserException("Number expected but not found!", m_Index);
    
          char buffer[32] = {0};
          memcpy(buffer, &m_Text[index], m_Index - index);
    
          return atof(buffer);
       }
    

With these defined, we can build the parser for the specified grammar.

class Parser
{
   Token m_crtToken;
   const char* m_Text;
   size_t m_Index;

private:

   void Expression()
   {
      Term();
      Expression1();
   }

   void Expression1()
   {
      switch(m_crtToken.Type)
      {
      case Plus:
         GetNextToken();
         Term();
         Expression1();
         break;

      case Minus:
         GetNextToken();
         Term();
         Expression1();
         break;
      }
   }

   void Term()
   {
      Factor();
      Term1();
   }

   void Term1()
   {
      switch(m_crtToken.Type)
      {
      case Mul:
         GetNextToken();
         Factor();
         Term1();
         break;

      case Div:
         GetNextToken();
         Factor();
         Term1();
         break;
      }
   }

   void Factor()
   {
      switch(m_crtToken.Type)
      {
      case OpenParenthesis:
         GetNextToken();
         Expression();
         Match(')');
         break;

      case Minus:
         GetNextToken();
         Factor();
         break;

      case Number:
         GetNextToken();
         break;

      default:
         {
            std::stringstream sstr;
            sstr << "Unexpected token '" << m_crtToken.Symbol << "' at position " << m_Index;
            throw ParserException(sstr.str(), m_Index);
         }
      }
   }

   void Match(char expected)
   {
      if(m_Text[m_Index-1] == expected)
         GetNextToken();
      else
      {
         std::stringstream sstr;
         sstr << "Expected token '" << expected << "' at position " << m_Index;
         throw ParserException(sstr.str(), m_Index);
      }
   }

   void SkipWhitespaces()
   {
      while(isspace(m_Text[m_Index])) m_Index++;
   }

   void GetNextToken()
   {
      // ignore white spaces
      SkipWhitespaces();

      m_crtToken.Value = 0;
      m_crtToken.Symbol = 0;

      // test for the end of text
      if(m_Text[m_Index] == 0)
      {
         m_crtToken.Type = EndOfText;
         return;
      }

      // if the current character is a digit read a number
      if(isdigit(m_Text[m_Index]))
      {
         m_crtToken.Type = Number;
         m_crtToken.Value = GetNumber();
         return;
      }

      m_crtToken.Type = Error;

      // check if the current character is an operator or parentheses
      switch(m_Text[m_Index])
      {
      case '+': m_crtToken.Type = Plus; break;
      case '-': m_crtToken.Type = Minus; break;
      case '*': m_crtToken.Type = Mul; break;
      case '/': m_crtToken.Type = Div; break;
      case '(': m_crtToken.Type = OpenParenthesis; break;
      case ')': m_crtToken.Type = ClosedParenthesis; break;
      }

      if(m_crtToken.Type != Error)
      {
         m_crtToken.Symbol = m_Text[m_Index];
         m_Index++;
      }
      else
      {
         std::stringstream sstr;
         sstr << "Unexpected token '" << m_Text[m_Index] << "' at position " << m_Index;
         throw ParserException(sstr.str(), m_Index);
      }
   }

   double GetNumber()
   {
      SkipWhitespaces();

      int index = m_Index;
      while(isdigit(m_Text[m_Index])) m_Index++;
      if(m_Text[m_Index] == '.') m_Index++;
      while(isdigit(m_Text[m_Index])) m_Index++;

      if(m_Index - index == 0)
         throw ParserException("Number expected but not found!", m_Index);

      char buffer[32] = {0};
      memcpy(buffer, &m_Text[index], m_Index - index);

      return atof(buffer);
   }

public:
   void Parse(const char* text)
   {
      m_Text = text;
      m_Index = 0;
      GetNextToken();

      Expression();
   }
};

The exception class is defined like this:

class ParserException : public std::exception
{
   int m_Pos;

public:
   ParserException(const std::string& message, int pos):
      std::exception(message.c_str()),
      m_Pos(pos)
   {
   }
};

As you can see, the code for the grammar production is quite simple and straight forward. Now, let's put it to the test.

void Test(const char* text)
{
   Parser parser;
   try
   {
      parser.Parse(text);
      std::cout << """ << text << ""t OK" << std::endl;
   }
   catch(ParserException& ex)
   {
      std::cout << """ << text << ""t " << ex.what() << std::endl;
   }
}

int main()
{
   Test("1+2+3+4");
   Test("1*2*3*4");
   Test("1-2-3-4");
   Test("1/2/3/4");
   Test("1*2+3*4");
   Test("1+2*3+4");
   Test("(1+2)*(3+4)");
   Test("1+(2*3)*(4+5)");
   Test("1+(2*3)/4+5");
   Test("5/(4+3)/2");
   Test("1 + 2.5");
   Test("125");
   Test("-1");
   Test("-1+(-2)");
   Test("-1+(-2.0)");

   Test("   1*2,5");
   Test("   1*2.5e2");
   Test("M1 + 2.5");
   Test("1 + 2&5");
   Test("1 * 2.5.6");
   Test("1 ** 2.5");
   Test("*1 / 2.5");

   return 0;
}

The output for this testing program is:

"1+2+3+4"        OK
"1*2*3*4"        OK
"1-2-3-4"        OK
"1/2/3/4"        OK
"1*2+3*4"        OK
"1+2*3+4"        OK
"(1+2)*(3+4)"    OK
"1+(2*3)*(4+5)"  OK
"1+(2*3)/4+5"    OK
"5/(4+3)/2"      OK
"1 + 2.5"        OK
"125"    OK
"-1"     OK
"-1+(-2)"        OK
"-1+(-2.0)"      OK
"   1*2,5"       Unexpected token ',' at position 6
"   1*2.5e2"     Unexpected token 'e' at position 8
"M1 + 2.5"       Unexpected token 'M' at position 0
"1 + 2&5"        Unexpected token '&' at position 5
"1 * 2.5.6"      Unexpected token '.' at position 7
"1 ** 2.5"       Unexpected token '*' at position 4
"*1 / 2.5"       Unexpected token '*' at position 1

Which is exactly what we expected: it validates correct expressions and throws an exception when the exception is incorrect.

In the next post I'll show how to modify this code to build an abstract syntax tree.

, , , , , , , Hits for this post: 9945 .

I was discussing a few days ago about evaluating expressions and I decided to explain how you can build an evaluator. I will do this in a series of posts, getting one step more in each post. I will use C++, but the approaches are the same regardless the language.

Let’s consider this expression: 1+2*3. The value of this expression is 7. But how do you evaluate it in a language like C++ if you get it as a string? First of all this is a so called “infix” notation. There are also prefix and postfix notation. The terms infix, prefix and postfix refer to the position of the operator related to the operands:

  • Prefix: operator operand1 operand2 (ex: + 1 2)
  • Infix: operand1 operator operand2 (ex: 1 + 2)
  • Postfix: operand1 operand2 operator (ex: 1 2 +)

The human understandable notation is infix. But it turns out that trying to parse a string with infix expression, from left to right and evaluate it is not possible. Because you cannot now what in advance and operators have different precedence; and there are parentheses too.

To solve the problem you’d have to build a helper structure representing the infix expression. There are two possibilities:

  • Reverse Polish Notation (RPN) implies transforming the infix expression in a postfix expression and then evaluating it from left to right. 1 + 2*3 is transformed into 1 2 3 * +. You go from left to right until you find an operator, evaluate the expression and then replace it in the stack.
  • Abstract Syntax Tree (AST) is an abstract representation of an expression, with inner nodes representing operators and leafs representing numbers.

    Abstract Syntax Tree

The RPN is harder to build and evaluate in my opinion, so I will focus on the approach with the AST.

We build an AST while parsing the expression. First, we’ll have to define the grammar for the expression. Otherwise we wouldn’t know what to parse.

EXP -> EXP + EXP | EXP - EXP | EXP * EXP | EXP / EXP | - EXP | (EXP) | number

First, this grammar is recursive, as you can see, but another important problem is that it does not represent the precedence of the operators. For this reasons, a better grammar is this:

EXP    -> EXP + TERM |
          EXP - TERM |
          TERM
TERM   -> TERM * FACTOR |
          TERM / FACTOR |
          FACTOR
FACTOR -> ( EXP ) | - EXP | number

These rules written above are called productions. The symbols used are:

  • EXP, TERM, FACTOR are called non-terminal symbols
  • +, -, /, *, (, ) number are called terminal symbols
  • EXT is the start symbol

While the grammar has the correct operator precedence, it’s still recursive, or more precisely, left-recursive. You can see that EXP goes into EXP then operator + then TERM. You never reach to match operator + because you have start again and again with a new expression. There are techniques for eliminating this recursion and the result is:

EXP    -> TERM EXP1
EXP1   -> + TERM EXP1 |
          - TERM EXP1 |
          epsilon
TERM   -> FACTOR TERM1
TERM1  -> * FACTOR TERM1 |
          / FACTOR TERM1 |
          epsilon
FACTOR -> ( EXP ) | - EXP | number

‘epsilon’ here means ‘nothing’.

With the theory (well, this is just the tip of the iceberg, but should be a good start for you) in place we’ll have to do three things:

The first two steps will be done the same time, but I’ll take them one at a time and explain it in details.

Before you continue with the implementation details, I suggest you read more about both RPN and AST and grammars.

Here are several references:

Yesterday I wrote about list in F#. Today I’ll write about arrays, which unlike lists are a mutable flat storage and cannot be resized. That means you have to create a new array if you want to remove or add elements. Advantages include constant look-up time and the fact that they can store a large amount of data.

You can create a literal array in a similar way with the lists, placing the elements between [| |]:

let data1 = [|1;2;3;4|]
printfn "data1: %a" output_any data1
data1: [|1; 2; 3; 4|]

The empty literal array is [||].

To create an array you can either use Array.create or Array.init. They both create and initialize an array, but the second makes a lambda expression, which allows advance initialization possibilities. The following creates an array with 10 elements initialized to 1:

let data2 = Array.create 10 1
printfn "data2: %a" output_any data2

Here is the output:

data2: [|1; 1; 1; 1; 1; 1; 1; 1; 1; 1|]

The same can be achieved using Array.init:

let data3 = Array.init 10 (fun x -> 1)
printfn "data3: %a" output_any data3
data3: [|1; 1; 1; 1; 1; 1; 1; 1; 1; 1|]

But we can use Array.init to initialize the elements from 1 to N for instance:

let data4 = Array.init 10 (fun x -> x+1)
printfn "data4: %a" output_any data4
data4: [|1; 2; 3; 4; 5; 6; 7; 8; 9; 10|]

The arrays are mutable data structures. Elements are accessed with .[] or .(). The following code shows how to set the elements of an array:

let data5 = Array.create 10 0
for i = 0 to (Array.length data5)-1 do
   data5.[i] <- i+1

printfn "data5: %a" output_any data5
data5: [|1; 2; 3; 4; 5; 6; 7; 8; 9; 10|]

You can iterate over the elements of an array with Array.iter and Array.iteri, the second also providing access to the index of the elements.

data4 |> Array.iter (fun x -> printf "%d " x)
printfn ""

data4 |> Array.iteri (fun i x -> printfn "data4(%d) = %d" i x)
1 2 3 4 5 6 7 8 9 10
data4(0) = 1
data4(1) = 2
data4(2) = 3
data4(3) = 4
data4(4) = 5
data4(5) = 6
data4(6) = 7
data4(7) = 8
data4(8) = 9
data4(9) = 10

Retrieving the length of the array can either be done with Array.length arr or with arr.Length.

for i = 0 to data4.Length-1 do
   printfn "data4(%d) = %d" i data4.(i)
data4(0) = 1
data4(1) = 2
data4(2) = 3
data4(3) = 4
data4(4) = 5
data4(5) = 6
data4(6) = 7
data4(7) = 8
data4(8) = 9
data4(9) = 10

Like the lists, arrays provide mapping that creates a new array by applying a function on all the elements of an array (with Array.map) or two arrays (with Array.map2).

let data6 = data4 |> Array.map (fun x -> x*2)
printfn "data6: %a" output_any data6

let data7 = Array.map2 (fun x y -> x+y) data4 data6
printfn "data7: %a" output_any data7
data6: [|2; 4; 6; 8; 10; 12; 14; 16; 18; 20|]
data7: [|3; 6; 9; 12; 15; 18; 21; 24; 27; 30|]

A copy of an array can be done with Array.copy.

let data7 = Array.copy data6
printfn "data7: %a" output_any data7
data7: [|2; 4; 6; 8; 10; 12; 14; 16; 18; 20|]

Appending elements to an array is also possible with Array.append, but the result is a new array, created by concatenating two arrays.

let data8 = Array.append data7 [|100|]
printfn "data8: %a" output_any data8
data8: [|2; 4; 6; 8; 10; 12; 14; 16; 18; 20; 100|]

The last operation of arrays I'm going to mention here is the folding, which allows applying a function to all the elements of an array, threading an accumulator argument in the process. The following example shows how to compute the sum of the elements of an array.

let data9 = [|1;2;3;4|]
let sum1 = (Array.fold_left (fun acc x-> x + acc) 0 data9)
let sum2 = (Array.fold_right (fun acc x-> x + acc) data9 0)
printfn "sum1 = %d" sum1
printfn "sum2 = %d" sum2
sum1 = 10
sum2 = 10

Hits for this post: 10977 .

In this post I will talk about the lists in F#, one of the fundamental concepts of the language. What should be said from the very beginning is that list are imutable single linked list. That means whenever you change a list, a new list is created.

You can declare a list in the following ways:

let list1 = [1;2;3;4]
let list2 = 5::6::7::8::[]

To print the content of the list you can do this:

printfn “list1: %a” output_any list1
printfn “list2: %a” output_any list2

list1: [1; 2; 3; 4]
list2: [5; 6; 7; 8]

You can concatenate two lists with operator @:

let list3 = list1 @ list2
printfn "list3: %a" output_any list3

list3: [1; 2; 3; 4; 5; 6; 7; 8]

and you can append elements to the beginning of the list with operator ::

let list4 = -1::0::list3
printfn "list4: %a" output_any list4

list4: [-1; 0; 1; 2; 3; 4; 5; 6; 7; 8]

You can also use the List (defined in Microsoft.FSharp.Code) functionality to print a list by iterating over its elements:

list3: [1; 2; 3; 4; 5; 6; 7; 8]

1 2 3 4

The same can be achieved using the pipe operator:

list1 |> List.iter (fun x -> printf "%d " x)

You can also iterate and get the index of the list elements, with List.iteri:

list1 |> List.iteri (fun i x -> printfn "list1[%d] : %d " i x)

list1[0] : 1
list1[1] : 2
list1[2] : 3
list1[3] : 4

List have a special representation, a head followed by a tail, that is in turn another list (including empty list []). Let's consider the list [1;2;3]. It has the head 1, and the tail [2;3]. The tail, in turn, has the head 2 and the tail [3]. This tail has the head 3 and the tail [], which is the empty list.
You can see the head and tail of a list with List.hd and List.td:

printfn "head list1: %a" output_any (List.hd list1)
printfn "tail list1: %a" output_any (List.tl list1)

The ouput for list1 [1;2;3] is:

head list1: 1
tail list1: [2;3]

Enough with basic things. Let's try working with lists.

1. Minimum and maximum from a list

We can compute the maximum (or minimum) of a list using the following algorithm:

  • if the list is empty, indicate error
  • if the list has only one element, that is the maximum (or minimum)
  • if the list has at least to elements, compute the maximum between that element and the maximum from the rest of the list

That sounds like a recursive operation, which can be simply put in F# like this:

let rec greatest_element l =
    match l with
    | [] -> failwith "empty list"
    | [x] -> x
    | x::rest -> max x (greatest_element rest)

let rec smallest_element l =
    match l with
    | [] -> failwith "empty list"
    | [x] -> x
    | x::rest -> min x (smallest_element rest)

We can use that like this:

let list1 = [1;2;3;4;-4;-3;-2;-1]
let list2 = []   

try
   printfn "maximum from list1: %d" (greatest_element list1)
   printfn "minimum from list1: %d" (smallest_element list1)

   printfn "maximum from list2: %d" (greatest_element list2)
   printfn "minimum from list2: %d" (smallest_element list2)
with
   Failure msg ->
      printfn "Error: %s" msg

and the output would be:

maximum from list1: 4
minimum from list1: -4
Error: empty list

2. Reversing a list

How would we reverse a list? We should take the last element and append to it the one before the last. To the new list we append the one before the one before the end, etc. That again sounds recursive.

let rec revert_list l =
   match l with
   | [] -> []
   | x::rest -> (revert_list rest) @ [x]

let list1 = [1;2;3;4;-4;-3;-2;-1]

printfn "list1: %a" output_any list1
printfn "list2: %a" output_any (revert_list list1)

And here is the output:

list1: [1; 2; 3; 4; -4; -3; -2; -1]
list2: [-1; -2; -3; -4; 4; 3; 2; 1]

3. Inserting in a list

So how could we insert an element in a list, before or after a specified element? We can use the following algorithm:

  • if the list is empty, the new list has one element (the one to insert)
  • else, if the head is the element we are looking for, create a list, with the new element either before the head, or between the head and the tail
  • else, if the head is not the element we are looking for, append the head to a list created by inserting the new element in the tail.

You got that right, recursion again.

let rec insert_after elem newelem l =
    match l with
    | [] -> [newelem]
    | x::rest -> if x = elem then
                    (x::newelem::rest)
                 else
                     x::(insert_after elem newelem rest)

let rec insert_before elem newelem l =
    match l with
    | [] -> [newelem]
    | x::rest -> if x = elem then
                    (newelem::x::rest)
                 else
                    x::(insert_before elem newelem rest)        

let list1 = [1;2;3;4;-4;-3;-2;-1]
let list2 = insert_after 4 6 list1
let list3 = insert_before 6 5 list2

printfn "list1: %a" output_any list1
printfn "list2: %a" output_any list2
printfn "list3: %a" output_any list3

And the output is:

list1: [1; 2; 3; 4; -4; -3; -2; -1]
list2: [1; 2; 3; 4; 6; -4; -3; -2; -1]
list3: [1; 2; 3; 4; 5; 6; -4; -3; -2; -1]

4. Removing elements from a list

As a last exercise, let's consider the removing of elements from a list. The following steps can be used to remove elements:

  • if the list is empty, return an empty list
  • if the list is not empty and the head meets the removing criteria, return a list obtained by reiterating the algorithm on the tail of the list
  • if the list is not empty and the head does not meet the removing criteria, return a list obtained by appending the head to a list optained by reiterating the algorithm on the tail of the list

let rec remove_if l predicate =
    match l with
    | [] -> []
    | x::rest -> if predicate(x) then
                    (remove_if rest predicate)
                 else
                     x::(remove_if rest predicate)

The great thing about this implementation is that we can pass a lambda expression as a predicate, and use it to specify the criteria for removing elements. We can remove like that, for instance, the odd elements, or the even elements, or the negative elements. Here is some sample code:

let list1 = [1;2;3;4;-4;-3;-2;-1]

let list2 = remove_if list1 (fun x -> (abs x &&&1) = 1)
let list3 = remove_if list1 (fun x -> (abs x &&&1) = 0)
let list4 = remove_if list1 (fun x -> x < 0)

printfn "%a" output_any list1
printfn "%a" output_any list2
printfn "%a" output_any list3
printfn "%a" output_any list4

The output for this sample is:

[1; 2; 3; 4; -4; -3; -2; -1]
[2; 4; -4; -2]
[1; 3; -3; -1]
[1; 2; 3; 4]

I hope this will help you to get a grip on how you can work on lists in F#.

Hits for this post: 11027 .

I was writing today about the movie Pirates of the Caribbeans, naming the post “Pirates at their Worst“. In the meanwhile I found this article on the APC Magazine’s web site, called Romania: a global hotspot for eBay fraud. It reported that the Australian office of eBay has organized a news conference to talk about the frauds commited by Romanians. According to a report from 2006 of the IC3.GOV Internet Crime Complaint Center, Romanian ring crime scammed eBay users for $5,000,000. Out of the sudden, pirats are no longer romantic heroes, looking for adventure, driven by honor and best rolemodels our kids can have.

Mat Henley, of the eBay global fraud investigation team said that:

We discovered that Romania had a huge technology gap between generations. It was enormous: 25-30 year old criminals were some of the brightest people we’ve dealt with, but when you mix in the prosecutors, law enforcement and magistrates, some of them had never been on a computer – period.

To tackle the perpetrators, eBay had to buy computers and digital cameras for the Romanian police and involve US agents working in Romania. The result were “hundreds” of arrests being made, though he refused to give any figures about the actual impact on the internet fraud.

Now, what I have to say about that: yes, Romanians were involved in many frauds, but I doubt that eBay bought computers for the Romanian Police and they made hundreds of arrests in Romania. The news conference was rather a way to make publicity for the company and calm down the unrest among Australian users about frauds on eBay. If you read the report I mentioned above, you’ll see that Romania takes only the 5th place, accounting for only 1.6% of fraud.

Top Ten Countries – Perpetrator
1. United States – 60.9%
2. United Kingdom – 15.9%
3. Nigeria – 5.9%
4. Canada – 5.6%
5. Romania – 1.6%
6. Italy – 1.2%
7. Netherlands – 1.2%
8. Russia– 1.1%
9. Germany – 0.7%
10. South Africa – 0.6%

Most of Romanians operated from countries such as US or UK. I don’t think they had to buy computers to the Police in these countries. So, without a real report from eBay or Romanian Police about the frauds made from Romania, I call the eBay announcements a scam.

And as an interesting thing, I read in the report that:

Pruette is a Romanian national and is currently being held by the Immigration and Naturalization Service. He is believed to be part of a multi-state cell which sends funds to Romania and other countries to support terrorist efforts.

So, according to people that investigate internet crime, Romania might be a terorrist heaven. Let’s only hope that after the Middle East, they won’t focus on us to counter terrorism and bring democracy to the area.

Hits for this post: 5983 .

I have decided to make a comparison for file IO operations on Win32, CRT, STL and MFC.

For all four libraries/APIs I have done the profiling in the following way:

  • open the file
  • allocate the buffer used for reading
  • start the timer
  • read/write from/to the file
  • stop the timer
  • close the file
  • release the memory

This way, the profiling only applies on the read or write operations, not on other task such as opening and closing files, allocating or releasing memory.

What I’ve used:

  • Win32: functions CreateFile, ReadFile, WriteFile and CloseHandle
  • CRT: functions FILE, fopen, fread, fwrite and fclose
  • STL: for reading class std::ifstream, and methods, open(), read() and close() and for writing class std::ofstream and methods open(), write() and close()
  • MFC: class CFile, and methods Open(), Read(), Write and Close()

I have performed the reading with different buffer sizes: 32, 64, 128, 256, 512 bytes and 1KB, 2KB, 4KB, 8KB, 16KB, 32KB, as well as with a buffer accomodating the entire file. Same buffer sizes were used for writing. For testing the write operation I also wrote the file at once. In all cases, I generated a 16MB file.

To decide which one is better overall, I have associated a score with each result. The faster (for each buffer size) got 4 points, the next 3, 2, and 1. The bigger the sum, the more performant overall.

I have run the program on two files for reading on a Intel(R) Pentium(R) 4 CPU at 3.20GHz, 1 GB RAM, running Windows XP, SP2. The results, representing an average of 15 runs, are shown bellow:

File 1: size 2,131,287 bytes

Buffer Size CRT Win32 STL MFC CRT Win32 STL MFC
32 0.01917630 0.063093700 0.02123180 0.064283700 4 2 3 1
64 0.01474360 0.031909200 0.01460960 0.032482700 3 2 4 1
128 0.01118370 0.016183700 0.01164060 0.016426700 4 2 3 1
256 0.00929148 0.008573490 0.01063090 0.008840810 2 4 1 3
512 0.01071420 0.004684040 0.00985086 0.004745970 1 4 2 3
1024 0.00883909 0.002584480 0.00907385 0.002486950 2 3 1 4
2048 0.00847502 0.001531440 0.00894887 0.001477660 2 3 1 4
4096 0.00776395 0.000981391 0.00891128 0.001009350 2 4 1 3
8192 0.00740465 0.000744340 0.00913489 0.000749145 2 4 1 3
16384 0.00740928 0.000604900 0.00936410 0.000673978 2 4 1 3
32768 0.00736531 0.000657141 0.00837419 0.000610040 2 3 1 4
file size 0.00955846 0.002496180 0.00981464 0.002428280 2 3 1 4
          28 38 20 34

File 2: size 110,999,662 bytes

Buffer Size CRT Win32 STL MFC CRT Win32 STL MFC
32 1.011360 3.3216500 2.47695 3.2822700 4 1 3 2
64 0.742683 1.6815600 0.804563 1.6836300 4 2 3 1
128 0.600344 0.8697840 0.639113 0.8750610 4 2 3 1
256 0.521233 0.4661430 0.586376 0.4751340 2 4 1 3
512 0.501420 0.2734540 0.532212 0.2653010 2 3 1 4
1024 0.474670 0.1532950 0.510266 0.1587330 2 4 1 3
2048 0.458538 0.1012430 0.479981 0.1067980 2 4 1 3
4096 0.432552 0.0715536 0.488251 0.0774886 2 4 1 3
8192 0.417481 0.0607284 0.467426 0.0674372 2 4 1 3
16384 0.400320 0.0510897 0.458111 0.0602826 2 4 1 3
32768 0.406497 0.0503835 0.461796 0.0572124 2 4 1 3
file size 0.523950 0.1867240 0.583327 0.1828440 2 3 1 4
          30 39 18 33

The first conclusion is that overall Win32 is the fastest, followed by MFC, then by CRT, the slowest being the STL.

The second conclusion is that CRT is the fastest with buffer sizes smaller than 256 bytes, and then Win32 and MFC are the faster.

The results for writing were a quite similar. Of course, running several times, can produce slight variation in the results (both for read and write).

File 3 : size 16,809,984

Buffer Size CRT Win32 STL MFC CRT Win32 STL MFC
32 0.273796 0.890973 0.335245 0.877301 4 1 3 2
64 0.219715 0.465254 0.259597 0.450076 4 1 3 2
128 0.181927 0.24715 0.201949 0.245169 4 1 3 2
256 0.178976 0.141146 0.189154 0.143666 2 4 1 3
512 0.153816 0.0872411 0.172239 0.0851424 2 3 1 4
1024 0.148846 0.0608282 0.159186 0.0601419 2 3 1 4
2048 0.139997 0.0493811 0.150503 0.0496117 2 4 1 3
4096 0.125797 0.0705146 0.15275 0.0508061 2 3 1 4
8192 0.126708 0.15708 0.1459 0.0655567 3 1 2 4
16384 0.121919 0.0282886 0.14662 0.158024 3 4 2 1
32768 0.124429 0.0247259 0.145496 0.0267301 2 4 1 3
16809984 0.148424 0.47066 0.146321 0.513205 3 2 4 1
          33 31 23 33

You can download the project I used for the benchmark from here.

Hits for this post: 13758 .