Evaluate Expressions – Part 1: The Approaches
Evaluate Expressions – Part 2: Parse the Expression
Evaluate Expressions – Part 3: Building the Abstract Syntax Tree
Evaluate Expressions – Part 4: Evaluate the Abstract Syntax Tree

So far we have managed to parse the text representing an expression and build an abstract syntax tree. The only thing left, and the simplest of them all, is traversing this abstract syntax tree and evaluating the expression is represents.

In pseudo code, this would look like this:

We actually have to check for one more type of node, the one representing an unary expression. However, the evaluation function is as simple as this:

The exception class is defined as:

So let’s try it:

The output of this test program is:

And that’s it. Define the grammar, build the parser, insert semantic actions and build the abstract syntax tree and then traverse it and evaluate the expression. If you are interested in understanding the grammar, and the parsing in a deeper manner than I presented in this posts, I suggest you read more articles. The purpose was not to teach compilers theory, but put it to a practical purpose.

Here you can download a Visual Studio 2008 project with the code contained in this tutorial.

, , , , Hits for this post: 52085 .

In my previous post we’ve parsed an exception verifying whether it’s correct or not syntactically. But we still have to evaluate it. To be able to do that we’ll have to build an abstract syntax tree. This can be done by modifying the previous code and inserting semantic action. That means we do something more when we match productions.

An abstract syntax tree is a binary tree. The inner nodes will represent operators and leafs will be numerical values.

Here is how a node in the AST will look:

AST node

It is defined like this:

For the expression 1+2*3, the AST will be:

AST example

We’ll build this tree by inserting semantic actions and adding nodes according to the following rules:

AST semantic rules

You’ll probably notice that based on these rules the AST shown above will be modified a little bit, with some additional nodes for operators + and *, having on the left a leaf node with the neutral element for the operation (zero for + and 1 for *), and on the right a node corresponding to a TERM or FACTOR. This won’t affect the evaluation.

The Parser class will change so that the functions corresponding to the non-terminal symbols EXP, EXP1, TERM, TERM1 and FACTOR will return an ASTNode* instead of void. That is the node created as a semantic action.

Now the Parse() method will return the created abstract syntax tree. We will see how to evaluate the expression by traversing this tree in the next post.

, , , , Hits for this post: 56228 .

In my previous post I have provided some background theory for evaluating expressions with abstract syntax trees. As I was mentioning, the first step towards this goal is to parse the expression, make sure it is correct syntactically. This is what I’ll show you in this post.

Having the grammar defined, we’ll make one function for each non-terminal symbol (EXP, EXP1, TERM, TERM1, FACTOR).

Simply put the code will look like this:

However, I want to make it a little bit more organized, so the first thing to do will be defining a Token structure that will indicate the type of last extracted token and if the case its value (for numbers). A token is basically a symbol extracted (one at a time) from the input text. The possible tokens will be the arithmetical operators (‘+’, ‘-‘, ‘/’, ‘*’), the parentheses (‘(‘ and ‘)’), numbers and the end of the text.

Here is how I defined the token type and the token:

To be able to do the parsing, we’ll need some helper functions:

  • SkipWhitespaces(), skips all whitespaces between two tokens:
  • GetNextToken(), extracts the next token from the text; if an illegal token appears it throws an exception
  • GetNumber() extracts a number from the input text from the current position; the purpose of this tutorial is didactical, so this function is quite simple: it reads integers and doubles with ‘.’ As the decimal point; it doesn’t read numbers in a format like 123.3E+2.

With these defined, we can build the parser for the specified grammar.

The exception class is defined like this:

As you can see, the code for the grammar production is quite simple and straight forward. Now, let’s put it to the test.

The output for this testing program is:

Which is exactly what we expected: it validates correct expressions and throws an exception when the exception is incorrect.

In the next post I’ll show how to modify this code to build an abstract syntax tree.

, , , , , , , Hits for this post: 51619 .

I was discussing a few days ago about evaluating expressions and I decided to explain how you can build an evaluator. I will do this in a series of posts, getting one step more in each post. I will use C++, but the approaches are the same regardless the language.

Let’s consider this expression: 1+2*3. The value of this expression is 7. But how do you evaluate it in a language like C++ if you get it as a string? First of all this is a so called “infix” notation. There are also prefix and postfix notation. The terms infix, prefix and postfix refer to the position of the operator related to the operands:

  • Prefix: operator operand1 operand2 (ex: + 1 2)
  • Infix: operand1 operator operand2 (ex: 1 + 2)
  • Postfix: operand1 operand2 operator (ex: 1 2 +)

The human understandable notation is infix. But it turns out that trying to parse a string with infix expression, from left to right and evaluate it is not possible. Because you cannot now what in advance and operators have different precedence; and there are parentheses too.

To solve the problem you’d have to build a helper structure representing the infix expression. There are two possibilities:

  • Reverse Polish Notation (RPN) implies transforming the infix expression in a postfix expression and then evaluating it from left to right. 1 + 2*3 is transformed into 1 2 3 * +. You go from left to right until you find an operator, evaluate the expression and then replace it in the stack.
  • Abstract Syntax Tree (AST) is an abstract representation of an expression, with inner nodes representing operators and leafs representing numbers.

    Abstract Syntax Tree

The RPN is harder to build and evaluate in my opinion, so I will focus on the approach with the AST.

We build an AST while parsing the expression. First, we’ll have to define the grammar for the expression. Otherwise we wouldn’t know what to parse.

First, this grammar is recursive, as you can see, but another important problem is that it does not represent the precedence of the operators. For this reasons, a better grammar is this:

These rules written above are called productions. The symbols used are:

  • EXP, TERM, FACTOR are called non-terminal symbols
  • +, -, /, *, (, ) number are called terminal symbols
  • EXT is the start symbol

While the grammar has the correct operator precedence, it’s still recursive, or more precisely, left-recursive. You can see that EXP goes into EXP then operator + then TERM. You never reach to match operator + because you have start again and again with a new expression. There are techniques for eliminating this recursion and the result is:

‘epsilon’ here means ‘nothing’.

With the theory (well, this is just the tip of the iceberg, but should be a good start for you) in place we’ll have to do three things:

The first two steps will be done the same time, but I’ll take them one at a time and explain it in details.

Before you continue with the implementation details, I suggest you read more about both RPN and AST and grammars.

Here are several references: