<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>Marius Bancila's Blog &#187; expression</title>
	<atom:link href="http://mariusbancila.ro/blog/tag/expression/feed/" rel="self" type="application/rss+xml" />
	<link>http://mariusbancila.ro/blog</link>
	<description>Sharing my opinions and ideas!</description>
	<lastBuildDate>Fri, 06 Apr 2012 13:45:55 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.3.1</generator>
		<item>
		<title>Evaluating Expressions &#8211; Part 3: Building the AST</title>
		<link>http://mariusbancila.ro/blog/2009/02/05/evaluating-expressions-part-3-building-the-ast/</link>
		<comments>http://mariusbancila.ro/blog/2009/02/05/evaluating-expressions-part-3-building-the-ast/#comments</comments>
		<pubDate>Thu, 05 Feb 2009 06:00:14 +0000</pubDate>
		<dc:creator>Marius Bancila</dc:creator>
				<category><![CDATA[Articles & Tutorials]]></category>
		<category><![CDATA[C++]]></category>
		<category><![CDATA[AST]]></category>
		<category><![CDATA[expression]]></category>
		<category><![CDATA[parser]]></category>
		<category><![CDATA[tree]]></category>

		<guid isPermaLink="false">http://mariusbancila.ro/blog/?p=150</guid>
		<description><![CDATA[In my previous post we&#8217;ve parsed an exception verifying whether it&#8217;s correct or not syntactically. But we still have to evaluate it. To be able to do that we&#8217;ll have to build an abstract syntax tree. This can be done by modifying the previous code and inserting semantic action. That means we do something more [...]]]></description>
			<content:encoded><![CDATA[<p>In my <a href="http://mariusbancila.ro/blog/?p=149">previous post</a> we&#8217;ve parsed an exception verifying whether it&#8217;s correct or not syntactically. But we still have to evaluate it. To be able to do that we&#8217;ll have to build an abstract syntax tree. This can be done by modifying the previous code and inserting semantic action. That means we do something more when we match productions.</p>
<p>An abstract syntax tree is a binary tree. The inner nodes will represent operators and leafs will be numerical values.</p>
<p>Here is how a node in the AST will look:</p>
<p><img style="vertical-align: middle;" src="/blog/wp-content/uploads/2009/02/ast_node.png" alt="AST node" width="245" height="51" /></p>
<p>It is defined like this:</p>
<pre class="prettyprint">
enum ASTNodeType
{
   Undefined,
   OperatorPlus,
   OperatorMinus,
   OperatorMul,
   OperatorDiv,
   UnaryMinus,
   NumberValue
};

class ASTNode
{
public:
   ASTNodeType Type;
   double      Value;
   ASTNode*    Left;
   ASTNode*    Right;

   ASTNode()
   {
      Type = Undefined;
      Value = 0;
      Left = NULL;
      Right = NULL;
   }

   ~ASTNode()
   {
      delete Left;
      delete Right;
   }
};
</pre>
<p>For the expression 1+2*3, the AST will be:</p>
<p><img style="vertical-align: middle;" src="/blog/wp-content/uploads/2009/02/ast_example.png" alt="AST example" width="665" height="246" /></p>
<p>We&#8217;ll build this tree by inserting semantic actions and adding nodes according to the following rules:</p>
<p><img style="vertical-align: middle;" src="/blog/wp-content/uploads/2009/02/ast_semanticrules.png" alt="AST semantic rules" width="659" height="229" /></p>
<p>You&#8217;ll probably notice that based on these rules the AST shown above will be modified a little bit, with some additional nodes for operators + and *, having on the left a leaf node with the neutral element for the operation (zero for + and 1 for *), and on the right a node corresponding to a TERM or FACTOR. This won&#8217;t affect the evaluation.</p>
<p>The Parser class will change so that the functions corresponding to the non-terminal symbols EXP, EXP1, TERM, TERM1 and FACTOR will return an ASTNode* instead of void. That is the node created as a semantic action.</p>
<pre class="prettyprint">
class Parser
{
   Token m_crtToken;
   const char* m_Text;
   size_t m_Index;

private:

   ASTNode* Expression()
   {
      ASTNode* tnode = Term();
      ASTNode* e1node = Expression1();

      return CreateNode(OperatorPlus, tnode, e1node);
   }

   ASTNode* Expression1()
   {
      ASTNode* tnode;
      ASTNode* e1node;

      switch(m_crtToken.Type)
      {
      case Plus:
         GetNextToken();
         tnode = Term();
         e1node = Expression1();

         return CreateNode(OperatorPlus, e1node, tnode);

      case Minus:
         GetNextToken();
         tnode = Term();
         e1node = Expression1();

         return CreateNode(OperatorMinus, e1node, tnode);
      }

      return CreateNodeNumber(0);
   }

   ASTNode* Term()
   {
      ASTNode* fnode = Factor();
      ASTNode* t1node = Term1();

      return CreateNode(OperatorMul, fnode, t1node);
   }

   ASTNode* Term1()
   {
      ASTNode* fnode;
      ASTNode* t1node;

      switch(m_crtToken.Type)
      {
      case Mul:
         GetNextToken();
         fnode = Factor();
         t1node = Term1();
         return CreateNode(OperatorMul, t1node, fnode);

      case Div:
         GetNextToken();
         fnode = Factor();
         t1node = Term1();
         return CreateNode(OperatorDiv, t1node, fnode);
      }

      return CreateNodeNumber(1);
   }

   ASTNode* Factor()
   {
      ASTNode* node;
      switch(m_crtToken.Type)
      {
      case OpenParenthesis:
         GetNextToken();
         node = Expression();
         Match(')');
         return node;

      case Minus:
         GetNextToken();
		 node = Factor();
         return CreateUnaryNode(node);

      case Number:
         {
            double value = m_crtToken.Value;
            GetNextToken();
            return CreateNodeNumber(value);
         }

      default:
         {
            std::stringstream sstr;
            sstr << "Unexpected token '" << m_crtToken.Symbol << "' at position " << m_Index;
            throw ParserException(sstr.str(), m_Index);
         }
      }
   }

   ASTNode* CreateNode(ASTNodeType type, ASTNode* left, ASTNode* right)
   {
      ASTNode* node = new ASTNode;
      node->Type = type;
      node->Left = left;
      node->Right = right;

      return node;
   }

   ASTNode* CreateUnaryNode(ASTNode* left)
   {
      ASTNode* node = new ASTNode;
      node->Type = UnaryMinus;
      node->Left = left;
      node->Right = NULL;

      return node;
   }

   ASTNode* CreateNodeNumber(double value)
   {
      ASTNode* node = new ASTNode;
      node->Type = NumberValue;
      node->Value = value;

      return node;
   }

   void Match(char expected)
   {
      if(m_Text[m_Index-1] == expected)
         GetNextToken();
      else
      {
         std::stringstream sstr;
         sstr << "Expected token '" << expected << "' at position " << m_Index;
         throw ParserException(sstr.str(), m_Index);
      }
   }

   void SkipWhitespaces()
   {
      while(isspace(m_Text[m_Index])) m_Index++;
   }

   void GetNextToken()
   {
      SkipWhitespaces();

	  m_crtToken.Value = 0;
	  m_crtToken.Symbol = 0;

      if(m_Text[m_Index] == 0)
      {
         m_crtToken.Type = EndOfText;
         return;
      }

      if(isdigit(m_Text[m_Index]))
      {
         m_crtToken.Type = Number;
		 m_crtToken.Value = GetNumber();
         return;
      }

      m_crtToken.Type = Error;

      switch(m_Text[m_Index])
      {
      case '+': m_crtToken.Type = Plus; break;
      case '-': m_crtToken.Type = Minus; break;
      case '*': m_crtToken.Type = Mul; break;
      case '/': m_crtToken.Type = Div; break;
      case '(': m_crtToken.Type = OpenParenthesis; break;
      case ')': m_crtToken.Type = ClosedParenthesis; break;
      }

      if(m_crtToken.Type != Error)
	  {
         m_crtToken.Symbol = m_Text[m_Index];
         m_Index++;
	  }
      else
      {
         std::stringstream sstr;
         sstr << "Unexpected token '" << m_Text[m_Index] << "' at position " << m_Index;
         throw ParserException(sstr.str(), m_Index);
      }
   }

   double GetNumber()
   {
      SkipWhitespaces();

      int index = m_Index;
      while(isdigit(m_Text[m_Index])) m_Index++;
      if(m_Text[m_Index] == '.') m_Index++;
      while(isdigit(m_Text[m_Index])) m_Index++;

      if(m_Index - index == 0)
         throw ParserException("Number expected but not found!", m_Index);

      char buffer[32] = {0};
      memcpy(buffer, &amp;m_Text[index], m_Index - index);

      return atof(buffer);
   }

public:
   ASTNode* Parse(const char* text)
   {
      m_Text = text;
      m_Index = 0;
      GetNextToken();

      return Expression();
   }
};
</pre>
<p>Now the Parse() method will return the created abstract syntax tree. We will see how to evaluate the expression by traversing this tree in the next post.</p>
<p></p>
]]></content:encoded>
			<wfw:commentRss>http://mariusbancila.ro/blog/2009/02/05/evaluating-expressions-part-3-building-the-ast/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Evaluating Expressions &#8211; Part 2: Parse the Expression</title>
		<link>http://mariusbancila.ro/blog/2009/02/04/evaluating-expressions-part-2-parse-the-expression/</link>
		<comments>http://mariusbancila.ro/blog/2009/02/04/evaluating-expressions-part-2-parse-the-expression/#comments</comments>
		<pubDate>Wed, 04 Feb 2009 13:32:25 +0000</pubDate>
		<dc:creator>Marius Bancila</dc:creator>
				<category><![CDATA[Articles & Tutorials]]></category>
		<category><![CDATA[C#]]></category>
		<category><![CDATA[C++]]></category>
		<category><![CDATA[analyzer]]></category>
		<category><![CDATA[AST]]></category>
		<category><![CDATA[expression]]></category>
		<category><![CDATA[factor]]></category>
		<category><![CDATA[parser]]></category>
		<category><![CDATA[term]]></category>
		<category><![CDATA[tree]]></category>

		<guid isPermaLink="false">http://mariusbancila.ro/blog/?p=149</guid>
		<description><![CDATA[In my previous post I have provided some background theory for evaluating expressions with abstract syntax trees. As I was mentioning, the first step towards this goal is to parse the expression, make sure it is correct syntactically. This is what I&#8217;ll show you in this post. Having the grammar defined, we&#8217;ll make one function [...]]]></description>
			<content:encoded><![CDATA[<p>In my <a href="http://mariusbancila.ro/blog/?p=148">previous post</a> I have provided some background theory for evaluating expressions with abstract syntax trees. As I was mentioning, the first step towards this goal is to parse the expression, make sure it is correct syntactically. This is what I&#8217;ll show you in this post.</p>
<p>Having the grammar defined, we&#8217;ll make one function for each non-terminal symbol (EXP, EXP1, TERM, TERM1, FACTOR).</p>
<p>Simply put the code will look like this:</p>
<pre class="prettyprint">
   void Expression()
   {
      Term();
      Expression1();
   }

   void Expression1()
   {
      switch(current_token)
      {
      case '+':
         GetNextToken();
         Term();
         Expression1();
         break;

      case '-':
         GetNextToken();
         Term();
         Expression1();
         break;
      }
   }
</pre>
<p>However, I want to make it a little bit more organized, so the first thing to do will be defining a <tt>Token</tt> structure that will indicate the type of last extracted token and if the case its value (for numbers). A token is basically a symbol extracted (one at a time) from the input text. The possible tokens will be the arithmetical operators (&#8216;+&#8217;, &#8216;-&#8217;, &#8216;/&#8217;, &#8216;*&#8217;), the parentheses (&#8216;(&#8216; and &#8216;)&#8217;), numbers and the end of the text.</p>
<p>Here is how I defined the token type and the token:</p>
<pre class="prettyprint">
enum TokenType
{
   Error,
   Plus,
   Minus,
   Mul,
   Div,
   EndOfText,
   OpenParenthesis,
   ClosedParenthesis,
   Number
};

struct Token
{
   TokenType	Type;
   double		Value;
   char		Symbol;

   Token():Type(Error), Value(0), Symbol(0)
   {}
};
</pre>
<p>To be able to do the parsing, we&#8217;ll need some helper functions:</p>
<ul>
<li><b>SkipWhitespaces()</b>, skips all whitespaces between two tokens:
<pre class="prettyprint">
   void SkipWhitespaces()
   {
      while(isspace(m_Text[m_Index])) m_Index++;
   }
</pre>
</li>
<li><b>GetNextToken()</b>, extracts the next token from the text; if an illegal token appears it throws an exception
<pre class="prettyprint">
   void GetNextToken()
   {
      // ignore white spaces
      SkipWhitespaces();

      m_crtToken.Value = 0;
      m_crtToken.Symbol = 0;

      // test for the end of text
      if(m_Text[m_Index] == 0)
      {
         m_crtToken.Type = EndOfText;
         return;
      }

      // if the current character is a digit read a number
      if(isdigit(m_Text[m_Index]))
      {
         m_crtToken.Type = Number;
         m_crtToken.Value = GetNumber();
         return;
      }

      m_crtToken.Type = Error;

      // check if the current character is an operator or parentheses
      switch(m_Text[m_Index])
      {
      case '+': m_crtToken.Type = Plus; break;
      case '-': m_crtToken.Type = Minus; break;
      case '*': m_crtToken.Type = Mul; break;
      case '/': m_crtToken.Type = Div; break;
      case '(': m_crtToken.Type = OpenParenthesis; break;
      case ')': m_crtToken.Type = ClosedParenthesis; break;
      }

      if(m_crtToken.Type != Error)
      {
         m_crtToken.Symbol = m_Text[m_Index];
         m_Index++;
      }
      else
      {
         std::stringstream sstr;
         sstr << "Unexpected token '" << m_Text[m_Index] << "' at position " << m_Index;
         throw ParserException(sstr.str(), m_Index);
      }
   }
</pre>
</li>
<li><b>GetNumber()</b> extracts a number from the input text from the current position; the purpose of this tutorial is didactical, so this function is quite simple: it reads integers and doubles with '.' As the decimal point; it doesn't read numbers in a format like 123.3E+2.
<pre class="prettyprint">
   double GetNumber()
   {
      SkipWhitespaces();

      int index = m_Index;
      while(isdigit(m_Text[m_Index])) m_Index++;
      if(m_Text[m_Index] == '.') m_Index++;
      while(isdigit(m_Text[m_Index])) m_Index++;

      if(m_Index - index == 0)
         throw ParserException("Number expected but not found!", m_Index);

      char buffer[32] = {0};
      memcpy(buffer, &#038;m_Text[index], m_Index - index);

      return atof(buffer);
   }
</pre>
</li>
</ul>
<p>With these defined, we can build the parser for the specified grammar.</p>
<pre class="prettyprint">
class Parser
{
   Token m_crtToken;
   const char* m_Text;
   size_t m_Index;

private:

   void Expression()
   {
      Term();
      Expression1();
   }

   void Expression1()
   {
      switch(m_crtToken.Type)
      {
      case Plus:
         GetNextToken();
         Term();
         Expression1();
         break;

      case Minus:
         GetNextToken();
         Term();
         Expression1();
         break;
      }
   }

   void Term()
   {
      Factor();
      Term1();
   }

   void Term1()
   {
      switch(m_crtToken.Type)
      {
      case Mul:
         GetNextToken();
         Factor();
         Term1();
         break;

      case Div:
         GetNextToken();
         Factor();
         Term1();
         break;
      }
   }

   void Factor()
   {
      switch(m_crtToken.Type)
      {
      case OpenParenthesis:
         GetNextToken();
         Expression();
         Match(')');
         break;

      case Minus:
         GetNextToken();
         Factor();
         break;

      case Number:
         GetNextToken();
         break;

      default:
         {
            std::stringstream sstr;
            sstr << "Unexpected token '" << m_crtToken.Symbol << "' at position " << m_Index;
            throw ParserException(sstr.str(), m_Index);
         }
      }
   }

   void Match(char expected)
   {
      if(m_Text[m_Index-1] == expected)
         GetNextToken();
      else
      {
         std::stringstream sstr;
         sstr << "Expected token '" << expected << "' at position " << m_Index;
         throw ParserException(sstr.str(), m_Index);
      }
   }

   void SkipWhitespaces()
   {
      while(isspace(m_Text[m_Index])) m_Index++;
   }

   void GetNextToken()
   {
      // ignore white spaces
      SkipWhitespaces();

      m_crtToken.Value = 0;
      m_crtToken.Symbol = 0;

      // test for the end of text
      if(m_Text[m_Index] == 0)
      {
         m_crtToken.Type = EndOfText;
         return;
      }

      // if the current character is a digit read a number
      if(isdigit(m_Text[m_Index]))
      {
         m_crtToken.Type = Number;
         m_crtToken.Value = GetNumber();
         return;
      }

      m_crtToken.Type = Error;

      // check if the current character is an operator or parentheses
      switch(m_Text[m_Index])
      {
      case '+': m_crtToken.Type = Plus; break;
      case '-': m_crtToken.Type = Minus; break;
      case '*': m_crtToken.Type = Mul; break;
      case '/': m_crtToken.Type = Div; break;
      case '(': m_crtToken.Type = OpenParenthesis; break;
      case ')': m_crtToken.Type = ClosedParenthesis; break;
      }

      if(m_crtToken.Type != Error)
      {
         m_crtToken.Symbol = m_Text[m_Index];
         m_Index++;
      }
      else
      {
         std::stringstream sstr;
         sstr << "Unexpected token '" << m_Text[m_Index] << "' at position " << m_Index;
         throw ParserException(sstr.str(), m_Index);
      }
   }

   double GetNumber()
   {
      SkipWhitespaces();

      int index = m_Index;
      while(isdigit(m_Text[m_Index])) m_Index++;
      if(m_Text[m_Index] == '.') m_Index++;
      while(isdigit(m_Text[m_Index])) m_Index++;

      if(m_Index - index == 0)
         throw ParserException("Number expected but not found!", m_Index);

      char buffer[32] = {0};
      memcpy(buffer, &amp;m_Text[index], m_Index - index);

      return atof(buffer);
   }

public:
   void Parse(const char* text)
   {
      m_Text = text;
      m_Index = 0;
      GetNextToken();

      Expression();
   }
};
</pre>
<p>The exception class is defined like this:</p>
<pre class="prettyprint">
class ParserException : public std::exception
{
   int m_Pos;

public:
   ParserException(const std::string&amp; message, int pos):
      std::exception(message.c_str()),
      m_Pos(pos)
   {
   }
};
</pre>
<p>As you can see, the code for the grammar production is quite simple and straight forward. Now, let's put it to the test.</p>
<pre class="prettyprint">
void Test(const char* text)
{
   Parser parser;
   try
   {
      parser.Parse(text);
      std::cout << """ << text << ""t OK" << std::endl;
   }
   catch(ParserException&amp; ex)
   {
      std::cout << """ << text << ""t " << ex.what() << std::endl;
   }
}

int main()
{
   Test("1+2+3+4");
   Test("1*2*3*4");
   Test("1-2-3-4");
   Test("1/2/3/4");
   Test("1*2+3*4");
   Test("1+2*3+4");
   Test("(1+2)*(3+4)");
   Test("1+(2*3)*(4+5)");
   Test("1+(2*3)/4+5");
   Test("5/(4+3)/2");
   Test("1 + 2.5");
   Test("125");
   Test("-1");
   Test("-1+(-2)");
   Test("-1+(-2.0)");

   Test("   1*2,5");
   Test("   1*2.5e2");
   Test("M1 + 2.5");
   Test("1 + 2&amp;5");
   Test("1 * 2.5.6");
   Test("1 ** 2.5");
   Test("*1 / 2.5");

   return 0;
}
</pre>
<p>The output for this testing program is:</p>
<pre class="prettyprint">
"1+2+3+4"        OK
"1*2*3*4"        OK
"1-2-3-4"        OK
"1/2/3/4"        OK
"1*2+3*4"        OK
"1+2*3+4"        OK
"(1+2)*(3+4)"    OK
"1+(2*3)*(4+5)"  OK
"1+(2*3)/4+5"    OK
"5/(4+3)/2"      OK
"1 + 2.5"        OK
"125"    OK
"-1"     OK
"-1+(-2)"        OK
"-1+(-2.0)"      OK
"   1*2,5"       Unexpected token ',' at position 6
"   1*2.5e2"     Unexpected token 'e' at position 8
"M1 + 2.5"       Unexpected token 'M' at position 0
"1 + 2&amp;5"        Unexpected token '&amp;' at position 5
"1 * 2.5.6"      Unexpected token '.' at position 7
"1 ** 2.5"       Unexpected token '*' at position 4
"*1 / 2.5"       Unexpected token '*' at position 1
</pre>
<p>Which is exactly what we expected: it validates correct expressions and throws an exception when the exception is incorrect.</p>
<p>In the next post I'll show how to modify this code to build an abstract syntax tree.</p>
]]></content:encoded>
			<wfw:commentRss>http://mariusbancila.ro/blog/2009/02/04/evaluating-expressions-part-2-parse-the-expression/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Evaluating Expressions &#8211; Part 1: The Approaches</title>
		<link>http://mariusbancila.ro/blog/2009/02/03/evaluating-expressions-part-1/</link>
		<comments>http://mariusbancila.ro/blog/2009/02/03/evaluating-expressions-part-1/#comments</comments>
		<pubDate>Tue, 03 Feb 2009 14:56:12 +0000</pubDate>
		<dc:creator>Marius Bancila</dc:creator>
				<category><![CDATA[Articles & Tutorials]]></category>
		<category><![CDATA[C++]]></category>
		<category><![CDATA[AST]]></category>
		<category><![CDATA[expression]]></category>
		<category><![CDATA[RPN]]></category>

		<guid isPermaLink="false">http://mariusbancila.ro/blog/?p=148</guid>
		<description><![CDATA[I was discussing a few days ago about evaluating expressions and I decided to explain how you can build an evaluator. I will do this in a series of posts, getting one step more in each post. I will use C++, but the approaches are the same regardless the language. Let&#8217;s consider this expression: 1+2*3. [...]]]></description>
			<content:encoded><![CDATA[<p>I was discussing a few days ago about evaluating expressions and I decided to explain how you can build an evaluator. I will do this in a series of posts, getting one step more in each post. I will use C++, but the approaches are the same regardless the language.</p>
<p>Let&#8217;s consider this expression: 1+2*3. The value of this expression is 7. But how do you evaluate it in a language like C++ if you get it as a string? First of all this is a so called &#8220;infix&#8221; notation. There are also prefix and postfix notation. The terms infix, prefix and postfix refer to the position of the operator related to the operands:</p>
<ul>
<li><b>Prefix</b>: <i>operator</i> operand1 operand2 (ex: + 1 2)</li>
<li><b>Infix</b>: operand1 <i>operator</i> operand2 (ex: 1 + 2)</li>
<li><b>Postfix</b>: operand1 operand2 <i>operator</i> (ex: 1 2 +)</li>
</ul>
<p>The human understandable notation is infix. But it turns out that trying to parse a string with infix expression, from left to right and evaluate it is not possible. Because you cannot now what in advance and operators have different precedence; and there are parentheses too.</p>
<p>To solve the problem you&#8217;d have to build a helper structure representing the infix expression. There are two possibilities:</p>
<ul>
<li><a href="http://en.wikipedia.org/wiki/Reverse_Polish_notation" target="_blank">Reverse Polish Notation</a> (RPN) implies transforming the infix expression in a postfix expression and then evaluating it from left to right. 1 + 2*3 is transformed into 1 2 3 * +. You go from left to right until you find an operator, evaluate the expression and then replace it in the stack.</li>
<li><a href="http://en.wikipedia.org/wiki/Abstract_syntax_tree" target="_blank">
<p>Abstract Syntax Tree</a> (AST) is an abstract representation of an expression, with inner nodes representing operators and leafs representing numbers.</p>
<p><img style="vertical-align: middle;" src="/blog/wp-content/uploads/2009/02/ast.png" alt="Abstract Syntax Tree" width="177" height="154" /></p>
</li>
</ul>
<p>The RPN is harder to build and evaluate in my opinion, so I will focus on the approach with the AST.</p>
<p>We build an AST while parsing the expression. First, we&#8217;ll have to define the grammar for the expression. Otherwise we wouldn&#8217;t know what to parse. </p>
<pre class="prettyprint">
EXP -> EXP + EXP | EXP - EXP | EXP * EXP | EXP / EXP | - EXP | (EXP) | number
</pre>
<p>First, this grammar is recursive, as you can see, but another important problem is that it does not represent the precedence of the operators. For this reasons, a better grammar is this:</p>
<pre class="prettyprint">
EXP    -> EXP + TERM |
          EXP - TERM |
          TERM
TERM   -> TERM * FACTOR |
          TERM / FACTOR |
          FACTOR
FACTOR -> ( EXP ) | - EXP | number
</pre>
<p>These rules written above are called productions. The symbols used are:</p>
<ul>
<li>EXP, TERM, FACTOR are called non-terminal symbols</li>
<li>+, -, /, *, (, ) number are called terminal symbols</li>
<li>EXT is the start symbol</li>
</ul>
<p>While the grammar has the correct operator precedence, it&#8217;s still recursive, or more precisely, left-recursive. You can see that EXP goes into EXP then operator + then TERM. You never reach to match operator + because you have start again and again with a new expression. There are techniques for eliminating this recursion and the result is:</p>
<pre class="prettyprint">
EXP    -> TERM EXP1
EXP1   -> + TERM EXP1 |
          - TERM EXP1 |
          epsilon
TERM   -> FACTOR TERM1
TERM1  -> * FACTOR TERM1 |
          / FACTOR TERM1 |
          epsilon
FACTOR -> ( EXP ) | - EXP | number
</pre>
<p>&#8216;epsilon&#8217; here means &#8216;nothing&#8217;.</p>
<p>With the theory (well, this is just the tip of the iceberg, but should be a good start for you) in place we&#8217;ll have to do three things:</p>
<ul>
<li><a href="http://mariusbancila.ro/blog/?p=149">Parse the expression</a></li>
<li><a href="http://mariusbancila.ro/blog/?p=150">Build the abstract syntax tree</a></li>
<li><a href="http://mariusbancila.ro/blog/?p=151">Evaluate the abstract syntax tree</a></li>
</ul>
<p>The first two steps will be done the same time, but I&#8217;ll take them one at a time and explain it in details.</p>
<p>Before you continue with the implementation details, I suggest you read more about both RPN and AST and grammars.</p>
<p>Here are several references:</p>
<ul>
<li><a href="http://www.csse.monash.edu.au/~lloyd/tildeProgLang/Grammar/" target="_blank">Syntax, Grammar</a></li>
<li><a href="http://www.csse.monash.edu.au/~lloyd/tildeProgLang/Grammar/Arith-Exp/" target="_blank">Arithmetic Expressions</a></li>
<li><a href="http://www.csse.monash.edu.au/~lloyd/tildeProgLang/Grammar/Abstract/" target="_blank">Abstract Syntax</a></li>
<li><a href="http://www.csse.monash.edu.au/~lloyd/tildeProgLang/Grammar/Top-Down/" target="_blank">Top-Down Parsing</a></li>
<ul>
]]></content:encoded>
			<wfw:commentRss>http://mariusbancila.ro/blog/2009/02/03/evaluating-expressions-part-1/feed/</wfw:commentRss>
		<slash:comments>5</slash:comments>
		</item>
	</channel>
</rss>

