Top Down or Bottom Up for Expression Evaluation ?

Last time I wrote about Top Down parsing, a technique for parsing that can be easilly implemented manually using function recursion.

Today it is time to talk about one of the other ways of parsing, called Bottom Up parsing. This technique try to find the most important units first and then, based on a language grammar and a set of rules, try to infer higher order structures starting from them. Bottom Up parsers are the common output of Parser Generators (you can find a good comparison of parser generators here) as they are easier to generate automatically then recursive parser because they can be implemented using sets of tables and simple state based machines.

Writing down manually a bottom up parser is quite a tedious task and require quite a lot of time; moreover, this kind of parsers are difficult to maintain and are usually not really expandable or portable. This is why for my tests I decided to port bYacc (a parser generator that usually generates C code) and edit it so it generates ActionScript code starting from Yacc-compatible input grammars. Having this kind of tool makes things a lot easier, because maintaining a grammar (that usually is composed by a few lines) is less time expensive than working with the generated code (that usually is many lines long).
I will not release today the port because actually I had not time to make sure it is bugfree and I’ve only a working version for Mac, but I plan to release it shortly if you will ever need it for your own tasks. My goal for today was to compare the speed of the parser I wrote with an automatically generated bottom up parser, to see which is the faster approach.

I created a bottom up parser which is able to execute the same expressions accepted by the expression evaluator I wrote last time. There are anyways some differences that – as you will probably and hopefully understand in the future – that make those parsers really different. Some will be disussed shortly here.

To do that I created a Yacc grammar and some support classes.
The parser grammar is really simple and really readable:
%{
%}
%token NUMBER SYMBOL
%left '+' '-'
%left '*' '/'
%left NEG
%%
program
: expr { Vars.result = $1; }
;
expr
: expr '+' expr { $$ = $1 + $3; }
| expr '-' expr { $$ = $1 - $3; }
| expr '*' expr { $$ = $1 * $3; }
| expr '/' expr { $$ = $1 / $3; }
| '-' expr %prec NEG { $$ = -$2; }
| '(' expr ')' { $$ = $2; }
| SYMBOL '(' expr ')' {
if( Vars.SYMBOL_TABLE[ $1 ] )
{
$$ = Vars.SYMBOL_TABLE[ $1 ]( $3 );
} else
{
trace( "can't find function" );
}
}
| SYMBOL {
if( Vars.SYMBOL_TABLE[ $1 ] )
{
$$ = Vars.SYMBOL_TABLE[ $1 ];
} else
{
trace( "can't find symbol" );
}
}
| NUMBER { $$ = yyval; }
;
%%

Continue reading the extended entry to see the results.

It defined a simple expression grammar which includes addition, subtraction, multiplication, negation, division, variables, function calls and sub expressions. The tokens accepted are numbers (integers or simple floating point numbers), symbols and all the punctuation and operands needed.

The grammar uses a simple support class (called Vars) which is used to store a global symbol table and the result of the evaluation.
As you might have noted there is a really big difference about how the expression will be evaluated here and how do I evaluate it in my original expression parser: the expression is executed as soon as a rule is matched, and it is not compiled into an intermediate abstract syntax tree.
Usually those kind of tools are used in conjunction with a lexer generator (Flex is a good example of lexer generator – don’t confuse it with Adobe Flex!! :D), that is able to automatically generate the code for a scanner based in a set of rules usually defined by regular expressions. I didn’t ported this kind of tool because I was already having a working scanner for my needs that is a porting of the scanner used in the expression evaluator library:


package it.sephiroth.expr2
{
public class Lexer
{
private var _pos: int;
private var _pool: Array;
private var _source: String;
public function Lexer( source: String )
{
_pos = 0;
_source = source;
_pool = new Array();
}
public function scan(): Object
{
if( _pool.length > 0 )
{
return _pool.shift();
}
skipWhites();
if( isEOF() )
{
return null;
}
var c: String = _source.charAt( _pos++ );
switch( c )
{
case '+':
case '-':
case '*':
case '/':
case '(':
case ')':
case ',': return { type: c.charCodeAt( 0 ), val: c };
default:
var buf: String = "";
var code: int = c.charCodeAt( 0 );
if( isNumber( code ) )
{
var num: Number;
while( isNumber( code ) )
{
buf += c;
if( isEOF() )
{
++_pos;
break;
}
c = _source.charAt( _pos++ );
code = c.charCodeAt( 0 );
}
if( c == '.' )
{
buf += c;
c = _source.charAt( _pos++ );
code = c.charCodeAt( 0 );
while( isNumber( code ) )
{
buf += c;
if( isEOF() )
{
++_pos;
break;
}
c = _source.charAt( _pos++ );
code = c.charCodeAt( 0 );
}
num = parseFloat( buf );
}else
{
num = parseInt( buf );
}
--_pos;
return { type: Token.NUMBER, val: num };
}
if( isAlpha( code ) || ( c == '_' ) )
{
while( isAlpha( code ) || ( c == '_' ) || isNumber( code ) )
{
buf += c;
if( isEOF() )
{
++_pos;
break;
}
c = _source.charAt( _pos++ );
code = c.charCodeAt( 0 );
}
--_pos;
return { type: Token.SYMBOL, val: buf };
}
break;
}
return null;
}
public function pushBack( token: Token ): void
{
_pool.push( token );
}
protected function isNumber( c: int ): Boolean
{
return ( c >= 48 ) && ( c < = 57 ); } protected function isAlpha( c: int ): Boolean { return ( ( c >= 97 ) && ( c < = 122 ) ) || ( ( c >= 65 ) && ( c < = 90 ) ); } protected function isEOF(): Boolean { return ( _pos >= _source.length );
}
protected function skipWhites(): void
{
while( !isEOF() )
{
var c: String = _source.charAt( _pos++ );
if( ( c != " " ) && ( c != "\t" ) )
{
--_pos;
break;
}
}
}
}
}

After compiling the grammar with my parser generator and having written some support code we can use to do our tests, I finally had my small comparison test suite. A quick note about the parser: It should be used only for testing and if you want to have a fast and safe expressions evaluator I suggest you to build your own or – if you can wait – wait a bit more 😉 The test can be easilly run from the command line without the need fo having Flex Builder (but you need the Flex SDK that you can freely download from the Adobe Open Source Website); I choose a quite long expression to make the execution and parsing time a bit high. Here is the test I ran:


package {
import flash.display.Sprite;
import flash.utils.getTimer;
import it.sephiroth.expr.CompiledExpression;
import it.sephiroth.expr.Parser;
import it.sephiroth.expr.Scanner;
import it.sephiroth.expr2.Vars;
public class TestExprEval extends Sprite
{
public static const ITERATIONS: int = 10000;
public static const EXPRESSION: String = "sin( x / ( 4 / 2.0 * 2 + (- 2.12345 + 2.12345) * x / x ) ) * 100 - ( sin( x / ( 4 / 2.0 * 2 + (- 2.12345 + 2.12345) * x / x ) ) * 100 )";
public function TestExprEval()
{
runTests();
}
private function runTests(): void
{
trace( "Running tests (this may take a while)" );
runTopDown();
runBottomUp();
}
private function runTopDown(): void
{
var sum: Number = 0;
for( var i: int = 0; i < ITERATIONS; ++i ) { var t: Number = getTimer(); var scanner: Scanner = new Scanner( EXPRESSION ); var parser: Parser = new Parser( scanner ); var compiled: CompiledExpression = parser.parse(); compiled.execute( { 'x' : Math.random(), 'sin' : Math.sin, 'cos' : Math.cos } ); sum += ( getTimer() - t ); } trace( "Top Down parser:", sum / ITERATIONS ); } private function runBottomUp(): void { var sum: Number = 0; Vars.symbolTable.sin = Math.sin; for( var i: int = 0; i < ITERATIONS; ++i ) { Vars.symbolTable.x = Math.random(); var t: Number = getTimer(); var myparser: TestBottomUpParser = new TestBottomUpParser(); myparser.yyparse() sum += ( getTimer() - t ); } trace( "Bottom Up parser:", sum / ITERATIONS ); } } } import it.sephiroth.expr2.BottomUpParser; import it.sephiroth.expr2.Lexer; class TestBottomUpParser extends BottomUpParser { private var _lexer: Lexer; public function TestBottomUpParser() { super(); _lexer = new Lexer( TestExprEval.EXPRESSION ); } override public function yylex(): int { var value: Object = _lexer.scan(); if( value == null ) { return 0; } set_yylval( value.val ); return value.type; } }

As you can see the test includes also the instantiation of all the objects needed. Running that from the command line, I obtained this result:
$ mxmlc -debug=true TestExprEval.as
Loading configuration file /Applications/Adobe Flex Builder 3 Plug-in/sdks/3.0.0/frameworks/flex-config.xml
/Users/gabriele/Desktop/TestExprParsers/src/TestExprEval.swf (11647 bytes)
$ fdb TestExprEval.swf
Adobe fdb (Flash Player Debugger) [build 3.0.0.477]
Copyright (c) 2004-2007 Adobe, Inc. All rights reserved.
Attempting to launch and connect to Player using URL
TestExprEval.swf
Player connected; session starting.
Set breakpoints and then type 'continue' to resume the session.
[SWF] Users:gabriele:Desktop:TestExprParsers:src:TestExprEval.swf - 25.942 bytes after decompression
(fdb) continue
[trace] Running tests (this may take a while)
[trace] Top Down parser: 0.6979
[trace] Bottom Up parser: 0.9063
(fdb) quit

This is not totally unexpected for such a simple grammar, but the execution time is much more better (around 20%) using a manually written top down parser than using an autogenerated bottom up parser; I say it is not totally unexpected because the performances should change as the grammar becames more complex, but in our situation this is what we get. And we can also achive a better result if we precompile the expression into an AST and the execute it. Add this code to your test and add the function call inside the runTests method:

private function runTopDownCompiled(): void
{
var sum: Number = 0;
var scanner: Scanner = new Scanner( EXPRESSION );
var parser: Parser = new Parser( scanner );
var compiled: CompiledExpression = parser.parse();
for( var i: int = 0; i < ITERATIONS; ++i ) { var t: Number = getTimer(); compiled.execute( { 'x': Math.random(), 'sin' : Math.sin, 'cos' : Math.cos } ); sum += ( getTimer() - t ); } trace( "Top Down parser (precompiled):", sum / ITERATIONS ); }

And then run again to see the result:
$ mxmlc -debug=true TestExprEval.as
Loading configuration file /Applications/Adobe Flex Builder 3 Plug-in/sdks/3.0.0/frameworks/flex-config.xml
/Users/gabriele/Desktop/TestExprParsers/src/TestExprEval.swf (11794 bytes)
$ fdb TestExprEval.swf
Adobe fdb (Flash Player Debugger) [build 3.0.0.477]
Copyright (c) 2004-2007 Adobe, Inc. All rights reserved.
Attempting to launch and connect to Player using URL
TestExprEval.swf
Player connected; session starting.
Set breakpoints and then type 'continue' to resume the session.
[SWF] Users:gabriele:Desktop:TestExprParsers:src:TestExprEval.swf - 26.272 bytes after decompression
(fdb) continue
[trace] Running tests (this may take a while)
[trace] Top Down parser: 0.698
[trace] Bottom Up parser: 0.9001
[trace] Top Down parser (precompiled): 0.0315

This is really a good thing: as we can see, the most of the time is taken by the parser and the scanner, and if we remove their execution from the loop and just evaluate the generated AST, we get a great speed improvment. As far as I know this is quite good if you have a set of expressions that does not vary and do you want to execute them many times.

And this is why quite all the languages precompile their source code into bytecode before execution.

Unfortunately it is time to go now, but I’ve really a lot of stuff to show you next week (the parser generator, a way to – try to – write an expression evaluator that evaluate expressions at near-native speed …) so stay tuned as usual!

Share with...