AsYacc – first alpha release

Before going further with discussing how to add runtime scripting support to a Flash applications, I’d like to share with you the source code of a porting of Yacc I did a while ago. I did some simple modification to make Yacc able to generate ActionScript 3.0 source code instead of C. It works quite well and supports a lot standard Yacc features. There might be some issues – report them to me and I’ll try to fix them 🙂

You can easilly find documentation about Yacc searching Google (you can start here for example).

Usually Yacc is used in conjunction with a Scanner generator (like Lex/Flex), but I didn’t to any porting of commonly used Scanner generators yet.

Here you can download the sources. The source code should be portable and compilable on all the most common platform, but I didn’t tested it on Windows yet. To compile the source code on Mac or Linux, cd to the source code directory and type use:

gcc *.c -o AsYacc

Once you have the compiled binary file, you can run the Parser generator using:

./AsYacc -Hpackage=it.sephiroth.test grammar.y

Where grammar.y is a text file that contains the grammar of a language defined using the proper syntax (see the docs online for all the detailed information you may need).

Here you can download a simple calculator example that uses the RegexLexer described previously to implement the Scanner. For the ones who might be interested, here is the grammar used:
/* Infix notation calculator. */
%{
%}
%token NUM
%left '-' '+'
%left '*' '/'
%left NEG
%right '^' /* exponentiation */
%%
input:
exp { trace( $1 ); }
;
exp:
NUM { $$ = $1; }
| exp '+' exp { $$ = $1 + $3; }
| exp '-' exp { $$ = $1 - $3; }
| exp '*' exp { $$ = $1 * $3; }
| exp '/' exp { $$ = $1 / $3; }
| '-' exp %prec NEG { $$ = -$2; }
| exp '^' exp { $$ = Math.pow( $1, $3 ); }
| '(' exp ')' { $$ = $2; }
;
%%

Scripting Flash apps: scanning the input file

Here we go. I know that probably I should have started this new topic talking about the grammar of the language we are going to implement, but as long as I see grammar strictly related to parsing, I’ve preferred to talk about scanning first.

We will go back to the grammar the next time, when talking about how to parse an input file.

I wrote about scanning (or lexing if you prefer) a while ago, when blogging about expression evalution in ActionScript. Scanning an input file is quite always the same (although some languages might require unusual features), and what I wrote about expressions works also for a general language.

The goal of the scanning process is to group some characters together, skipping the ones that have no meaning for the language like spaces. Each group of characters is usually called token. So a Scanner converts a textual input into a stream of tokens, each one rappresenting a possible valid word for our language. While scanning you don’t take care about the meaning of what you are grouping or about the fact that a sequence of tokens is meaningful. This is a task for the Parser as we will see.

When talking about expressions, I showed a manual implementation for a Lexer. Now I want to take a different approach and show you a possible implementation for a simple dynamic scanner. This scanner will be based on regular expressions: each regular expression will rappresent a given token, and we will be able to assign callbacks to the scanner that will be executed each time a given token is extracted from the input.
Writing a general and reusable scanner is usually a good practice. A common approach is to use some scanner generators, that are usually regular expression based too but are able to generate the code for a scanner at compile time. Our approach is different (and generates slower scanners) because regular expressions are evaluated at runtime; but it is fine for a test project.

You can download the source code for the regular expression based lexer (RegexLexer) here, as long as a simple usage example. Let’s see this example together so we can briefly discuss it (PBLexer.as):

Continue reading

Top Down or Bottom Up for Expression Evaluation ?

Last time I wrote about Top Down parsing, a technique for parsing that can be easilly implemented manually using function recursion.

Today it is time to talk about one of the other ways of parsing, called Bottom Up parsing. This technique try to find the most important units first and then, based on a language grammar and a set of rules, try to infer higher order structures starting from them. Bottom Up parsers are the common output of Parser Generators (you can find a good comparison of parser generators here) as they are easier to generate automatically then recursive parser because they can be implemented using sets of tables and simple state based machines.

Writing down manually a bottom up parser is quite a tedious task and require quite a lot of time; moreover, this kind of parsers are difficult to maintain and are usually not really expandable or portable. This is why for my tests I decided to port bYacc (a parser generator that usually generates C code) and edit it so it generates ActionScript code starting from Yacc-compatible input grammars. Having this kind of tool makes things a lot easier, because maintaining a grammar (that usually is composed by a few lines) is less time expensive than working with the generated code (that usually is many lines long).
I will not release today the port because actually I had not time to make sure it is bugfree and I’ve only a working version for Mac, but I plan to release it shortly if you will ever need it for your own tasks. My goal for today was to compare the speed of the parser I wrote with an automatically generated bottom up parser, to see which is the faster approach.

I created a bottom up parser which is able to execute the same expressions accepted by the expression evaluator I wrote last time. There are anyways some differences that – as you will probably and hopefully understand in the future – that make those parsers really different. Some will be disussed shortly here.

To do that I created a Yacc grammar and some support classes.
The parser grammar is really simple and really readable:
%{
%}
%token NUMBER SYMBOL
%left '+' '-'
%left '*' '/'
%left NEG
%%
program
: expr { Vars.result = $1; }
;
expr
: expr '+' expr { $$ = $1 + $3; }
| expr '-' expr { $$ = $1 - $3; }
| expr '*' expr { $$ = $1 * $3; }
| expr '/' expr { $$ = $1 / $3; }
| '-' expr %prec NEG { $$ = -$2; }
| '(' expr ')' { $$ = $2; }
| SYMBOL '(' expr ')' {
if( Vars.SYMBOL_TABLE[ $1 ] )
{
$$ = Vars.SYMBOL_TABLE[ $1 ]( $3 );
} else
{
trace( "can't find function" );
}
}
| SYMBOL {
if( Vars.SYMBOL_TABLE[ $1 ] )
{
$$ = Vars.SYMBOL_TABLE[ $1 ];
} else
{
trace( "can't find symbol" );
}
}
| NUMBER { $$ = yyval; }
;
%%

Continue reading the extended entry to see the results. Continue reading

ActionScript Parsing, the YACC revenge :)

After my first attempts with ANTLR scanners in python/java I decided to start back with Bison/Flex again to see the difference in performances.
So first I need to wrote from scratch the grammar/lexer files using only the ECMAScript 4 specifications and much patience (the elastic grammar file help me a lot too).

After finishing a first version of the parser I tested it on the same file (75Kb actionscript file) which both java and python parsed in more than 1 second.
The result was unbelievable: 0.02 seconds for that file!

Then I tested it on multiple files, and for about 320 files of the whole adobe corelib library it took 220ms

Ok, the parser it’s not yet complete and doesn’t care about regexp and xml syntax, but its performance convinced me enough…
Now, the next step is to finish and test the parser and finally create a python library using pyrex, then a benchmark test again.

If someone is interested in testing the parser, download it (use “parser –help” form the command line for usage help), but remember this is only a first test.. not really helpful right now (I just wanted to share my text/parsing experiences).