AsYacc – first alpha release

Before going further with discussing how to add runtime scripting support to a Flash applications, I’d like to share with you the source code of a porting of Yacc I did a while ago. I did some simple modification to make Yacc able to generate ActionScript 3.0 source code instead of C. It works quite well and supports a lot standard Yacc features. There might be some issues – report them to me and I’ll try to fix them 🙂

You can easilly find documentation about Yacc searching Google (you can start here for example).

Usually Yacc is used in conjunction with a Scanner generator (like Lex/Flex), but I didn’t to any porting of commonly used Scanner generators yet.

Here you can download the sources. The source code should be portable and compilable on all the most common platform, but I didn’t tested it on Windows yet. To compile the source code on Mac or Linux, cd to the source code directory and type use:

gcc *.c -o AsYacc

Once you have the compiled binary file, you can run the Parser generator using:

./AsYacc -Hpackage=it.sephiroth.test grammar.y

Where grammar.y is a text file that contains the grammar of a language defined using the proper syntax (see the docs online for all the detailed information you may need).

Here you can download a simple calculator example that uses the RegexLexer described previously to implement the Scanner. For the ones who might be interested, here is the grammar used:
/* Infix notation calculator. */
%{
%}
%token NUM
%left '-' '+'
%left '*' '/'
%left NEG
%right '^' /* exponentiation */
%%
input:
exp { trace( $1 ); }
;
exp:
NUM { $$ = $1; }
| exp '+' exp { $$ = $1 + $3; }
| exp '-' exp { $$ = $1 - $3; }
| exp '*' exp { $$ = $1 * $3; }
| exp '/' exp { $$ = $1 / $3; }
| '-' exp %prec NEG { $$ = -$2; }
| exp '^' exp { $$ = Math.pow( $1, $3 ); }
| '(' exp ')' { $$ = $2; }
;
%%

Top Down or Bottom Up for Expression Evaluation ?

Last time I wrote about Top Down parsing, a technique for parsing that can be easilly implemented manually using function recursion.

Today it is time to talk about one of the other ways of parsing, called Bottom Up parsing. This technique try to find the most important units first and then, based on a language grammar and a set of rules, try to infer higher order structures starting from them. Bottom Up parsers are the common output of Parser Generators (you can find a good comparison of parser generators here) as they are easier to generate automatically then recursive parser because they can be implemented using sets of tables and simple state based machines.

Writing down manually a bottom up parser is quite a tedious task and require quite a lot of time; moreover, this kind of parsers are difficult to maintain and are usually not really expandable or portable. This is why for my tests I decided to port bYacc (a parser generator that usually generates C code) and edit it so it generates ActionScript code starting from Yacc-compatible input grammars. Having this kind of tool makes things a lot easier, because maintaining a grammar (that usually is composed by a few lines) is less time expensive than working with the generated code (that usually is many lines long).
I will not release today the port because actually I had not time to make sure it is bugfree and I’ve only a working version for Mac, but I plan to release it shortly if you will ever need it for your own tasks. My goal for today was to compare the speed of the parser I wrote with an automatically generated bottom up parser, to see which is the faster approach.

I created a bottom up parser which is able to execute the same expressions accepted by the expression evaluator I wrote last time. There are anyways some differences that – as you will probably and hopefully understand in the future – that make those parsers really different. Some will be disussed shortly here.

To do that I created a Yacc grammar and some support classes.
The parser grammar is really simple and really readable:
%{
%}
%token NUMBER SYMBOL
%left '+' '-'
%left '*' '/'
%left NEG
%%
program
: expr { Vars.result = $1; }
;
expr
: expr '+' expr { $$ = $1 + $3; }
| expr '-' expr { $$ = $1 - $3; }
| expr '*' expr { $$ = $1 * $3; }
| expr '/' expr { $$ = $1 / $3; }
| '-' expr %prec NEG { $$ = -$2; }
| '(' expr ')' { $$ = $2; }
| SYMBOL '(' expr ')' {
if( Vars.SYMBOL_TABLE[ $1 ] )
{
$$ = Vars.SYMBOL_TABLE[ $1 ]( $3 );
} else
{
trace( "can't find function" );
}
}
| SYMBOL {
if( Vars.SYMBOL_TABLE[ $1 ] )
{
$$ = Vars.SYMBOL_TABLE[ $1 ];
} else
{
trace( "can't find symbol" );
}
}
| NUMBER { $$ = yyval; }
;
%%

Continue reading the extended entry to see the results. Continue reading

ActionScript Parsing, the YACC revenge :)

After my first attempts with ANTLR scanners in python/java I decided to start back with Bison/Flex again to see the difference in performances.
So first I need to wrote from scratch the grammar/lexer files using only the ECMAScript 4 specifications and much patience (the elastic grammar file help me a lot too).

After finishing a first version of the parser I tested it on the same file (75Kb actionscript file) which both java and python parsed in more than 1 second.
The result was unbelievable: 0.02 seconds for that file!

Then I tested it on multiple files, and for about 320 files of the whole adobe corelib library it took 220ms

Ok, the parser it’s not yet complete and doesn’t care about regexp and xml syntax, but its performance convinced me enough…
Now, the next step is to finish and test the parser and finally create a python library using pyrex, then a benchmark test again.

If someone is interested in testing the parser, download it (use “parser –help” form the command line for usage help), but remember this is only a first test.. not really helpful right now (I just wanted to share my text/parsing experiences).

An experience with antlr, java and python

I just wanted to share a little experience with generating an AS3 parser using antlr and python.
I was trying first to create the parser using GNU Flex and Bison in C, probably the best way for a very performancing parser.
Yeah, that’s right.. but looking at the antlr syntax I realized that’s easier and easier.
Moreover I start using this very useful eclipse plugin for antlr debugging which made my life easier!

The grammar file I created is a compromise between the asdt grammar file and the ECMA-262 grammar specification.

Once finished working on my eclipse project I’ve managed to parse succesfully all the adobe corelibs files using this java test file:
package org.sepy.core.parsers.as3;
import java.io.File;
import java.io.FileInputStream;
import java.io.FileNotFoundException;
import antlr.CommonAST;
import antlr.RecognitionException;
import antlr.TokenStreamException;
public class Application {
public static void main(String argv[])
{
if(argv.length > 0)
{
File file = new File(argv[0]);
if(file.exists())
{
FileInputStream is = null;
try {
is = new FileInputStream(file);
} catch (FileNotFoundException e) {
// TODO Auto-generated catch block
e.printStackTrace();
}
AS3Lexer L = new AS3Lexer(is);
AS3Parser P = new AS3Parser(L);
try {
P.compilationUnit();
} catch (RecognitionException e) {
// TODO Auto-generated catch block
System.out.println(" line=" + e.line + ", column="+ e.column);
System.out.println(e.getMessage());
e.printStackTrace(System.err);
} catch (TokenStreamException e) {
// TODO Auto-generated catch block
System.out.println(" line=" + L.getLine() + ", column="+ L.getColumn());
System.out.println(L.getGuessInfo());
System.out.println(e.getMessage());
e.printStackTrace(System.err);
}
CommonAST.setVerboseStringConversion(false, P.getTokenNames());
CommonAST ast = (CommonAST) P.getAST();
System.out.println("Tree:");
System.out.println(ast.toStringTree());
}
}
}
}

Ok, done that I decided to export the grammar file for python (thanks to antrl python export feature).
Everything works fine also for python, but I realized that the python script were so much slower than the java one!
import sys
import antlr
import AS3Parser
import AS3Lexer
L = AS3Lexer.Lexer(filename);
P = AS3Parser.Parser(L);
P.setFilename(filename)
try:
P.compilationUnit();
ast = P.getAST();
except:
pass

On a 75Kb actionscript file the python script took about 7 seconds to run, while the java application only 2 seconds. I know python interpreter caould be slower than many other languages, but I never thought so much slower.
So I run the python hotshot profiler to see which could be the bottleneck in the python script and I found most of the problems were due to unuseless antlr (the python module) method’s calls.
After making corrections to the antlr.py file the same script took exactly half of the time. Now 3 seconds. Wow 🙂
But not fast enough.
So I enabled for the antlr python script psyco module and this time the same script took only 1.6 seconds.
Now the python script is fast enough, even if I’m sure I can make more optimizations in the antlr module…