AsYacc – first alpha release

Before going further with discussing how to add runtime scripting support to a Flash applications, I’d like to share with you the source code of a porting of Yacc I did a while ago. I did some simple modification to make Yacc able to generate ActionScript 3.0 source code instead of C. It works quite well and supports a lot standard Yacc features. There might be some issues – report them to me and I’ll try to fix them 🙂

You can easilly find documentation about Yacc searching Google (you can start here for example).

Usually Yacc is used in conjunction with a Scanner generator (like Lex/Flex), but I didn’t to any porting of commonly used Scanner generators yet.

Here you can download the sources. The source code should be portable and compilable on all the most common platform, but I didn’t tested it on Windows yet. To compile the source code on Mac or Linux, cd to the source code directory and type use:

gcc *.c -o AsYacc

Once you have the compiled binary file, you can run the Parser generator using:

./AsYacc -Hpackage=it.sephiroth.test grammar.y

Where grammar.y is a text file that contains the grammar of a language defined using the proper syntax (see the docs online for all the detailed information you may need).

Here you can download a simple calculator example that uses the RegexLexer described previously to implement the Scanner. For the ones who might be interested, here is the grammar used:
/* Infix notation calculator. */
%{
%}
%token NUM
%left '-' '+'
%left '*' '/'
%left NEG
%right '^' /* exponentiation */
%%
input:
exp { trace( $1 ); }
;
exp:
NUM { $$ = $1; }
| exp '+' exp { $$ = $1 + $3; }
| exp '-' exp { $$ = $1 - $3; }
| exp '*' exp { $$ = $1 * $3; }
| exp '/' exp { $$ = $1 / $3; }
| '-' exp %prec NEG { $$ = -$2; }
| exp '^' exp { $$ = Math.pow( $1, $3 ); }
| '(' exp ')' { $$ = $2; }
;
%%

Scripting Flash apps: scanning the input file

Here we go. I know that probably I should have started this new topic talking about the grammar of the language we are going to implement, but as long as I see grammar strictly related to parsing, I’ve preferred to talk about scanning first.

We will go back to the grammar the next time, when talking about how to parse an input file.

I wrote about scanning (or lexing if you prefer) a while ago, when blogging about expression evalution in ActionScript. Scanning an input file is quite always the same (although some languages might require unusual features), and what I wrote about expressions works also for a general language.

The goal of the scanning process is to group some characters together, skipping the ones that have no meaning for the language like spaces. Each group of characters is usually called token. So a Scanner converts a textual input into a stream of tokens, each one rappresenting a possible valid word for our language. While scanning you don’t take care about the meaning of what you are grouping or about the fact that a sequence of tokens is meaningful. This is a task for the Parser as we will see.

When talking about expressions, I showed a manual implementation for a Lexer. Now I want to take a different approach and show you a possible implementation for a simple dynamic scanner. This scanner will be based on regular expressions: each regular expression will rappresent a given token, and we will be able to assign callbacks to the scanner that will be executed each time a given token is extracted from the input.
Writing a general and reusable scanner is usually a good practice. A common approach is to use some scanner generators, that are usually regular expression based too but are able to generate the code for a scanner at compile time. Our approach is different (and generates slower scanners) because regular expressions are evaluated at runtime; but it is fine for a test project.

You can download the source code for the regular expression based lexer (RegexLexer) here, as long as a simple usage example. Let’s see this example together so we can briefly discuss it (PBLexer.as):

Continue reading

Top Down or Bottom Up for Expression Evaluation ?

Last time I wrote about Top Down parsing, a technique for parsing that can be easilly implemented manually using function recursion.

Today it is time to talk about one of the other ways of parsing, called Bottom Up parsing. This technique try to find the most important units first and then, based on a language grammar and a set of rules, try to infer higher order structures starting from them. Bottom Up parsers are the common output of Parser Generators (you can find a good comparison of parser generators here) as they are easier to generate automatically then recursive parser because they can be implemented using sets of tables and simple state based machines.

Writing down manually a bottom up parser is quite a tedious task and require quite a lot of time; moreover, this kind of parsers are difficult to maintain and are usually not really expandable or portable. This is why for my tests I decided to port bYacc (a parser generator that usually generates C code) and edit it so it generates ActionScript code starting from Yacc-compatible input grammars. Having this kind of tool makes things a lot easier, because maintaining a grammar (that usually is composed by a few lines) is less time expensive than working with the generated code (that usually is many lines long).
I will not release today the port because actually I had not time to make sure it is bugfree and I’ve only a working version for Mac, but I plan to release it shortly if you will ever need it for your own tasks. My goal for today was to compare the speed of the parser I wrote with an automatically generated bottom up parser, to see which is the faster approach.

I created a bottom up parser which is able to execute the same expressions accepted by the expression evaluator I wrote last time. There are anyways some differences that – as you will probably and hopefully understand in the future – that make those parsers really different. Some will be disussed shortly here.

To do that I created a Yacc grammar and some support classes.
The parser grammar is really simple and really readable:
%{
%}
%token NUMBER SYMBOL
%left '+' '-'
%left '*' '/'
%left NEG
%%
program
: expr { Vars.result = $1; }
;
expr
: expr '+' expr { $$ = $1 + $3; }
| expr '-' expr { $$ = $1 - $3; }
| expr '*' expr { $$ = $1 * $3; }
| expr '/' expr { $$ = $1 / $3; }
| '-' expr %prec NEG { $$ = -$2; }
| '(' expr ')' { $$ = $2; }
| SYMBOL '(' expr ')' {
if( Vars.SYMBOL_TABLE[ $1 ] )
{
$$ = Vars.SYMBOL_TABLE[ $1 ]( $3 );
} else
{
trace( "can't find function" );
}
}
| SYMBOL {
if( Vars.SYMBOL_TABLE[ $1 ] )
{
$$ = Vars.SYMBOL_TABLE[ $1 ];
} else
{
trace( "can't find symbol" );
}
}
| NUMBER { $$ = yyval; }
;
%%

Continue reading the extended entry to see the results. Continue reading

Runtime expression evaluation in ActionScript

Finally I have a bit of free time to write down this post.
Some days ago a guy on the forum asked about how to evaluate at runtime simple mathematical expressions with support for symbols and function calls.

Now that the eval function have been removed from the language for reasons I’ve not enought time to talk about, we need to parse and execute those expressions manually.

I built a simple Actionscript library that can be used to parse mathematical expressions: it has support for function calls, for variables and it is able to convert the expression into a postfix rappresentation if you may need it. The expression parser is quite simple and have been built manually following some concepts related to programming language compilers; it includes a Scanner (or lexer – however you want to call it) to tokenize the expression, a Parser that convert the expression into an Abstract Syntax Tree and a simple routine that evaluates that AST using a Symbol Table as evaluation context.

There are no comments inside the code right now; I hope to find a little bit of time to write an in depth discussion about this topic. In the mean time you can continue reading the entry for a general explanation about how does the code works and for some examples that may be useful.

Here is a simple example that shows how does the code works:

import it.sephiroth.expr.CompiledExpression;
import it.sephiroth.expr.Parser;
import it.sephiroth.expr.Scanner;
public class Example
{
public static function run(): void
{
var expression: String = "sin( x / ( 8 / 2 + (-0.12 + 2.3) * x / x ) ) * 100";
var scanner: Scanner = new Scanner( expression );
var parser: Parser = new Parser( scanner );
var compiled: CompiledExpression = parser.parse();
var context: Object = {
x: 100,
sin: Math.sin,
cos: Math.cos
};
trace( 'Postfix:', compiled.toString() );
trace( 'Result:', compiled.execute( context ) );
}
}

Before I forgot, you can download the source code here with a simple example included.

Continue reading

Actionscript parsing experiences: PyBison & PLY

My experiments with text-parsing continue..

PyBison
Last day I founded a python library (pybison) which runs the generated python parser at near the speed of C-based parsers, due to direct hooks into bison-generated C code.
Cool, unfortunately I couldn’t compiled it for Windows and so I made my test on Ubuntu only. What I did was just to export the already written lexer/grammar using bison2py (boundled with pybison) and run it.

If you want to take a look at the python parser try it by downloading the source code here.
The run.py file accept these parameters:
usage: run.py [options]
options:
-h, --help show this help message and exit
-v, --verbose Turn on verbose mode
-o OUTPUT_FILE, --output=OUTPUT_FILE
Don't print to stdout but to the output file
-i INPUT_FILE, --input=INPUT_FILE
Input file to be parsed
-x, --to-xml Returns a human-readable xml representation of the
parse tree
-b, --bison-debug Print the Bison/Flex debug

You can download the source code here

PLY
The second test I did was using PLY, an implementation of lex and yacc parsing tools for Python. Being implemented entirely in python it should be much more slower that pybison, but I didn’t find any difference with the pybison parser version. In fact PLY , like the traditional bison, creates tables starting from the grammar syntax.
Ok, Both of the implementations are slower that the pure C parser, but extremely faster that antlr!
(They took more or less 0.02 to 0.5 secs for parsing and generating the AST.)
Unlike pybison PLY is still mantained and offers more features and a better error handling.. even if the whole grammar has to be rewritten in python, and it can be compiled in Windows too.

To run the test just write:
python run.py {filename}
P.S. Unfortunately the yacc parser isn’t yet complete because I still need to find a way for parsing correctly E4X and XML syntax..

ActionScript Parsing, the YACC revenge :)

After my first attempts with ANTLR scanners in python/java I decided to start back with Bison/Flex again to see the difference in performances.
So first I need to wrote from scratch the grammar/lexer files using only the ECMAScript 4 specifications and much patience (the elastic grammar file help me a lot too).

After finishing a first version of the parser I tested it on the same file (75Kb actionscript file) which both java and python parsed in more than 1 second.
The result was unbelievable: 0.02 seconds for that file!

Then I tested it on multiple files, and for about 320 files of the whole adobe corelib library it took 220ms

Ok, the parser it’s not yet complete and doesn’t care about regexp and xml syntax, but its performance convinced me enough…
Now, the next step is to finish and test the parser and finally create a python library using pyrex, then a benchmark test again.

If someone is interested in testing the parser, download it (use “parser –help” form the command line for usage help), but remember this is only a first test.. not really helpful right now (I just wanted to share my text/parsing experiences).

An experience with antlr, java and python

I just wanted to share a little experience with generating an AS3 parser using antlr and python.
I was trying first to create the parser using GNU Flex and Bison in C, probably the best way for a very performancing parser.
Yeah, that’s right.. but looking at the antlr syntax I realized that’s easier and easier.
Moreover I start using this very useful eclipse plugin for antlr debugging which made my life easier!

The grammar file I created is a compromise between the asdt grammar file and the ECMA-262 grammar specification.

Once finished working on my eclipse project I’ve managed to parse succesfully all the adobe corelibs files using this java test file:
package org.sepy.core.parsers.as3;
import java.io.File;
import java.io.FileInputStream;
import java.io.FileNotFoundException;
import antlr.CommonAST;
import antlr.RecognitionException;
import antlr.TokenStreamException;
public class Application {
public static void main(String argv[])
{
if(argv.length > 0)
{
File file = new File(argv[0]);
if(file.exists())
{
FileInputStream is = null;
try {
is = new FileInputStream(file);
} catch (FileNotFoundException e) {
// TODO Auto-generated catch block
e.printStackTrace();
}
AS3Lexer L = new AS3Lexer(is);
AS3Parser P = new AS3Parser(L);
try {
P.compilationUnit();
} catch (RecognitionException e) {
// TODO Auto-generated catch block
System.out.println(" line=" + e.line + ", column="+ e.column);
System.out.println(e.getMessage());
e.printStackTrace(System.err);
} catch (TokenStreamException e) {
// TODO Auto-generated catch block
System.out.println(" line=" + L.getLine() + ", column="+ L.getColumn());
System.out.println(L.getGuessInfo());
System.out.println(e.getMessage());
e.printStackTrace(System.err);
}
CommonAST.setVerboseStringConversion(false, P.getTokenNames());
CommonAST ast = (CommonAST) P.getAST();
System.out.println("Tree:");
System.out.println(ast.toStringTree());
}
}
}
}

Ok, done that I decided to export the grammar file for python (thanks to antrl python export feature).
Everything works fine also for python, but I realized that the python script were so much slower than the java one!
import sys
import antlr
import AS3Parser
import AS3Lexer
L = AS3Lexer.Lexer(filename);
P = AS3Parser.Parser(L);
P.setFilename(filename)
try:
P.compilationUnit();
ast = P.getAST();
except:
pass

On a 75Kb actionscript file the python script took about 7 seconds to run, while the java application only 2 seconds. I know python interpreter caould be slower than many other languages, but I never thought so much slower.
So I run the python hotshot profiler to see which could be the bottleneck in the python script and I found most of the problems were due to unuseless antlr (the python module) method’s calls.
After making corrections to the antlr.py file the same script took exactly half of the time. Now 3 seconds. Wow 🙂
But not fast enough.
So I enabled for the antlr python script psyco module and this time the same script took only 1.6 seconds.
Now the python script is fast enough, even if I’m sure I can make more optimizations in the antlr module…

E4X, ActionScript 3 new “sublanguage”

Since I’ve heard the voices about the upcoming E4x in actionscript 3 I awaited this moment…
Finally ActionScript supports E4X and we can kick away the old XML object (always supported for backward compatibility)! we can stop using all the various xml2object classes created in these years, we finally have a validating parser, so we will see finally well formatted XML files from flash users 🙂
But finally we have a powerful “language” with a very strong syntax.

I’ve written a small tutorial with a quick guide  about actionscript3 and e4x basic syntax (without including Namespaces and QNames for the moment)

http://www.sephiroth.it/tutorials/flashPHP/E4X/index.php

P.s. I forgot to say that E4X is also available in the Geko1.8 browser (such as Firefox 1.5). For more info see this article: http://developer.mozilla.org/en/docs/E4X. Unfortunately E4X ins’t supported by python, neither there’s a module for it 🙁