Here we go. I know that probably I should have started this new topic talking about the grammar of the language we are going to implement, but as long as I see grammar strictly related to parsing, I’ve preferred to talk about scanning first.
We will go back to the grammar the next time, when talking about how to parse an input file.
I wrote about scanning (or lexing if you prefer) a while ago, when blogging about expression evalution in ActionScript. Scanning an input file is quite always the same (although some languages might require unusual features), and what I wrote about expressions works also for a general language.
The goal of the scanning process is to group some characters together, skipping the ones that have no meaning for the language like spaces. Each group of characters is usually called token. So a Scanner converts a textual input into a stream of tokens, each one rappresenting a possible valid word for our language. While scanning you don’t take care about the meaning of what you are grouping or about the fact that a sequence of tokens is meaningful. This is a task for the Parser as we will see.
When talking about expressions, I showed a manual implementation for a Lexer. Now I want to take a different approach and show you a possible implementation for a simple dynamic scanner. This scanner will be based on regular expressions: each regular expression will rappresent a given token, and we will be able to assign callbacks to the scanner that will be executed each time a given token is extracted from the input.
Writing a general and reusable scanner is usually a good practice. A common approach is to use some scanner generators, that are usually regular expression based too but are able to generate the code for a scanner at compile time. Our approach is different (and generates slower scanners) because regular expressions are evaluated at runtime; but it is fine for a test project.
You can download the source code for the regular expression based lexer (RegexLexer) here, as long as a simple usage example. Let’s see this example together so we can briefly discuss it (PBLexer.as):
package test
{
import it.sephiroth.scripting.utils.RegexLexer;
import it.sephiroth.scripting.utils.RegexLexerContext;
import it.sephiroth.scripting.utils.Token;
public class PBLexer extends RegexLexer
{
private static const keyword_map: Object = {
languageVersion: TokensTable.LANGUAGE_VERSION,
kernel: TokensTable.KERNEL,
parameter: TokensTable.PARAMETER,
input: TokensTable.INPUT,
image1: TokensTable.IMAGE1,
image2: TokensTable.IMAGE2,
image3: TokensTable.IMAGE3,
image4: TokensTable.IMAGE4,
"void": TokensTable.VOID,
output: TokensTable.OUTPUT,
evaluatePixel: TokensTable.EVALUATE_PIXEL,
"if": TokensTable.IF,
"else": TokensTable.ELSE,
"const": TokensTable.CONST,
"int": TokensTable.INT,
int2: TokensTable.INT2,
int3: TokensTable.INT3,
int4: TokensTable.INT4,
float: TokensTable.FLOAT,
float2: TokensTable.FLOAT2,
float3: TokensTable.FLOAT3,
float4: TokensTable.FLOAT4,
float2x2: TokensTable.FLOAT2X2,
float3x3: TokensTable.FLOAT3X3,
float4x4: TokensTable.FLOAT4X4,
pixel1: TokensTable.PIXEL1,
pixel2: TokensTable.PIXEL2,
pixel3: TokensTable.PIXEL3,
pixel4: TokensTable.PIXEL4 };
public function PBLexer( source_name: String, input: String )
{
super( source_name, input );
addRule( "nl", "\n", -1, RegexLexer.NL );
addRule( "space", "[ \t]+", -1, RegexLexer.SKIP );
addRule( "comment", "//.*", -1, RegexLexer.SKIP );
addRule( "start_cmt", "/\\*", -1, RegexLexer.START_EAT );
addRule( "stop_cmt", "\\*/", -1, RegexLexer.STOP_EAT );
addRule( "ident", "[a-zA-Z_][a-zA-Z_0-9]*", TokensTable.IDENT, onIdent );
addRule( "number", "0|([1-9][0-9]*(\\.[0-9]*)?)", TokensTable.NUMBER, onNumber );
addRule( "string", "\"(\\\\\"|[^\"])+\"", TokensTable.STRING, onString );
addRule( "symbol", "\\+\\+|\\-\\-|< =|>=|==|" +
"!=|&&|\\|\\||\\*=|/=|" +
"\\+=|\\-=|\\(|\\)|\\*|" +
"\\+|,|\\-|\\.|/|:|;|#|" +
"\\< |=|\\>|\\{|\\}|\\[|" +
"\\]|\\?|!", -1, onSymbol );
}
private function onSymbol( context: RegexLexerContext ): void
{
var token: Token;
token = new Token(
context.token.value.charCodeAt( 0 ),
context.token.value,
context.token.sourceName,
context.token.charno,
context.token.lineno,
context.token.columnno );
context.token = token;
}
private function onIdent( context: RegexLexerContext ): void
{
var type: int;
var token: Token;
type = context.token.type;
if( keyword_map[ context.token.value ] )
{
type = keyword_map[ context.token.value ];
}
token = new Token(
type,
context.token.value,
context.token.sourceName,
context.token.charno,
context.token.lineno,
context.token.columnno );
context.token = token;
}
private function onNumber( context: RegexLexerContext ): void
{
var token: Token;
var value: Number;
value = parseFloat( context.token.value );
token = new Token(
context.token.type,
value,
context.token.sourceName,
context.token.charno,
context.token.lineno,
context.token.columnno );
context.token = token;
}
private function onString( context: RegexLexerContext ): void
{
var token: Token;
var value: String;
value = context.token.value.substr( 1, context.token.value.length - 2 );
token = new Token(
context.token.type,
value,
context.token.sourceName,
context.token.charno,
context.token.lineno,
context.token.columnno );
context.token = token;
}
}
}
As you may imagine, this example is a simple Lexer that is able to scan Pixel Bender input files. The Lexer is composed by a serie of rules registered using the function addRule( name: String, regex: String, id: int, callback: Function ). The callback function can be a user defined callback or a standard callback.
There are some standard callbacks:
- RegexLexer.NL: tells the Lexer that a new line token have been matched. This action increase an internal counter that keeps track of the line number;
- RegexLexer.SKIP: skip the current token;
- RegexLexer.START_EAT: start eating the tokens;
- RegexLexer.STOP_EAT: stop eating the tokens;
Custom actions instead can be used to change the state of the Lexer or to generate a custom Token; take for instance the onIdent calback: each time an identifier is matched, we check if it is a keyword and if so, we generate the proper.
Here is an simple usage example:
package {
import flash.display.Sprite;
import it.sephiroth.scripting.utils.Token;
import test.PBLexer;
public class test extends Sprite
{
private static var INPUT: XML = < ![CDATA[
/*****************************************************************************
*
* Copyright (C) 2008 Adobe Systems Incorporated
* All Rights Reserved.
*
* NOTICE: All information contained herein is, and remains the property of
* Adobe Systems Incorporated and its suppliers, if any. The intellectual and
* technical concepts contained herein are proprietary to Adobe Systems
* Incorporated and its suppliers and may be covered by U.S. and Foreign
* Patents, patents in process, and are protected by trade secret or copyright
* law. Dissemination of this information or reproduction of this material is
* strictly forbidden unless prior written permission is obtained from Adobe
* Systems Incorporated.
*
*****************************************************************************/
// basicBoxBlur: A simple example to demonstrate how to implement a 1
// iteration box blur. The kernel is a good example to
// demonstrate how to implement a convolution using for loops.
kernel basicBoxBlur
< namespace : "AIF";
vendor : "Adobe Systems";
version : 2;
description : "variable Radius Box Blur"; >
{
// A parameter that represents the radius of the blur in both the X and Y
// directions.
parameter float blurRadius
< minValue:float(0.0);
maxValue:float(25.0);
defaultValue:float(5.0);
>;
// A dependent value. These are calculated in the evaluateDependents()
// function and save us from having to perform repetitive calculations
// for each pixel.
dependent int radiusAsInt;
// evaluateDependents(): Built-in function to do any pre-processing of the
// parameters. Usage of this function helps avoid
// repeating any necessary instructions per pixel.
// In a filter as simple as this, the extra cycles
// are probably not necessary, but it serves as a
// good example.
void
evaluateDependents()
{
// Convert the dimension into a integer value for use in the iteration
// over the kernel window. All type conversions are required to be
// explicit.
radiusAsInt = int(ceil(blurRadius));
}
input image4 src;
output float4 dst;
// Region functions are not available in Flash targets, so we only define
// the functions if we are executing on a different backend.
#if !AIF_FLASH_TARGET
// needed(): Indicates what area of the input is needed to fulfill the
// requested output region. In this case, we consider a
// radius sized area of the input for each output pixel, so we
// need to outset the area by one.
region needed(region outputRegion, imageRef inputRef)
{
float2 singlePixel = pixelSize(src);
return outset(outputRegion, float2(singlePixel.x * ceil(blurRadius), singlePixel.y * ceil(blurRadius)));
}
// changed(): Indicates what area of the output is affected by the
// specified input. In this case, we will blur out the image
// by the size of the radius, so the output is the input outset by
// one.
region changed(region inputRegion, imageRef inputRef)
{
float2 singlePixel = pixelSize(src);
return outset(inputRegion, float2(singlePixel.x * ceil(blurRadius), singlePixel.y * ceil(blurRadius)));
}
#endif
// evaluatePixel(): The function of the filter that actually does the
// processing of the image. This function is called once
// for each pixel of the output image.
void
evaluatePixel()
{
// Containers for both the accumulated color as well as the number of
// pixels touched.
float denominator = 0.0;
float4 colorAccumulator = float4(0.0, 0.0, 0.0, 0.0);
float2 singlePixel = pixelSize(src);
// Iterate through the kernel window, adding up the samples in
// that location.
for(int i = -radiusAsInt; i < = radiusAsInt; i++)
{
for(int j = -radiusAsInt; j <= radiusAsInt; j++)
{
colorAccumulator += sampleNearest(src,
outCoord() + float2(float(i) * singlePixel.x, float(j) * singlePixel.y));
denominator++;
}
}
// calculate the resultant pixel value which is the accumulated color
// divided by the total number of pixels.
// In this case, the denominator is 9.
dst = colorAccumulator / denominator;
}
}
]]>
public function test()
{
super();
var token: Token;
var lexer: PBLexer;
lexer = new PBLexer( "
while( token = lexer.next() )
{
if( token.type == Token.EOF )
{
break;
}
trace( token );
}
}
}
}
Next week we will move further and start discussing the grammar, implementing a simple Parser using a custom version of Yacc that generates ActionScript code.