Reentrant scanner and parser with Aflex and Ayacc

By Stephane Carrez

Version 1.6 of Aflex, the lexical analyzer, and version 1.4 of Ayacc, the Ada parser generator provide numerous improvements to customize the generated scanner and parser. The major change is the support to write a reentrant scanner and parser. Let's have a look at it.

Aflex and Ayacc

What's new in Aflex 1.6

  • Support the flex options %option output, %option nooutput, %option yywrap, %option noinput, %option noyywrap, %option unput, %option nounput, %option bufsize=NNN to better control the generated _IO package.
  • Aflex templates provide more control for tuning the code generation and they are embedded with Advanced Resource Embedder
  • Support to define Ada code block in the scanner that is inserted in the generated scanner
  • New option -P to generate a private Ada package for DFA and IO
  • New directive %option reentrant and %yyvar to generate a recursive scanner
  • New directive %yydecl to allow passing parameters to YYLex or change the default function name

Example of %option directives to tell Aflex to avoid generating several function or procedures and customize the buffer size.

%option nounput
%option noinput
%option nooutput
%option noyywrap
%option bufsize=1024

The tool supports some code block injection at various places in the generated scanner. The code block has the following syntax where <block-name> is the name of the code block:

%<block-name> {
  -- Put Ada code here
}

The %yytype code block can contain type declaration, function and procedure declarations. It is injected within the YYLex function in the declaration part. The %yyinit code block can contain statements that are executed at beginning of the YYLex function. The %yyaction code block can contain statements that are executed before running any action. The %yywrap code block can contain statements which are executed when the end of current file is reached to start parsing a next input.

What's new in Ayacc 1.4

  • Support the Bison %define variable value option to configure the parser generator
  • Support the Bison %code name { ... } directive to insert code verbatim into the output parser
  • Recognize some Bison variables api.pure, api.private, parse.error, parse.stacksize, parse.name, parse.params, parse.yyclearin, parse.yyerrok, parse.error
  • New option -S skeleton to allow using an external skeleton file for the parser generator
  • Ayacc templates provide more control for tuning the code generation and they are embedded with Advanced Resource Embedder
  • New option -P to generate a private Ada package for the tokens package
  • Improvement to allow passing parameters to YYParse for the grammar rules
  • New %lex directive to control the call of YYLex function
  • Fix #6: ayacc gets stuck creating an infinitely large file after encountering a comment in an action

The generator supports two code block injections, the first one decl is injected in the YYParse procedure declaration and the init is injected as first statements to be executed only once when the procedure is called. The syntax is borrowed from the Bison parser:

%code decl {
   -- Put Ada declarations
}
%code init {
   -- Put Ada statements
}

Some other Bison like improvements have been introduced to control the generation of the parser code.

%define parse.error true
%define parse.stacksize 256
%define parse.yyclearin false
%define parse.yyerrok false
%define parse.name MyParser

How to use

The easiest way to use Ayacc and Aflex is to use Alire, get the sources, build them and install them. You can do this as follows:

alr get aflex
cd aflex_1.6.0_b3c21d99
alr build
alr install
alr get ayacc
cd ayacc_1.4.0_c06f997f
alr build
alr install

UPDATE: the alr install command is available only with Alire 2.0.

Using these tools is done in two steps:

  1. a first step to call aflex or ayacc command with the scanner file or grammar file,
  2. a second step to call gnatchop to split the generated file in separate Ada files

For example, with a calc_lex.l scanner file, you would use:

aflex calc_lex.l
gnatchop -w calc_lex.ada

And with a calc.y grammar file:

ayacc calc.y
gnatchop -w calc.ada

To know more about how to write a scanner file or grammar file, have a look at Aflex 1.5 and Ayacc 1.3.0 which explains more into details some of these aspects.

Highlight on reentrancy

By default Aflex and Ayacc generate a scanner and a parser which use global variables declared in a generated Ada package. These global variables contain some state about the scanner such as the current file being scanned. The Ayacc parser generates on its side two global variables YYLVal and YYVal.

Using global variables creates some strong restrictions when using the generated scanner and parser: we can scan and parse only one file at a time. It cannot be used in a multi-thread environment unless the scan and parse is protected from concurrent access. We cannot use easily some grammars that need to recurse and parse another file such as an included file.

Reentrant scanner

The reentrant scanner is created by using the -R option or the %option reentrant directive. The scanner will then need a specific declaration with a context parameter that will hold the scanner state and variables. The context parameter has its type generated in the Lexer_IO package. The %yydecl directive in the scanner file must be used to declare the YYLex function with its parameters. By default the name of the context variable is Context but you can decide to customize and change it to another name by using the %yyvar directive.

%option reentrant
%yyvar Context
%yydecl function YYLex (Context : in out Lexer_IO.Context_Type) return Token

When the reentrant option is activated, Aflex will generate a first Context_Type limited type in the Lexer_DFA package and another one in the Lexer_IO package. The generator can probably be improved in the future to provide a single package with a single type declaration. The Lexer_DFA package contains the internal data structures for the scanner to maintain its state and the Lexer_IO package holds the input file as well as the YYLVal and YYVal values.

Reentrant parser

On its side, Ayacc uses the YYLVal and YYVal variables. By default, it generates them in the _tokens package that contains the list of parser symbols. It must not generate them and it must now use the scanner Context_Type to hold them as well as the scanner internal state. The setup requires several steps:

  1. The reentrant parser is activated by using the %define api.pure directive similar to the bison %define.
  2. The %lex directive must be used to define how the YYLex function must be called since it now has some context parameter.
  3. The scanner context variable must be declared somewhere, either as parameter to the YYParse procedure or as a local variable to YYParse. This is done using the new %code decl directive and allows to customize the local declaration part of the YYParse generated procedure.
  4. We must give visibility of the YYLVal and YYVal values defined in the scanner context variable. Again, we can do this within the %code decl directive.

A simple reentrant parser could be defined by using:

%define api.pure true
%lex YYLex (Scanner)
%code decl {
      Scanner : Lexer_IO.Context_Type;
      YYLVal  : YYSType renames Scanner.YYLVal;
      YYVal   : YYSType renames Scanner.YYVal;
}

However, this simple form is not really useful as you may need to open the file and setup the scanner to read from it. It is probably better to pass the scanner context as parameter to the YYParse procedure. For this, we can use the %define parse.params directive to control the procedure parameters. The reentrant parser is declared as follows:

%lex YYLex (Scanner)
%define api.pure true
%define parse.params "Scanner : in out Lexer_IO.Context_Type"
%code decl {
      YYLVal : YYSType renames Scanner.YYLVal;
      YYVal  : YYSType renames Scanner.YYVal;
}

To use the reentrant parser and scanner, we only need to declare the scanner context, open the file by using the Lexer_IO.Open_Input procedure and call the YYParse procedure as follows:

  Scanner : Lexer_IO.Context_Type;
  ...
    Lexer_IO.Open_Input (Scanner, "file-to-scan");
    YYParse (Scanner);

Grammar examples:

To have a more complete example of a reentrant parser, you may have a look at the following files:

Add a comment

To add a comment, you must be connected. Login