Aflex 1.5 and Ayacc 1.3.0

By Stephane Carrez

Aflex is a lexical analyzer generating tool similar to the Unix tool lex (1). Ayacc is an Ada parser generator in the style of yacc (1). New releases are available for these two tools and they bring a number of small improvements to these tools written in Ada 83 in the 1990s by John Self, David Taback and Deepak Tolani at the University of California, Irvine.

Aflex Version 1.5.2021

The new release provides the following changes:

  • Fix crash when the scanner file uses characters in range 128..255,
  • Fixed various compilation warnings,
  • Use gprbuild to build and support alr,
  • Reduced number of style compilation warnings in generated code

To install Aflex, you can use the Alire tool by using:

  alr get aflex
  cd aflex_1.5.2021_33198b8f
  alr build

You can also build and install it as easily with the following commands:

  git clone https://github.com/Ada-France/aflex.git
  cd aflex
  make
  sudo make install

You can also install the following Debian package: aflex_1.5.2021.

Ayacc Version 1.3.0

The Ayacc release provides a little bit more improvements:

  • New option -C to disable the generation of yyclearin procedure,
  • New option -E to disable the generation of yyerrok procedure,
  • New option -D to write the generated files to the directory specified,
  • New option -k to keep the case of grammar symbols,
  • Fixed various compilation warnings,
  • Generate constants for shift reduce and goto arrays,
  • Better strong typing in the generated state tables,
  • Reduced number of style compilation warnings in generated code

To install Ayacc with Alire tool use:

  alr get ayacc
  cd ayacc_1.3.0_9b8bf854
  alr build

You can also build and install it as easily with the following commands:

  git clone https://github.com/Ada-France/ayacc.git
  cd ayacc
  make
  sudo make install

You can also install the following Debian package: ayacc_1.3.0.

Development with Aflex and Ayacc

Aflex is a tool for generating a scanner that recognizes lexical patterns in text. Ayacc uses a BNF-like specification of a grammar to generate an Ada parser for that grammar. These two tools are tightly coupled together. First the scanner's role is to identify tokens that are used by the grammar file. Second, the parser will use the scanner to ask for tokens as it parses the input content.

The generated parser provides a main procedure YYParse which is the entry point to parse an input content. For this, it calls the YYLex function provided by the generated scanner.

The development model is that you write a scanner file that describes the lexical patterns and the grammar file in the BNF-like specification. Aflex reads the scanner file and generates the scanner. Ayacc reads the grammar file and produces the parser. The scanner and parser are composed of several Ada packages and they are used by each other to exchange some types and operations.

Aflex and Ayacc Development Model

Ayacc and Aflex need a number of Ada types for their work and the important types are controlled by the Ayacc grammar. They also share some global variables to hold a token that was parsed or describe the current parser state. The list of possible tokens is generated by Ayacc in the <Parser>_Tokens package and represented by the Token type.

package <Parser>_Tokens
  type Token is (T_AND, T_OR, T_STRING, ...);
end <Parser>_Tokens;

Aflex will generate the YYLex function that returns a Token type:

  function YYLex return <Parser>_Tokens.Token;

On its side Ayacc generates the YYParse procedure in the generated package body. It is up to you and your grammar to decide whether this procedure must be exported as is or encapsulated to another operation.

  procedure YYParse;

Writing a scanner with Aflex

The scanner file describes regular expressions and Ada code which are executed when some input text matches the regular expression. The scanner file contains basically three areas separated by a line containing only %%. The global layout is:

definitions
%%
rules
%%
user code

The scanner extracts below are taken from a CSS parser written in Ada, the complete scanner file is css-parser-lexer.l. I highlight here some interesting parts of that lexer.

The definitions section contains a set of configuration options that tells the Aflex generator how to generate the final Ada scanner. First, the %unit directive allows to control the Ada package name used by the final scanner. The scanner package can be a child package which gives a lot of flexibility and control for the program organization.

%unit CSS.Parser.Lexer
%option case-insensitive
%option yylineno

The %option case-insensitive tells Aflex to ignore case when generating the scanner and the %option yylineno instructs it to maintain a variable yylineno that indicates the current line number. It also maintains the yylinecol variable that indicates the column number of the current token. This is useful to keep track of token position in order to report them when an error occurs.

Because writing regular expressions is difficult, Aflex allows you to create named definitions where you give a name to a regular expression. A regular expression can use another named definition, hence simplifying the creation of complex regular expressions.

h               [0-9a-f]
nonascii        [\240-\377]
unicode         \\{h}{1,6}(\r\n|[ \t\r\n\f])?
escape          {unicode}|\\[^\r\n\f0-9a-f]
string1         \"([^\n\r\f\\"]|\\{nl}|{escape})*\"
string2         \'([^\n\r\f\\']|\\{nl}|{escape})*\'
string          {string1}|{string2}
nl              \n|\r\n|\r|\f

When all named definitions are written, the next section separated by the %% marker defines the rules that describe a pattern and an associated action. The pattern is a regular expression and the action corresponds to some Ada code that is executed. Aflex is not aware of the Ada grammar and does not check its validity. It uses the action code as is and write it in the final Ada file. When an action is complex and must be written in several lines, the action block can be enclosed by { and } (this syntax comes from the C lexer, the two symbols are of course omitted in the final program).

The extract below shows several pattern and actions:

%%
\-                      return '-';
\+                      return '+';
"and"                   { return T_AND; }
"or"                    { return T_OR; }
"not"                   { return T_NOT; }
{string}		{ Set_String (YYLVal, YYText, yylineno, yylinecol); return T_STRING; }

Here, when the scanner finds the - or + symbols, it returns the - and + token. If it finds the word 'and', it returns the token T_AND.

The last rule of this example will match a CSS string such as "red" or 'blue' and we will get these results when the YYText function is called. The action will execute the Set_String procedure whose purpose is to populate the YYLVal variable (not shown in the example, and for the curious, that procedure is provided by the private part of the parent package CSS.Parser, hence it is visible by the generated scanner package CSS.Parser.Lexer).

The last section of the lexer file contains the user Ada code that will be used for the Ada scanner code generation. This section should contain a definition of the scanner package specification with any necessary with units. It must also provide the declaration of the YYLex function that should return a Token type.

%%
with CSS.Parser.Parser_Tokens;
package CSS.Parser.Lexer is

   use CSS.Parser.Parser_Tokens;

   function YYLex return Token;

end CSS.Parser.Lexer;

with Ada.Text_IO;
with CSS.Parser.Lexer_dfa;
with CSS.Parser.Lexer_io;
package body CSS.Parser.Lexer is

   use CSS.Parser.Lexer_dfa;
   use CSS.Parser.Lexer_io;

   pragma Style_Checks (Off);
   pragma Warnings (Off);
##

end CSS.Parser.Lexer;

The package body must also be written and it should contain a ## marker which is the place where the Aflex generator will write the scanner code with the YYLex function body.

Because Aflex does not parse the Ada action code in the scanner file, it is not able to indent correctly the final program. The pragma Style_Checks (Off); is useful to remove the indentation warnings that the compiler could emit.

Aflex also generates a <Lexer>_io package and a <Lexer>_dfa package which provides helper operations for the generated scanner. The <Lexer>_dfa package defines two important functions that give access to the current token as a text.

package CSS.Parser.Lexer_dfa is
  function YYText return String;
  function YYLength return Integer;
end CSS.Parser.Lexer_dfa;

For example, if the scanner finds the input string "red", the YYLex function will return the T_STRING value and the YYText function will contain the string "red".

The <Lexer>_io package defines the Open_Input procedure that opens the file that must be parsed. The package also defines several other procedures and functions but most of them are dedicated to the scanner.

package CSS.Parser.Lexer_io is
  procedure Open_Input (Fname : in String);
end CSS.Parser.Lexer_io;

Code generation

The Ada scanner is generated by running the aflex tool. The -m option tells the tool to avoid emitting Ada with clauses to the Text_IO package, the -s option disables the default lex rule that emits an echo of unmatched scanner input to the standard output and the -L option disables the #line directives.

  aflex -ms -L css-parser-lexer.l
  gnatchop -w css-parser-lexer.ada

The tool generates a single Ada file that contains the specification and the body and it must be split by using gnatchop to produce the css-parser-lexer.ads specification and css-parser-lexer.adb package body.

Writing a parser with Ayacc

Ayacc uses the BNF-like grammar definition to generate the parser written in Ada. The grammar file is similar to a grammar file created for Yacc or Bison but it contains Ada code for the grammar actions. The grammar file starts with a set of definitions that enumerates the available tokens and declares the main type to describe a parser state.

Similar to the scanner file, the %unit directive allows to control the name of the generated Ada package. I've found very convenient to provide a parent Ada package that defines various types and operations that the generated parser can use. By using the %unit and specifying a child package, the generated parser can easily access those operations and types even if they are declared in the private part.

%unit CSS.Parser.Parser
%token T_STRING
%token T_AND
%token T_OR
%token T_NOT

The Ayacc generated parser uses a stack to keep track and maintain the state while parsing a content. Each stack entry represents a grammar rule that was recognized and it is possible to also record some specific information. For this, the declaration part must declare the YYSType Ada type. That type describes the state and information about a grammar rule. The parser pushes or pops values on the stack according to the grammar rules that are recognized (after a shift or reduce operation). The YYSType cannot be a limited type nor can it be abstract because the value must be copied to or from the YYLVal global variable.

In the following definition, the YYSType is in fact declared in the private part of the parent package CSS.Parser (again, that type is visible to the parser because it is one of its child package).

{
   subtype YYSType is CSS.Parser.YYstype;
}

After the declaration part, the %% separator introduces a list of BNF-like rules. Some grammar rules can be associated with some actions which are executed when the grammar rule is recognized. With Ayacc, the action is an Ada code which can contain $n constructs which give access to values of the rule elements. In fact, the $n values refers to values in the parser stack and the $$ value is the YYVal variable. When an action is executed, this is known as a reduce action and all the rule elements are replaced by the $$ result.

%%
expr :
    expr operator term
       { CSS.Parser.Set_Expr ($$, $1, $3); }
  |
    expr term
       { CSS.Parser.Set_Expr ($$, $1, $2); }
  |
    term  
  ;

In this grammar rule extract, the $$ refers to the current rule value (YYVal), the $1 contains the value of the first element and so on. It is not possible to exceed the number of elements described by the rule. These values are of type YYSType. When the rule action is finished, the elements matched by the rule are removed from the stack and the current value $$ (YYVal) is pushed on top of the stack.

It is possible to write complex action code in the grammar file but this is not very convenient. In many cases, it is easier to provide procedures or functions that the generated parser can use for the action to perform its work.

After the grammar rules, a last %% separator introduces the final Ada code used by the parser generator. That last section must contain the ## marker which represents the place where Ayacc will emit the body of the parser. The Ada code should also declare the yyerror procedure that is called by the generated parser when a syntax error is detected.

Below is a partial example of such final Ada code:

%%
with CSS.Core.Sheets;
package CSS.Parser.Parser is
   Error_Count : Natural := 0;

   function Parse (Content  : in String;
                   Document : in CSS.Core.Sheets.CSSStylesheet_Access) return Integer;

end CSS.Parser.Parser;

with CSS.Parser.Parser_Goto;
with CSS.Parser.Parser_Tokens;
with CSS.Parser.Parser_Shift_Reduce;
with CSS.Parser.Lexer_io;
with CSS.Parser.Lexer;
with CSS.Parser.Lexer_dfa;
with Ada.Text_IO;
package body CSS.Parser.Parser is

   procedure YYParse;

   procedure yyerror (Message : in String := "syntax error");

   procedure yyerror (Message : in String := "syntax error") is
   begin
      Error_Count := Error_Count + 1;
   end yyerror;

   function Parse (Content  : in String;
                   Document : in CSS.Core.Sheets.CSSStylesheet_Access) return Integer is
   begin
      Error_Count := 0;
      CSS.Parser.Lexer_Dfa.yylineno  := 1;
      CSS.Parser.Lexer_Dfa.yylinecol := 1;
      CSS.Parser.Lexer_IO.Open_Input (Content);
      CSS.Parser.Parser.Document := Document;
      YYParse;
      return Error_Count;
   end Parse;

##

end CSS.Parser.Parser;

To prepare the scanner to read a content, you will have to call the Open_Input procedure with the file name as parameter. Then, all you have to do is call the YYParse procedure that will call the YYLex function and proceed with the scanning and parsing of the input file.

Code generation

The Ada parser is generated by running the ayacc tool. The -n option controls the size of the parser stack (the default is 8192). The -k option tells the generator to keep the case of tokens as they are written in the grammar file (the default is that it converts the token using mixed case). The -s option prints some statistics about the grammar. This is useful to understand and check the shift/reduce and reduce/reduce conflicts that could arise in the grammar. The -e option controls the extension of the generated file.

  ayacc -n 256 -k -s -e .ada css-parser-parser.y
  gnatchop -w css-parser-parser.ada

The tool generates a single Ada file that contains the specification and the body and it must be split by using gnatchop to produce the css-parser-parser.ads specification and css-parser-parser.adb package body.

Going further

Both tools are very close to the lex/flex and yacc/bison Unix tools which means that most tips and documentation that you will find on Lex and Yacc are common and applicable to Aflex and Ayacc. Writing the grammar file is probably the most difficult task and the GNU Bison documentation seriously helps in that task.

Looking at how scanner and grammar files are written in some other projects can help. Below is a non exhaustive list of scanner and grammar files that can be processed by Aflex and Ayacc:

Scanner examples

Grammar examples

Add a comment

To add a comment, you must be connected. Login