Aflex Version 1.5.2021
The new release provides the following changes:
- Fix crash when the scanner file uses characters in range 128..255,
- Fixed various compilation warnings,
- Use gprbuild to build and support alr,
- Reduced number of style compilation warnings in generated code
To install Aflex, you can use the Alire tool by using:
alr get aflex
cd aflex_1.5.2021_33198b8f
alr build
You can also build and install it as easily with the following commands:
git clone https://github.com/Ada-France/aflex.git
cd aflex
make
sudo make install
You can also install the following Debian package: aflex_1.5.2021.
Ayacc Version 1.3.0
The Ayacc release provides a little bit more improvements:
- New option -C to disable the generation of yyclearin procedure,
- New option -E to disable the generation of yyerrok procedure,
- New option -D to write the generated files to the directory specified,
- New option -k to keep the case of grammar symbols,
- Fixed various compilation warnings,
- Generate constants for shift reduce and goto arrays,
- Better strong typing in the generated state tables,
- Reduced number of style compilation warnings in generated code
To install Ayacc with Alire tool use:
alr get ayacc
cd ayacc_1.3.0_9b8bf854
alr build
You can also build and install it as easily with the following commands:
git clone https://github.com/Ada-France/ayacc.git
cd ayacc
make
sudo make install
You can also install the following Debian package: ayacc_1.3.0.
Development with Aflex and Ayacc
Aflex is a tool for generating a scanner that recognizes lexical patterns in text.
Ayacc uses a BNF-like specification of a grammar to generate an Ada parser for that grammar.
These two tools are tightly coupled together. First the scanner's role is to identify tokens that are used by the grammar file.
Second, the parser will use the scanner to ask for tokens as it parses the input content.
The generated parser provides a main procedure YYParse which is the entry point to parse an input
content. For this, it calls the YYLex function provided by the generated scanner.
The development model is that you write a scanner file that describes the lexical patterns
and the grammar file in the BNF-like specification. Aflex reads the scanner file and generates the
scanner. Ayacc reads the grammar file and produces the parser. The scanner and parser are composed of
several Ada packages and they are used by each other to exchange some types and operations.

Ayacc and Aflex need a number of Ada types for their work and the important types are controlled by the Ayacc grammar.
They also share some global variables to hold a token that was parsed or describe the current parser state.
The list of possible tokens is generated by Ayacc in the <Parser>_Tokens package and represented by the Token type.
package <Parser>_Tokens
type Token is (T_AND, T_OR, T_STRING, ...);
end <Parser>_Tokens;
Aflex will generate the YYLex function that returns a Token type:
function YYLex return <Parser>_Tokens.Token;
On its side Ayacc generates the YYParse procedure in the generated package body. It is up to you and your grammar
to decide whether this procedure must be exported as is or encapsulated to another operation.
procedure YYParse;
Writing a scanner with Aflex
The scanner file describes regular expressions and
Ada code which are executed when some input text matches the regular expression.
The scanner file contains basically three areas separated by a line containing only %%.
The global layout is:
definitions
%%
rules
%%
user code
The scanner extracts below are taken from a CSS parser written in Ada, the complete
scanner file is css-parser-lexer.l.
I highlight here some interesting parts of that lexer.
The definitions section contains a set of configuration options that tells the Aflex generator
how to generate the final Ada scanner. First, the %unit directive allows to control the Ada package
name used by the final scanner. The scanner package can be a child package which gives a lot of flexibility
and control for the program organization.
%unit CSS.Parser.Lexer
%option case-insensitive
%option yylineno
The %option case-insensitive tells Aflex to ignore case when generating the scanner
and the %option yylineno instructs it to maintain a variable yylineno that indicates
the current line number. It also maintains the yylinecol variable that indicates the
column number of the current token. This is useful to keep track of token position in
order to report them when an error occurs.
Because writing regular expressions is difficult, Aflex allows you to create named definitions
where you give a name to a regular expression. A regular expression can use another named
definition, hence simplifying the creation of complex regular expressions.
h [0-9a-f]
nonascii [\240-\377]
unicode \\{h}{1,6}(\r\n|[ \t\r\n\f])?
escape {unicode}|\\[^\r\n\f0-9a-f]
string1 \"([^\n\r\f\\"]|\\{nl}|{escape})*\"
string2 \'([^\n\r\f\\']|\\{nl}|{escape})*\'
string {string1}|{string2}
nl \n|\r\n|\r|\f
When all named definitions are written, the next section separated by the %% marker
defines the rules that describe a pattern and an associated action. The pattern is
a regular expression and the action corresponds to some Ada code that is executed. Aflex is not
aware of the Ada grammar and does not check its validity. It uses the action code as
is and write it in the final Ada file. When an action is complex and must be written in several
lines, the action block can be enclosed by { and } (this syntax comes from the C
lexer, the two symbols are of course omitted in the final program).
The extract below shows several pattern and actions:
%%
\- return '-';
\+ return '+';
"and" { return T_AND; }
"or" { return T_OR; }
"not" { return T_NOT; }
{string} { Set_String (YYLVal, YYText, yylineno, yylinecol); return T_STRING; }
Here, when the scanner finds the - or + symbols, it returns the - and + token.
If it finds the word 'and', it returns the token T_AND.
The last rule of this example will match a CSS string such as "red" or 'blue' and we will
get these results when the YYText function is called. The action will execute the Set_String procedure
whose purpose is to populate the YYLVal variable (not shown in the example, and for the curious,
that procedure is provided by the private part of the parent package CSS.Parser,
hence it is visible by the generated scanner package CSS.Parser.Lexer).
The last section of the lexer file contains the user Ada code that will be used for the
Ada scanner code generation. This section should contain a definition of the scanner package
specification with any necessary with units. It must also provide the declaration of the YYLex function
that should return a Token type.
%%
with CSS.Parser.Parser_Tokens;
package CSS.Parser.Lexer is
use CSS.Parser.Parser_Tokens;
function YYLex return Token;
end CSS.Parser.Lexer;
with Ada.Text_IO;
with CSS.Parser.Lexer_dfa;
with CSS.Parser.Lexer_io;
package body CSS.Parser.Lexer is
use CSS.Parser.Lexer_dfa;
use CSS.Parser.Lexer_io;
pragma Style_Checks (Off);
pragma Warnings (Off);
##
end CSS.Parser.Lexer;
The package body must also be written and it should contain a ## marker which is the place where
the Aflex generator will write the scanner code with the YYLex function body.
Because Aflex does not parse the Ada action code in the scanner file, it is not able to indent correctly the
final program. The pragma Style_Checks (Off); is useful to remove the indentation warnings that the
compiler could emit.
Aflex also generates a <Lexer>_io package and a <Lexer>_dfa package which provides helper
operations for the generated scanner. The <Lexer>_dfa package defines two important functions
that give access to the current token as a text.
package CSS.Parser.Lexer_dfa is
function YYText return String;
function YYLength return Integer;
end CSS.Parser.Lexer_dfa;
For example, if the scanner finds the input string "red", the YYLex function will return the T_STRING value
and the YYText function will contain the string "red".
The <Lexer>_io package defines the Open_Input procedure that opens the file that must be parsed.
The package also defines several other procedures and functions but most of them are dedicated to the scanner.
package CSS.Parser.Lexer_io is
procedure Open_Input (Fname : in String);
end CSS.Parser.Lexer_io;
Code generation
The Ada scanner is generated by running the aflex tool.
The -m option tells the tool to avoid emitting Ada with clauses to the Text_IO package,
the -s option disables the default lex rule that emits an echo of unmatched scanner input
to the standard output and the -L option disables the #line directives.
aflex -ms -L css-parser-lexer.l
gnatchop -w css-parser-lexer.ada
The tool generates a single Ada file that contains the specification and the body and
it must be split by using gnatchop to produce the css-parser-lexer.ads specification
and css-parser-lexer.adb package body.
Writing a parser with Ayacc
Ayacc uses the BNF-like grammar definition to generate the parser written in Ada.
The grammar file is similar to a grammar file created for Yacc or Bison but it contains Ada code for the grammar actions. The grammar file starts with
a set of definitions that enumerates the available tokens and declares the main type to describe a parser state.
Similar to the scanner file, the %unit directive allows to control the name of the generated Ada package.
I've found very convenient to provide a parent Ada package that defines various types and operations that
the generated parser can use. By using the %unit and specifying a child package, the generated parser
can easily access those operations and types even if they are declared in the private part.
%unit CSS.Parser.Parser
%token T_STRING
%token T_AND
%token T_OR
%token T_NOT
The Ayacc generated parser uses a stack to keep track and maintain the state while parsing a content.
Each stack entry represents a grammar rule that was recognized and it is possible to also record some
specific information. For this, the declaration part must declare the YYSType Ada type.
That type describes the state and information about a grammar rule.
The parser pushes or pops values on the stack according to the grammar rules that are recognized
(after a shift or reduce operation).
The YYSType cannot be a limited type nor can it be abstract because the value must be copied
to or from the YYLVal global variable.
In the following definition, the YYSType is in fact declared in the private part of the parent
package CSS.Parser (again, that type is visible to the parser because it is one of its child package).
{
subtype YYSType is CSS.Parser.YYstype;
}
After the declaration part, the %% separator introduces a list of BNF-like rules.
Some grammar rules can be associated with some actions which are executed when
the grammar rule is recognized. With Ayacc, the action is an Ada code which can
contain $n constructs which give access to values of the rule elements.
In fact, the $n values refers to values in the parser stack and
the $$ value is the YYVal variable. When an action is executed, this is known as
a reduce action and all the rule elements are replaced by the $$ result.
%%
expr :
expr operator term
{ CSS.Parser.Set_Expr ($$, $1, $3); }
|
expr term
{ CSS.Parser.Set_Expr ($$, $1, $2); }
|
term
;
In this grammar rule extract, the $$ refers to the current rule value (YYVal), the $1 contains the value
of the first element and so on. It is not possible to exceed the number of elements described by the rule.
These values are of type YYSType.
When the rule action is finished, the elements matched by the rule are removed from the stack
and the current value $$ (YYVal) is pushed on top of the stack.
It is possible to write complex action code in the grammar file but this is not very convenient.
In many cases, it is easier to provide procedures or functions that the generated parser can use
for the action to perform its work.
After the grammar rules, a last %% separator introduces the final Ada code used by the parser generator.
That last section must contain the ## marker which represents the place where Ayacc will emit the body of
the parser. The Ada code should also declare the yyerror procedure that is called by the generated
parser when a syntax error is detected.
Below is a partial example of such final Ada code:
%%
with CSS.Core.Sheets;
package CSS.Parser.Parser is
Error_Count : Natural := 0;
function Parse (Content : in String;
Document : in CSS.Core.Sheets.CSSStylesheet_Access) return Integer;
end CSS.Parser.Parser;
with CSS.Parser.Parser_Goto;
with CSS.Parser.Parser_Tokens;
with CSS.Parser.Parser_Shift_Reduce;
with CSS.Parser.Lexer_io;
with CSS.Parser.Lexer;
with CSS.Parser.Lexer_dfa;
with Ada.Text_IO;
package body CSS.Parser.Parser is
procedure YYParse;
procedure yyerror (Message : in String := "syntax error");
procedure yyerror (Message : in String := "syntax error") is
begin
Error_Count := Error_Count + 1;
end yyerror;
function Parse (Content : in String;
Document : in CSS.Core.Sheets.CSSStylesheet_Access) return Integer is
begin
Error_Count := 0;
CSS.Parser.Lexer_Dfa.yylineno := 1;
CSS.Parser.Lexer_Dfa.yylinecol := 1;
CSS.Parser.Lexer_IO.Open_Input (Content);
CSS.Parser.Parser.Document := Document;
YYParse;
return Error_Count;
end Parse;
##
end CSS.Parser.Parser;
To prepare the scanner to read a content, you will have to call the Open_Input procedure with the file name as parameter.
Then, all you have to do is call the YYParse procedure that will call the YYLex function and proceed with the scanning
and parsing of the input file.
Code generation
The Ada parser is generated by running the ayacc tool.
The -n option controls the size of the parser stack (the default is 8192).
The -k option tells the generator to keep the case of tokens as they are written in the grammar file
(the default is that it converts the token using mixed case).
The -s option prints some statistics about the grammar. This is useful to understand and check the
shift/reduce and reduce/reduce conflicts that could arise in the grammar.
The -e option controls the extension of the generated file.
ayacc -n 256 -k -s -e .ada css-parser-parser.y
gnatchop -w css-parser-parser.ada
The tool generates a single Ada file that contains the specification and the body and
it must be split by using gnatchop to produce the css-parser-parser.ads specification
and css-parser-parser.adb package body.
Going further
Both tools are very close to the lex/flex and yacc/bison Unix tools which means that most tips and documentation that you will
find on Lex and Yacc are common and applicable to Aflex and Ayacc. Writing the grammar file is probably the most
difficult task and the GNU Bison documentation
seriously helps in that task.
Looking at how scanner and grammar files are written in some other projects can help. Below is a non exhaustive list
of scanner and grammar files that can be processed by Aflex and Ayacc:
Scanner examples
Grammar examples
Add a comment
To add a comment, you must be connected. Login