Aflex is a lexical analyzer generating tool similar to the Unix tool lex (1). Ayacc is an Ada parser generator in the style of yacc (1). New releases are available for these two tools and they bring a number of small improvements to these tools written in Ada 83 in the 1990s by John Self, David Taback and Deepak Tolani at the University of California, Irvine.
Aflex 1.5 and Ayacc 1.3.0
By Stephane Carrez2021-12-19 13:03:00
Aflex Version 1.5.2021
The new release provides the following changes:
- Fix crash when the scanner file uses characters in range 128..255,
- Fixed various compilation warnings,
- Use
gprbuild
to build and supportalr
, - Reduced number of style compilation warnings in generated code
To install Aflex, you can use the Alire tool by using:
alr get aflex
cd aflex_1.5.2021_33198b8f
alr build
You can also build and install it as easily with the following commands:
git clone https://github.com/Ada-France/aflex.git
cd aflex
make
sudo make install
You can also install the following Debian package: aflex_1.5.2021.
Ayacc Version 1.3.0
The Ayacc release provides a little bit more improvements:
- New option
-C
to disable the generation ofyyclearin
procedure, - New option
-E
to disable the generation ofyyerrok
procedure, - New option
-D
to write the generated files to the directory specified, - New option
-k
to keep the case of grammar symbols, - Fixed various compilation warnings,
- Generate constants for shift reduce and goto arrays,
- Better strong typing in the generated state tables,
- Reduced number of style compilation warnings in generated code
To install Ayacc with Alire tool use:
alr get ayacc
cd ayacc_1.3.0_9b8bf854
alr build
You can also build and install it as easily with the following commands:
git clone https://github.com/Ada-France/ayacc.git
cd ayacc
make
sudo make install
You can also install the following Debian package: ayacc_1.3.0.
Development with Aflex and Ayacc
Aflex is a tool for generating a scanner that recognizes lexical patterns in text. Ayacc uses a BNF-like specification of a grammar to generate an Ada parser for that grammar. These two tools are tightly coupled together. First the scanner's role is to identify tokens that are used by the grammar file. Second, the parser will use the scanner to ask for tokens as it parses the input content.
The generated parser provides a main procedure YYParse
which is the entry point to parse an input content. For this, it calls the YYLex
function provided by the generated scanner.
The development model is that you write a scanner file that describes the lexical patterns and the grammar file in the BNF-like specification. Aflex reads the scanner file and generates the scanner. Ayacc reads the grammar file and produces the parser. The scanner and parser are composed of several Ada packages and they are used by each other to exchange some types and operations.
Ayacc and Aflex need a number of Ada types for their work and the important types are controlled by the Ayacc grammar. They also share some global variables to hold a token that was parsed or describe the current parser state. The list of possible tokens is generated by Ayacc in the <Parser>_Tokens
package and represented by the Token
type.
package <Parser>_Tokens
type Token is (T_AND, T_OR, T_STRING, ...);
end <Parser>_Tokens;
Aflex will generate the YYLex
function that returns a Token
type:
function YYLex return <Parser>_Tokens.Token;
On its side Ayacc generates the YYParse
procedure in the generated package body. It is up to you and your grammar to decide whether this procedure must be exported as is or encapsulated to another operation.
procedure YYParse;
Writing a scanner with Aflex
The scanner file describes regular expressions and Ada code which are executed when some input text matches the regular expression. The scanner file contains basically three areas separated by a line containing only %%
. The global layout is:
definitions
%%
rules
%%
user code
The scanner extracts below are taken from a CSS parser written in Ada, the complete scanner file is css-parser-lexer.l. I highlight here some interesting parts of that lexer.
The definitions section contains a set of configuration options that tells the Aflex generator how to generate the final Ada scanner. First, the %unit
directive allows to control the Ada package name used by the final scanner. The scanner package can be a child package which gives a lot of flexibility and control for the program organization.
%unit CSS.Parser.Lexer
%option case-insensitive
%option yylineno
The %option case-insensitive
tells Aflex to ignore case when generating the scanner and the %option yylineno
instructs it to maintain a variable yylineno
that indicates the current line number. It also maintains the yylinecol
variable that indicates the column number of the current token. This is useful to keep track of token position in order to report them when an error occurs.
Because writing regular expressions is difficult, Aflex allows you to create named definitions where you give a name to a regular expression. A regular expression can use another named definition, hence simplifying the creation of complex regular expressions.
h [0-9a-f]
nonascii [\240-\377]
unicode \\{h}{1,6}(\r\n|[ \t\r\n\f])?
escape {unicode}|\\[^\r\n\f0-9a-f]
string1 \"([^\n\r\f\\"]|\\{nl}|{escape})*\"
string2 \'([^\n\r\f\\']|\\{nl}|{escape})*\'
string {string1}|{string2}
nl \n|\r\n|\r|\f
When all named definitions are written, the next section separated by the %%
marker defines the rules that describe a pattern and an associated action. The pattern is a regular expression and the action corresponds to some Ada code that is executed. Aflex is not aware of the Ada grammar and does not check its validity. It uses the action code as is and write it in the final Ada file. When an action is complex and must be written in several lines, the action block can be enclosed by {
and }
(this syntax comes from the C lexer, the two symbols are of course omitted in the final program).
The extract below shows several pattern and actions:
%%
\- return '-';
\+ return '+';
"and" { return T_AND; }
"or" { return T_OR; }
"not" { return T_NOT; }
{string} { Set_String (YYLVal, YYText, yylineno, yylinecol); return T_STRING; }
Here, when the scanner finds the -
or +
symbols, it returns the -
and +
token. If it finds the word 'and
', it returns the token T_AND
.
The last rule of this example will match a CSS string such as "red"
or 'blue'
and we will get these results when the YYText
function is called. The action will execute the Set_String
procedure whose purpose is to populate the YYLVal
variable (not shown in the example, and for the curious, that procedure is provided by the private part of the parent package CSS.Parser
, hence it is visible by the generated scanner package CSS.Parser.Lexer
).
The last section of the lexer file contains the user Ada code that will be used for the Ada scanner code generation. This section should contain a definition of the scanner package specification with any necessary with
units. It must also provide the declaration of the YYLex
function that should return a Token
type.
%%
with CSS.Parser.Parser_Tokens;
package CSS.Parser.Lexer is
use CSS.Parser.Parser_Tokens;
function YYLex return Token;
end CSS.Parser.Lexer;
with Ada.Text_IO;
with CSS.Parser.Lexer_dfa;
with CSS.Parser.Lexer_io;
package body CSS.Parser.Lexer is
use CSS.Parser.Lexer_dfa;
use CSS.Parser.Lexer_io;
pragma Style_Checks (Off);
pragma Warnings (Off);
##
end CSS.Parser.Lexer;
The package body must also be written and it should contain a ##
marker which is the place where the Aflex generator will write the scanner code with the YYLex
function body.
Because Aflex does not parse the Ada action code in the scanner file, it is not able to indent correctly the final program. The pragma Style_Checks (Off);
is useful to remove the indentation warnings that the compiler could emit.
Aflex also generates a <Lexer>_io
package and a <Lexer>_dfa
package which provides helper operations for the generated scanner. The <Lexer>_dfa
package defines two important functions that give access to the current token as a text.
package CSS.Parser.Lexer_dfa is
function YYText return String;
function YYLength return Integer;
end CSS.Parser.Lexer_dfa;
For example, if the scanner finds the input string "red"
, the YYLex
function will return the T_STRING
value and the YYText
function will contain the string "red"
.
The <Lexer>_io
package defines the Open_Input
procedure that opens the file that must be parsed. The package also defines several other procedures and functions but most of them are dedicated to the scanner.
package CSS.Parser.Lexer_io is
procedure Open_Input (Fname : in String);
end CSS.Parser.Lexer_io;
Code generation
The Ada scanner is generated by running the aflex
tool. The -m
option tells the tool to avoid emitting Ada with
clauses to the Text_IO
package, the -s
option disables the default lex rule that emits an echo of unmatched scanner input to the standard output and the -L
option disables the #line
directives.
aflex -ms -L css-parser-lexer.l
gnatchop -w css-parser-lexer.ada
The tool generates a single Ada file that contains the specification and the body and it must be split by using gnatchop
to produce the css-parser-lexer.ads specification and css-parser-lexer.adb package body.
Writing a parser with Ayacc
Ayacc uses the BNF-like grammar definition to generate the parser written in Ada. The grammar file is similar to a grammar file created for Yacc or Bison but it contains Ada code for the grammar actions. The grammar file starts with a set of definitions that enumerates the available tokens and declares the main type to describe a parser state.
Similar to the scanner file, the %unit
directive allows to control the name of the generated Ada package. I've found very convenient to provide a parent Ada package that defines various types and operations that the generated parser can use. By using the %unit
and specifying a child package, the generated parser can easily access those operations and types even if they are declared in the private part.
%unit CSS.Parser.Parser
%token T_STRING
%token T_AND
%token T_OR
%token T_NOT
The Ayacc generated parser uses a stack to keep track and maintain the state while parsing a content. Each stack entry represents a grammar rule that was recognized and it is possible to also record some specific information. For this, the declaration part must declare the YYSType
Ada type. That type describes the state and information about a grammar rule. The parser pushes or pops values on the stack according to the grammar rules that are recognized (after a shift or reduce operation). The YYSType
cannot be a limited type nor can it be abstract because the value must be copied to or from the YYLVal
global variable.
In the following definition, the YYSType
is in fact declared in the private part of the parent package CSS.Parser
(again, that type is visible to the parser because it is one of its child package).
{
subtype YYSType is CSS.Parser.YYstype;
}
After the declaration part, the %%
separator introduces a list of BNF-like rules. Some grammar rules can be associated with some actions which are executed when the grammar rule is recognized. With Ayacc, the action is an Ada code which can contain $n
constructs which give access to values of the rule elements. In fact, the $n
values refers to values in the parser stack and the $$
value is the YYVal
variable. When an action is executed, this is known as a reduce action and all the rule elements are replaced by the $$
result.
%%
expr :
expr operator term
{ CSS.Parser.Set_Expr ($$, $1, $3); }
|
expr term
{ CSS.Parser.Set_Expr ($$, $1, $2); }
|
term
;
In this grammar rule extract, the $$
refers to the current rule value (YYVal
), the $1
contains the value of the first element and so on. It is not possible to exceed the number of elements described by the rule. These values are of type YYSType
. When the rule action is finished, the elements matched by the rule are removed from the stack and the current value $$
(YYVal
) is pushed on top of the stack.
It is possible to write complex action code in the grammar file but this is not very convenient. In many cases, it is easier to provide procedures or functions that the generated parser can use for the action to perform its work.
After the grammar rules, a last %%
separator introduces the final Ada code used by the parser generator. That last section must contain the ##
marker which represents the place where Ayacc will emit the body of the parser. The Ada code should also declare the yyerror
procedure that is called by the generated parser when a syntax error is detected.
Below is a partial example of such final Ada code:
%%
with CSS.Core.Sheets;
package CSS.Parser.Parser is
Error_Count : Natural := 0;
function Parse (Content : in String;
Document : in CSS.Core.Sheets.CSSStylesheet_Access) return Integer;
end CSS.Parser.Parser;
with CSS.Parser.Parser_Goto;
with CSS.Parser.Parser_Tokens;
with CSS.Parser.Parser_Shift_Reduce;
with CSS.Parser.Lexer_io;
with CSS.Parser.Lexer;
with CSS.Parser.Lexer_dfa;
with Ada.Text_IO;
package body CSS.Parser.Parser is
procedure YYParse;
procedure yyerror (Message : in String := "syntax error");
procedure yyerror (Message : in String := "syntax error") is
begin
Error_Count := Error_Count + 1;
end yyerror;
function Parse (Content : in String;
Document : in CSS.Core.Sheets.CSSStylesheet_Access) return Integer is
begin
Error_Count := 0;
CSS.Parser.Lexer_Dfa.yylineno := 1;
CSS.Parser.Lexer_Dfa.yylinecol := 1;
CSS.Parser.Lexer_IO.Open_Input (Content);
CSS.Parser.Parser.Document := Document;
YYParse;
return Error_Count;
end Parse;
##
end CSS.Parser.Parser;
To prepare the scanner to read a content, you will have to call the Open_Input
procedure with the file name as parameter. Then, all you have to do is call the YYParse
procedure that will call the YYLex
function and proceed with the scanning and parsing of the input file.
Code generation
The Ada parser is generated by running the ayacc
tool. The -n
option controls the size of the parser stack (the default is 8192). The -k
option tells the generator to keep the case of tokens as they are written in the grammar file (the default is that it converts the token using mixed case). The -s
option prints some statistics about the grammar. This is useful to understand and check the shift/reduce and reduce/reduce conflicts that could arise in the grammar. The -e
option controls the extension of the generated file.
ayacc -n 256 -k -s -e .ada css-parser-parser.y
gnatchop -w css-parser-parser.ada
The tool generates a single Ada file that contains the specification and the body and it must be split by using gnatchop
to produce the css-parser-parser.ads specification and css-parser-parser.adb package body.
Going further
Both tools are very close to the lex/flex and yacc/bison Unix tools which means that most tips and documentation that you will find on Lex and Yacc are common and applicable to Aflex and Ayacc. Writing the grammar file is probably the most difficult task and the GNU Bison documentation seriously helps in that task.
Looking at how scanner and grammar files are written in some other projects can help. Below is a non exhaustive list of scanner and grammar files that can be processed by Aflex and Ayacc:
Scanner examples
- Aflex example example.l
- Aflex example test95.l
- Aflex example options.l
- Ayacc calc example calc_lex.l
- Ayacc Ada83 scanner example ada_lex.l
- Ada CSS, CSS 3 scanner css-parser-lexer.l
- Ada CSS, CSS 3 definition rule scanner css-analysis-parser-lexer.l
- Memory Analysis Tool, expression scanner mat-expressions-lexer.l
Grammar examples
- Ayacc calc example calc.y
- Ayacc Ada83 grammar example ada.y
- Ada CSS, CSS 3 parser css-parser-parser.y
- Ada CSS, CSS 3 definition rule parser css-analysis-parser-parser.y
- Memory Analysis Tool, expression parser mat-expressions-parser.y
Tags
- Facelet
- NetBSD
- framework
- Mysql
- generator
- files
- application
- gcc
- ReadyNAS
- Security
- binutils
- ELF
- JSF
- Java
- bacula
- Tutorial
- Apache
- COFF
- collaboration
- planning
- project
- upgrade
- AWA
- C
- EL
- J2EE
- UML
- php
- symfony
- Ethernet
- Ada
- FreeBSD
- Go
- KVM
- MDE
- Proxy
- STM32
- Servlet
- backup
- lvm
- multiprocessing
- web
- Bean
- Jenkins
- release
- OAuth
- ProjectBar
- REST
- Rewrite
- Sqlite
- Storage
- USB
- Ubuntu
- bison
- cache
- crash
- Linux
- firefox
- performance
- interview
Add a comment
To add a comment, you must be connected. Login