Leniel Maccaferri's blog: compilers construction

Showing posts with label compilers construction. Show all posts

Fortran Numerical Constants Recognizer

Posted by Leniel Maccaferri on 2/09/2009 01:19:00 AM

Compilers construction paper
I and a dear brother in faith of mine called Wellington Magalhães Leite wrote a paper titled: Fortran Numerical Constants Recognizer

It exercises concepts related to state transition diagram, finite state machine and automata theory.

Needless to say, compilers construction was one of the best disciplines or even the best discipline I had in the computer engineering course. I really like this kind of stuff. State machines and automata theory really rocks! Automata theory is other discipline I had that really got my attention.

State Transition Diagram
This is the lexical state transition diagram in which our work is based:

Fortran Numerical Constants - State Transition Diagram

Alphabet = {d, +, - , . , E} (d is any digit)

The arches of any vertex to an Error vertex aren’t showed but they exist.

Expressions like the following are recognized: 13, +13, -13, 13., 1.25, .25, -.25, -32.43, 13E-15, 13.E-15, -13.25E+72, .75E5, etc.

Expressions like the following aren’t recognized: ++13, .E13, etc.

The vertex with a 0 (zero) value is the initial state.

The vertexes with bold circles are the final states.

Note: It’s important to note that lexical state transition diagrams are finite automatons.

Abstract
A didactic method for the construction of a compiler front-end is the one substantiated in transition diagrams. So, its construction helps with the familiarization regarding the tasks of a compiler project.

This paper presents a Fortran numerical constants recognizer. It is developed based on a state transition diagram and its implementation follows the standards of the C# programming language.

Keywords: fortran, compiler construction, state transition diagram, C# programming language

CONTENTS
1 INTRODUCTION 6
  1.1 Objective	6
  1.2 Definition 6
2 DEVELOPMENT 7
  2.1 Mapping the constants 7
  2.2 Mapping the functions 7
  2.3 Application main entry point 10
3 APPLICATION 12
  3.1 Validating expressions 12
      3.1.1 Single expression 12
      3.1.2 Set of expressions 13
4 CONCLUSION 14
5 REFERENCES 15
6 ADDENDUM 16

Resumo
Um método muito didático para a construção do front-end de um compilador é aquele fundamentado em diagramas de transição. Logo, seu estudo ajuda na familiarização com as atividades envolvidas no projeto de um compilador.

Esse trabalho apresenta um sistema reconhecedor de constantes numéricas em Fortran. O mesmo é desenvolvido a partir de um diagrama de transição de estados e sua codificação segue os padrões e moldes da linguagem de programação C#.

Palavras-chave: fortran, construção de compiladores, diagrama de transição de estados, linguagem de programação C#

SUMÁRIO
1 Introdução 5
   1.1 Objetivo	5
   1.2 Definição 5
2 Desenvolvimento 6
   2.1 Mapeamento das constantes 6
   2.2 Mapeamento das funções 6
   2.3 Mapeamento da chamada principal 10
3 Aplicação 12
   3.1 Validação de expressões 12
       3.1.1 Uma única expressão 13
       3.1.2 Uma lista de expressões 14
4 Conclusão 15
5 Bibliografia	16
6 Anexo 17

Source code

class StateTransitionSystem
{
  public static void Main(string[] args)
  {
    Console.Title = "State Transition System for recognizing numeric constants in FORTRAN";

    Console.BackgroundColor = ConsoleColor.White;
    Console.ForegroundColor = ConsoleColor.Black;

    char ch;

    do
    {
      Console.Clear();

      // Print startup banner
      Console.Write("\nState Transition System C# Sample Application\n");
      Console.Write("Copyright ©2006 Leniel Braz de Oliveira Macaferi & Wellington Magalhães Leite.\n\n");
      Console.Write("UBM COMPUTER ENGINEERING - 7TH SEMESTER [http://www.ubm.br/]\n\n");

      // Describes program function
      Console.Write("This program example demonstrates the State Transition Diagram's algorithm for\n");
      Console.Write("numeric constants validation in FORTRAN.\n\n");

      // Describes program's options     
      Console.Write("You can validate expressions by two different ways as follow:\n\n");
      Console.Write("1 - A single expression by providing an entry.\n\n");
      Console.Write("2 - A set of expressions by providing an input file.\n");
      Console.Write("  * Notice: the expressions must be separeted in-lines.\n");

      Console.Write("\n\nEnter your choice: ");

      ch = Console.ReadKey().KeyChar;

      switch(ch)
      {
        case '1':
          {
            SingleExpression();
            break;
          }
        case '2':
          {
            SetOfExpressions();
            break;
          }
      }
    }
    while(ch == '1' || ch == '2');
  }

  /// <summary>
  /// Enumeration where each item corresponds to one state in the State Transition Diagram.
  /// </summary>
  public enum PossibleStates
  {
    s0 = 0,
    s1,
    s2,
    s3,
    s4,
    s5,
    s6,
    s7,
    error
  }

  /// <summary>
  /// Array of type PossibleStates, which contains the finalStates acceptable by the
  /// State Transition Diagram.
  /// </summary>
  public static PossibleStates[] finalStates = { PossibleStates.s2, PossibleStates.s3, PossibleStates.s6 };

  /// <summary>
  /// Verifies if the last state achieved by the machine belongs to the finalStates
  /// array.
  /// </summary>
  public static bool IsFinalState(PossibleStates currentState)
  {
    for(int i = 0; i < finalStates.Length; i++)
      if(currentState == finalStates[i])
        return true;

    return false;
  }

  /// <summary>
  /// Recognizes the current state and the character “label” being analysed, values
  /// passed as parameters. After, the function does the transition of state case some
  /// condition is satisfied, otherwise, the function will return an error flag.
  /// </summary>
  public static PossibleStates Recognizer(PossibleStates currentState, char c)
  {
    switch(currentState)
    {
      case PossibleStates.s0:
        {
          if(c == '+' || c == '-')
            return PossibleStates.s1;

          if(char.IsDigit(c))
            return PossibleStates.s2;

          if(c == '.')
            return PossibleStates.s4;

          break;
        }

      case PossibleStates.s1:
        {
          if(char.IsDigit(c))
            return PossibleStates.s2;

          if(c == '.')
            return PossibleStates.s4;

          break;
        }

      case PossibleStates.s2:
        {
          if(char.IsDigit(c))
            return PossibleStates.s2;

          if(c == '.')
            return PossibleStates.s3;

          if(c == 'E')
            return PossibleStates.s5;

          break;
        }

      case PossibleStates.s3:
        {
          if(char.IsDigit(c))
            return PossibleStates.s3;

          if(c == 'E')
            return PossibleStates.s5;

          break;
        }

      case PossibleStates.s4:
        {
          if(char.IsDigit(c))
            return PossibleStates.s3;

          break;
        }

      case PossibleStates.s5:
        {
          if(char.IsDigit(c))
            return PossibleStates.s6;

          if(c == '+' || c == '-')
            return PossibleStates.s7;

          break;
        }

      case PossibleStates.s6:
        {
          if(char.IsDigit(c))
            return PossibleStates.s6;

          break;
        }

      case PossibleStates.s7:
        {
          if(char.IsDigit(c))
            return PossibleStates.s6;

          break;
        }
    }
    return PossibleStates.error;
  }

  /// <summary>
  /// Reads an input expression, recognizes its characters, changes the states
  /// accordingly to those characters and hence validates the entry.
  /// </summary>
  public static void SingleExpression()
  {
    do
    {
      // The machine points to the initial state of the State Transition Diagram.
      PossibleStates currentState = PossibleStates.s0;

      Console.Write("\n\nEnter the expression to be evaluated: ");

      // strExpression receives the entry typed by the user.
      string strExpression = Console.ReadLine();

      /* For each string's character (label), calls the function Recognizer that
      on the other hand changes the machine state accordingly. */
      for(int i = 0; strExpression.Length > i; ++i)
        if(currentState != PossibleStates.error)
          currentState = Recognizer(currentState, strExpression[i]);
        else
          break;

      /* Calls the function IsFinalState to verify if the state where the machine
       stopped is a final state or not. */
      if(IsFinalState(currentState))
        Console.WriteLine("\n Valid expression.\n");
      else
        Console.WriteLine("\n Invalid expression!\n");

      Console.Write("Do you wanna try again? (y\\n) ");
    }
    while(Console.ReadKey().KeyChar == 'y');
  }

  /// <summary>
  /// Reads an input file, recognizes its lines, expression by expression and changes
  /// the states accordingly to each expression. In other words, validates the entire
  /// list.
  /// </summary>
  public static void SetOfExpressions()
  {
    do
    {
      Console.Write("\n\nEnter the file path: ");

      // Obtains the file name.
      string fileName = Console.ReadLine();

      // Verifies if the file exists.
      if(!File.Exists(fileName))
        Console.Write("\n File not found!\n\n");
      else
      {
        // Reads all the file's lines and stores them.
        StreamReader sReader = new StreamReader(fileName);

        string expression;

        // Evaluates each line until achieve the EOF (end of file).
        while((expression = sReader.ReadLine()) != null)
        {
          // The machine points to the initial state of the State Transition Diagram.
          PossibleStates currentState = PossibleStates.s0;

          /* For each expression's character (label), calls the function Recognizer that
          on the other hand changes the machine state accordingly. */
          for(int i = 0; expression.Length > i; ++i)
            if(currentState != PossibleStates.error)
              currentState = Recognizer(currentState, expression[i]);
            else
              break;

          /* Calls the function IsFinalState to verify if the state where the machine
           stopped for the expression is a final state or not. */
          if(IsFinalState(currentState))
            Console.WriteLine("\n{0} is a valid expression.", expression);
          else
            Console.WriteLine("\n{0} is an invalid expression!", expression);
        }
        sReader.Close();
      }
      Console.Write("\nDo you wanna try again? (y\\n) ");
    }
    while(Console.ReadKey().KeyChar == 'y');
  }
}

Screenshots
See screenshots of Fortran numerical constants recognizer in action:

Fortran Numerical Constants Recognizer - Single Expression

Fortran Numerical Constants Recognizer - Set Of Expressions

You can get a PDF copy of the paper and the accompanying Fortran Numerical Constants Recognizer files in a ZIP file at:

English version
http://leniel.googlepages.com/FortranNumericalConstantsRecognizer.pdf

Portuguese version
http://leniel.googlepages.com/SisReconhecedorConstNumericasFortran.pdf

Microsoft Visual Studio C# 2008 Console Application project
http://leniel.googlepages.com/FortranNumericalConstantsRecognizer.zip

Note: The program’s source code is in a Microsoft Visual Studio C# 2008 Console Application project. If you don’t have Visual Studio, you can get a free copy of its express edition at http://www.microsoft.com/express/download.

Source code Indenter-Capitalizer

Posted by Leniel Maccaferri on 5/01/2008 06:17:00 PM

Following the coursework related to the Compilers Construction discipline I attended during the Computer Engineering course, I was asked to indent and capitalize the reserved words (keywords) of a source code file. More specifically, to do this work I should use the Syntactic Analyzer built with Flex and YACC that was created in a previous coursework task.

The source code file in question is the one being analyzed by the syntactic analyzer. This way at the same time it analyses the syntactic structure of the file it also indents and capitalizes the keywords.

The following is the content of the syntactic analyzer file named sinan.yacc:

%{
   #include <stdio.h>
   #include <stdlib.h>

   int c;
%}

%token PROGRAM
%token ID
%token SEMIC
%token DOT
%token VAR
%token COLON
%token INTEGER
%token REAL
%token RCONS
%token BOOL
%token OCBRA
%token CCBRA
%token IF
%token THEN
%token ELSE
%token WHILE
%token DO
%token READ
%token OPAR
%token CPAR
%token WRITE
%token COMMA
%token STRING
%token ATRIB
%token RELOP
%token ADDOP
%token MULTOP
%token NEGOP
%token CONS
%token TRUE
%token FALSE

%%

prog : PROGRAM {printf("PROGRAM "); c = 0;}

       ID {printf("%s", yytext);}

       SEMIC {printf(";\n\n");}

       decls

       compcmd

       DOT {printf(".");}
       {
         printf("\n Syntactical Analisys done without erros!\n");

         return 0;
       }
     ;

decls : VAR {printf("VAR ");} decl_list

      ;

decl_list : decl_list decl_type
            decl_type
          ;

decl_type : id_list COLON {printf(":");} type SEMIC {printf(";\n");}
          ;

id_list : id_list COMMA {printf(", ");} ID {printf("%s", yytext);}
          ID {printf("%s", yytext);}
        ;

type : INTEGER {printf(" INTEGER");}
       REAL {printf(" REAL");}
       BOOL {printf(" BOOL");}
     ;

compcmd : OCBRA {int i; for(i = 0; i < c; i++)printf(" "); printf("{\n"); c = c + 2;} cmd_list CCBRA {printf("\n"); int i; for(i = 2; i < c; i++)printf(" "); printf("}"); c = c - 2;}
        ;

cmd_list : cmd_list SEMIC {printf(";\n\n");} cmd
           cmd
         ;

cmd : {int i; for(i = 0; i < c; i++)printf(" ");} If_cmd
      {int i; for(i = 0; i < c; i++)printf(" ");} While_cmd
      {int i; for(i = 0; i < c; i++)printf(" ");} Read_cmd
      {int i; for(i = 0; i < c; i++)printf(" ");} Write_cmd
      {int i; for(i = 0; i < c; i++)printf(" ");} Atrib_cmd
      compcmd
    ;

If_cmd : IF {printf("IF ");} exp THEN {printf(" THEN\n");} cmd Else_cmd
       ;

Else_cmd : ELSE {printf("\n"); int i; for(i = 0; i < c; i++)printf(" "); printf("ELSE\n"); c = c + 2;} cmd {c = c - 2;}

         ;

While_cmd : WHILE {printf("WHILE ");} exp DO {printf(" DO\n");} cmd
          ;

Read_cmd : READ {printf("READ");} OPAR {printf("(");} id_list CPAR {printf(")");}
         ;

Write_cmd : WRITE {printf("WRITE");} OPAR {printf("(");} w_list CPAR {printf(")");}
          ;

w_list : w_list COMMA {printf(", ");} w_elem
         w_elem
       ;
w_elem : exp
         STRING {printf("%s", yytext);}
       ;

Atrib_cmd : ID {printf("%s ", yytext);} ATRIB {printf("= ");} exp
          ;

exp : simple_exp
      simple_exp RELOP {printf(" %s ", yytext);} simple_exp
    ;

simple_exp : simple_exp ADDOP {printf(" %s ", yytext);} term
             term
           ;

term : term MULTOP {printf(" %s ", yytext);} fac
       fac
     ;

fac : fac NEGOP {printf(" %s", yytext);}
      CONS {printf(" %s", yytext);}
      RCONS {printf(" %s", yytext);}
      OPAR exp CPAR
      TRUE {printf("TRUE ");}
      FALSE {printf("FALSE ");}
      ID {printf("%s", yytext);}
    ;


%%

#include "lex.yy.c"

As you can see there is an integer variable named c right beneath the #include section. This variable is incremented and decremented according to the section of code being analyzed (parsed). Such variable is then used inside the for loops. According to its current value blank spaces are written on the screen so that the next token parsed is printed on the right position (indented).

Each keyword parsed is capitalized and printed on the screen through a simple printf command.

Let's run a simple test case with the following source code file named test.txt. The code is intentionally not indented and it doesn't do much. It's just for a testing purpose.

program factorial;

var n, fact, i: integer;
{
read(n);

fact = 1;

i = 1;

while i <= n do
{
fact = fact * i;

i = i + 1
};

if i >= fact then
{
i = fact;

fact = fact + 1
}
else
i = fact + 1;



if i < fact then
{
i = fact;

fact = fact + 1
};

write("The factorial of ", n, " is: ", fact)
}.

In blue are the keywords, so the output should present them in CAPITALIZED letters. The blocks of code should also be presented with indentation according to the logic specified above. I use 2 white spaces to indent the blocks of code. That's why I use c = c + 2 (to increment) and c = c - 2 (to decrement) the c variable is responsible for controlling the indentation.

To run the test case it's necessary to build the syntactic analyzer. I won't show here all the steps involved since it's already done in the paper described in the post Syntactic Analyzer built with Flex and YACC. So I'll just compile the sinan.yacc file since it's the only file that was modified to accomplish this task. The other necessary files to generate the executable file - such as the lexical analyzer ones - are included in the download package at the end of this post.

For a better understanding of the commands used in this post and to set up the development environment I recommend that you read at least section 3.3 of the paper Syntactic Analyzer built with Flex and YACC. That said, let's do the job. :-)

To run this test case, follow the steps listed bellow:

Create a folder called IndentCapCode on the root directory C:\, which will contain all the files created henceforward.
Open a command prompt and type: path=C:\MinGW\bin;%PATH%. I consider that the MinGW installation was done in the folder C:\MinGW. Change it accordingly. After completing this step the tools to build the syntactic analyzer will be available in the command prompt.
Generate the file y.tab.c in the command prompt with following command: yacc sinan.yacc.
Compile the files with GCC in the command prompt with the following command: gcc y.tab.c yyerror.c main.c -osinan -lfl.

The result file for the syntactic analyzer is sinan.exe. To use it just type sinan < test.txt. The file test.txt contains the source code to be analyzed by the syntactic analyzer.

You can see the commands listed above in action through the following screenshot:

In a next post related to compilers construction I'll show the implementation of a semantic analyzer. This implementation was another coursework task. I used the same principle shown here to indent the code and even to colorize the keywords. But this is for another day.

You can get the .ZIP package that contains the files used in this post: http://leniel.googlepages.com/CodeIndenterCapitalizerFlexYACC.zip

Note: if you want, you can use a batch file (.bat) to automate the steps listed above. A .bat file named ini.bat is included in the .ZIP package. For more information about batch files, read my previous post Programming-Constructing a Batch file.

Syntactic Analyzer built with Flex and YACC

Posted by Leniel Maccaferri on 4/21/2008 07:29:00 AM

Compilers construction paper
I and a dear brother in faith of mine called Wellington Magalhães Leite wrote a paper titled: A Syntactic Analyzer built with the CASE tools Flex and YACC

See the paper's abstract below:

The function of a syntactic analyzer in a compiler is to verify the syntactic structure of a program’s source code. It then detects, signalize and handle the syntactic errors found in the source code and at the same time servers as the framework for the front-end (user interface) of the program. So, its construction helps with the familiarization regarding the tasks included in the compiler project.

The language used in this paper does not have left recursion. The language neither presents subprograms, nor indexed variables and its only loop command is while for the sake of simplicity. The syntactic analyzer implemented uses the bottom-up parsing approach.

This paper presents the steps to the construction of a syntactic analyzer, which serves as a base element for a compiler implementation.

Keywords: syntactic analyzer, syntactical analysis, compiler construction, case tools, flex, yacc

CONTENTS
1 INTRODUCTION 7
  1.1 Objective 7
  1.2 Definition 7
      1.2.1 Grammar 7
            1.2.1.1 Grammar productions 8
            1.2.1.2 Lexical specifications 8
            1.2.1.3 Reserved words or keywords 10
            1.2.1.4 Operator types and attributes 11
            1.2.1.5 Separator types 11
2 DEVELOPMENT 13
  2.1 Lexical analysis 13
      2.1.1 Sample source code of a factorial program 13
      2.1.2 Flex 16
  2.2 Syntactical analysis 16
      2.2.1 Sample syntactic tree 16
      2.2.2 YACC 20
3 APPLICATION 21
  3.1 Constructing the file for the lexical analysis (lexan.lex) 21
  3.2 Constructing the file for the syntactic analysis (sinan.yacc) 21
  3.3 Guide to implementation 24
  3.4 Using a batch file to avoid boilerplate work 28
4 CONCLUSION 30
5 REFERENCES 31
6 ADDENDUM 32

See a screenshot of the syntactic analyzer in action:

SyntacticAnalyzerFlexYACCSyntaxError