Dévai Gergely @ ELTE

This is an old revision of the document!

A PCRE internal error occured. This might be caused by a faulty plugin

===== Tutorial on semantic analysis ==== ==== Step 1: Preparation === * In this tutorial we use the [[http://deva.web.elte.hu/compilers/lab/while-language.html|While language]]. It is possible to use your own lexer and parser. If you have not completed it, use [[http://deva.web.elte.hu/compilers/lab/parser.zip|this implementation]]. * If you are working not with your own parser, please take the time to understand the grammar. * Look into the ''Parser.ih'' file. * The function called ''lex'' asks the current line number from the parser and saves it into a field of the ''d_loc<nowiki>__</nowiki>'' attribute of the ''Parser'' class. Then it asks for the next token from the lexer and returns it to the parser. * The line number information is used in the ''error'' function to print location information for error messages. * Make sure you can compile the parser and it accepts all the correct [[http://deva.web.elte.hu/compilers/lab/while-tests.zip|test files]] and rejects the lexical error ones. Try out the semantic error test cases! They are accepted. The goal of the tutorial is to find those errors as well. ==== Step 2: Symbol table === The symbol table will be implemented by a simple C++ map. The keys will be variable names, mapping to the variable type and the line number of the declaration. * Create a new header file, called ''semantics.h''. * Include the standard headers ''iostream'', ''string'' and ''map''. * Create an enumeration type to represent the two types of the //While// language <code> enum type { natural, boolean }; </code> * Create a class called ''var_data'' to represent the data, that variable names will be mapped to. Its two attributes: * ''decl_row'': to store the line number of the declaration. * ''var_type'': the type of the variable. * Create two constructors for the ''var_data'' class: * With two parameters, initializing both attributes. * With no parameters and empty body. This will be needed if we want to put objects of this class into a map. * In ''Parser.h'', add the symbol table as a private attribute to the ''Parser'' class: <code> std::map<std::string,var_data> symbol_table; </code> * In the ''while.y'' file, in the ''%baseclass-preinclude'' option, change ''<iostream>'' to ''"semantics.h"'', in order to make the just created header file part of the project. * Compile the project and fix the compilation errors, if any. ==== Step 3: Passing variable names from the lexer to the parser ==== In order to be able to insert variable names and the corresponding data into the symbol table, the parser needs the names of the variables. We will ask this information from the lexer and bind it to each terminal symbol representing identifiers. In //bisonc++//, every terminal and non-terminal can have an associated //semantic value//. (Remember //semantic values// or //attributes// from the lectures!) Since every symbol can have a semantic value of different type, we need to create a //union// showing the different possibilities. Our union will first have only one option, as we want to associate the names of variables to the //identifier// terminals. //bisonc++// is having its own syntax for creating this union, and it will be turned into a real C++ union when running the //bisonc++// command. * In ''while.y'', before the first ''%%'' line, add the following: <code> %union { std::string *name; } </code> * The names of options in this union can be used to define the type of semantic value assigned to a given grammar symbol. Extend the declaration of the //identifier// non-terminal as follows (''TOKEN_ID'' is the terminal you use for variable names in your project): <code> %token <name> TOKEN_ID </code> * Now, //identifiers// can have semantic values of type ''string''. This value can be set in the ''lex'' function. Extend the ''lex'' function with the following code (before the ''return'' instruction): <code> if( ret == TOKEN_ID ) { d_val__.name = new std::string(lexer.YYText()); } </code> Some explanation: The ''YYText()'' function of the lexer returns the name of the current token. We create a C++ string from it. The ''d_val<nowiki>__</nowiki>'' attribute of the parser can be used to assign semantic values to the current token. Its type is the union that we have defined in the ''while.y'' file. We load the name of the identifier into the ''name'' option of this union. Now we are able to access the variable names in the C++ code fragments following the grammar rules. If you have a rule ''a: A b C'', then the semantic values of ''A'', ''b'' and ''C'' are ''$1'', ''$2'' and ''$3'' respectively. * Modify the grammar rules for declarations such that they write the name of the declared variable to the standard output. (Note that the actual rule might look a bit different from the following example.) <code> declaration: TOKEN_INTEGER TOKEN_ID TOKEN_SEMICOLON { std::cout << "declared variable: " << *$2 << std::endl; } </code> * Run your compiler a correct test file and check if the declared variable names appear in the output.

Dévai Gergely @ ELTE

Sidebar

Info

ELTE

Research

Dévai Gergely @ ELTE

User Tools

Site Tools

Sidebar

Info

ELTE

Research

Page Tools