User Tools

Site Tools


semantic_tutorial

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revision Previous revision
Next revision
Previous revision
Last revision Both sides next revision
semantic_tutorial [2015/11/09 20:47]
deva [Step 4: Finding re-declared and undeclared variables]
semantic_tutorial [2016/12/01 17:07]
deva
Line 3: Line 3:
 ==== Step 1: Preparation === ==== Step 1: Preparation ===
  
-  * In this tutorial we use the [[http://​deva.web.elte.hu/​compilers/​lab/​while-language.html|While language]]. It is possible to use your own lexer and parser. If you have not completed it, use [[http://​deva.web.elte.hu/​compilers/lab/​parser.zip|this implementation]].+  * In this tutorial we use the [[http://​deva.web.elte.hu/​compilers/​lab/​while-language.html|While language]]. It is possible to use your own lexer and parser. If you have not completed it, use [[http://​deva.web.elte.hu/​compilers/​parser.zip|this implementation]].
   * If you are working not with your own parser, please take the time to understand the grammar.   * If you are working not with your own parser, please take the time to understand the grammar.
   * Look into the ''​Parser.ih''​ file.   * Look into the ''​Parser.ih''​ file.
     * The function called ''​lex''​ asks the current line number from the parser and saves it into a field of the ''​d_loc<​nowiki>​__</​nowiki>''​ attribute of the ''​Parser''​ class. Then it asks for the next token from the lexer and returns it to the parser.     * The function called ''​lex''​ asks the current line number from the parser and saves it into a field of the ''​d_loc<​nowiki>​__</​nowiki>''​ attribute of the ''​Parser''​ class. Then it asks for the next token from the lexer and returns it to the parser.
     * The line number information is used in the ''​error''​ function to print location information for error messages.     * The line number information is used in the ''​error''​ function to print location information for error messages.
-  * Make sure you can compile the parser and it accepts all the correct [[http://​deva.web.elte.hu/​compilers/lab/​while-tests.zip|test files]] and rejects the lexical error ones. Try out the semantic error test cases! They are accepted. The goal of the tutorial is to find those errors as well.+  * Make sure you can compile the parser and it accepts all the correct [[http://​deva.web.elte.hu/​compilers/​while-tests.zip|test files]] and rejects the lexical error ones. Try out the semantic error test cases! They are accepted. The goal of the tutorial is to find those errors as well.
  
 ==== Step 2: Symbol table === ==== Step 2: Symbol table ===
Line 18: Line 18:
   * Create an enumeration type to represent the two types of the //While// language   * Create an enumeration type to represent the two types of the //While// language
 <​code>​ <​code>​
-enum type { natural, boolean };+enum type { integer, boolean };
 </​code>​ </​code>​
   * Create a class called ''​var_data''​ to represent the data, that variable names will be mapped to. Its two attributes:   * Create a class called ''​var_data''​ to represent the data, that variable names will be mapped to. Its two attributes:
Line 80: Line 80:
     TOKEN_INTEGER TOKEN_ID TOKEN_SEMICOLON     TOKEN_INTEGER TOKEN_ID TOKEN_SEMICOLON
     {     {
-        symbol_table[*$2] = var_data( d_loc__.first_line, ​natural ​);+        symbol_table[*$2] = var_data( d_loc__.first_line, ​integer ​);
     }     }
 </​code>​ </​code>​
  
-  * Before the insertion operation shown above, check if the variable has already been declared. (The ''​count''​ function of the ''​map''​ type returns the number of the given element in the map -- ''​0''​ or ''​1''​ in our case.)+  * Before the insertion operation shown above, check if the variable has already been declared. (The ''​count''​ function of the ''​map''​ type returns the number of the given element in the map''​0''​ or ''​1''​ in our case.)
 <​code>​ <​code>​
   if( symbol_table.count(*$2) > 0 )   if( symbol_table.count(*$2) > 0 )
Line 93: Line 93:
     error( ss.str().c_str() );     error( ss.str().c_str() );
   }   }
 +</​code>​
 +
 +Explanation:​ We use here the ''​stringstream''​ type to collect the parts of the error message. In order to make this work, you have to include the ''<​sstream>''​ standard header in ''​semantics.h''​. We can ask for the string, collected in the ''​stringstream''​-typed ''​ss'',​ using the ''​str()''​ member function. Since the ''​error''​ function (see in ''​Parser.ih''​) accepts a C-style character array instead of a C++ string, we have to use ''​c_str()''​ to convert.
 +
 +  * Implement the C++ code fragment of the rule for logical variable declarations in a similar manner. Do not forget to use the ''​boolean''​ type when inserting those variables into the symbol_table.
 +  * Find or create a test file with re-declared variables and check if your compiler can really find the error.
 +  * Find all other uses of the //​identifier//​ terminal in the grammar (assignment,​ expression),​ and implement a check for undeclared variables in the corresponding C++ code fragments. Test your solution with test files containing undeclared variables in assignments and expressions.
 +
 +==== Step 5: Type checking ====
 +
 +In order to do type checking, we need semantic values for expressions. These will be of type ''​type''​ (the enumeration declared in ''​semantics.h''​).
 +
 +  * Add a new option to the //union// in ''​while.y'':​
 +<​code>​
 +type *expr_type;
 +</​code>​
 +  * Find out how is the non-terminal symbol for expressions is called in your grammar. (It is ''​expression''​ in the parser downloadable at the beginning of this tutorial, but it might have a different name in your own solution.) Since this is a non-terminal,​ you can define the type of its semantic value as follows. (Insert it near the token declarations at the beginning of ''​while.y''​.)
 +<​code>​
 +%type <​expr_type>​ expression
 +</​code>​
 +  * For each //​expression rule//, do the necessary type checking and set the semantic value of the left-hand side of the rule. The semantic value of the left-hand side can be referenced by the ''​$$''​ symbol.
 +    * For example, the rule of number literals will have the following code: ''​$$ = new type(integer);''​
 +    * In case of the rule of a single variable name, we have to ask the type of the variable from the symbol table. (Note, that we have earlier checked in this rule whether the variable was in the symbol table, so after this check we are safe to retrieve the variable data.) ''​$$ = new type(symboltable[*$1].var_type);''​
 +    * In case of expressions build using operators, there is need for type checking the arguments. For example, the rule for addition will have this C++ code:
 +<​code>​
 +if(*$1 != integer || *$3 != integer)
 +{
 +   ​std::​stringstream ss;
 +   ss << d_loc__.first_line << ": Type error in addition."​ << std::endl;
 +   ​error( ss.str().c_str() );
 +}
 +$$ = new type(integer);​
 +</​code>​
 +  * Type check assignments:​ Emit an error, if the left and right hand sides are of different types.
 +  * Test your solution: It should give error messages for all semantic error test cases.
 +
 +==== Step 6: Fixing the memory leaks ====
 +
 +All the semantic values were created using the ''​new''​ keyword of C++. This means we have to delete them in order to avoid memory leakage.
 +  * Review all the rules and delete the semantic values of the symbols on the right-hand side at the end of the corresponding C++ code fragments (change ''​i''​ to the appropriate number):
 +<​code>​
 +delete $i;
 </​code>​ </​code>​
  
semantic_tutorial.txt · Last modified: 2016/12/08 13:25 by deva