Parser.ih
file.lex
asks the current line number from the parser and saves it into a field of the d_loc__
attribute of the Parser
class. Then it asks for the next token from the lexer and returns it to the parser.error
function to print location information for error messages.The symbol table will be implemented by a simple C++ map. The keys will be variable names, mapping to the variable type and the line number of the declaration.
semantics.h
.iostream
, string
and map
.enum type { integer, boolean };
var_data
to represent the data, that variable names will be mapped to. Its two attributes:decl_row
: to store the line number of the declaration.var_type
: the type of the variable.var_data
class:Parser.h
, add the symbol table as a private attribute to the Parser
class:std::map<std::string,var_data> symbol_table;
while.y
file, in the %baseclass-preinclude
option, change <iostream>
to “semantics.h”
, in order to make the just created header file part of the project.In order to be able to insert variable names and the corresponding data into the symbol table, the parser needs the names of the variables. We will ask this information from the lexer and bind it to each terminal symbol representing identifiers.
In bisonc++, every terminal and non-terminal can have an associated semantic value. (Remember semantic values or attributes from the lectures!) Since every symbol can have a semantic value of different type, we need to create a union showing the different possibilities. Our union will first have only one option, as we want to associate the names of variables to the identifier terminals. bisonc++ is having its own syntax for creating this union, and it will be turned into a real C++ union when running the bisonc++ command.
while.y
, before the first %%
line, add the following:%union { std::string *name; }
TOKEN_ID
is the terminal you use for variable names in your project): %token <name> TOKEN_ID
string
. This value can be set in the lex
function. Extend the lex
function with the following code (before the return
instruction):if( ret == TOKEN_ID ) { d_val__.name = new std::string(lexer.YYText()); }
Some explanation: The YYText()
function of the lexer returns the name of the current token. We create a C++ string from it. The d_val__
attribute of the parser can be used to assign semantic values to the current token. Its type is the union that we have defined in the while.y
file. We load the name of the identifier into the name
option of this union.
Now we are able to access the variable names in the C++ code fragments following the grammar rules. If you have a rule a: A b C
, then the semantic values of A
, b
and C
are $1
, $2
and $3
respectively.
declaration: TOKEN_INTEGER TOKEN_ID TOKEN_SEMICOLON { std::cout << "declared variable: " << *$2 << std::endl; }
Instead of writing the names of declared variables to the standard output, now we insert them in the symbol table. The C++ map
datatype has the []
operator, which can be used to insert or retrieve elements.
TOKEN_INTEGER TOKEN_ID TOKEN_SEMICOLON { symbol_table[*$2] = var_data( d_loc__.first_line, integer ); }
count
function of the map
type returns the number of the given element in the map; 0
or 1
in our case.)if( symbol_table.count(*$2) > 0 ) { std::stringstream ss; ss << "Re-declared variable: " << *$2 << ".\n" << "Line of previous declaration: " << symbol_table[*$2].decl_row << std::endl; error( ss.str().c_str() ); }
Explanation: We use here the stringstream
type to collect the parts of the error message. In order to make this work, you have to include the <sstream>
standard header in semantics.h
. We can ask for the string, collected in the stringstream
-typed ss
, using the str()
member function. Since the error
function (see in Parser.ih
) accepts a C-style character array instead of a C++ string, we have to use c_str()
to convert.
boolean
type when inserting those variables into the symbol_table.
In order to do type checking, we need semantic values for expressions. These will be of type type
(the enumeration declared in semantics.h
).
while.y
:type *expr_type;
expression
in the parser downloadable at the beginning of this tutorial, but it might have a different name in your own solution.) Since this is a non-terminal, you can define the type of its semantic value as follows. (Insert it near the token declarations at the beginning of while.y
.)%type <expr_type> expression
$$
symbol.$$ = new type(integer);
$$ = new type(symbol_table[*$1].var_type);
if(*$1 != integer || *$3 != integer) { std::stringstream ss; ss << d_loc__.first_line << ": Type error in addition." << std::endl; error( ss.str().c_str() ); } $$ = new type(integer);
All the semantic values were created using the new
keyword of C++. This means we have to delete them in order to avoid memory leakage.
i
to the appropriate number):delete $i;