Table of Contents

Tutorial on semantic analysis

Step 1: Preparation

Step 2: Symbol table

The symbol table will be implemented by a simple C++ map. The keys will be variable names, mapping to the variable type and the line number of the declaration.

enum type { integer, boolean };
std::map<std::string,var_data> symbol_table;

Step 3: Passing variable names from the lexer to the parser

In order to be able to insert variable names and the corresponding data into the symbol table, the parser needs the names of the variables. We will ask this information from the lexer and bind it to each terminal symbol representing identifiers.

In bisonc++, every terminal and non-terminal can have an associated semantic value. (Remember semantic values or attributes from the lectures!) Since every symbol can have a semantic value of different type, we need to create a union showing the different possibilities. Our union will first have only one option, as we want to associate the names of variables to the identifier terminals. bisonc++ is having its own syntax for creating this union, and it will be turned into a real C++ union when running the bisonc++ command.

%union
{
  std::string *name;
}
%token <name> TOKEN_ID
if( ret == TOKEN_ID )
{
  d_val__.name = new std::string(lexer.YYText());
}

Some explanation: The YYText() function of the lexer returns the name of the current token. We create a C++ string from it. The d_val__ attribute of the parser can be used to assign semantic values to the current token. Its type is the union that we have defined in the while.y file. We load the name of the identifier into the name option of this union.

Now we are able to access the variable names in the C++ code fragments following the grammar rules. If you have a rule a: A b C, then the semantic values of A, b and C are $1, $2 and $3 respectively.

declaration:
    TOKEN_INTEGER TOKEN_ID TOKEN_SEMICOLON
    {
        std::cout << "declared variable: " << *$2 << std::endl;
    }

Step 4: Finding re-declared and undeclared variables

Instead of writing the names of declared variables to the standard output, now we insert them in the symbol table. The C++ map datatype has the [] operator, which can be used to insert or retrieve elements.

    TOKEN_INTEGER TOKEN_ID TOKEN_SEMICOLON
    {
        symbol_table[*$2] = var_data( d_loc__.first_line, integer );
    }
  if( symbol_table.count(*$2) > 0 )
  {
    std::stringstream ss;
    ss << "Re-declared variable: " << *$2 << ".\n"
    << "Line of previous declaration: " << symbol_table[*$2].decl_row << std::endl;
    error( ss.str().c_str() );
  }

Explanation: We use here the stringstream type to collect the parts of the error message. In order to make this work, you have to include the <sstream> standard header in semantics.h. We can ask for the string, collected in the stringstream-typed ss, using the str() member function. Since the error function (see in Parser.ih) accepts a C-style character array instead of a C++ string, we have to use c_str() to convert.

Step 5: Type checking

In order to do type checking, we need semantic values for expressions. These will be of type type (the enumeration declared in semantics.h).

type *expr_type;
%type <expr_type> expression
if(*$1 != integer || *$3 != integer)
{
   std::stringstream ss;
   ss << d_loc__.first_line << ": Type error in addition." << std::endl;
   error( ss.str().c_str() );
}
$$ = new type(integer);

Step 6: Fixing the memory leaks

All the semantic values were created using the new keyword of C++. This means we have to delete them in order to avoid memory leakage.

delete $i;