联系方式

  • QQ:99515681
  • 邮箱:99515681@qq.com
  • 工作时间:8:00-21:00
  • 微信:codinghelp

您当前位置:首页 >> Java编程Java编程

日期:2024-01-07 06:33

LW1

DA53: COMPILERS AND LANGUAGE

THEORY

Lexical Analysis

by hand

00 | Goal

The goal of this on-computer tutorial session is to write a small lexical analyzer (or lexer) by hand.

The development could be done in Java, C# or Python.

Remember that the lexical analyzer takes a source program, and scans it, and replies the recognized

tokens when they are requested by the parser.

Definitions :

• Lexeme : Sequence of characters that is recognized by the parser (usually expressed by

regular expression)

• Token : Symbolic name of a lexeme

During this on-computer tutorial, we must :

Define the language of the source program, its syntax and its semantic.

Write a symbol table

Write the lexer

01 | Language Description

The language to implement in this Labwork (LW) is a part of the Tiny Basic dialect of the BASIC

language.

01.1 Statements

The language is composed of statements, one per line. If this ST we do not recognize

multiple statements on the same line. A line must follow one of the following formats :

statement

number statement

where number is the number of the line, and statement is the instruction of the language.

Lexical

Analyzer Parser

Symbol

Table

tokens

syntax

tree

source

program

getNextToken

The supported statements are :

PRINT expression

▪ Print the given expression on the standard output

IF expression relational_operator expression THEN statement

▪ The statement is executed only if the given relational operator on the two

given expressions replies true.

▪ The relational operators are :

= equality

< less than

> greater than

<= less or equal

>= greater or equal

<>, >< not equal

GOTO expression

▪ Run the statement at the line de)ned by the expression.

INPUTvariable [, variable ...]

▪ Read variable values from the standard input .

LET variable = expression

▪ Assign the value of the expression to the variable.

GOSUB expression

▪ Run the statement at the line de)ned by the expression, but in opposite

than GOTO, remember the line of the GOSUB and run the statement just

after that line when the statement RETURN is encountered.

RETURN

▪ Run the statement at the line following the last GOSUB.

END

▪ Stop the program.

REM text

▪ Insert a comment in the source code.

01.2 Expressions

Many of these statement types involve the use of expressions. An expression is the

combination of one or more numbers, strings or variables, joined by operators, and

possibly grouped by parentheses.

There are four operators: +, -, *, /

Note that the operators + and – have two versions : the binary (a+b) and the unary (+a).

A string of characters is enclosed by the quote characters (").

01.3 Variable Definition and Scope

The variables do not need to be declared. The scope of all the variables is global. So that

a variable, when it is used, could be undened.

All the variables are of numerical type or of string type, depending on the type of the

expression assigned to the variable.

01.4 Example

Let the following Tiny Basic program :

5 LET S = 0

10 INPUT NUM

15 INPUT V

20 LET N = NUM

30 IF N <= 0 THEN GOTO 99

40 LET S = S + V

50 LET N = N - 1

70 GOTO 30

99 PRINT S/NUM

100 END

Your lexer must reply the following tokens, in that order :

<NUM,5> <LET> <ID,S> <=> <NUM,0>

<NUM,10> <INPUT> <ID,NUM>

<NUM,15> <INPUT> <ID,V>

<NUM,20> <LET> <ID,N> <=> <ID,NUM>

<NUM,30> <IF> <ID,N> <RELOP,<=> <NUM,0> <THEN> <GOTO> <NUM,99>

<NUM,40> <LET> <ID,S> <=> <ID,S> <OP,+> <ID,V>

<NUM,50> <LET> <ID,N> <=> <ID,N> <OP,-> <NUM,1>

<NUM,70> <GOTO> <NUM,30>

<NUM,99> <PRINT> <ID,S> <OP,/> <ID,NUM>

<NUM,100> <END>

02 | Exercices

02.1 Exercice 1

 What is the alphabet of the language ?

 What are the lexemes (as regular expressions) of the language ?

 Write the table that is matching the lexemes, the tokens, and the attributes of the tokens.

02.2 Symbol Table

 Create the class SymbolTable that permits to store several informations about the variables :

name, lexeme, line of the first occurrence, etc.

 This symbol table is a kind of map with the name of the variable as key and the description of

the variable as value.

02.2 Tokens

 Define the hierarchy of classes that is describing all the tokens of the language.

 Note that the token ID should point to an entry of the symbol table.

02.3 Scanner

 Create the Scanner class that takes an input stream (Reader is prefered in Java) as

parameter of its constructor.

 The scanner is able to read the characters from the stream, to compact white spaces when

possible.

 The scanner is able to reply the line in the source program at which the read character is

located.

 We recommend to provides to functions : char peek(), and char get(). The first replies the next

available character but does not consume it. The second replies the next available character

and consumes it to pass to the next available character.

02.4 Lexer

 Create the Lexer class and a function inside that is able to read a character from an input

stream.

 Create a Finite Automata (FA) or an ad-hoc (from-scratch) code that is recognizing the

language. If you are encountering problems to do the FA, you may take the FA at the end of

this document.

 Implements the FA or the ad-hoc in the function getNextSymbol() of your Lexer, and it should

use the Scanner.

 Remember that the Lexer should not reply a token for the comment statements.

02.5 Main Program

 Write the main program that creates the Symbol Table, and the Lexer on an input file.

 The main program invokes the getNextSymbol() function of the Lexer to retreive the tokens.

 The tokens are displayed on the standard output.

03 | Finite Automata

A Finite Automata (explained in Chapter 2) is able to recognized one or more tokens. When a state is

drawn with two circles, it means that a token is recognized and the Lexer could return it. If a star is

drawn on the side of a two-circle state, it means that it does not consume the character from the input

stream that has permits to reach this star-two-circle state.

03.1 Alphabet

The alphabet of the language is :

 0, 1, 2, 3, 4, 5, 6, 7, 8, 9

 +, -, /, *

 a, b, c, d, …., x, y, z

 A, B, C, D, ….., X, Y, Z

 ., <, >, =, (, ), "

03.2 Finite Automata for Token ID

1 2 3 start letter other *

letter or digit

03.3 Finite Automata for Token RELOP

03.4 Finite Automata for Token NUM


LW2 | LW3

DA53: COMPILERS AND LANGUAGE

THEORY

Lexical Analysis

and

Syntax Analysis

00 | Goal

The goal of this on-computer tutorial session is to write a small interpreter for the Tiny Basic language.

The development must be done in Java.

JavaCC (https://javacc.github.io/javacc/) should be used. This library is also available on Teams.

You must download and complete a skeleton of Java project, downloadable from Teams.

Remember that the lexical analyzer takes a source program, and scans it, and replies the recognized

tokens when they are requested by the parser. The parser generates a syntax tree that can be

executed (interpretation mode).

During this on-computer tutorial, you will :

1. Define the grammar and the Syntax-Directed Definition (SDD) of the Tiny Basic language.

2. Write a symbol table

3. Write the lexer

4. Write the interpretation rules

Lexical

Analyzer Parser

Symbol

Table

tokens

syntax

tree

source

program

getNextToken

01 | Language Definition

The language to implement in this labwork is a part of the Tiny Basic dialect of the BASIC language.

01.1 Statements

The language is composed of statements, one per line. If this labwork we do not

recognize multiple statements on the same line. A line must follow one of the following

formats :

statement

number statement

where number is the number of the line, and statement is the instruction of the language.

The supported statements are :

PRINT expression

▪ Print the given expression on the standard output

IF expression relational_operator expression THEN statement

▪ The statement is executed only if the given relational operator on the two

given expressions replies true.

▪ The relational operators are :

= equality

< less than

> greater than

<= less or equal

>= greater or equal

<>, >< not equal

GOTO expression

▪ Run the statement at the line de(ned by the expression.

INPUTvariable [, variable ...]

▪ Read variable values from the standard input .

LET variable = expression

▪ Assign the value of the expression to the variable.

GOSUB expression

▪ Run the statement at the line de(ned by the expression, but in opposite

than GOTO, remember the line of the GOSUB and run the statement just

after that line when the statement RETURN is encountered.

RETURN

▪ Run the statement at the line following the last GOSUB.

END

▪ Stop the program.

REM text

▪ Insert a comment in the source code.

01.2 Expressions

Many of these statement types involve the use of expressions. An expression is the

combination of one or more numbers, strings or variables, joined by operators, and

possibly grouped by parentheses.

There are four operators: +, -, *, /

Note that the operators + and – have two versions : the binary (a+b) and the unary (+a).

A string of characters is enclosed by the quote characters (").

01.3 Variable Definition and Scope

The variables do not need to be declared. The scope of all the variables is global. So that

a variable, when it is used, could be undened.

All the variables are of numerical type or of string type, depending on the type of the

expression assigned to the variable.

01.4 Example

Let the following Tiny Basic program :

5 LET S = 0

10 INPUT NUM

15 INPUT V

20 LET N = NUM

30 IF N <= 0 THEN GOTO 99

40 LET S = S + V

50 LET N = N - 1

70 GOTO 30

99 PRINT S/NUM

100 END

02 | JavaCC

JavaCC is a Java library and a collection of CLI tools that is generating a lexer and a parser

written in Java.

02.1 Input file

The input file of JavaCC is a file with extension « .jj », named here «tinybasic.jj ».

The input file must start with the Java definition of the Parser (named TinyBasicParser). You may

put in this part all the functions and attributes that are invoked in the grammar rules.

After the Java definition, you must put the lexer rules and the grammar (BNF) rules.

options {

IGNORE_CASE = true;

STATIC = false;

}

PARSER_BEGIN(TinyBasicParser)

package fr.utbm.gi.lo46.tp2.parser;

import java.util.SortedMap;

import java.util.Map;

import java.util.List;

import java.util.TreeMap;

import java.util.ArrayList;

import fr.utbm.gi.lo46.tp2.context.*;

import fr.utbm.gi.lo46.tp2.symbol.*;

import fr.utbm.gi.lo46.tp2.syntaxtree.*;

public class TinyBasicParser {

private int basicLineNumber = 1;

private final SymbolTable symbolTable = new SymbolTable();

/** Replies the symbol table used by the parser.

* @return the symbol table.

*/

public SymbolTable getSymbolTable() {

return this.symbolTable;

}

}

PARSER_END(TinyBasicParser)

02.2 Lexer rule syntax

 Each lexer rule defines the lexeme that is recognized by the lexer. The recognized lexemes

are named tokens.

 Different types of lexer rules are available : SKIP, TOKEN, …

 The best practice in JavaCC is to define tokens only when they are associated to complex

regular expressions, ie. then an regular expression operator is used. For example, the lexeme

« <= » may not be defined in a token rule. Rather is should be directed typed in the BNF rules.

 The SKIP rule tells to the lexer to ignore several sequences of characters :

SKIP :

{

" "

| "\t"

| "\r"

}

 The TOKEN rule tells to the lexer the token to reply to the syntax analyzer :

TOKEN :

{

< CR: "\n" >

}

02.3 Parser rule syntax

 The grammar (BNF) rules defines the syntactic rules of the language.

 Each rule is defined like a kind of Java function (with a similar but different syntax) :

void ruleName() :

{

}

{

}

 The prototype of the rule has the same syntax as the prototype of a Java function. Return type

and parameters are possible. Indeed, a rule could return a value or take values as parameters

when it is invoked.

 The first block after the column character must contains any declaration of Java variable.

 The second block after the column character must contains the BNF rules (separated by the

« | » character).

 The BNF rule is a sequence of one of :

1. a token

• the value of the token could be retreived with an assign symbol (see

example below).

2. a BNF rule invokation

• it is a call to one of the BNF-rule functions, using the Java syntax.

• the value returned by the BNF-rule could be retreived with an assign symbol (see

example below).

• Parameters could be passed to the BNF-rules function.

3. a Java block

The following example is rule that is recognizing a parenthezed numerical expression, a

number literal, or an identifier.

private Number factor() :

{

Number value;

Token t;

}

{

"(" value = expression() ")"

{ return value; }

| t = <NUMBER>

{ return NumberUtil.parse(t.image); }

| t = <IDENTIFIER>

{ this.symbolTable.add(t.image,t.beginLine);

return this.symbolTable.get(t.image);

}

}

02.4 Run JavaCC

 On the CLI, type :

$> cd src/fr/utm/gi/lo46/tp2/parser

$> javacc tinybasic.jj

02 | Working Steps

02.1 Skeleton

 Download the skeleton, and install it into your Eclipse.

 The skeleton contains : a symbol table (SymbolTable), a context of interpretation

(ExecutionContext), an interpreter (Interpreter, and SyntaxTreeInterpreter), the main function

(TinyBasicInterpreter), and several abstract classes that correspond to the tree nodes of the

syntax tree.

 Takes a look on the code.

02.2 JavaCC Scanner and Lexer Definition

 Write the JavaCC input file that is corresponding to the Lexer (the token).

02.3 Complete the JavaCC Definition

 Write the grammar inside your JavaCC input file (the BNF rules with the Java blocks empty).

02.4 Parse Tree Node

 Create classes for all the nodes that are required in a parse tree dedicated to the Tiny Basic

language.

 Each node of the parse tree must corresponds to an element of the language.

 Each statement node should be linked to child nodes that are corresponding to the

parameters of the statement.

02.5 Write the SDD in JavaCC

 Update your JavaCC input file with the SDD rules that permit to :

 check the types

 generate the parse tree for the Tiny Basic program.

02.6 Interpretation of the Parse Tree

 Add in each node of your parse tree, the function run(...) and evaluate(...) which may be

invoked to run the parse tree as in an interpreter, and to compute the value of an expression.

02.7 Extension of the Language

 Update your compiler implementation to support the new statements :

 Statement : for id = num to num step num statements next id

 Arrays as id ( expression )

 The new keyword that is replying the size of an array : LEN ( id )

 Statement : while ( expression relop expression ) do statements wend

LW4

DA53: COMPILERS AND LANGUAGE

THEORY

Intermediate

Code

Generation

00 | Goal

The goal of this on-computer tutorial session is to write a small compiler that is able to compile a Tiny

Basic program and generate three-address code program.

The development must be done in Java.

You must reuse your works from the previous on-computer tutorial sessions.

01 | Working Steps

01.1 Three-Address Code

 Select the three-address instructions to support and put them in the ThreeAddressInstruction

enumeration.

 Write the class ThreeAddressRecord. It is a record that contains the three-address instruction,

the parameters (String), and the result (String). A record may also be associated to a label,

and optionally to a comment. Note that during this tutorial, you should not compute the

addresses of the instructions and of the variables (this computation will be done during

the tutorial TU5).

 Write the class ThreeAddressCode. It is a list of three-address records in one hand. In the

other hand, it should permit to :

1. create temporary variables ;

2. maintain a mapping between a TinyBasic line and a three-address code line ;

3. access to the symbol table ;

4. create labels.

01.2 Three-Address Code Generation

 Update classes of the syntax tree with the generate() function. This function put in the given

ThreeAddressCode all the three-address statements that may corresponds to the syntax tree

node.

 Note that the gosub and goto statements take a TinyBasic line number. You must,

dynamically, retreive the three-address code line that is corresponding to a given TinyBasic

line. This mapping is a part of the generated code by the ThreeAddressCode class.

 Note that the symbol table may contains string literals (constants). You may define the threeaddress variables that are corresponding to these constants prior to any execution of the

program.


版权所有:编程辅导网 2021 All Rights Reserved 联系方式:QQ:99515681 微信:codinghelp 电子信箱:99515681@qq.com
免责声明:本站部分内容从网络整理而来,只供参考!如有版权问题可联系本站删除。 站长地图

python代写
微信客服:codinghelp