Assignment 2 Description
Assignment 2 - Writing an Assembler
Weighting and Due Dates
? Marks for this assignment contribute 5% of the overall course
mark.
? Marks for functionality will be awarded automatically by the
web submission system.
? Due dates: Milestone - 11:55pm Tuesday of week 9, Final -
11:55pm Friday of week 9.
? Late penalties: For each part, the maximum mark awarded will
be reduced by 25% per day / part day late. If your mark is
greater than the maximum, it will be reduced to the maximum.
? Core Body of Knowledge (CBOK) Areas: abstraction, design,
hardware and software, data and information, and programming.
Project Description
In this assignment you will complete a variation of project 6 in the
nand2tetris course. A detailed description of Nand2Tetris Project 6
tailored to this course is shown below. In this assignment the
assembler will be written as two separate programs. The executable
program, parser, will read a Hack Assembly Language program from
standard input and produce an abstract syntax tree on standard
output. The executable program, translator, will read the abstract
syntax tree and assemble a machine code representation of the
original Hack Assembly Language program. The assembled code will be
formatted as sixteen zeros or ones per line and it will be written to
standard output.
SVN Repository
You must create a directory in your svn repository named:
<year>/<semester>/cs/assignment2. This directory must only contain
the following files and directories - the web submission system will
check this:
? Makefile - this file is used by make to compile your
programs - do not modify this file.
? parser.cpp - C++ source file
? translator.cpp - C++ source file
? my*.cpp C++ source files with names that start with my
? my*.h C++ include files with names that start with my
? bin - this directory contains precompiled programs and scripts
- do not modify this directory.
? lib - this directory contains precompiled components - do not
modify this directory.
? includes - this directory contains .h files for precompiled
classes - do not modify this directory.
? tests - this directory contains test data, you can add your
own tests here
Note: if the lib/lib.a file does not get added to your svn
repository you will need to explicitly add it using:
% svn add lib/lib.a
Submission and Marking Scheme
This assignment has two assignments in the web submission system
named: Assignment 2 - Milestone Submissions and Assignment 2 -
Final Submissions. The assessment is based on "Assessment of
Programming Assignments".
Assignment 2 - Milestone Submissions: due 11:55pm Tuesday of
week 9
The marks awarded by the web submission system for the milestone
submission contribute up to 20% of your marks for assignment
2. Your milestone submission mark, after the application of late
penalties, will be posted to the myuni gradebook when the assignment
marking is complete.
Your programs must be written in C++ and will be tested using Hack
Assembly Language programs that that may or may not be syntactically
correct. Although a wide range of tests may be run, including a
number of secret tests, marks will only be recorded for those tests
that are syntactically correct and do not use symbols. Your programs
will be compiled using the Makefile included in the zip file
attached below. The parser program will be compiled using the file
parser.cpp and the translator program will be compiled using the
file translator.cpp file. In both cases any .cpp files with names
starting with my will be also included together with the precompiled
library functions.
Assignment 2 - Final Submissions: due 11:55pm Friday of week
9
The marks awarded for the final submission contribute up to 80% of
your marks for assignment 2.
Your final submission mark will be the geometric mean of the marks
awarded by the web submission system, a mark for your logbook and a
mark for your code. It will be limited to 20% more than the marks
awarded by the web submission system. See "Assessment - Mark
Calculations" for examples of how the marks are combined.
Your final submission mark, after the application of late penalties,
will be posted to the myuni gradebook when the assignment marking is
complete.
Automatic Marking
The automatic marking will compile and test both of your programs in
exactly the same way as for the milestone submission. The difference
is that marks will be recorded for all of the tests including
the secret tests. Note: if your programs fail any of these
secret tests you will not receive any feedback about these
secret tests, even if you ask!
Logbook Marking
Important: the logbook must have entries for all work in this
assignment, including your milestone submissions. See "Assessment -
Logbook Review" for details of how your logbook will be assessed.
Code Review Marking
For each of your programming assignments you are expected to submit
well written code. See "Assessment - Code Review" for details of
how your code will be assessed.
Nand2Tetris Project 6: The Assembler
Background
Low-level machine programs are rarely written by humans. Typically,
they are generated by compilers. Yet humans can inspect the
translated code and learn important lessons about how to write their
high-level programs better, in a way that avoids low-level pitfalls
and exploits the underlying hardware better. One of the key players
in this translation process is the assembler -- a program designed to
translate code written in a symbolic machine language into code
written in binary machine language.
This project marks an exciting landmark in our Nand to Tetris
odyssey: it deals with building the first rung up the software
hierarchy, which will eventually end up in the construction of a
compiler for a Java/C++ like high-level language. The relevant
reading for this project is Chapter 6. Some of the useful tools
available include, the Hack Assembler, the CPU Emulator and working
versions of the two programs, bin/working-parser and bin/workingtranslator.
Objective
The Hack assembler is a relatively simple program however, so that
you can gain experience with the tools used in other workshops and
assignments, you will build your assembler from three parts. This
will involve using a precompiled tokeniser for the Hack assembly
language to implement a parser that recognises labels, A-instructions
and C-instructions using tokens returned by the tokeniser. The
parser will construct a tree representation of the program that the
translator will walk over in order to assemble the final machine
code.
If you wish to create additional source files that can be used by
your programs the .cpp and .h files must have names that start
with my. All my*.cpp files will be compiled as part of both
you parser and translator programs.
Compiling and Running Your Programs
You programs can be compiled with one of the following three
commands. To compile both:
% make
To compile just parser:
% make parser
To compile just translator:
% make translator
To test your programs you can use the following command:
% make
Testing student assembler-m against hack files
Checking "./assembler-m < tests/AddL.asm | diff - tests/AddL.hack" -
test failed
...
The test scripts do not show the program outputs, just passed or
failed, but they do show you the commands being used to run each
test. You can copy-paste these commands if you want to run a
particular test yourself and see all of the output.
If you wish to add new tests, create appropriately named files in the
tests directory and then use the command:
% make test-add
Note: Do not modify the Makefile or the subdirectories
includes, bin and lib. They will be replaced during
testing by the web submission system.
The Tokeniser
You must use the precompiled tokeniser functions provided by the
library in the zip file attached below. The tokeniser functions are
described in the files includes/a-tok.h and include/tokeniser.h.
This tokeniser returns the followings tokens:
Token
Class
Definition Example Token
ak_address ::= '@' ( name | number ) @hello ak_name
ak_label ::= '(' name ')' (END) ak_label
ak_dest ::= 'MD' | 'AM' | 'AD' | 'AMD' Am ak_dest_AM
ak_register ::= 'A' | 'M' | 'D' a ak_register_A
ak_alu_op ::= '0' | '1' | '-1' | '!D' | '!A' |
'-D' | '-A' | 'D+1' | 'A+1' | 'D-
1' | 'A-1' | 'D+A' | 'D-A' |'A-D'
| 'D&A' | 'D|A' | '!M' | '-M' |
'M+1' | 'M-1' | 'D+M' | 'D-M' |
'M-D' | 'D&M' | 'D|M'
1 ak_alu_1
ak_jump ::=
'JMP' | 'JLT' | 'JLE' | 'JGT' |
'JGE' | 'JEQ' | 'JNE' JGT ak_jgt
ak_assign ::= '=' = ak_assign
ak_semi ::= ';' ; ak_semi
ak_null ::= 'NULL' nuLl ak_null
Additional
Rules
Definition Example Text
name ::= letter ( letter | digit )* "_he:82m.Uch$"
number ::= digit digit* "99"
letter ::=
'a'-'z' | 'A'-'Z' | '$' | '_' |
':' | '.' "$"
digit ::= '0'-'9' "1"
Notes:
? if an error occurs or the end of input is reached, the token
ak_eoi is returned
? it is an error to find a character that cannot be part of
token or is not a space " ", tab "\t", carriage return "\r" or
newline "\n"
? single line comments that start with "//" and finish at the
next newline character "\n"
? when searching for the start of the next token all spaces,
tabs, carriage returns, newlines and comments are ignored
? newlines are not significant to the tokeniser so more than one
instruction can be on the same line or an instruction could be
split over multiple lines
? name, number, letter and digit are never returned as token
classes
? letters in a name are case sensitive, all other occurrences of
letters are converted to uppercase
? in a definition the or operator | separates
alternative components
? in a definition the round brackets ( ) which are not inside
single quotes are for grouping components of token
? in a definition the star character * indicates that the
preceding component of a token may appear 0 or more times
The Assembler Parser
The following table describes the structure of an assembly language
program that must be translated into machine code. Note: there are no
references to lines in these rules. All whitespace, including newline
characters, and comments will have been removed by the
precompiled tokeniser. Therefore these rules are defined purely in
terms of the token classes in the previous table.
Term Definition
program ::=
( ak_label | ak_address | Cinstr
)* ak_eoi
C-instr ::= [destination] aluop [ajump]
destination ::=
( ak_null | ak_dest | ak_register )
ak_assign
aluop ::= ak_register | ak_alu_op
ajump ::= ak_semi ( ak_null | ak_jump )
Notes:
? in a definition eof indicates the end of the input
? in a definition the or operator | separates
alternative components
? in a definition the round brackets ( ) are for grouping
components
? in a definition the square brackets [ ] which indicates that
the enclosed components may appear 0 or once
? in a definition the star character * indicates that the
preceding component may appear 0 or more times
Abstract Syntax Tree
An abstract syntax tree is just a tree like data structure that holds
a representation of a program. In this case the parser program must
construct an object of class an_program_node and append all labels
or instructions to it. Each tree class can only be manipulated by
pointers and has a typedef for a pointer to the class so the *
operator does not need to be used very often. Each class for which
you are allowed to create objects also has a create function, for
example to create an an_program_node you would call the function
an_program_node_create(). To accommodate situations where pointers
of different classes can be used, each class has a function to
convert the pointer type after checking that it is appropriate. For
example, a pointer to an an_c_instruction can be used anywhere a
pointer to an an_instruction is required and the conversion function
is named to_an_instruction(). If the checks fails your program will
exit immediately.
Parser
The skeleton parser.cpp file has most of the parsing structure
already written, you just need to complete the object creation and
parsing of labels and individual instructions. The main() function of
the parser program prints out an XML representation of the abstract
syntax tree using the precompiled library function an_print_as_xml().
You do not need to know how to write out XML. XML is produced by
the parser because it is easy to read by humans.
If your parser encounters any errors, it must exit immediately and
have not produced any output. Examples of errors include, multiple
definitions of a label, an A-instruction with a number that is too
large, or a C-instruction with multiple destination components,
multiple jump components or no ALU component. In this two program
structure, the parser will not be able to directly detect the first
two kinds of errors. Multiple definitions of a label will not be
detectable until the translator walks the abstract syntax tree and a
number that is too large will cause the tokeniser to report that it
is at the end of the input. The other errors can be detected by the
parser and should result in an immediate exit with no output.
The iobuffer.h include file gives details of functions you can use
to manage when output is produced by your programs.
Translator
The skeleton translator.cpp file includes a loop that walks over the
abstract syntax tree produced by the parser. The translator program
uses the precompiled function an_parse_xml(), to read the output of
the parser program. You do not need to know how to read in and
interpret XML. The later workshops and assignments will use this
technique to allow their compilers to be written as separate
programs.
One advantage of this two program approach is that walking over the
tree more than once is really simple. This is important in the
translator when it comes to resolving symbols. When a program
includes an A-instruction such as @somename you cannot tell if
somename is a label or a variable until the entire program has been
inspected because the label definition may come later. This will be
true for jumps to a later part of a program. We solve this problem by
walking over the abstract syntax tree and record all the labels that
we can find. This is Pass 1. When we walk over the abstract syntax
tree a second time, Pass 2, every label name will already have been
defined. Therefore, all undefined names found in Pass 2 must be
variable names.
If your translator encounters any errors, it must exit immediately
and have not produced any output. It may be interesting to reflect on
why we did not include missing label definitions amongst the list of
errors to be detected.
Lookup Tables
Your translator program will need a way of generating the correct set
of bits for each part of a C-instruction. For example, "JMP" needs to
be mapped to "111". To do this you will need some sort of lookup
table to record these mappings. The translator program also needs a
lookup table so that it can implement a symbol table to record label
addresses and the memory locations for variables. You should use the
precompiled symbol table classes described in the includes/symbols.h
file to create any lookup tables that you require.
The includes/symbols.h file includes examples of how to use these
lookup tables.
You will need several lookup tables, one for the symbol table and one
for each component of a C-instruction. Although most parts of a Cinstruction
are unique, the M, A and D registers need to be mapped to
different sets of bits depending on whether they are used as an alu
operation or a destination. For example, "A" as a destination would
be "100" but as an alu operation it would be "0110000".
Test Programs
In addition to the tests in tests directory, we will use a number of
secret tests that may contain illegal tokens and / or multiple labels
and instructions on the same line and / or syntactic errors. Note:
these tests are secret, if your programs fail any of these
secret tests you will not receive any feedback about these
secret tests, even if you ask!
The Pong program supplied in the tests directory was written in the
Java-like high-level Jack language and translated into the Hack
assembly language by the Jack compiler and a Hack Virtual Machine
translator. (The Virtual Machine, Jack and the Jack compiler are
described in Chapters 7-8, 9 and 10-11, respectively). Although the
original Jack program is only about 300 lines of Jack code, the
executable Pong code is naturally much longer. Running this
interactive program in the supplied CPU Emulator is a slow affair, so
don't expect a high-powered Pong game. This slowness is actually a
virtue, since it enables your eye to track the graphical behavior of
the program. And don't worry! as we continue to build the software
platform in the next few projects, Pong and and other games will run
much faster.
Startup Files
版权所有:编程辅导网 2021 All Rights Reserved 联系方式:QQ:99515681 微信:codinghelp 电子信箱:99515681@qq.com
免责声明:本站部分内容从网络整理而来,只供参考!如有版权问题可联系本站删除。