联系方式

  • QQ:99515681
  • 邮箱:99515681@qq.com
  • 工作时间:8:00-21:00
  • 微信:codinghelp

您当前位置:首页 >> C/C++编程C/C++编程

日期:2023-06-22 11:32

13:26 COMP9319 2023T2 Assignment 1

1/7

COMP9319 2023T2 Assignment 1: LZW

Encoding and Decoding

Your task in this assignment is to implement an LZW encoder and its decoder with 15-bit

32768 dictionary entries (excluding those entries for the individual ASCII characters),

called lencode and ldecode, in C or C++. After the dictionary is full, no new entries can be

added. You may assume the source file may contain any 7-bit ASCII characters.

Both lencode and ldecode accept two commandline arguments: a path to the input file,

followed by a path to the output file. The output file generated by ldecode should be

identical to the corresponding source file of its encoded input, similarly, its encoded input

file should be obtainable and generated by lencode using that corresponding source file as

its input file. This is illustrated by the following example:

%grieg> lencode ~cs9319/a1/test1.txt test1.encoded

%grieg> ldecode test1.encoded test1.decoded

%grieg> diff ~cs9319/a1/test1.txt test1.decoded

%grieg>

In the encoded file, each ASCII character is stored in one byte with the most significant

bit marked as 0, and each dictionary index address is represented as two bytes with the

most significant bit marked as 1. When an index is read from the encoded file, the most

13:26 COMP9319 2023T2 Assignment 1

2/7

significant byte will be read first (so we can determine if we are reading an ASCII or an

index). The example used in the next section will elaborate on this further.

Examples and Testing

Some example source files and their encoded files are available at ~cs9319/a1 of any CSE

Linux machine. A sanity test script is also available there.

For example, consider a source file test1.txt:

cs9319@grieg:~/a1$ cat test1.txt

^WED^WE^WEE^WEB^WET

You can inspect its expected LZW encoded file test1.lzw using xxd:

cs9319@grieg:~/a1$ xxd -b test1.txt

00000000: 01011110 01010111 01000101 01000100 01011110 01010111 ^WED^W

00000006: 01000101 01011110 01010111 01000101 01000101 01011110 E^WEE^

0000000c: 01010111 01000101 01000010 01011110 01010111 01000101 WEB^WE

00000012: 01010100 T

cs9319@grieg:~/a1$ xxd -b test1.lzw

00000000: 01011110 01010111 01000101 01000100 01011110 01010111 ^WED^W

00000006: 01000101 10000000 00000100 01000101 01011110 01010111 E..E^W

0000000c: 01000101 01000010 10000000 00000100 01010100 EB..T

cs9319@grieg:~/a1$

13:26 COMP9319 2023T2 Assignment 1

3/7

I inspected both the source file and its encoded file using xxd to show you their

differences. In particular, the binary value 10000000 00000100 that corresponds to the index

value of 4 replaces ^WE twice in the encoded file.

You can figure out the corresponding index values by tracing the LZW algorithm

presented in the Slide 7-17 of Lecture Week 2. Following the same variable tracing as the

Slides, i.e., (p, c, output, index, symbol), the content of test1.lzw can be derived similarly

and illustrated in the following table:

NIL ^

^ W ^ 0 ^W

W E W 1 WE

E D E 2 ED

D ^ D 3 D^

^ W

^W E ^W 4 ^WE

E ^ E 5 E^

^ W

^W E

^WE E 4 6 ^WEE

E ^

E^ W E^ 7 E^W

W E

WE B WE 8 WEB

B ^ B 9 B^

^ W

^W E

13:26 COMP9319 2023T2 Assignment 1

4/7

^WE T 4 10 ^WET

T EOF T

When compared with the Slide 7-17 of Lecture Week 2, the above table has two

differences. Firstly, the dictionary index starts from 0 instead of 256, as the ASCII

character code and dictionary index no longer share the same address space. Secondly,

note the output of row 7 is the actual characters instead of a dictionary index (i.e., 256 in

the Lecture Slide or 0 in the above table). Since two bytes are used to represent an index,

storing an index instead of two characters is not space saving. Therefore, an index to a

dictionary entry will only be output when its entry is holding at least 3 characters. As a

result, the encoded file is always smaller than or equal to the size of its source file. When

their sizes are equal, the encoded file is in fact identical to its source file (for example,

refer to test2.txt and test2.lzw in ~cs9319/a1).

To run the sanity test script on grieg, simply go inside the folder that contains your

program source files and type: ~cs9319/a1/autotest that will run tests based on example

files provided there. You should run the sanity test script before you submit your solution.

Documentation and Code Readability

Your source code may be manually inspected. Marks may be deducted if your code is

very poor on readability and ease of understanding.

13:26 COMP9319 2023T2 Assignment 1

5/7

Marking

This assignment is worth 15 points, all based on auto marking.

Your solution will be compiled and tested on grieg (i.e., grieg.cse.unsw.edu.au), a particular

CSE Linux machine. It may call and use any C or C++ functions or libraries available on

grieg.

You should submit totally two C or C++ files (and must be named using the file extension

.c and .cpp respectively), one for encoding and one for decoding. Your submitted files will

be compiled using the following commands depending on their file extensions:

%wagner> gcc -o lencode lencode.c

%wagner> gcc -o ldecode ldecode.c

or:

%wagner> g++ -o lencode lencode.cpp

%wagner> g++ -o ldecode ldecode.cpp

Any solution that fails to compile on grieg with the above commands will receive zero

points for the entire assignment. Your submission will not be tested with huge files, but it

is expected to work properly and correctly for files up to a few hundred kilobytes.

13:26 COMP9319 2023T2 Assignment 1

6/7

Performance

Any single test (e.g., lencode test1.txt test1.enc) that takes more than 5 seconds on grieg

will be terminated, and your solution will receive zero points for that test. So, your

dictionary lookup cannot be too slow (e.g., linear search may not make it).

Submission

Deadline: Monday 26th June 12:00pm AEST (noon). The penalty for late submission of

assignments will be 5% (of the worth of the assignment) subtracted from the raw mark

per day of being late. In other words, earned marks will be lost. No assignments will be

accepted later than 5 days after the deadline. Please read the Assignments section in

the course outline for details.

Use the give command below to submit the assignment or submit via WebCMS3:

give cs9319 a1 lencode.c ldecode.c

or:

give cs9319 a1 lencode.cpp ldecode.cpp

Please use classrun to make sure that you have submitted the correct file(s).

13:26 COMP9319 2023T2 Assignment 1

7/7

9319 classrun -check a1

Note: Unfortunately the give and classrun commands are not available on grieg, but are

on any other CSE linux machine, so you'll have to use these other machines to run the

give/classrun command.

Plagiarism

The work you submit must be your own work. Submission of work partially or completely

derived from any other person or sources is not permitted. The penalties for such an

offence may include negative marks, automatic failure of the course and possibly other

academic discipline. Assignment submissions will be examined both automatically and

manually for such submissions.

Please read the Student Conduct section in the course outline for details.


版权所有:编程辅导网 2021 All Rights Reserved 联系方式:QQ:99515681 微信:codinghelp 电子信箱:99515681@qq.com
免责声明:本站部分内容从网络整理而来,只供参考!如有版权问题可联系本站删除。 站长地图

python代写
微信客服:codinghelp