联系方式

  • QQ:99515681
  • 邮箱:99515681@qq.com
  • 工作时间:8:00-21:00
  • 微信:codinghelp

您当前位置:首页 >> C/C++编程C/C++编程

日期:2020-11-22 04:54

Project CS3103 Operating Systems Version 2.1 (20201021)

1 CS3103 - Operating Systems

Project: Parallel Zip

Due Date: Tuesday, November 24, 2020 at 8 PM HKT. Late submissions will be penalized as per syllabus.

I. Project Instructions

Overview

In the earlier programming assignment, you implemented a simple compression tool based on run-length

encoding, known simply as czip. For this project, you'll implement something similar, except you'll use

threads to make a parallel version of czip. We'll call this version pzip.

There are three specific objectives to this project:

• To familiarize yourself with the Linux pthreads library for writing multi-threaded programs.

• To learn how to parallelize a program.

• To learn how to program for performance.

Project Organization

Each group should do the following pieces to complete the project. Each piece is explained below:

• Design

• Implementation

• Evaluation

30 points

30 points

40 points

You’re required to submit a project report which concisely describes your design, implementation, and

experimental results.

Design

The design space of building an effective compression tool is large. A practical zip that achieve a better

performance will require you to address (at least) the following issues (see part II. Project Description for

more details):

• How to parallelize the compression.

• How to determine the number of threads to create.

• How to efficiently compress multiple files in parallel.

• How to access the input file efficiently.

In your project report, please describe the main techniques or mechanisms proposed to parallelize the

compression. List and describe all the functions used in this project.

2 CS3103 - Operating Systems

Implementation

Your code should be nicely formatted with plenty of comments. Each function should be preceded by a

header comment that describes what the function does. The code should be easy to read, properly indented,

employ good naming standards, good structure, and should correctly implement the design. Your code

should match your design.

Evaluation

We provide 10 test cases for you to test your code. Before submitting your work, please run your pzip on the

test cases and check if your pzip zips input files correctly. Time limitation is set for each test case, and if this

limit is exceeded, your test will fail. For each test case, your grade will be calculated as follows:

• Correctness. Your code will first be measured for correctness, ensuring that it zips input files correctly.

You will receive full points if your solution passes the correctness tests performed by the test script. You

will receive zero points for this test case if your code is buggy.

• Performance. If you pass the correctness tests, your code will be tested for performance; Test script will

record the running time of your program for performance evaluation. Shorter time/higher performance

will lead to better scores.

In your project report, summary and analyze the results. You can also compare your solution with the provided

baseline implementation.

Tips: Keep a log of work you have done. You may wish to list optimizations you tried, what failed, etc.

Keeping a good log will make it easy to put together your final writeup.

Bonus

You’re encouraged to be creative and innovative, and this project award bonus points (up to 10 points) for

additional and/or exemplary work.

• New ideas/designs are welcome to fully explore the parallelism of the process of zipping files.

Additional test cases will be used to test your solution for performance evaluation.

• To encourage healthy competition and desire to improve, we'll provide a scoreboard that shows scores

for each group. The latest scores are displayed, rank ordered, on the scoreboard. We will award bonus

points for top 10 groups. Further details will be posted on Canvas soon.

Language/Platform

The project should be written in ANSI standard C.

This project can be done on Linux (recommended), MacOS, or Windows using Cygwin. Since grading of

this project will be done using gateway Linux server, students who choose to develop their code on any

other machine are strongly encouraged to run their programs on the gateway Linux server before turning it

in. There will be no points for programs that do not compile and run on gateway Linux server, even if they

run somewhere else.

3 CS3103 - Operating Systems

Handing In

The project can be done individually, in pairs, or in groups, where each group can have a maximum of three

members. All students are required to join one project group in Canvas: "People" section > "Groups" tab >

“Project” Group > Group 01–Group 40 or Individual 41–Individual 50. Contact TA to add additional groups

if necessary. Self sign-up is enabled for these groups. Instead of all students having to submit a solution to the

project, Canvas allows one person from each group to submit on behalf of their team. If you work with

partner(s), both you and your partner(s) will receive the same grade for the entire project unless you explicitly

specify each team member's contribution in your report. Please be sure to indicate who are in your group when

submitting the project report.

Before you hand in, make sure to add the requested identifying information about your project group, which

contains project group number, full name and e-mail address for each group member.

When you're ready to hand in your solution, go to the course site in Canvas, choose the "Assignments" section >

"Project" group > "Project" item > "Submit Assignment" button and upload your files, including the following:

1) A PDF document which concisely describes your design, implementation, and experimental results;

If you are working in a team, please also describe each team member's contribution.

2) The source file, i.e., pzip.c;

Academic Honesty

All work must be developed by each group separately. Please write your own code. All submitted source

code will be scanned by anti-plagiarism software. If the code does not work, please indicate in the report

clearly.

Questions?

If you have questions, please first post them on Canvas so others can get the benefit of the TA's answer.

Avoid posting code that will give away your solution or allow others to cheat. If this does not resolve your

issue, contact the TA (Mr. LI Jinheng<jinheng.li@my.cityu.edu.hk>).

Acknowledgements

This project is taken and modified from the OSTEP book written by Remzi and Andrea Arpaci-Dusseau at

the University of Wisconsin. This free OS book is available at http://www.ostep.org. Automated testing scripts

are from Kyle C. Hale at the Illinois Institute of Technology.

Disclaimer

The information in this document is subject to change with notice via Canvas. Be sure to download the latest

version from Canvas.

4 CS3103 - Operating Systems

II. Project Description

For this project, you will implement a parallel version of zip using threads. First, recall how zip works by

reading the description in Assignment 1 Part II. You'll use the same basic specification, with run-length

encoding (RLE) as the basic technique.

RLE is quite simple: when you encounter n characters of the same type in a row, the compression tool (pzip)

will turn that into the number n and a single instance of the character.

Thus, if we had a file with the following contents:

aaaaaaaaaabbbb

The tool would turn it (logically) into:

10a4b

However, the exact format of the compressed file is quite important; here, you will write out a 4-byte integer

in binary format followed by the single character in ASCII. Thus, a compressed file will consist of some

number of 5-byte entries, each of which is comprised of a 4-byte integer (the run length) and the single

character. To write out an integer in binary format (not ASCII), you should use fwrite(). Read the man

page for more details. For pzip, all output should be written to standard output (the stdout file stream).

Your pzip will externally look the same as czip. However, internally, the program will use POSIX threads

to parallelize the compression process. The general usage from the command line will be as follows:

$ ./pzip file.txt > file.z

Doing so effectively and with high performance will require you to address (at least) the following issues:

• How to parallelize the compression. The central challenge of this project is to parallelize the

compression process. You are required to think about whether the compression process can be separated

into several sub-processes, what sub-processes can be done in parallel, and what sub-processes must be

done serially by a single thread. Then, you are required to design your parallel zip as appropriate. For

example, does it possible to zip several small sub-files using multiple threads instead of zipping a large

file using only one thread? If it's possible, how to divide the large file? How to zip those small sub-files

using multiple threads? How to merge zipped results of several small sub-files? One interesting issue that

the "best" implementations will handle is this: what happens if one thread runs much slower than another?

Does the compression give more work to faster threads? This issue is often referred to as the straggler

problem.

• How to determine the number of threads to create. On Linux, the determination of the number of

threads may refer to some interfaces like get_nprocs() and get_nprocs_conf(); You are suggested

to read the man pages for more details. Then, you are required to create an appropriate number of threads

to match the number of CPUs available on whichever system your program is running.

5 CS3103 - Operating Systems

• How to efficiently compress multiple files in parallel. In previous issues, you may have completed the

parallel compression for one large file. Now you are required to think about how to parallelize the

compression processes of multiple files. A naïve way is to sequentially process the parallel compression

process of each file. However, this method cannot fully explore the parallelism of the compression

processes of multiple files. You are required to explore the parallelism between the compression processes

of multiple files and design an efficient and fast parallel method to compress multiple files. Note that

when the input contains directories, you can first obtain the paths of all files in the directories recursively

using readdir(), then compress them as multiple files.

• How to access the input file efficiently. On Linux, there are many ways to read from a file, including C

standard library calls like fread() and raw system calls like read(). One particularly efficient way is

to use memory-mapped files, available via mmap(). By mapping the input file into the address space, you

can then access bytes of the input file via pointers and do so quite efficiently.

To understand how to make tackle these problems, you should first understand the basics of thread creation,

and perhaps locking and signaling via mutex locks and condition variables. Review the tutorials and read the

following chapters from OSTEP book carefully in order to prepare yourself for this project.

• Intro to Threads

• Threads API

• Locks

• Using Locks

• Condition Variables

6 CS3103 - Operating Systems

III. Project Guidelines

1. Getting Started

The project is to be done on the CSLab SSH gateway server, to which you should already be able to log in.

As before, follow the same copy procedure as you did in the previous tutorials to get the project files (code

and test files). They are available in /public/cs3103/project/ on the gateway server. project.zip contains

the following files/directories:

/project

├── Makefile

├── pzip <- A sample solution (executable file).

├── pzip.c <- Modify and hand in pzip.c file.

├── README.txt

└── tests

├── bin

│ ├── generator.py

│ ├── generic-tester.py

│ ├── serialized_runner.py

│ └── test-pzip.csh

├── config

│ ├── 1.json

│ ├── ...

│ └── 10.json

├── stdout

│ ├── 1.out

│ ├── 1.rc

│ ├── ...

│ └── 10.rc

└── tests-pzip

├── 1

│ └── 1-0.in

├── 2

│ ├── 2-0.in

│ ├── 2-1.in

│ ├── ...

│ └── 2-11.in

├── ...

└── 10

├── 10-0

│ ├── 10-0-0.in

│ ├── 10-0-1.in

7 CS3103 - Operating Systems

│ └── 10-0-2.in

├── 10-1

│ ├── 10-1-0.in

│ ├── 10-1-1.in

│ └── 10-1-2.in

├── 10-2.in

├── ...

└── 10-7.in

Start by copying the provided files to a directory in which you plan to do your work. For example, copy

/public/cs3103/project/project.zip to your home directory, extract the files from the ZIP file with the unzip

command. Note that the file size of project.zip is only 10MB, but the uncompressed project directory has a

size of 5GB. It takes about 90 seconds to unzip the project.zip file on our gateway server. After the unzip

command extracts all files to the current directory, change to the project directory and take a look at the

directory contents:

$ cd ~

$ cp /public/cs3103/project/project.zip .

$ unzip project.zip

$ cd project

$ ls

A sample pzip is also provided (we only provide a single executable file without source code). This pzip

uses pthread to support the parallel compression of multiple files or directories. When the input files of the

compression process contain directories, the pzip first obtains the paths of all files in the directories

recursively using readdir(), then treats the compression process as the compression of multiple files. For

the multiple files’ compression, the pzip uses mmap to map files into multiple pages and compress all pages

in parallel. The parallel compression process can be treated as producer-consumer problem. The pzip uses

one thread of producer to map files and multiple threads of consumers to compress pages.

You can run and test pzip by using the make run command.

$ make run

TEST 1 - single file test, a small file of 10 MB (2 sec timeout)

Test finished in 0.045 seconds

RESULT passed

(content removed for brevity)

TEST 10 - mixed test 3, two directories that contain three small files of

10 MB, 20 MB, 10 MB and three large files of 200 MB, 100 MB, 200 MB, and

six small files outside directory of 10 MB, 20 MB, 10 MB, 20 MB, 10 MB,

20 MB (2 sec timeout)

Test finished in 0.252 seconds

RESULT passed

8 CS3103 - Operating Systems

You can also use this one as a baseline implementation for performance evaluation, which means you can

compare the execution time of your pzip with that of the provided one in final report. Note that after building

your own pzip (using make or make test), the provided pzip file will be overwritten. But don’t worry,

you can always copy it from /public/cs3103/project/pzip.

2. Writing your pzip program

The pzip.c is the file that you will be handing in and is the only file you should modify. Write your code from

scratch or simply borrow the code from your czip to implement this parallel version of zip. Again, it's a good

chance to learn (as a side effect) how to use a proper code editor such as vim1,2.

3. Building your program

A simple makefile that describes how to compile pzip is provided for you.

To compile your pzip.c and to generate the executable file, use the make command within the directory that

contains your project. It will display the command used to compile the pzip.

$ make

gcc -Wall -Werror -pthread -O pzip.c -o pzip

Note that the -Werror compiler flag is specified. It causes all warnings to be treated as build errors. It would

be better to fix the compiling issue instead of disabling -Werror flag.

If everything goes well, there would an executable file pzip in it:

$ ls

Makefile pzip pzip.c README.txt tests

If you make some changes in pzip.c later, you should re-compile the project by running make command again.

To remove any files generated by the last make, use the make clean command.

$ make clean

rm -f pzip

$ ls

Makefile pzip.c README.txt tests

4. Testing your C program

We also provide 10 test cases for you to test your code. You can find them in the directory tests/tests-pzip/.

The makefile could also trigger automated testing scripts, type make run (run testing only) or make test

(build your program and run testing).

1 Interactive Vim tutorial, https://www.openvim.com/

2 Vim Cheat Sheet, https://vim.rtorr.com

9 CS3103 - Operating Systems

$ make test

TEST 0 - clean build (program should compile without errors or warnings)

Test finished in 0.189 seconds

RESULT passed

TEST 1 - single file test, a small file of 10 MB (2 sec timeout)

Test finished in 0.045 seconds

RESULT passed

TEST 2 - multiple files test, twelve small files of 10 MB, 20 MB, 30 MB,

10 MB, 20 MB, 30 MB, 10 MB, 20 MB, 30 MB, 10 MB, 20 MB, 30 MB (2 sec

timeout)

Test finished in 0.101 seconds

RESULT passed

TEST 3 - empty file test (2 sec timeout)

Test finished in 0.013 seconds

RESULT passed

TEST 4 - no file test (2 sec timeout)

Test finished in 0.014 seconds

RESULT passed

TEST 5 - single large file test, a large file of 100 MB (2 sec timeout)

Test finished in 0.074 seconds

RESULT passed

TEST 6 - multiple large files test, six large files of 100 MB, 200 MB, 300

MB, 100 MB, 200 MB, 300 MB (2 sec timeout)

Test finished in 0.435 seconds

RESULT passed

TEST 7 - directory test, a directory that contains twelve small files of

10 MB, 20 MB, 30 MB, 40 MB, 10 MB, 20 MB, 30 MB, 40 MB, 10 MB, 20 MB, 30

MB, 40 MB (2 sec timeout)

Test finished in 0.135 seconds

RESULT passed

TEST 8 - mixed test 1, a directory that contains six small files of 10 MB,

20 MB, 10 MB, 20 MB, 10 MB, 20 MB and six large files outside directory

of 100 MB, 200 MB, 300 MB, 100 MB, 200 MB, 300 MB (2 sec timeout)

10 CS3103 - Operating Systems

Test finished in 0.410 seconds

RESULT passed

TEST 9 - mixed test 2, a directory that contains six large files of 100

MB, 200 MB, 100 MB, 200 MB, 100 MB, 200 MB, and six small files outside

directory of 30 MB, 40 MB, 30 MB, 40 MB, 30 MB, 40 MB (2 sec timeout)

Test finished in 0.374 seconds

RESULT passed

TEST 10 - mixed test 3, two directories that contain three small files of

10 MB, 20 MB, 10 MB and three large files of 200 MB, 100 MB, 200 MB, and

six small files outside directory of 10 MB, 20 MB, 10 MB, 20 MB, 10 MB,

20 MB (2 sec timeout)

Test finished in 0.252 seconds

RESULT passed

The job of those automated scripts is to orderly run your pzip on the test cases and check if your pzip zips

input files correctly. TEST 0 (available for make test) will fail if your program is compiled with errors or

warnings. Time limitation is set for each test case, and if this limit is exceeded, your test will fail. Besides, the

script will record the running time of your program for performance evaluation. Shorter time/higher

performance will lead to better scores.

Below is a brief description of each test case:

• Test case 1-6: Test cases 1, 2, are small files. For test case 3, the file is empty. For test case 4, there is no

file. if no files are specified, the program should exit with return code 1 and print "pzip: file1 [file2 ...]"

(followed by a newline).

Test case 1) single file test – a small file.

Test case 2) multiple files test – twelve small files of different file size.

Test case 3) empty file test.

Test case 4) no files test.

Test case 5) single large file test – a large file.

Test case 6) multiple large files test – six large files of different file size.

• Test case 7-10: Some files are stored in a directory, and you are required to compress the directory and

other files. Do note that if multiple files are passed to pzip, they are compressed into a single compressed

output. The information that multiple files were originally input into pzip is lost.

Test case 7) directory test – a directory that contains twelve small files of different file size.

Test case 8) mixed test 1 – a directory that contains six small files, and six large files outside

directory.

Test case 9) mixed test 2 – a directory that contains six large files, and six small files outside

directory.

11 CS3103 - Operating Systems

Test case 10) mixed test 3 – two directories that contain three large files and three small files, and

six small files outside directory.

Each test consists of 5 files with different filename extension:

• n.json (in tests/config/ directory): The configuration of test case n.

§ binary: Indicates the data type of input and output is binary.

§ filename: The test case number.

§ filesize: The file size of each file. All numbers are included in a list. If an item is a list of

numbers, it indicates it is a directory.

§ timeout: Time limitation (seconds).

§ seed: The seed used to generate the content of input files.

§ description: A short text description of the test.

§ preparation: Code to run before the test, to set something up.

• n.rc (in tests/stdout/ directory): The return code the program should return (usually 0 or 1).

• n.out (in tests/stdout/ directory): The standard output expected from the test.

• n.in (in tests/tests-pzip/ directory): The input file of the test case.

Just like in Assignment 1, you can run and test your pzip manually. For example, to run your pzip to

compress the input file 1-0.in in tests/tests-pzip and save the compressed file as 1.out, enter:

$ ./pzip ./tests/tests-pzip/1/1-0.in > 1.out

To run your pzip to compress multiple input files (the input files 2-0.in, 2-1.in, 2-2.in) in tests/tests-pzip, and

save the compressed file as 2.out, enter:

$ ./pzip tests/tests-pzip/2/2-0.in tests/tests-pzip/2/2-1.in tests/testspzip/2/2-2.in

> 2.out

To run your pzip to compress the input directory 7-0 in tests/tests-pzip and save the compressed file as 7.out,

enter:

$ ./pzip tests/tests-pzip/7/7-0 > 7.out

To run your pzip to compress the input directory 8-0 and some input files (the input files 8-0.in, 8-1.in) in

tests/tests-pzip and save the compressed file as 8.out, enter:

$ ./pzip tests/tests-pzip/8/8-0 tests/tests-pzip/8/8-0.in tests/tests-pzip/8/8-

1.in > 8.out

To run other test cases, please run as follows:

$ ./pzip tests/tests-pzip/3/3-0.in > 3.out

$ ./pzip tests/tests-pzip/5/5-0.in > 5.out

$ ./pzip tests/tests-pzip/9/9-0 tests/tests-pzip/9/9-0.in tests/tests-pzip/9/9-

1.in > 9.out


版权所有:编程辅导网 2021 All Rights Reserved 联系方式:QQ:99515681 微信:codinghelp 电子信箱:99515681@qq.com
免责声明:本站部分内容从网络整理而来,只供参考!如有版权问题可联系本站删除。 站长地图

python代写
微信客服:codinghelp