联系方式

  • QQ:99515681
  • 邮箱:99515681@qq.com
  • 工作时间:8:00-21:00
  • 微信:codinghelp

您当前位置:首页 >> C/C++编程C/C++编程

日期:2019-04-06 09:05

COMP SCI 4094/4194/7094 - Distributed Databases and Data Mining

Assignment 1

DUE: 9pm Monday 29th April 2019

Important Notes

Handins:

– The deadline for submission of your assignment is 9pm Monday 29th April, 2019.

– You must do this assignment individually and make individual submissions.

– Your program should be coded in C++ and pass test runs on the two test files. The

sample input and output files are downloadable in “Assignments” of the course home

page (https://cs.adelaide.edu.au/users/honours/dddm/19s1-dddm-adelaide/).

– You need to use svn to upload and run your source code in the web submission system

following Web-submission instructions stated at the end of this sheet. You should

attach your name and student number in your submission.

– Late submissions will attract a penalty: the maximum mark you can obtain will be

reduced by 25% per day (or part thereof) past the due date or any extension you are

granted.

Marking scheme:

– 12 marks for testing on 4 randomly generated tests: 3 marks per test, where 1 mark

is for the affinity matrix AA, and 2 marks for the clustered affinity matrix CA.

– 3 marks for the structure of your code.

If you have any questions, please send them to the student discussion forum. This way you

can all help each other and everyone gets to see the answers.

The assignment

In this assignment you are required to implement the Bond Energy Algorithm of vertical fragmentation.

Your code should contains two separate procedures AA Generator and CA Generator,

where AA Generator takes the input of all attributes of a relation, a set of queries and their

access frequencies at different sites, and produces the output of an affinity matrix AA, and CA

Generator takes input of an affinity matrix AA and produces a clustered affinity matrix CA. For

description of the BEA algorithm, definitions of AA and CA, please see lecture slides/textbook.

In this assignment, the Attribute Affinity is measured by the extended Otsuka-Ochiai coef-

ficient (https://en.wikipedia.org/wiki/Yanosuke Otsuka) instead of the traditional method described

in the textbook. The following equations show the details of the computation, where n

is the number of attributes, and m is the number of sites, Aik is the number of times Attribute

Ai

is accessed by Query qk, considering of all sites. For the result of division, you must round

off the decimals to four decimal digits.

Example

For AA Generator:

Input

The relation, called PROJ, has the following features Ai:

Label Name

A1 PNO

A2 PNAME

A3 BUDGET

A4 LOC

Queries (qi):

q1: SELECT BUDGET FROM PROJ WHERE PNO=Value

q2: SELECT PNAME, BUDGET FROM PROJ

q3: SELECT PNAME FROM PROJ WHERE LOC=Value

q4: SELECT SUM(BUDGET) FROM PROJ WHERE LOC=Value

Access frequency matrix ACC, where Si denotes the i-th site:

S1 S2 S3

q1 15 20 10

q2 5 0 0

q3 25 25 25

q4 5 0 0

Output

The attribute affinity matrix AA:

A1 A2 A3 A4

A1 45 0 45 0

A2 0 80 5 75

A3 45 5 53 3

A4 0 75 3 78

For CA Generator:

Input

The attribute affinity matrix AA:

A1 A2 A3 A4

A1 45 0 45 0

A2 0 80 5 75

A3 45 5 53 3

A4 0 75 3 78

Output

The clustered affinity matrix CA:

A1 A3 A2 A4

A1 45 45 0 0

A3 45 53 5 3

A2 0 5 80 75

A4 0 3 75 78

Web-submission instructions

First, type the following command, all on one line (replacing xxxxxxx with your student

ID):svn mkdir - -parents -m "DDDM"

https://version-control.adelaide.edu.au/svn/axxxxxxx/2019/s1/dddm/assignment1

Then, check out this directory and add your files:

svn co https://version-control.adelaide.edu.au/svn/axxxxxxx/2019/s1/dddm/assignment1

cd assignment1

svn add AAGenerator.cpp

svn add CAGenerator.cpp

svn commit -m "assignment1 solution"

Next, go to the web submission system at:

https://cs.adelaide.edu.au/services/websubmission/

Navigate to 2019, Semester 1, Distributed Databases and Data Mining, Assignment 1.

Then, click Tab “Make Submission” for this assignment and indicate that you agree to the

declaration. The automark script will then check whether your code compiles. You can

make as many resubmissions as you like. If your final solution does not compile you will

not get any marks for this solution.

Note:i. The auto-marker script compiles and runs the two cpp files named “AAGenerator.cpp”

and “CAGenerator.cpp” one by one.

ii. The auto-marker script will compile your AAGenerator.cpp and CAGenerator.cpp by

the following command:

g++ -std=c++11 AAGenerator.cpp -o runAA g++ -std=c++11 CAGenerator.cpp -o

runCA

iii. Your AAGenerator.cpp should accept three input text files in the order of Attributes

(att), Queries (query) and Access Frequencies (acc), which are randomly generated

by the system, then output and print the required attribute affinity matrix (AA).

Your CAGenerator.cpp should accept an input affinity matrix AA provided by the

system rather than reading your AAGenerator’s output AA, then output and print

the clustered affinity matrix (CA) as the output. In this way of testing AA and CA

separately, your marks will be maximized — you will receive marks for your correct

CAGenerator coding even if your AAGenerator produces incorrect AA.

iv. The file path and the file name in your local machine will not work with our websubmission

system.


版权所有:编程辅导网 2021 All Rights Reserved 联系方式:QQ:99515681 微信:codinghelp 电子信箱:99515681@qq.com
免责声明:本站部分内容从网络整理而来,只供参考!如有版权问题可联系本站删除。 站长地图

python代写
微信客服:codinghelp