联系方式

  • QQ:99515681
  • 邮箱:99515681@qq.com
  • 工作时间:8:00-21:00
  • 微信:codinghelp

您当前位置:首页 >> Java编程Java编程

日期:2019-04-05 10:21

COSC 2123/1285 Algorithms and Analysis

Semester 1, 2019

Assignment 1: Closest Associates

Due date: 11:59pm Sunday, 14th of April, 2019

(Check Canvas and annoucements for latest due dates)

Weight: 15%

Pair (Group of 2) Assignment

1 Objectives

There are three key objectives for this assignment:

Understand how a real problem can be mapped to graphs and its operations.

Use a number of fundamental data structures to implement the graph abstract data type.

Evaluate and contrast the performance of the data structures with respect to different usage

scenarios and input data.

2 Background

Facebook, Twitter and LinkedIn are some examples of social networks that we commonly use everyday.

Indirect networks, such as email communications between people, can also reveal inherent relationships.

There are many interesting questions that can be asked from these social-related networks, e.g., who

are my closest friends, who is influential in spreading news (or gossip) and what are my relationship

groups and who belongs to them. All these questions can be studied when we abstract and represent

these social networks as graphs and perform analysis using graph operations and techniques from

social network analysis and network science.

When we represent these networks as a graph, the vertices in such a graph represent people and

edges can represent relationships or communications. There are many types of social-related networks,

in this assignment, we are interested in communication graphs. In a communication graph, the edges

are typically directed and represent one person communicating with another. They may also be

weighted, representing the level of communication between two people and implicitly, how close or

significant their relationship are. This can be determined via finding the k-nearest neighbours on such

graphs, which is described in more details below.

In class, we studied three methods to represent the graph, the adjacency list, adjacency matrix and

vertex/edge list representations. There is a fourth type of representation called incident matrix (see

below for details). The performance of each representation varies depending on the characteristics of

the graph. In this assignment, we will implement the adjacency list and incident matrix representations,

and evaluate on how well they perform when representing a communication graph. This will

assist with your understanding of the tradeoffs between data structure representations and its effect

when implementing operations or solving problems using algorithms and data structures.

2.1 Incidence Matrix Representation

The incidence matrix represents a graph as a set of vertices and list of edges incident to each vertex

as a 2D array/matrix. Incidence matrices are not generally used to represent weighted graphs, but in

this assignment we will use the following convention to do so. More formally, let the incidence matrix

of a directed graph is an n x m matrix A with n and m are the number of vertices and edges of the

graph respectively such that:

Ai,k = wk (entry in this matrix) if edge ek is a directed edge from vi to vj (meaning vi

is the

source of the edge) and ek has weight of wk.

Aj,k = wk (entry in this matrix) if edge ek is a directed edge from vi to vj (meaning vj is the

target of the edge) and ek has weight of wk.

Ai,k = 0 for all other non-incident edges.

For example, the following graph:

has its incidence matrix as below:

AB AC AD BC CD

A 2 1 4 0 0

B 2 0 0 3 0

C 0 1 0 3 2

D 0 0 4 0 2

Some extra resources about incidence matrix:

https://en.wikipedia.org/wiki/Incidence_matrix (note we use the opposite sign convention

to wikipedia for directed graphs)

http://mathonline.wikidot.com/incidence-matrices (note in their directed graph example,

there is a typo in columns 3 and 4 and column 7 doesn’t need to exist)

2.2 K-nearest Neighbours

In class, we discussed the concept of a neighbours of a vertex – all the vertices that are incident to

it. For directed graphs, recall there are two types, in-neighbours and out-neighbours. In-neighbours

of a vertex vj are all vertices that are connected (incident) to an edge coming into the vertex vj .

Out-neighbours of a vertex vi are all vertices that are connected (incident) to an edge coming out from

vertex vi

. Because the graphs are weighted, we can also have the concept of nearest in/out-neighbours,

where we order all the in/out neighbours and choose the k ones with highest edge weights.

As an example, in the example above for incidence matrix, we have the following neighbours:

In-neighbourhood of C is {A, B}.

Out-neighbourhood of C is {D}.

? In-neighbourhood of A is { } (empty set or no in-neighbours).

Out-neighbourhood of A is {D, C, B}.

1-nearest in-neighbourhood of C is {B}.

2-nearest out-neighbourhood of C is {D} (the number of neighbours less than k, so we return all).

2-nearest out-neighbourhood of A is {D, B}.

3 Tasks

The assignment is broken up into a number of tasks, to help you progressively complete the project.

Task A: Implement the Graph Representations and their Operations (8 marks)

In this task, you will implement the directed, weighted graph using the adjacency list and incidence

matrix representations. Each representations will be implemented by a data structure. Your implementation

should support the following operations:

Create an empty directed graph (implemented as a constructor that takes zero arguments).

Add a vertex to the graph.

Add an edge to the graph.

Get the weight of an edge in the graph.

Update weight of edge in the graph.

Delete a vertex from the graph.

Compute the k-nearest in-neighbours of a vertex in the graph.

Compute the k-nearest out-neighbours of a vertex in the graph.

Print out the set of vertices of the graph.

Print out the set of edges and their weights of the graph.

Data Structure Details

Graphs can be implemented using a number of data structures. You are to implement the graph

abstract data type using the following data structures:

Adjacency list, using an array of linked lists.

Incidence matrix, using a 2D array (an array of arrays).

For the above data structures, you must program your own implementation, and not use the LinkedList

or Matrix type of data structures in java.utils or any other libraries. You must implement your own

nodes and methods to handle the operations. If you use java.utils or other implementation from

libraries, this will be considered as an invalid implementation and attract 0 marks for that data

structure. The only exception is if you choose to implement a map of vertex labels to a row or column

index for the incidence matrix, you may use one of the existing Map classes to do this.

Operations Details

Operations to perform on the implemented graph abstract data type are specified on the command

line. They are in the following format:

<operation> [arguments]

where operation is one of {AV, AE, W, U, RV, IN, ON, PV, PE, Q} and arguments is for optional

arguments of some of the operations. The operations take the following form:

AV <vertLabel> – add a vertex with label ’vertLabel’ into the graph.

AE <srcLabel> <tarLabel> <weight> – add an edge with source vertex ’srcLabel’, target vertex

’tarLabel’ and edge weight ’weight’ into the graph.

W <srcLabel> <tarLabel> – return weight of edge. If edge doesn’t exist, return -1.

U <srcLabel> <tarLabel> <newWeight> – Update the weight of edge (’srcLabel’, ’tarLabel) to

’newWeight’ value. If ’newWeight’ = 0, then delete the edge.

RV <vertLabel> – remove vertex ’vertLabel’ from the graph.

IN <k> <vertLabel> – Return a set of k nearest in-neighbours for vertex ’vertLabel’. The

ordering of the neighbours does not matter. If k = -1, then all neighbours should be returned.

See below for the required format.

ON <k> <vertLabel> – Return a set of k nearest out-neighbours for vertex ’vertLabel’. The

ordering of the neighbours does not matter. If k = -1, then all neighbours should be returned.

See below for the required format.

PV – prints the vertex set of the graph. See below for the required format. The vertices can be

printed in any order.

PE – prints the edge set of the graph. See below for the required format. The edges can be

printed in any order.

Q – quits the program.

The format of the output of a neighbour operation for vertex ’A’ should take the form:

A <(neighbour1,weight1) (neighbour2, weight2) ...>

Each neighbour has its associated edge weight printed with it, e.g, neighbour1 and weight 1. If a

vertex has no neighbours, then the neighbour list should be empty.

The print vertex operation output the vertices in the graph in a single line. The line should specifies

all the valid vertex (indices) in the graph.

<vertex1> <vertex2> <vertex3> ...

The print edge operation output the edges in the graph in over a number of lines. Each line

specifies an edge in the graph, and should be in the following format:

<srcVertex> <tarVertex> <weight>

As an example of the operations, consider the output from the following list of operations:

AV A

AV B

AV C

AV D

AV E

AV F

AE A B 1

AE C B 1

AE B D 1

AE A E 3

AE D C 5

AE F A 2

ON 1 A

IN 1 F

W C B

W B C

W A D

U C B 4

U A B 0

RV D

AV G

PV

PE

Q

The output from the two neighbour operations (‘ON -1 A’, ‘IN -1 F’) should be:

A (B, 2 ) (E, 3 )F

The output from operations to retrieve edge weights (‘W C B’, ‘W B C’, ’W D C’) should be:

The output from the print vertices operation (PV) could be (remember that the order doesn’t

matter):

A B C E F G

The output from the print edges operation (P E) could be (remember that the order doesn’t

matter):

A E 3

C B 4

F A 2

Testing Framework

We provide Java skeleton code (see Table 1) to help you get started and automate the correctness

testing. You may add your own Java files to your final submission, but please ensure that they work

with the supplied Python testing script (see below).

In addition, we provide a Python script that automates testing, based on input files of operations

(such as example above). These are fed into the Java framework which calls your implementations.

file description

GraphEval.java Code that reads in operation commands from stdin then executes those

on the selected graph implementation. Also will format the output as

required. No need to modify this file.

AssociationGraph.java Interface class for the graph representations. It contains the common

interface/methods that you’ll need to implement for the various representations.

No need to modify this file.

AbstractAssocGraph.java Abstract class, that you can use to implement common functionality

among the representations.

AdjList.java Code that implements the adjacency list implementation of a graph.

Complete the implementation (implement parts labelled “Implement

me!”).

IncidenceMatrix.java Code that implements the incidence matrix implementation of a graph.

Complete the implementation (implement parts labelled “Implement

me!”).

MyPair.java Code that implements a pair class, as javafx is not available on the core

teaching server. No need to modify this file.

Table 1: Table of Java files.

The outputs resulting from any print operations are stored, then compared with the expected output.

We have provided two sample input and expected files for your testing and examination.

For our evaluation of the correctness of your implementation, we will use the same Python script

and input/expected files that are in the same format as the provide examples. To avoid unexpected failures,

please do not change the Python script nor GraphEval.java. If you wish to use the script for your

timing evaluation, make a copy and use the unaltered script to test the correctness of your implementations,

and modify the copy for your timing evaluation. Same suggestion applies for GraphEval.java.

Instructions on how the python script runs are available within the header of the script, on Canvas

and discussed in lectures.

Notes

We will run the supplied test script on your implementation on the university’s core teaching

servers. If you develop on your own machines, please ensure your code compiles and runs on

these machines. You don’t want to find out last minute that your code doesn’t compile on

these machines. If your code doesn’t run on these machines, we unfortunately do not have the

resources to debug each one and cannot award marks for testing.

All submissions should compile with no warnings on Oracle Java 1.8.

Test Data

For the next part, we provide an email graph of about 2000 vertices in a file called “assocGraph.csv”.

This is real, network of the amount of emial correspondence between people but to protect privacy we

do not state what organisation nor have the people names.

Task B: Evaluate your Data Structures (7 marks)

In this second task, you will evaluate your two implemented structures in terms of their time complexities

for the different operations and different use case scenarios. Scenarios arise from the possible

use cases of a social/communication network.

Write a report on your analysis and evaluation of the different implementations. Consider and

recommend in which scenarios each type of implementation would be most appropriate. The report

should be 8 pages or less, in font size 12. See the assessment rubric (Appendix A) for the criteria

we are seeking in the report.

Use Case Scenarios

Typically, you use real usage data to evaluate your data structures. However, for this assignment,

you will write data generators to enable testing over different scenarios of interest. We are also

interested in the effect of the density of the graph1 on these scenarios. There are many possibilities,

but for this assignment, consider the following scenarios:

Scenario 1 Shrinking graph (Removals): Although not often occurring for association graph,

people do defriend/de-associate with each other. In this scenario, you are to evaluate the performance

of your implementations in terms of:

vertex removal

edge removal

Assume the graph that you start with is the one that we provided you with. You are to evaluate the

performance the vertex and removal operations as the density of the initial graph is varied.

Scenario 2 Nearest Neighbours: In this scenario, the graph is not changing, but important operations

such as neighbourhood are requested.

Assume the graph that you start with is the one that we provided you with. You are to evaluate the

performance of the the nearest neighbourhood implementations, for both in and out neighbourhoods,

as the density of the initial graph and k are varied.

Scenario 3 Changing associations (Update edge weights): In this scenario, associates between

people are fluctuating and the corresponding edge weights are changing. In this scenario, you are to

evaluate the performance of your implementations in terms of:

edge weight changes, both increases and decreases (but never enough to cause a weight to be 0

or less and subsequently be deleted).

Assume the graph that you start with is the one that we provided you with. You are to evaluate

the performance the edge weight operations as the density of the initial graph is varied.

Data Generation

When generating the vertices and edges to remove, find neighbourhood and update edge weights for,

the distribution of these elements, compared to what is in the graph already, will have an effect on

the timing performance. However, without the usage and query data, it is difficult to specify what

this distributions might be. Instead, in this assignment, uniformly sample from a fixed range, e.g.,

1Density = number of edges

number of vertices2

0 to max vertex index of your graph when generating the vertices and edges for removing, nearest

neighbourhoods and updating weights, and a different range when adding vertices (we do not want to

repeatingly add vertices that are in the graph already).

For generating graphs with different initial densities, you may want to either generate a series of

add edge operations (’AE’) to grow the graph to the desired density from the one we supplied, then

evaluate for the appropriate scenario. Alternatively, you can consider writing a data generator within

Java to insert directly into the data structures. Or can use one of the many great graph generators2

Whichever method you decide to use, remember to generate graphs of different densities to evaluate

on. Due to the randomness of the data, you may wish to generate a few datasets with the same

parameters settings (same graph density and a scenario) and take the average across a number of

runs.

Analysis

In your analysis, you should evaluate each of your representations and data structures in terms of the

different scenarios outlined above.

Note, you may be generating and evaluating a significant number of datasets, hence we advise you to

get started on this part relatively early.

4 Report Structure

As a guide, the report could contain the following sections:

Explain your data generation and experimental setup. Things to include are (brief) explanations

of the generated data you decide to evaluate on, the density parameters you tested on, describe

how the scenarios were generated (a paragraph and perhaps a figure or high level pseudo code

suffice), which approach(es) you decide to use for measuring the timing results, and briefly

describe the fixed set(s) you used to generate the elements for vertex addition.

Evaluation of the data structures using the generated data. Analyse, compare and discuss your

results across different densities, representations and scenarios. Provide your explanation on

why you think the results are as you observed. You may consider using the known theoretical

time complexities of the operations of each data structure to help in your explanation.

Summarise your analysis as recommendations, e.g., for this certain data scenario of this density,

I recommend to use this data structure because... We suggest you refer to your previous analysis

to help.

5 Submission

The final submission will consist of three parts:

Your Java source code of your implementations. Your source code should be placed into in a

flat structure, i.e., all the files should be in the same directory/folder, and that directory/folder

should be named as Assign1-<partner 1 student number>-<partner 2 student number>.

Specifically, if your student numbers are s12345 and s67890, then all the source code files should

be in folder Assign1-s12345-s67890.

2See https://networkx.github.io/documentation/stable/reference/generators.html, under heading Random Graphs,

select one of them. Note this is in Python.

? All folder (and files within) should be zipped up and named as Assign1-<partner 1 student

number>-<partner 2 student number>.zip. E.g., if your student numbers are s12345 and

s67890, then your submission file should be called Assign1-s12345-s67890.zip, and when we

unzip that zip file, then all the submission files should be in the folder Assign1-s12345-s67890.

Your written report for part B in PDF format, called “assign1.pdf”. Place this pdf within

the Java source file directory/folder, e.g., Assign1-s12345-s67890.

Your data generation code. Create a sub-directory/sub-folder called “generation” within the

Java source file directory/folder. Place your generation code within that folder. We will not run

the code, but will examine their contents.

? Your group’s contribution sheet. See the following ‘Team Structure’ section for more details.

This sheet should also be placed within your source file folder.

Note: submission of the report and code will be done via Canvas. More detailed instructions

will be provided closer to submission date.

6 Assessment

The project will be marked out of 15. Late submissions will incur a deduction of 1.5 marks per day,

and no submissions will be accepted 7 days beyond the due date.

The assessment in this project will be broken down into two parts. The following criteria will be

considered when allocating marks.

Implementation (8/15):

You implementation will be assessed on whether they are adjacency lists and incidence matrices,

respectively, and on the number of tests it passes in our automated testing.

While the emphasis of this project is not programming, we would like you to maintain decent

coding design, readability and commenting, hence these factors will contribute towards your

marks.

Report (7/15):

The marking sheet in Appendix A outlines the criteria that will be used to guide the marking of

your evaluation report. Use the criteria and the suggested report structure (Section 4) to inform you

of how to write the report.

7 Team Structure

This project should be done in pairs (group of two). If you have difficulty in finding a partner, post

on the discussion forum or contact your lecturer. If there are issues with work division and workload

in your group, please contact your lecture as soon as possible.

In addition, please submit what percentage each partner made to the assignment (a contribution

sheet will be made available for you to fill in), and submit this sheet in your submission. The contributions

of your group should add up to 100%. If the contribution percentages are not 50-50, the

partner with less than 50% will have their marks reduced. Let student A has contribution X%, and

student B has contribution Y%, and X > Y . The group is given a group mark of M. Student A will

get M for assignment 1, but student B will get M

.

8 Plagiarism Policy

University Policy on Academic Honesty and Plagiarism: You are reminded that all submitted project

work in this subject is to be the work of you and your partner. It should not be shared with other

groups. Multiple automated similarity checking software will be used to compare submissions.

It is University policy that cheating by students in any form is not permitted, and that work

submitted for assessment purposes must be the independent work of the students concerned. Plagiarism

of any form may result in zero marks being given for this assessment and result in disciplinary

action.

For more details, please see the policy at http://www1.rmit.edu.au/students/academic-integrity.

9 Getting Help

There are multiple venues to get help. There are weekly consultation hours (see Canvas for time and

location details). In addition, you are encouraged to discuss any issues you have with your Tutor

or Lab Demonstrator. We will also be posting common questions on the project 1 Q&A section on

Canvas and we encourage you to check and participate in the discussion forum on Canvas. However,

please refrain from posting solutions.

A Marking Guide for the Report

Design of Evaluation Analysis of Results Report Clarity and Structure

(Maximum = 1.5 marks) (Maximum = 4 marks) (Maximum = 1.5 marks)

1.5 marks 4 marks 1.5 marks

Data generation is well

designed, systematic and well

explained. All suggested

scenarios, data structures and a

reasonable range of densities

were evaluated. Each type of

test was run over a number of

runs and results were averaged.

Analysis is thorough and demonstrates

understanding and critical

analysis. Well-reasoned explanations

and comparisons are

provided for all the data structures,

scenarios and densities.

All analysis, comparisons and

conclusions are supported by

empirical evidence and possibly

theoretical complexities. Well

reasoned recommendations are

given.

Very clear, well structured and

accessible report, an undergraduate

student can pick up the report

and understand it with no

difficulty.

1 marks 3 marks 1 marks

Data generation is reasonably

designed, systematic and

explained. There are at least

one obvious missing suggested

scenarios, data structures or

reasonable densities. Each type

of test was run over a number of

runs and results were averaged.

Analysis is reasonable and

demonstrates good understanding

and critical analysis.

Adequate comparisons and

explanations are made and

illustrated with most of the suggested

scenarios and densities.

Most analysis and comparisons

are supported by empirical

evidence and possibly theoretical

analysis. Reasonable

recommendations are given.

Clear and structured for the

most part, with a few unclear minor

sections.

0.5 mark 2 marks 0.5 mark

Data generation is somewhat

adequately designed, systematic

and explained. There are

several obvious missing

suggested scenarios, data

structures or reasonable

densities. Each type of test may

only have been run once.

Analysis is adequate and demonstrates

some understanding and

critical analysis. Some explanations

and comparisons are given

and illustrated with one or two

scenarios and densities. A portion

of analysis and comparisons

are supported by empirical evidence

and possibly theoretical

analysis. Adequate recommendations

are given.

Generally clear and well structured,

but there are notable gaps

and/or unclear sections.

0 marks 1 mark 0 marks

Data generation is poorly

designed, systematic and

explained. There are many

obvious missing suggested

scenarios, data structures or

reasonable densities. Each type

of test has only have been run

once.

Analysis is poor and demonstrates

minimal understanding

and critical analysis. Few explanations

or comparisons are

made and illustrated with one

scenario and density setting. Little

analysis and comparisons are

supported by empirical evidence

and possibly theoretical analysis.

Poor or no recommendations are

given.

The report is unclear on the

whole and the reader has to work

hard to understand.


版权所有:编程辅导网 2021 All Rights Reserved 联系方式:QQ:99515681 微信:codinghelp 电子信箱:99515681@qq.com
免责声明:本站部分内容从网络整理而来,只供参考!如有版权问题可联系本站删除。 站长地图

python代写
微信客服:codinghelp