联系方式

  • QQ:99515681
  • 邮箱:99515681@qq.com
  • 工作时间:8:00-21:00
  • 微信:codinghelp

您当前位置:首页 >> Java编程Java编程

日期:2023-04-20 09:22

COMP8410 Data Mining S1 2023


Assignment 2

Maximum marks 100

Weight 25% of the total marks for the course

Length

Maximum of 10 pages excluding cover sheet, bibliography and

appendices.

Layout

A4 margin, at least 11-point type size, use of typeface, margins

and headings consistent with a professional style.

Submission deadline 9:00am, Monday, 8 May

Submission mode Electronic, via Wattle

Estimated time 15 hours

Penalty for lateness 100% after the deadline has passed

First posted: 27th March, 1:00 AM

Last modified: 27th March, 1:00 AM

Questions to: Wattle Discussion Forum


This assignment specification may be updated to reflect clarifications and modifications after it is first issued.

It is strongly suggested that you start working on the assignment right away. You can submit as many times

as you like. Only the most recent submission at the due date will be assessed.

In this assignment, you are required to submit a single report in the form of a PDF file. You may also attach

supporting information (appendices) as one or more identified sections at the end of the same PDF file.

Appendices will not be marked but may be treated as supporting information to your report. Please use a

cover sheet at the front that identifies you as author of the work using your u-number and name and

identifies this as your submission for COMP8410 Assignment 2. The cover sheet and appendices do not

contribute to the page limit.

You are expected to write in a style appropriate to a professional report. You may refer to

http://www.anu.edu.au/students/learning-development/writing-assessment/report-writing for some

stylistic advice. You are expected to have both an introduction and a conclusion in your report.

No particular layout is specified, but you should follow use no smaller than 11-point typeface and stay within

the maximum specified page count. Page margins, heading sizes, paragraph breaks and so forth are not

specified but a professional style must be maintained. Text beyond the page limit will be treated as non-

existent.

This is a single-person assignment and should be completed on your own. Make certain you carefully

reference all the material that you use, although the nature of this assignment suggests few references will

be needed. It is unacceptable to cut and paste another author's work and pass it off as your own. Anyone

found doing this, from whatever source, will get a mark of zero for the assignment and, in addition, CECC

procedures for plagiarism will apply.


No particular referencing style is required. However, you are expected to reference conventionally,

conveniently, and consistently. References are not included in the page limit. Due to the context in which

this assignment is placed, you may refer to the course notes or course software where appropriate (e.g. “For

this experiment Rattle was used”), without formal reference to original sources, unless you copy text which

always requires a formal reference to the source. You do not need to reference this specification.

An assessment rubric is provided. The rubric will be used to mark your assignment. You are advised to use it

to supplement your understanding of what is expected for the assignment and to direct your effort towards

the most rewarding parts of the work.

Your submission will be treated confidentially. It will be available to ANU staff involved in the course for the

purposes of marking. It may be shared, de-identified, as an exemplar for other students.


Task

You are to study the supplied data set and to apply data mining processes and techniques to discover

interesting things about the data. You are to write a short report that justifies and explains your methods in

detail, presents your results, and evaluates and interprets the results you find. In the following, the task is

described in terms of what your report should contain, not in terms of the steps you should take to carry out

the assignment. In your report, similarly, you should describe the methods used in terms of the language of

data mining, not in the terms of commands you typed or buttons you selected.

1. Introduce the problem

You must provide some context to the data mining project you are working on. You may refer to the purpose

of learning and assessment for COMP8410, but in addition you should set some goals for the exercise – what

do you expect to learn from the data? What are you looking for? It is possible that you may not achieve the

goals you set here, but it should be possible to trace the results you present back to the goals as motivating

questions. Furthermore, you should review the goals you state here in your conclusion.

2. Describe your data

You must

identify the source of the data and the population over which the data is sampled,

broadly describe the attributes in the data,

offer a cursory assessment of data quality, and

include a basic statistical summary of the data you have.

This should comprise a brief description of the data necessary to explain the context for the work presented

here in a self-contained way, although for more detail it might refer to information provided with this

assignment specification or elsewhere.

3. Describe your methods

You are encouraged to use Rattle or R for this assignment. You may use external tools instead for part or all

of the work (e.g. you might prefer to use python for data pre-processing). Use of alternative tools may make

your explanations of methods more wordy, your methods more difficult to reproduce, and your assignment

harder to mark, so take this into account. You will not be awarded marks for methods where your method

cannot be understood.

You must use at least two clearly distinct data mining methods as taught in this course. The distinct

methods should be diverse with respect to both: i. different data mining problems like classification, numeric


prediction, association rule mining etc. and ii. different algorithms like NN, DT, etc. You may additionally use

other methods taught in this course. Further, you may choose to use some methods not addressed in this

course. You must justify your choice of methods with reference to the data types involved, the questions

you are looking to answer, the benefit of application to practice, computational feasibility, experimentation

experience, or other reasons.


Application of some methods, or addressing particular questions, may require you to pre-process the data in

some way. For example, if you are looking to predict outcomes independently of time, you could consider

removing time attributes from the dataset. You must include either a statement that no such pre-processing

was performed or else brief information on any

removal of provided data from consideration,

imputation or other transformation, or

differences in the basic data summary from that you provided for the original data.

Data pre-processing can be a never-ending task. Be careful to exercise your judgement on how much you do

here, taking account of the marking rubric.


Your description must be sufficient for a reasonably competent professional in the field to reproduce your

major results. You may choose to attach detailed specifications or configuration parameters as an appendix

(which does not contribute to the word count). If you are using methods that were not taught in the course

it would normally be necessary to provide extra detail over that that can be assumed for methods taught in

the course. Extended technical detail may be included in an appendix or by well-chosen references that

contain enough information to implement the technique.

4. Present your results

You must explain what you found. This should not be a complete listing of everything you found. You should

select results that are interesting, surprising, explanatory, answer your initial questions, or are otherwise

meaningful, and explain why they are meaningful. Your selected results must be supported by appropriate

objective quality measures and must be subjectively interpreted within the context of the problem context

you gave. Your interpretation must be pitched towards an expert in the field related to the data source and

business problem but who may not be an expert in data mining. You might consider using diagrams to assist

but use your judgement about any added value of diagrams.

5. Conclude with opportunities for application of your results and identification of further work

Here you should write about the significance of your results and the challenge (or not) of using the results to

make changes in the practice for which your data was collected. This analysis should be made in the context

of the goals you set in your introduction, and you can afford to speculate about possible impacts of what you

found.

You are not expected to be an expert in the area of application, nor to solve challenges you might raise with

putting your results into practice. Identifying further work may include identifying additional data that could

be used to refine the results you found, or alternative methods that should be tried with additional

resources.


Assessment Rubric

This rubric will be used to mark your assignment. You are advised to use it to supplement your understanding of what is expected for the assignment and to

direct your effort towards the most rewarding parts of the work. Your assignment will be marked out of 100, and marks will be scaled back to contribute to

the defined weighting for assessment of the course.


相关文章

版权所有:编程辅导网 2021 All Rights Reserved 联系方式:QQ:99515681 微信:codinghelp 电子信箱:99515681@qq.com
免责声明:本站部分内容从网络整理而来,只供参考!如有版权问题可联系本站删除。 站长地图

python代写
微信客服:codinghelp