联系方式

  • QQ:99515681
  • 邮箱:99515681@qq.com
  • 工作时间:8:00-21:00
  • 微信:codinghelp

您当前位置:首页 >> Algorithm 算法作业Algorithm 算法作业

日期:2019-10-02 10:09

STAT2020 PREDICTIVE ANALYTICS – PROJECT S2/2019

OVERVIEW

This assessment involves writing a report that summarises a statistical learning related investigation that you have

conducted on data that you have chosen yourself. The investigation must involve the main topics covered in the

course, most noticeably supervised learning and or unsupervised learning using R/RStudio.

It builds upon the practical knowledge that you must have acquired in the labs and other activities throughout the

course, however neither the dataset nor the detailed steps to be carried out will be provided here, you have to

make independent choices and decisions.

You will need to find your own data using good practices. Your dataset cannot be smaller than 1000 observations

of 5 variables.

Do not use data from textbooks or from R packages. Do not use data for which statistical learning results and

analyses can be found online; this will likely be the case for many datasets from websites such as UCI Machine

Learning Repository, Kaggle, OpenML, and other popular data science repositories alike, so these should be

avoided unless you make sure that something similar or equivalent to your project is not readily available, fully

or partially. You can use public data, but the data should be appropriate for addressing a relevant statistical

learning problem, and a solution to a similar problem for the same data should not be available.

You don’t need to solve this entire statistical learning problem in your investigation, but you need to clearly

indicate what the targeted problem would be about and how your project can reasonably contribute towards

addressing it.

You have to write a report with details about the problem in question, the data, the methods, results, analyses and

findings. You might like to look online for research papers for examples of how to shape your report. Obviously

many of these papers will have undergone extensive work to collect their data, we don’t expect that for you.

We also don’t expect you to win a Nobel prize with this assessment. Ideally, you will be able to demonstrate that:

(a) you have grasped important concepts associated with this course, most noticeably supervised and unsupervised

learning; and (b) you can communicate your investigation in a formal written manner.

Regarding (a), we expect that your investigation will include at least three of the following topics:

1. Decision trees for classification

2. Ensembles of decision trees for regression (bagging and/or random forests)

3. Principal Component Analysis (PCA)

4. Cluster Analysis

5. Unsupervised Outlier Detection

6. Support Vector Machines

7. Formal Quantitative Assessment of Results (e.g., cross-validation for model assessment and selection in

regression or classification, clustering evaluation and model selection, etc.)

8. Qualitative Assessment through Visualisation of Results (e.g. visualisation and interpretation of

clustering hierarchies, visual interpretation of PCA, etc.)

Regarding (b), we strongly encourage you to prepare your report using R Markdown, with all relevant code

chunks disclosed. If you do, you have to deliver (in Blackboard) both the R Markdown source file (.Rmd) and

the resulting file (.html or .PDF) that is produced by compiling/knitting the source code. If you don’t use R

Markdown, you will have to provide both your report in PDF format and your complete R code as an R script

source file (.R). Either way, make sure you include set.seed(0) in the beginning of your code, alongside with the

information about the version of R/RStudio that you have used. The markers are not supposed to run your code

or R Markdown file, but they may refer to these files while marking your report, if some clarification is needed.

REPORT FORMAT

The main body of the report (containing title, abstract, introduction, data, methods, results and discussion, and

conclusions) cannot exceed 8 (eight) single-column pages when printed in A4 format using R Markdown

default settings for font, font size, line spacing, margins, etc. A maximum of 2 (two) additional pages are

allowed for bibliographic references and appendices containing any supporting material that you may want to

include. Therefore, your report cannot exceed 10 (ten) A4 printed single-column pages in total. If you prepare

your report using a text editor such as MS Word or Latex, make sure you follow as close as possible (visual

inspection and common sense should suffice) the R Markdown default format in terms of font, spacing, margins,

etc., respecting the same aforementioned limits on number of pages. NOTE: Only the main body and references

will be formally assessed for grading, though additional material in appendices (if any) may help clarify issues

that can possibly arise during the marking process. Notice that relevant pieces of R code can and should be

displayed throughout the report, interwoven with the presentation, discussions and corresponding results, just like

in the weekly course materials. Further details about the report structure are provided in the following section.

REPORT STRUCTURE

The report should have the following sections marked clearly:

• Title: In today’s busy world, it is very important to make the most of your title. Make the title ‘eye-catching’,

informative and an accurate representation of the contents of the report.

• Abstract: The abstract provides a short sharp overview of the contents in the report and will be around 200 –

300 words. The abstract has five parts:

i. Introductory statement: background to the study, important issue(s) the report addresses.

(approximately 1 to 2 sentences)

ii. Purpose of the report: state the objectives (1-2 sentences)

iii. Methodological approach: overview the data and methods (2-3 sentences)

iv. Findings or Achievements: list one or two of the main findings or achievements from your

investigation (1-2 sentences)

v. Conclusions and Implications: what conclusions can be drawn from your investigation? How can

the findings/achievements in your report deliver a benefit to people, things, systems or processes?

(1-2 sentences)

• Introduction: The introduction sets the scene for the investigative efforts. It provides motivation for the work

and relevant background information and references that will enable the reader to put in context the key

objectives and achievements in your report. Address the important issues that have motivated your

investigation. At the end of the introduction clearly state the objectives of the report. Do not put any results

from your investigation in the introduction. Do not discuss details about the data and methods in this section.

Do not discuss your conclusions or key findings in the introduction.

• Data: This section should provide details about how the data was obtained, pre-processed (if applicable) and

what the data represent. You should include information such as:

i. What the source of the data is.

ii. How the data was originally collected (e.g. from an experiment or observational study).

iii. The sample size.

iv. The number and types of variables.

v. Any known interventions or pre-processing that precede the ones described in your report.

vi. Any interventions or pre-processing that you did as part of your report. NOTE: it is part of your

work to consider and possibly make interventions (e.g. variable rescaling / normalisation, etc.)

that may be required or recommended prior to application of a given statistical learning method.

vii. Any other information that is relevant to the understanding and assessment of your work/report.

• Methods: This section should summarise the statistical learning methods that were used to process and to

analyse the data, as well as the software version used to generate the results and report. To cite R-Studio type

RStudio.Version() from the command line. The methods should be appropriate to ensure that the objectives

of the report are met. You are strongly encouraged to interleave your text with key calls to R functions that

generated relevant results that you may want to highlight, just like the weekly course notes and labs. This can

be achieved straightforwardly using R Markdown. You can use R Markdown display control settings (e.g.

echo = FALSE) to hide chunks of code that you judge less relevant, but these must still be present in the

source code verification, if necessary. In the textual description, it is important to provide the sufficient level

of details so that your methodology could be repeated by an independent person, while being clearly and

objectively presented so that it can be understood without the need to check your complete R code.

• Results and Discussion: This section presents and discusses the results. The discussion centres on the outputs

from the statistical learning procedures that you have performed. For example, what are the main outcomes?

Why are they useful and what for? How are they interesting and why? Etc. In particular, how do the results

align with the goals set in the introduction? What are the main achievements and their implications?

• Conclusions: Final remarks about the key achievements of the investigations and what makes them

“interesting” or “useful”, right now or for future work. Achievements or findings should be contrasted with

the original objectives or hypotheses of the project. Make sure that you mention any limitations of your work

here. Limit the conclusions to no more than two or three paragraphs.

• References. List the sources your investigation has drawn from. Note that all references should be referred

to in the text.

• Appendices (optional): Add any supporting materials that might be useful to help assess your work.

FORMAT SUMMARY

The main body of the report must be presented in HTML or PDF using R Markdown default settings or

equivalent for font, font size, line spacing, margins, etc., on no more than 8 (eight) A4 single-column pages.

References and appendices can be listed on at most 2 (two) additional pages.

In total, the report cannot exceed 10 (ten) A4 printed single-column pages, to be uploaded in Blackboard.

WARNING: only the main body and the references will be formally assessed and graded.

IMPORTANT NOTE

The entire project must be accomplished using R/RStudio. Any calculations, visualisations, results, etc.

produced using software other than R/RStudio (e.g. Excel, Tableau, etc.) is not accepted and therefore will not

be assessed. Failure to comply with these requirements will incur in your work being considered as not

delivered. Use of R Markdown is not compulsory but you are encouraged to adopt it to prepare your report.

A WORD ON PLAGIARISM AND SELF-PLAGIARISM:

Plagiarism is the act of using another’s words, works or ideas from any source as one’s own. Plagiarism has no

place in a University. Student work containing plagiarised material will be subject to formal university processes.

In case significant portions of your own previous work (e.g. a report for a related course you did in this or any

other university) is recycled in a way that it could be fully or partially graded twice (“double-dipping”), this is

considered self-plagiarism and will not be tolerated.

MARKING SCHEME

Please adhere to the strict formatting requirements. The report will not be assessed if it is not formatted appropriately.

Total marks possible 120.

Dimension Sophisticated [100% marks] Competent [50% marks] Needs Work [0% marks]

Title

[2 marks]

The title is a concise (less than 20 words)

and accurate reflection of the contents of the

report. Author is listed below the title.

The title is a concise (less than 20 words)

and moderately reflects the contents of

the report. Author is listed.

The title is not informative or

exceeds the word length or Author

not listed.

Abstract

[6 marks]

Clearly addresses the five parts of the

abstract so that the reader has a clear

overview of the reports.

Partially addresses the five parts of the

abstract and or addresses all five parts

but the writing is not clear in places.

Unclear, does not overview the

report, or the writing is poor

overall and mostly unclear

Introduction

[16 marks]

Position and exceptions, if any, are clearly

stated. Organization of the argument is

completely and clearly outlined and

implemented.

Position is clearly stated. Organization

of argument is clear in parts or only

partially described and mostly

implemented.

Position is vague. Organization of

argument is missing, vague, or not

consistently maintained.

Data

[20 marks]

Data are suitable, the report explains how

the data were obtained, and all of the

following information items (whenever

applicable) are clearly explained:

i. What the source of the data is.

ii. How the data was originally collected

(e.g. from an experiment or

observational study).

iii. The sample size.

iv. The number and types of variables.

v. Any known interventions or preprocessing

that precede the ones

described in your report.

Data are suitable, the report explains how

the data were obtained, and most of the

applicable data information items are

addressed and reasonably explained.

Little information/explanation

about the data is provided and/or

the grammar structure is difficult

to follow and/or the data do not

meet the minimum requirements.

vi. Any interventions or pre-processing that

you did as part of your report.

vii. Any other information that is relevant to

the understanding and assessment of

your work/report.

Methods

[28 marks]

Lists all the steps in order in which they

were performed to explore, analyse and

obtain patterns and/or models from the data.

These steps, if executed appropriately and

interpreted appropriately, will ensure that

the objectives of the report are clearly met.

At least 3 of the following targeted key

topics from the course have been explored

and explained in depth:

1. Decision trees for classification

2. Ensembles of decision trees for

regression (bagging and/or random

forests)

3. Principal Component Analysis (PCA)

4. Cluster Analysis

5. Unsupervised Outlier Detection

6. Support Vector Machines

7. Formal Quantitative Assessment of

Results

8. Qualitative Assessment through

Visualisation of Results

Most of the steps are listed and

explained, but some details are a little

hazy or questionable. At least 2 of the

targeted key topics from the course

(listed in the leftmost column) have been

reasonably explored and explained.

The methods clearly will not allow

the objectives of the report to be

met and/or the details of

methodological steps and

procedures are very difficult to

follow and/or the listed key topics

from the course have been poorly

or not appropriately explored.

Results and

Discussion

[22 marks]

The results and discussion are explained

correctly, clearly, and in sufficient detail.

The results and discussion clearly follow

from the data collection and the methods.

The results and discussion are explained

correctly, clearly and in sufficient detail

most of the time. There exists a

connection of some type between the

results/discussion and the data collection

and methods.

One or more of the items

discussed in the middle column

are missing.

Conclusion

[10 marks]

The original objectives and/or hypotheses

are restated and contrasted against the

obtained achievements and/or findings.

The conclusion summarizes and draws a

clear, effective conclusion of the

investigation and enhances the impact of the

report – e.g., it provides a recommendation

or action that should be undertaken in the

future. It may also highlight unavoidable

limitations of the investigation.

Conclusion is clearly stated and

connections to the original objectives

and/or hypotheses are mostly clear.

Conclusion may not be clear

and/or the connections to the work

reported are incorrect or unclear or

just a repetition of the findings

without a suitable summarisation

and interpretation and/or the

underlying logic has major flaws.

Writing

[16 marks]

Report is coherently organized and the logic

is easy to follow. There are no spelling or

grammatical errors and terminology is

clearly defined. Writing is clear and concise

and persuasive.

Each Figure/Table will be numbered,

followed by a caption, and referred to in the

body of the text, most noticeably in the

results and/or discussion section. The

Figures/Tables provided reinforce the most

relevant achievements of the work.

All references have been listed and referred

to in the appropriate places in the body of

the text and listed at the end of the report. At

least 3 references have been provided.

Report is generally well organized and

most of the argument is easy to follow.

There are only a few minor spelling or

grammatical errors, or terms are not

clearly defined. Writing is mostly clear

but may lack conciseness.

Each Figure/Table will be numbered,

followed by a caption, and referred to in

the body of the text, most noticeably in

the results and/or discussion section.

Most references have been listed and

referred to in the appropriate places in

the body of the text and listed at the end

of the report. At least 3 references have

been provided.

Report is poorly organized and

difficult to read – does not flow

logically from one part to another.

There are several spelling and/or

grammatical errors; technical

terms may not be defined or are

poorly defined; figures/tables

and/or references are sloppy or

missing. Writing lacks clarity and

conciseness.


版权所有:编程辅导网 2021 All Rights Reserved 联系方式:QQ:99515681 微信:codinghelp 电子信箱:99515681@qq.com
免责声明:本站部分内容从网络整理而来,只供参考!如有版权问题可联系本站删除。 站长地图

python代写
微信客服:codinghelp