联系方式

  • QQ:99515681
  • 邮箱:99515681@qq.com
  • 工作时间:8:00-21:00
  • 微信:codinghelp

您当前位置:首页 >> Python编程Python编程

日期:2023-10-21 10:47

BUSS6002 Assignment

October 10, 2023

Instructions

• Due: at 23:59 on Friday, October 27, 2023 (end of week 12).

• You must submit a written report (in PDF) with the following filename format, replacing

STUDENTID with your own student ID: BUSS6002 STUDENTID.pdf.

• You must also submit a Jupyter Notebook (.ipynb) file with the following filename format,

replacing STUDENTID with your own student ID: BUSS6002 STUDENTID.ipynb.

• There is a limit of 6 A4-pages for your report (including equations, tables, and captions).

• All plots, computational tasks, and results must be completed using Python.

• Each section of your report must be clearly labelled with a heading.

• Do not include any Python code as part of your report.

• All figures must be appropriately sized and have readable axis labels and legends (where

applicable).

• The submitted .ipynb file must contain all the code used in the development of your report.

• The submitted .ipynb file must be free of any errors, and the results must be reproducible.

• You may submit multiple times but only your last submission will be marked.

• A late penalty applies if you submit your assignment late without a successful special consideration. See the Unit Outline for more details.

• Generative AI tools (such as ChatGPT) may be used for this assignment but you must add a

statement at the end of your report specifying how generative AI was used. E.g., Generative

AI was used only used for editing the final report text.

• Hint! It is highly recommended that you finish the week 10 tutorial before starting this

assignment.

1

Description

In this assignment, you are conducting a study that compares the empirical performance between

two families of basis functions for linear basis function (LBF) models: polynomial basis functions

and radial basis functions. The aim is to investigate which family of basis functions is better suited

for approximating highly nonlinear relationships between two scalar-valued variables.

More specifically, you are given four benchmark datasets: A, B, C, and D. Each dataset contains 5,000 observations of the the response and predictor variables, which are named y and x,

respectively. A scatter plot of each dataset is shown in Figure 1. Your task is to compare the performance between polynomial and radial basis function regression models on each of the datasets.

Figure 1: Benchmark Datasets

The LBF model being considered in your study is given by

y = ϕ(x)

⊤β + ε,

where ϕ(x) := [1, ϕ1(x), . . . , ϕp(x)]⊤, β := [β0, β1, . . . , βp]

⊤, and ε is a random noise. For the

set of basis functions {ϕi}

p

i=1, two choices are being investigated: the first choice is the family of

polynomial basis functions,

ϕi(x) := x

i

,

and the second choice is the family of radial basis functions,

ϕi(x) := exp (

(x −

i

p+1 )

2

2s

2

)

.

2

Before comparing the two basis function families, you must set the value of p for the polynomial regression model, as well as the values of p and s for the radial basis function regression model. These

hyperparameter values should be selected for each dataset, using a validation set, by minimising

the validation mean squared error (MSE).

In your study, the optimal value of p (for each basis function family) should be selected by

exhaustively searching through an equally-spaced grid from 1 to 10, with a spacing of 1:

P := {1, 2, 3, . . . , 10}.

For the radial basis functions, in addition to selecting p, you should also select the optimal value

of s by exhaustively searching through another equally-spaced grid from 0.1 to 1, with a spacing

of 0.1:

S := {0.1, 0.2, 0.3, . . . , 1}.

That is, for each dataset, the optimal values must be determined for three hyperparameters in

total: ppol ∈ P, prad ∈ P, and s ∈ S, where ppol denotes the number of polynomial basis functions

(i.e., the degree of the polynomial) and prad denotes the numbers of radial basis functions.

Once the optimal values of the hyperparameters are chosen for both basis function families,

you will be able to compare the performance between the two using a test set (i.e., by comparing

the test MSE between the two optimally selected models).

The files containing the datasets are listed in Table 1, which can be downloaded from the unit’s

Canvas site. In each file, the dataset is organised as comma separated values, with each row being

an observation and each column being a variable. The response values are on the first column and

the corresponding predictor values on are the second column.

File Description

dataset-a.csv Benchmark dataset A

dataset-b.csv Benchmark dataset B

dataset-c.csv Benchmark dataset C

dataset-d.csv Benchmark dataset D

Table 1: Files Provided

3

Report Structure

Your report must contain the following four sections:

1 Introduction (0.5 pages)

– Provide a brief project background so that the reader of your report can understand

the general problem that you are solving.

– Motivate your research question.

– State the aim of your project.

– Provide a short summary of each of the rest of the sections in your report (e.g., “The

report proceeds as follows: Section 2 presents . . . ”).

2 Methodology (2 pages)

– Define and describe the LBF model.

– Define and describe the two choices of basis function families being investigated.

– Describe how the parameter vector β is estimated given the hyperparameter value(s).

Mention any potential numerical issues associated with the estimation procedure.

– Describe how the hyperparameter value(s) can be determined automatically from data

(as opposed to manually setting the hyperparameters to arbitrary values).

– Describe how the performance of the two families of basis functions is compared given

the optimal hyperparameter value(s).

3 Empirical Study (2.5 pages)

– Describe the benchmark datasets used in your study.

– Describe in detail the procedure that you followed to obtain the empirical results, including any computational challenges that you may have encountered. You may refer

to details in Section 2 to avoid repetition in your writing.

– Present (in a table) the optimal hyperparameter values selected for each dataset and

for each basis function family.

– Discuss the table of selected hyperparameters.

– Visually present (using plots) the predicted response values under each basis function

family for each dataset.

– Discuss the plots of predicted values.

– Present (in a table) the test MSE under each basis function family for each dataset.

– Discuss the table of test MSE values.

4 Conclusion (0.5 pages)

– Discuss your overall findings / insights.

– Discuss any limitations of your study.

– Suggest potential directions of extending your study.

4

Rubric

This assignment is worth 30% of the unit’s marks. The assessment is designed to test your computational skills in implementing algorithms and conducting empirical experiments, as well as your

communication skills in writing a concise and coherent report presenting your approach and results.

The mark allocation across assessment items is given in Table 2.

Assessment Item Goal Marks

Section 1 Introduction 4

Section 2 Methodology 10

Section 3 Empirical Study 16

Section 4 Conclusion 3

Overall Presentation Clear, concise, coherent, and correct 5

Jupyter Notebook Reproducable results 2

Total 40

Table 2: Assessment Items and Mark Allocation

5


版权所有:编程辅导网 2021 All Rights Reserved 联系方式:QQ:99515681 微信:codinghelp 电子信箱:99515681@qq.com
免责声明:本站部分内容从网络整理而来,只供参考!如有版权问题可联系本站删除。 站长地图

python代写
微信客服:codinghelp