联系方式

  • QQ:99515681
  • 邮箱:99515681@qq.com
  • 工作时间:8:00-21:00
  • 微信:codinghelp

您当前位置:首页 >> Algorithm 算法作业Algorithm 算法作业

日期:2021-10-03 11:07

STAT3006 Assignment 3—Classification

Due Date: 15th October 2021

Weighting: 30%

Instructions

The assignment consists of three (3) problems, each problem is worth 10 marks, and

each mark is equally weighted.

The mathematical elements of the assignment can be completed by hand, in LaTeX (prefer-

ably), or in Word (or other typesetting software). The mathematical derivations and ma-

nipulations should be accompanied by clear explanations in English regarding necessary

information required to interpret the mathematical exposition.

Computation problems can be answered using your programming language of choice, al-

though R is generally recommended, or Python if you are uncomfortable with R. As with

the mathematical exposition, you may choose to typeset your answers to the problems in

whatever authoring or word processing software that you wish. You should also maintain a

copy of any codes that you have produced.

Computer generated plots and hand drawn graphs should be included together with the text

where problems are answered.

The assignment will require four (4) files containing data, that you can can download from the

Assignment 3 section on Blackboard. These files are: p2_1ts.csv, p2_1cl.csv, p2_2ts.csv,

p3_1x.csv, p3_1y.csv, and data_bank_authentification.txt.

Submission files should include the following (which ever applies to you):

– Scans of handwritten mathematical exposition.

– Typeset mathematical exposition, outputted as a pdf file.

– Typeset answers to computational problems, outputted as a pdf file.

– Program code/scripts that you wish to submit, outputted as a txt file.

1

All submission files should be labeled with your name and student number and

archived together in a zip file and submitted at the TurnItIn link on Blackboard.

We suggest naming using the convention:

FirstName_LastName_STAT3006A3_[Problem_XX/Problem_XX_Part_YY].[FileExtension].

As per my.uq.edu.au/information-and-services/manage-my-program/student-in

tegrityand-conduct/academic-integrity-and-student-conduct, what you submit

should be your own work. Even where working from sources, you should endeavour to write

in your own words. You should use consistent notation throughout your assignment and

define whatever is required.

Problem 1 [10 Marks]

Let X ∈ X = [0, 1] and Y ∈ {0, 1}. Further, suppose that

piy = P (Y = y) = 1/2

for both y ∈ {0, 1}, and that the conditional distributions of [X|Y = y] are characterized by the

probability density functions (PDFs):

f (x|Y = 0) = 2? 2x

and

f (x|Y = 1) = 2x.

Part a [2 Marks]

Consider the Bayes’ classifier for Y ∈ {0, 1} is

r (x) =1 if τ1 (x) > 1/2,0 otherwise,

where

τ1 (x) = P (Y = 1|X = x) .

Derive the explicit form of τ1 (x) in the current scenario and plot τ1 (x) as a function

of x.

2

Part b [2 Marks]

Define the classification loss function for a generic classifier r : X→ {0, 1} as

` (x, y, r (x)) = Jr (x) 6= yK ,

where ` : X× {0, 1} × {0, 1}, and consider the associated risk

L (r) = E (Jr (X) 6= Y K) .

It is known that the Bayes’ classifier is optimal in that it minimizes the classification risk, that is

L (r?) ≤ L (r) .

In the binary classification case,

L (r) = E (min {τ1 (X) , 1? τ1 (X)}) = 12 1

E (|2τ1 (X) 1|) .

Calculate L (r) for the current scenario.

Part c [2 Marks]

Assume now that pi1 ∈ [0, 1] is now unknown. Derive an expression for L (r?) that depends

on pi1.

Part d [2 Marks]

Assume again that pi1 ∈ [0, 1] is unknown.

Then, assuming that pi0 = pi1 = 1/2,

Consider now that pi1 ∈ [0, 1] is unknown, as are f (x|Y = 0) and f (x|Y = 1). That is, we only

know that f (·|Y = y) : X → R is a density function on X = [0, 1], for each y ∈ {0, 1}, in sense

that f (x|Y = y) ≥ 0 for all x ∈ X and that ∫X f (x|Y = y) dx = 1.

3

Using the expressions from Part d, deduce the minimum and maximum values of L (r?)

and provide conditions on pi1, f (·|Y = 0) and f (·|Y = 1) that yield these values.

Problem 2 [10 Marks]

Suppose that we observe an independent and identically distributed sample of n = 300 random

pairs (Xi, Yi), for i ∈ [n], where Xi = (Xi1, . . . , Xid) is a mean-zero time series of length d = 100

and Yi ∈ {1, 2, 3} is a class label. Here, Xit is the observation of time series i ∈ [n] at time t ∈ [d]

and we may say that Xi ∈ X = Rd.

We assume that the label Yi, for i ∈ [n], is such that each class occurs in the general population

with unknown probability

piy = P (Yi = y) ,

for each y ∈ {1, 2, 3}, where ∑3y=1 piy = 1. Further, we know that Xit is first-order autoregressive,

in the sense that the distribution of [Xi|Y = y] can be characterized by the fact the conditional

probability densities

f (xir|Xi1 = xi1, Xi2 = xi2, . . . , Xi,r?1 = xi,r?1, Yi = y) = φ

(

xir; βyxi,r?1, σ2y

)

,

where xi = (xi1, . . . , xid) is a realization of Xi, and for each y ∈ {1, 2, 3}, σ2y ∈ (0,∞) and

βy ∈ [?1, 1].

is the univariate normal probability density function with mean μ ∈ R and variance σ2 ∈ (0,∞).

Part a [2 Marks]

Let (X, Y ) arise from the same population distribution as (X1, Y1). Using the information above,

derive expressions for the a posteriori probabilities

τy (x;θ) = P (Y = y|X = x) ,

for each y ∈ {1, 2, 3}, as functions of the parameter vector


版权所有:编程辅导网 2021 All Rights Reserved 联系方式:QQ:99515681 微信:codinghelp 电子信箱:99515681@qq.com
免责声明:本站部分内容从网络整理而来,只供参考!如有版权问题可联系本站删除。 站长地图

python代写
微信客服:codinghelp