联系方式

  • QQ:99515681
  • 邮箱:99515681@qq.com
  • 工作时间:8:00-21:00
  • 微信:codinghelp

您当前位置:首页 >> Algorithm 算法作业Algorithm 算法作业

日期:2019-04-12 09:44

FIT5197 2019 S1 Assignment 1 (25 marks)

18 March 2019

Contents

1 Details 2

2 Probabilities in Cards (2 marks) 3

2.1 A special flush (1 mark) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3

2.2 No repeats (1 mark) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3

3 PDF and Expectations (3 Marks) 3

3.1 Plot (1/2 mark) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3

3.2 Mean (1/2 mark) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4

3.3 Variance (1 mark) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4

3.4 Skewness (1 mark) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4

4 Distributions (2 marks) 4

4.1 Model (1 mark) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4

4.2 Checking (1 mark) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4

5 Entropy (3 Marks) 4

5.1 Conditional probabilities (1 mark) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5

5.2 Entropies (1 marks) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5

5.3 Coding (1 mark) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5

6 Maximum likelihood estimation of parameters (3 marks) 5

6.1 Model (1 mark) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5

6.2 Maximum likelihood fitting (2 marks) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6

7 Central Limit Theorem (7 marks) 6

7.1 Sampling distribution (2 marks) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6

7.2 Simulation (2 marks) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7

7.3 Plotting normality (3 marks) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7

Submission due date: by 11:59pm on Friday 12 April 2019 (end of Week 6)

1

1 Details

Marks

This assignment contains 6 questions. There are 25 marks in total for the assignment and it counts for 25%

of your overall grade for the unit. Also, 3 of the 25 marks are awarded for code quality and 2 of the marks

awarded for presentation of results, for instance use of plots. That leaves 20 marks for individual answers.

You must show all working, so we can see how you obtain your answer. Marks are assigned for working as

well as the correct answer.

Your solutions

Please put your name or student number on the first page of your solutions. Do not copy questions in your

solutions. Only include question numbers. If you use any external resources for developing your results,

please remember to provide the link to the source.

If an extension has been given then submission after the due date is allowed with no penalty being

incurred. If no extension has been given then assignments submitted after the due date, there will be penalised

5% per day up to a maximum of 10 days late.

Submitting your assignment on Moodle

Please submit your assignment through Moodle via upload a Word or PDF document as well as R markdown

you used to generate results.

If you choose to use R markdown file to directly knit Word/PDF document, you would need to type in

Latex equations for Question 1,2 and 5. Find more information about using latex in R markdown files

here. You may also find the R markdown cheatsheet useful.

You can also work with Word and R markdown separately. In this case you would need to type your

answers in Word and also copy R code (using the format: Courier New), results and figures to the

Word document.

We will mark your submission mostly using your Word/PDF document. However, you need to make sure

your R markdown file is executable in case we need to check your code.

Code quality marks

Your R code will be reviewed for conciseness, efficiency, explainability and quality. Inline documentation, for

instance, should demarcate key sections and explain more obtuse operations, but shouldn’t be over verbose.

Out of the 25 marks, 3 will be awarded for code quality.

Presentation marks

Your presentation of results using R will be reviewed. How well do you use plots or other means of ordering

and conveying results. Out of the 25 marks, 2 will be awarded for presentation using R.

2

2 Probabilities in Cards (2 marks)

Have a regular deck of cards with no jokers (13 cards per suit, 4 suits) giving 52 cards. Suppose we draw a

5 card hand, so 5 cards without replacement. For each answer write out the full calculation in R to show

working.

Note there are 52!

47! different 5 card hands if ordering of the draw is considered, and each is equally likely. If

ordering of the draw is ignored, there are

different 5 card hands.

2.1 A special flush (1 mark)

What is the probability of getting a royal flush but where the cards ordered by rank have alternate color?

That is, order the cards as 10,J,Q,K,A and then check to see they have alternate colour. Note in a proper

royal flush, it is all the one suit, but we have changed that to alternate colour. So, for example “red 10,

black J, red Q, black K, red A” is OK but “red J, black 10, red Q, black K, red A” is not OK because once

reordered in rank the alternating colour no longer holds. Note the order in which they are drawn from the

pack is not considered.

HINT: This event is defined ignoring the order of the draw, so count out the number of such hands (ignoring

the order of the draw), and divide by

2.2 No repeats (1 mark)

What is the probability that in the sequence of cards, as they are drawn, no rank occurs twice in a row?

So ignoring the suit, the following are allowed: A, 10, 4, J, 10 or A, 10, A, 4, A, but the following are not

allowed: A, A, 10, 4, A (A repeated in positions 1 and 2), A, 4, 10, 10, J (10 repeated in positions 3 and 4).

HINT: This event is defined using the order of the draw, so count out the number of such hands, and divide

by 52!/47!.

3 PDF and Expectations (3 Marks)

Let X have the PDF given by a function with a different negative and positive part.

f(x) = 12

You can use Wolfram Alpha to do the definite integrals, for instance

https://www.wolframalpha.com/input/?i=integral+(1-x)%5E3+from+0+to+1

3.1 Plot (1/2 mark)

Draw the plot in R.

3

3.2 Mean (1/2 mark)

Find E(X). Why is it not zero?

3.3 Variance (1 mark)

Find variance, V ar(X).

3.4 Skewness (1 mark)

Find skewness, using the formula in the lecture notes. Interpret the value.

4 Distributions (2 marks)

One study has evaluated a number of leukaemia records in a rural area. The population of the area was

35,000. In a year there were 16 leukaemia cases identified, of which 4 where not local residents but tourists

or new immigrants (of which there are not many). In a general population, the annual rate of leukaemia is

typically about one in 10,000.

4.1 Model (1 mark)

Describe the model you recommend to use for the counts, and estimate the parameters using suitable point

estimates.

4.2 Checking (1 mark)

Also, consider the hypothesis, “the annual rate of leukaemia in the area is 1/10,000?” Assume this is the rate

for the residents only. Plot the distribution over counts under this hypothesis. Where does your data lie, and

do you think it is consistent with the hypothesis?

5 Entropy (3 Marks)

In this question, we will use a modified version of the Titanic dataset from the Kaggle competition, Titanic:

Machine Learning from Disaster? The dataset includes information about passenger characteristics as well as

whether they survived from the disaster.

Import the Titanic data using the following R code:

df <- read.csv("Titanic.csv",header=TRUE, sep=",")

Now Survived is Boolean so convert to a truth value with:

df[['Survived']] <- df[['Survived']]==1

4

5.1 Conditional probabilities (1 mark)

Compute tables for the frequency estimates of P(Survived), P(Survived|P class = val) and

P(Survived|Gender = val), for different vals. Do the computation in R. But its OK to present

the final table as a separate Word table (since it might be hard to layout in R). What does this tell you

about survival?

5.2 Entropies (1 marks)

Calculate the entropy (log2()) of Survived, H(Survived) and the conditional entropy of Survived given

P class, H(Survived|P class), and of Survived given Gender, H(Survived|Gender). Do not use an entropy

function but write the code yourself. Use R functions table() and prop.table() to gather stats and form

probabilities from the data frame. What do these three entropies tell you about Survived?

5.3 Coding (1 mark)

Consider the joint space (Survived, P class) which has six outcomes, (T rue, 1), (T rue, 2), (T rue, 3), (F alse, 1),

(F alse, 2), (F alse, 3). Develop an efficient binary prefix code to transmit these outcomes. Would it be

adequate to just provide the codelengths, or is a code needed too? Justify your answer.

6 Maximum likelihood estimation of parameters (3 marks)

One of the central problems of sensory neuroscience is to separate the recordings of background physiological

processes that are irrelevant (noise), from neural responses that are of experimental interest (signal). This

is by no means an easy task, as the signals that neurons produce when they fire are extremely weak and

more random. It is therefore of particular interest to examine the randomness of neuro signals as this allows

researchers to study the brain at a cellular level.

Let’s assume that we have conducted one experiment and recorded the spike signals from one particular

neuron for a duration of 15 seconds. After some data processing, we can obtain spike signals with data given

by a time in seconds and a spike size, similar to the following data and in Figure 1.

n <- 30

times <- c(0.8670763, 1.2550631, 1.3463051, 2.6999393, 3.5238785, 4.8215638, 4.8502006,

5.2372364, 5.3201143, 6.2835730, 7.6961491, 8.0164785, 8.6279902, 9.1390150,

9.5136710, 9.9207854, 9.9795974, 10.0242579, 10.1622076, 10.5968354,

11.6766725, 12.3441424, 12.7731282, 12.8911034, 13.0458095, 13.4280567,

14.2443711, 14.4219672, 14.7461019, 14.7726211)

spikes <- c(0.220136914, 1.252061356, 0.943525370, 0.907732787, 1.157388806, 0.342485956,

0.291760012, 0.556866189, 0.738992636, 0.690779640, 0.425849738, 0.876344116,

1.248761245, 0.697514552, 0.174445203, 1.376500202, 0.731507303, 0.483036515,

0.650835440, 1.106788259, 0.587840538, 0.978983532, 1.179754064, 0.941462421,

0.749840071, 0.005994156, 0.664525928, 0.816033621, 0.483828371, 0.524253461)

6.1 Model (1 mark)

Let us assume that the rate of signals remains constant over time, and the size of each signal is independent

of time too. If the rate of the signals remains constant over time, which distribution would most suit to

model the probability distribution for the number of spike signals over 15 seconds? Why? Briefly answer this

question in a sentence or 2. Also, while we don’t know enough to suggest a distribution for spike sizes, but

what properties should it have?

5

Figure 1: Spike data.

6.2 Maximum likelihood fitting (2 marks)

Using the model above, what is the log-likelihood function for number of spike signals for the period of

experiment time, and what is the maximum likelihood estimate for its parameters?

You’re told that a candidate distribution for spike sizes is the Weibull with shape given by 0.7 and unknown

scale, between 0.5 and 2. This is supported in R using the [dpqr]weibull() functions. One can do maximum

likelihood fitting using the Weibull density on the unknown parameter. Use the optimize() function for that,

so something like

‘optimize(fn, c(minvalue,maxvalue), maximum = TRUE, tol = .Machine$double.eps?0.25)’

7 Central Limit Theorem (7 marks)

Assume that we draw random integers from a Poisson distribution with rate one of λ1 = 1, λ2 = 5, or λ3 = 20.

7.1 Sampling distribution (2 marks)

According to Central Limit Theorem what is the limiting distribution for the sample mean, for the three

rates λ1, λ2, λ3, when we have sample size of 10, 100, 1000 and 10000? Give the theory then compute the

parameter values in R.

Bonus question for HD students giving bonus 1 mark (added to final mark for Assignment 1 only if the final

mark is 24 or less): what is the limiting distribution for the sample variance? This is not really a solvable

problem, so approximate it for just λ3 = 20.

6

7.2 Simulation (2 marks)

Experimentally justify the result in the CLT that says the sample mean has a mean given by the population

mean and a variance given by the population variance divided by sample size. See the CLT Theorem in

Lecture 4. Use simulation given sample a size of 10, 100 and 1000. For each given sample size use 50000

simulations to compute samples and their means. From these means compute the mean and variance of the

sample means, and discuss how results reflect the CLT. Plot the results (3 sample sizes and 3 rates with

mean and SD) to demonstrate any effects you want to discuss.

7.3 Plotting normality (3 marks)

When rate λ1 = 1 and λ2 = 5 and sample size is 10 or 100, obtain the z scores of the sampling means (from

50000 simulations). Plot their distributions in a histogram with the theoretical Gaussian curve overlaid.

Note for sample size 100, the plots overlay very nicely. But what happens with sample size 10? Explain the

differences between the four plots.

For each simulation: the z score of the mean can be calculated as:

where Xˉ is the mean of the sample, μ is the population mean and σ is the population SD.

7


版权所有:编程辅导网 2021 All Rights Reserved 联系方式:QQ:99515681 微信:codinghelp 电子信箱:99515681@qq.com
免责声明:本站部分内容从网络整理而来,只供参考!如有版权问题可联系本站删除。 站长地图

python代写
微信客服:codinghelp