T2187 Business analytics, applied modelling and prediction
Important note
This commentary reflects the examination and assessment arrangements for this course in the
academic year 2019–20. The format and structure of the examination may change in future years,
and any such changes will be publicised on the virtual learning environment (VLE).
Information about the subject guide and the Essential reading
references
Unless otherwise stated, all cross-references will be to the latest version of the course (2019). You
should always attempt to use the most recent edition of any Essential reading textbook, even if the
commentary and/or online reading list and/or subject guide refer to an earlier edition. If different
editions of Essential reading are listed, please check the VLE for reading supplements – if none are
available, please use the contents list and index of the new edition to find the relevant section.
General remarks
Learning outcomes
At the end of the course and having completed the essential reading and activities you should be
able to:
apply modelling at varying levels to aid decision-making
understand basic principles of how to analyse complex multivariate datasets with the aim of
extracting the important message contained within the large amount of data which is often
available
demonstrate the wide applicability of mathematical models while, at the same time,
identifying their limitations and possible misuse.
Format of the examination
The examination is two hours long and you must answer four questions out of five. The examination
is worth 70% of the final grade. The other 30% is determined by the coursework component. (The
coursework comprised a freestyle data visualisation project using Tableau, requiring the design of a
five-dashboard story and accompanying report – see the ‘Assessment’ section in the VLE for details.)
Overall performance
The performance of candidates in the examination was very pleasing, with many excellent answers.
Use of Excel is not directly examined, rather some questions required the interpretation of Excel
1
ST2187 Business analytics, applied modelling and prediction
output and/or formulae. Some answers lacked sufficient depth of explanation – remember to
comment in detail on each argument of a function.
Although this is an applied statistics course, candidates are reminded that commercial insight is also
important. Always think about which business decisions could be taken as a consequence of the
analysis, justifying the decision(s) based on the results – the course is about business analytics after
all!
Examination revision strategy
Many candidates are disappointed to find that their examination performance is poorer than they
expected. This may be due to a number of reasons, but one particular failing is ‘question
spotting’, that is, confining your examination preparation to a few questions and/or topics which
have come up in past papers for the course. This can have serious consequences.
We recognise that candidates might not cover all topics in the syllabus in the same depth, but you
need to be aware that examiners are free to set questions on any aspect of the syllabus. This
means that you need to study enough of the syllabus to enable you to answer the required number of
examination questions.
The syllabus can be found in the Course information sheet available on the VLE. You should read
the syllabus carefully and ensure that you cover sufficient material in preparation for the
examination. Examiners will vary the topics and questions from year to year and may well set
questions that have not appeared in past papers. Examination papers may legitimately include
questions on any topic in the syllabus. So, although past papers can be helpful during your revision,
you cannot assume that topics or specific questions that have come up in past examinations will
occur again.
If you rely on a question-spotting strategy, it is likely you will find yourself in difficulties
when you sit the examination. We strongly advise you not to adopt this strategy.
2
Examiners’ commentaries 2020
Examiners’ commentaries 2020
ST2187 Business analytics, applied modelling and prediction
Important note
This commentary reflects the examination and assessment arrangements for this course in the
academic year 2019–20. The format and structure of the examination may change in future years,
and any such changes will be publicised on the virtual learning environment (VLE).
Information about the subject guide and the Essential reading
references
Unless otherwise stated, all cross-references will be to the latest version of the course (2019). You
should always attempt to use the most recent edition of any Essential reading textbook, even if the
commentary and/or online reading list and/or subject guide refer to an earlier edition. If different
editions of Essential reading are listed, please check the VLE for reading supplements – if none are
available, please use the contents list and index of the new edition to find the relevant section.
Comments on specific questions
Candidates should answer FOUR of the following FIVE questions. All questions carry equal marks.
Question 1
(a) Consider a modified version of the Monty Hall problem. In this version, there
are 8 boxes, of which 1 box contains the prize and the other 7 boxes are empty.
You select one box at first. Monty, who knows where the prize is, then opens 6
of the remaining 7 boxes, all of which are shown to be empty. If Monty has a
choice of which boxes to open (i.e. if the prize is in the box you chose at first),
he will choose at random which one of the boxes to leave unopened.
i. Suppose that you have chosen Box 1, and then Monty opens Boxes 3 to 8,
leaving Box 2 unopened. After we have observed this, what is the probability
that the prize is in Box 1, and what is the probability that it is in Box 2?
(10 marks)
ii. How should a risk-neutral decision-maker use the probabilities computed in i.
to inform their strategy?
(3 marks)
Reading for this question
Block 1 on the VLE covers the Monty Hall problem as part of decision-making under
uncertainty.
3
ST2187 Business analytics, applied modelling and prediction
Approaching the question
i. Let Bj denote the event that the prize is in Box j, for j = 1, 2, . . . , 8, and let N2 denote
the event that Monty does not open Box 2. Bayes’ theorem tells us that the conditional
probability that the prize is in Box 1 is:
P (B1 |N2) = P (N2 |B1)P (B1)8∑
j=1
P (N2 |Bj)P (Bj)
=
P (N2 |B1)P (B1)
P (N2 |B1)P (B1) + P (N2 |B2)P (B2) +
8∑
j=3
P (N2 |Bj)P (Bj)
.
According to the rules of the game, the probabilities to use here are:
? P (B1) = P (B2) = · · · = P (B8) = 1/8, since all the boxes are initially equally likely to
contain the prize.
? P (N2 |B1) = 1/7. If the prize is in Box 1, Monty has a choice, so he chooses at
random which of the remaining boxes to leave unopened.
? P (N2 |B2) = 1. If the prize is in Box 2, Monty must leave it unopened.
? P (N2 |Bj) = 0 for j = 3, 4, . . . , 8. If the prize is in either Box 3 to 8, Monty cannot
open that box. This means that he must open all the other boxes not chosen by you,
including Box 2.
Therefore, the conditional probabilities are:
P (B1 |N2) = 1/7× 1/8
1/7× 1/8 + 1× 1/8 + 6× (0× 1/8) =
1
8
and P (B2 |N2) = 7/8. Hence you have an 87.5% chance of winning the prize if you
switch to Box 2.
ii. While neither conditional probability computed in i. offers certainty, a risk-neutral
decision-maker would ‘play to the probabililties’ and choose the strategy to always switch
to the unopened box, since a long-run expected win rate of 87.5% by switching is far
better than a long-run expected win rate of 12.5% by not switching.
(b) ‘If there is uncertainty about some monetary outcome and you are concerned
about return and risk, then all you need to see are the mean and standard
deviation. The entire distribution provides no extra useful information.’ Do you
agree or disagree? Provide an example to back up your argument.
(5 marks)
Reading for this question
Block 5 on the VLE covers probability and probability distributions.
Approaching the question
The mean and standard deviation are very important measures of a probability distribution,
but the entire shape of the distribution, especially the tails, can be relevant in
decision-making. This is especially true when you are risk-averse and large losses (or gains)
are possible!
(c) It is assumed that inter-person arrival times at a bank during the peak period
covering lunchtime follow an exponential distribution with a mean of 20
seconds. An Excel analysis was conducted as follows:
4
Examiners’ commentaries 2020
Explain what the value of 0.0498 in cell B5 represents and why this function is
used.
(7 marks)
Reading for this question
Block 6 on the VLE covers common probability distributions in business applications. This
question combines the Poisson and exponential distributions.
Approaching the question
This returns the probability of having to wait at least 1 minute (equivalently 60 seconds) for
the next arrival. The Excel function returns P (X = 0), where X ~ Pois(3), which is:
P (X = 0) =
e?λλx
x!
=
e?3 × 30
0!
= e?3.
Since the inter-person arrival times, Y , follow an exponential distribution, then Y ~ Exp(λ)
with E(Y ) = 1/λ = 20, hence λ = 1/20, which represents the per-second rate parameter for
the Poisson distribution. Since this is multiplied by 60 in cell B5, we have the Poisson
parameter for 1 minute.
Question 2
A landlord is deciding whether to upgrade a building’s heating system and is willing
to do so if the cost of the upgraded system can be recovered over the next five
consecutive years of winter weather, which can be classified as one of ‘mild’,
‘normal’, ‘cold’ and ‘severe cold’, occurring with probabilities of 0.2(1 ? x),
0.5(1 ? x), 0.3(1 ? x) and x, respectively.
The predicted annual heating costs (assume no inflation), with and without the
upgrade, are:
Type of winter weather Mild Normal Cold Severe cold
Annual costs without upgrade £420 £590 £720 £900
Annual costs with upgrade £358 £503 £612 £765
Probability 0.2(1 ? x) 0.5(1 ? x) 0.3(1 ? x) x
(a) Given x = 0.2, what is the maximum amount a risk-neutral landlord would be
willing to pay for an upgraded heating system?
(6 marks)
(b) If the upgraded heating system costs £600, what is the minimum value of x
such that installing the upgraded heating system becomes economically viable?
(7 marks)
5
ST2187 Business analytics, applied modelling and prediction
(c) Given x = 0.2, calculate and interpret the expected value of perfect
information, EVPI.
Hint: You may find it helpful to draw the decision tree for this problem.
(12 marks)
Reading for this question
Block 7 on the VLE covers decision-making under uncertainty using decision trees.
Approaching the question
(a) Given x = 0.2, the expected monetary value, EMV, of annual costs without the upgrade is:∑
x p(x) = 420× 0.16 + 590× 0.40 + 720× 0.24 + 900× 0.2 = £656.00
and so for five consecutive winters is:
£656.00× 5 = £3,280.00.
Given x = 0.2, the expected monetary value, EMV, of annual costs with the upgrade is:∑
x p(x) = 358× 0.16 + 503× 0.40 + 612× 0.24 + 765× 0.2 = £558.36
and so for five consecutive winters is:
£558.36× 5 = £2,791.80.
Hence the maximum amount a risk-neutral landlord would be willing to pay for an upgraded
heating system is:
£3,280.00? £2,791.80 = £488.20.
(b) In terms of x, the expected monetary value, EMV, of annual costs without the upgrade is:∑
x p(x) = 5× (420× 0.2(1? x) + 590× 0.5(1? x) + 720× 0.3(1? x) + 900× x)
= 1,525x+ 2,975
while with the upgrade, the EMV is:∑
x p(x) = 5× (358× 0.2(1? x) + 503× 0.5(1? x) + 612× 0.3(1? x) + 765× x)
= 1,291.5x+ 2,533.50
So we require:
(1,525x+ 2,975)? (1,291.5x+ 2,533.50) > £600
which solves for:
x > 0.6788.
Hence the value of x must be at least 67.88% before the landlord decides it is economically
worthwhile to install upgraded heating system.
(c) From (a), the expected monetary value with no information is £3,280.00.
6
Examiners’ commentaries 2020
The decision tree is:
The expected monetary value with perfect information is £3,265.00.
Hence the expected value of perfect information is:
EVPI = £3,280.00? £3,265.00 = £15.00.
Question 3
A company is interested in explaining the variation in annual salaries of its
employees, in dollars, using the following variables:
Gender, coded as 1 = female, and 0 = male
Age, in years
Work experience with the company, in years
Amount of post-16 education, in years.
(a) The correlation matrix of the above variables is:
7
ST2187 Business analytics, applied modelling and prediction
What conclusions can be drawn from this correlation matrix?
(6 marks)
The results of a multiple regression, including an interaction between gender and
age are:
(b) State the estimated regression equation and comment on R2.
(5 marks)
(c) Assess the overall significance of the model, as well as the significance of the
individual regression coefficients.
(5 marks)
(d) According to this multiple regression model, how does age impact annual
salaries for males and females?
(4 marks)
(e) Calculate an approximate 99% prediction interval for the annual salary of a
26-year-old female, who has worked at the company for 8 years, with a total of 2
years of post-16 education. Use a t coefficient of 2.601.
(5 marks)
Reading for this question
Blocks 11 and 12 on the VLE cover regression analysis (estimating relationships and statistical
inference, respectively).
8
Examiners’ commentaries 2020
Approaching the question
(a) The dependent variable (annual salary) shows moderate correlation with each of the
candidate independent variables, albeit gender is categorical, suggesting performing a
multiple regression would be appropriate.
Among the independent variables the strongest correlation is only 0.269 and so we would
not expect multicollinearity to feature as a problem in a multiple regression.
(b) We have:
?Annual salary = ?6,583.78 + 9,465.53×Gender + 592.82×Age
+ 2,831.16× Experience + 8,182.46× Education
? 474.85×Gender×Age.
R2 = 0.7110 indicating that 71.10% of the variation in annual salary is explained by the
model, which is a fairly high percentage.
(c) Testing H0 : β1 = β2 = · · · = β5 = 0, the F -ratio is 97.42 with a p-value < 0.0001, hence the
test is highly significant and so we have very strong evidence that the model has significant
explanatory power. Testing H0 : βi = 0 for i = 1, 2, . . . , 5, all the variables are statistically
significant at the 5% significance level, except Gender with a p-value of 0.28 > 0.05. Note
that the interaction variable involving Gender (with Age) is significant.
(d) For females, the effect of an extra year on annual salary is:
592.82? 474.85 = 117.97
while for males the effect of an extra year on annual salary is 592.82.
Other things equal, females earn $474.85 less than males in a year.
(e) The prediction is:
?6,583.78 + 9,465.53× 1 + 592.82× 26 + 2,831.16× 8 + 8,182.46× 2? 474.85× 1× 26
which is $44,966.66. Hence an approximate 99% prediction interval is:
Y? ± t0.005, 198 × se ? 44,966.66± 2.601× 16,467.77
which gives ($2,133.99, $87,799.33).
Question 4
(a) Holt’s method, as an exponential smoothing model, assumes an additive trend.
Suppose instead that the model is changed to a multiplicative trend so that, for
example, if the current level estimate is 10, and the current trend estimate is
1.3 (i.e. the forecast increases by 30% per period), then the forecast next period
is 10 × 1.3, and for the following period is 10 × (1.3)2. Holt’s method equations
then become:
Lt = αYt + (1 ? α)(?) and Tt = β(#) + (1 ? β)Tt?1.
i. State the expressions for (?) and (#) in the above equations, briefly
explaining your answer.
(6 marks)
9
ST2187 Business analytics, applied modelling and prediction
ii. Assume you are working with monthly data, with December being month 12,
January being month 13 etc. Suppose L12 = 80 and T12 = 1.2 and you
observe Y13 = 90. Determine the forecast for Y15, assuming α = 0.3 and
β = 0.7 using a multiplicative trend.
(7 marks)
Reading for this question
Block 13 on the VLE covers time series analysis and forecasting.
Approaching the question
i. The first term is the product of the previous level and the previous trend:
Lt?1 × Tt?1.
The second term is the ratio of the current level to the previous level:
Lt
Lt?1
.
ii. We have:
L13 = α× Y13 + (1? α)× L12 × T12 = 0.3× 80 + 0.7× 90× 1.2 = 94
and:
T13 = β × L13
L12
+ (1? β)× T12 = 0.7× 94
80
+ 0.3× 1.2 = 1.18.
Therefore:
F15 = L13 × T 213 = 94× (1.18)2 = 132.11.
(b) Is it more appropriate to use an additive or a multiplicative model to forecast
seasonal data? Summarise the difference(s) between these two types of seasonal
models.
(5 marks)
Reading for this question
Block 13 on the VLE covers time series analysis and forecasting.
Approaching the question
Both can be useful, and it is hard to say that one is better than the other. The additive
model is relevant if you believe each season’s typical value is a certain amount above or
below the overall average. The multiplicative model is relevant if you believe each season’s
typical value is a certain percentage above or below the overall average.
(c) Let Yt be the sales in dollars during month t, and let Pt be the price in dollars
charged during month t. A regression of the form:
Yt = β0 + β1Yt?1 + β2Pt + εt
was run and was estimated to be:
Y?t = 972,545.98 + 0.6714 × Yt?1 ? 23,604.19 × Pt.
i. If the price during month 21 is $14, and sales in month 20 were $781,000,
what would you predict for sales in month 21?
(3 marks)
10
Examiners’ commentaries 2020
ii. The correlation coefficient between Yt and Yt?1, i.e. rYt,Yt?1 , is 0.864.
Comment on this value.
(2 marks)
iii. The correlation coefficient between Yt and Pt, i.e. rYt,Pt , is ?0.325. Comment
on this value.
(2 marks)
Reading for this question
Block 13 on the VLE covers time series analysis and forecasting.
Approaching the question
i. We have:
972,545.98 + 0.6714× 781,000? 23,604.19× 14 = $1,166,451
ii. There is fairly strong positive autocorrelation for sales – for example, high sales one
month tend to be followed by high sales the next month, which seems reasonable.
iii. There is weak-to-moderate negative correlation between sales and price, consistent with
the law of demand.
Question 5
(a) Suppose you are a financial analyst and your company runs many simulation
models to estimate the profitability of its projects. If you had to choose just two
measures of the distribution of any important output, such as net profit, to
report, which two would you choose? Why? What information would be missing
if you reported only these two measures? How could they be misleading?
(6 marks)
Reading for this question
Block 15 on the VLE covers Monte Carlo simulation models.
Approaching the question
The point here is that there is an entire distribution of any such output, and we are limited
to reporting only two of its measures. (Granted, this is unrealistic.) Anyway, you probably
want a measure of central tendency and a measure of variability. Two useful measures for a
financial analyst might be the median (half the time, net profit is above it; half the time, it
is below it) and the 5th percentile (this is nearly as bad as net profit will be).
(b) A country, with no free health service, offers a tax-free savings account for
medical expenses. At the start of the year you decide how much to pay into the
account. When medical expenses are incurred you may withdraw up to this
amount (with no tax paid) exclusively to pay for medical expenses (if the
amount invested exceeds your medical expenses, you lose this money). Any
medical expenses which exceed the amount you invested must be paid by you
using income taxed at 35%. Your annual salary is $85,000, and you assume
annual medical expenses are normally distributed with a mean of $1,700 and a
standard deviation of $350.
The following is the spreadsheet model which shows the results of one
simulation in the cell range A16:D16.
11
ST2187 Business analytics, applied modelling and prediction
i. For the simulation in the cell range A16:D16, explain how the value of
$53,861 in cell D16 would have been calculated (with reference to relevant
cells in the Excel spreadsheet model above).
(6 marks)
ii. Write down an appropriate Excel function for cell C16, i.e. to determine the
medical expenses above the amount invested. Explain your choice of function.
(5 marks)
iii. One thousand simulations were run and the distribution of outcomes
produced the following:
12
Examiners’ commentaries 2020
Explain why the maximum amount of money left to you is $54,340.
(4 marks)
iv. Based on the simulation results in iii., do you think investing $1,400 in the
tax-free savings account is appropriate? Or should you invest more? Or less?
Explain your answer.
(4 marks)
Reading for this question
Block 15 on the VLE covers Monte Carlo simulation models.
Approaching the question
i. Cell A16 returns a simulated value from N(1,700, (350)2), which is $1,879 denoting the
medical expenses of the individual.
The tax liability of 35% is levied on the difference between annual income and amount
invested:
($85,000? $1,400)× 0.35 = $29,260.
The medical expenses, $1,879, exceed the amount invested by:
$1,879? $1,400 = $479.
The amount of money left is:
($85,000? $1,400)× 0.65? $479 = $53,861.
ii. One possibility is =MAX(A16-B12,0), which returns the greater of ‘excess medical
expenses over amount invested’ and zero (when the amount invested is sufficient to cover
medical expenses).
iii. This represents the best case scenario when medical expenses do not exceed the amount
invested, i.e. do not exceed $1,400. Net income is then:
($85,000? $1,400)× 0.65? $0 = $54,340.
iv. Any reasonable comments accepted, taking into account the risk appetite of the
individual.
版权所有:编程辅导网 2021 All Rights Reserved 联系方式:QQ:99515681 微信:codinghelp 电子信箱:99515681@qq.com
免责声明:本站部分内容从网络整理而来,只供参考!如有版权问题可联系本站删除。