2018/2019
Quantitative Business Analysis and Forecasting (CB9108)
There are 7 questions with 100 marks in total.
Student name:
Login:
Student
declaration:
I confirm that: This is an original assessment and is entirely my own
work. Where I have used ideas, tables, figures of other authors, I have
acknowledged the source in every case. This assignment was not
submitted previously as assessed work for any other academic
course.
Plagiarism: This is an individual assignment and must be your own work. Collusion,
copying or plagiarism may result in disciplinary action. This assignment
will be checked by the Turnitin system for plagiarism. Please refer to
relevant university regulations and
http://www.kent.ac.uk/ai/students/whatisplagiarism.html for further
information. You are also referred to Section 2.4.2 on page 8 in Module
Guide for information on the penalties for plagiarism.
Submission
methods:
The electronic version of this assignment should be uploaded to the
Moodle by 8:00, pm, 16th January 2019 (Hong Kong time).
Please upload your PDF-formatted file to Turnitin only.
Total marks: 100
Weighting: 100%
Note: (1).Any uninterpreted figures or tables will not be marked;
(2).No more than 2 figures or 2 tables are allowed in your solution to
each question.
Question 1.
Download dataset Question1.sav from module cb9108 on moodle.kent.ac.uk. The values in the
variable income, “<=50k” means a salary smaller than or equal to $50k per annum and “>50K”
means a salary greater than $50k per annum.
(1).Provide your interpretation of the histogram of variable Age.
[3 marks]
(2).Draw box plots for variable hours_per_week on the male and the female groups, respectively.
Provide your interpretation of the boxplots and their comparison.
[4 marks]
(3).Use one of the two methods: the Kolmogorov-Smirnov (K-S) test and the Shapiro-Wilk (SW)
test, to test whether the variable age is normally distributed. What is your conclusion
and why?
[4 marks]
(4).For each income group, test whether or not their mean ages are significantly different
between the two sub-groups. Carefully define your hypotheses, explaining whether you are
using a one-tailed or two-tailed test. Report your conclusions.
[6 marks]
(5).Suppose that we want to use the data to decide whether or not the distribution of gender
relates to the distribution of occupation. Answer the following two questions.
(5.1) Formulate the problem statistically by posing it as a hypothesis test.
[2 marks]
(5.2) Provide the result of the hypothesis test in part (5.1).
[2 marks]
Question 2.
A dataset was split into two subsets: training set and test set. Two regression models with the same
number of independent variables were built based on the training dataset. Their mean absolute
deviation on the training dataset and the test dataset are given below.
[6 Marks]
Mean Absolute Deviation
Model 1 Model 2
Training set 13.42 14.03
Test set 16.21 13.26
Which model will you select and why? Discuss it.
Question 3:
Download dataset Question3.sav from module cb9108 on moodle.kent.ac.uk. The dataset contains
four variables: average working hours, measure of average prices, measure of average salaries, and
city. Based on the observations of the first three variables, answer the following questions.
(1).Use Ward’s method to determine the number of clusters and explain the reason,
[4 marks]
(2).Based on the number of clusters determined from the above step, use the k-means
clustering method to cluster the observations and interpret the outcome.
[4 marks]
Question 4.
Download dataset Question4.sav from module cb9108 on moodle.kent.ac.uk. This dataset contains
questionnaire on factors related to the quality of a public place. Each observation represents a
response from a user. Answer the following questions.
(1).How many factors do you select and how do you select them?
[2 marks]
(2).What is the cumulative percentage of variance accounted for by your selected factors?
Interpret it.
[3 marks]
(3).If you use rotation method “Varimax” and use extraction method “Principal component
analysis”, which variables are your factor(s) associated with?
[3 marks]
(4).According to question (3), among the 19 items, ??1, ??2, … , ??19, which items have the most
significant influence on the factors you have selected, respectively?
[3 marks]
Question 5.
A construction equipment company, CX, started its business about 100 years ago. Recently, one
of its product, compact excavator, has become the leading product of CX, but the sales team has
suffered with repeated over productions and under productions, due to wrong sales forecast. The
production team needs to schedule how many compact excavators need to be produced one month
in advance. See dataset Quesion5.xlsx from module CB9108 on moodle.kent.ac.uk for the sales data.
Ms Chan, the senior manager of the sales team, is very anxious with the accuracy of the current
sales forecasting model, which uses the moving average of the last three months. The sales team
was quite puzzled by the great variability in the sales every month, and the current forecast model
seems to be too simplistic and could not justify why it averages the last three periods. Some of her
sales team suggested that the historical data for the number of sales might contain seasonal
dependencies, but they have no idea how to confirm the existence of the seasonal patterns and how
to model this feature. Some others also suggested using a regression model, but they could not find
any relevant external factors that could explain the behaviour of the sales time series, so they
decided to enhance the forecast based on the historical time series data at this stage.
(1).Before any modelling, visually analyse each of the systematic patterns (e.g. trend and
seasonality) in the time series and discuss their existence or/and patterns.
[4 marks]
(2).Divide the data into the training dataset/period (up to the end of 2015) for estimating
forecast models, and the test (hold-out) dataset/period (Jan-2016 onward) to evaluate your
model forecasts. Develop a simple moving average (SMA) model, a simple exponential
smoothing (SES) model, a Holt’s exponential smoothing (HES) model, and Holt-Winters
modle. When estimating the model parameter(s), you are suggested to use the MSE(mean
squared error) in the training period only.
a) Present the four models, respectively;
[6 marks]
b) Which model do you recommend finally and why?
[4 marks]
Question 6.
Download dataset Question6.sav from module CB9108 on moodle.kent.ac.uk.
Bike sharing systems are a new generation of traditional bike rentals where the whole process,
starting from membership registration, rental and return back, has become automatic. Through
these systems, a user is able to easily rent a bike from a particular place and return it back at
another place.
The dataset is sampled from a dataset in the UCI data bank, which implies that the dataset is a
subset of the original dataset. You may find the description of the original dataset on
https://archive.ics.uci.edu/ml/datasets/Bike+Sharing+Dataset.
Develop a linear regression model with variable cnt as the dependent variable and the other
variables as independent variables (you may select a subset of the independent variables). Answer
the following questions.
(1).Provide the process of your model development in detail (for example, you may check
model assumptions, statistical inference, etc).
[16 marks]
(2).Analyse the performance of your model, and
[2 marks]
(3).Analyse how the dependent variable relates to the independent variables.
[2 marks]
Question 7.
Download dataset Quesion7.sav from module CB9108 on moodle.kent.ac.uk. The dataset is sampled
from a dataset in the UCI data bank, which implies that the dataset is a subset of the original
dataset. You may find the description of the original dataset on
https://archive.ics.uci.edu/ml/datasets/Polish+companies+bankruptcy+data.
Answer the following two questions.
Develop a classification model with variable Y as the dependent variable and the other variables as
independent variables (you may select a subset of the independent variables). For this question,
you must use at least three modelling techniques (for example, the k-nearest neighbour algorithm,
decision tree, etc) and then report the model with the best performance. Provide the process of
your model development in detail.
(1).Provide the process of your model development in detail (for example, you may build a
model with the best generalisation, etc).
[12 marks]
(2).Analyse the performance of your model, and
[4 marks]
(3).Provide the model with the best performance.
[4 marks]
版权所有:编程辅导网 2021 All Rights Reserved 联系方式:QQ:99515681 微信:codinghelp 电子信箱:99515681@qq.com
免责声明:本站部分内容从网络整理而来,只供参考!如有版权问题可联系本站删除。