联系方式

  • QQ:99515681
  • 邮箱:99515681@qq.com
  • 工作时间:8:00-21:00
  • 微信:codinghelp

您当前位置:首页 >> Java编程Java编程

日期:2019-01-06 10:09

2018/2019

Quantitative Business Analysis and Forecasting (CB9108)

There are 7 questions with 100 marks in total.

Student name:

Login:

Student

declaration:

I confirm that: This is an original assessment and is entirely my own

work. Where I have used ideas, tables, figures of other authors, I have

acknowledged the source in every case. This assignment was not

submitted previously as assessed work for any other academic

course.

Plagiarism: This is an individual assignment and must be your own work. Collusion,

copying or plagiarism may result in disciplinary action. This assignment

will be checked by the Turnitin system for plagiarism. Please refer to

relevant university regulations and

http://www.kent.ac.uk/ai/students/whatisplagiarism.html for further

information. You are also referred to Section 2.4.2 on page 8 in Module

Guide for information on the penalties for plagiarism.

Submission

methods:

The electronic version of this assignment should be uploaded to the

Moodle by 8:00, pm, 16th January 2019 (Hong Kong time).

Please upload your PDF-formatted file to Turnitin only.

Total marks: 100

Weighting: 100%

Note: (1).Any uninterpreted figures or tables will not be marked;

(2).No more than 2 figures or 2 tables are allowed in your solution to

each question.

Question 1.

Download dataset Question1.sav from module cb9108 on moodle.kent.ac.uk. The values in the

variable income, “<=50k” means a salary smaller than or equal to $50k per annum and “>50K”

means a salary greater than $50k per annum.

(1).Provide your interpretation of the histogram of variable Age.

[3 marks]

(2).Draw box plots for variable hours_per_week on the male and the female groups, respectively.

Provide your interpretation of the boxplots and their comparison.

[4 marks]

(3).Use one of the two methods: the Kolmogorov-Smirnov (K-S) test and the Shapiro-Wilk (SW)

test, to test whether the variable age is normally distributed. What is your conclusion

and why?

[4 marks]

(4).For each income group, test whether or not their mean ages are significantly different

between the two sub-groups. Carefully define your hypotheses, explaining whether you are

using a one-tailed or two-tailed test. Report your conclusions.

[6 marks]

(5).Suppose that we want to use the data to decide whether or not the distribution of gender

relates to the distribution of occupation. Answer the following two questions.

(5.1) Formulate the problem statistically by posing it as a hypothesis test.

[2 marks]

(5.2) Provide the result of the hypothesis test in part (5.1).

[2 marks]

Question 2.

A dataset was split into two subsets: training set and test set. Two regression models with the same

number of independent variables were built based on the training dataset. Their mean absolute

deviation on the training dataset and the test dataset are given below.

[6 Marks]

Mean Absolute Deviation

Model 1 Model 2

Training set 13.42 14.03

Test set 16.21 13.26

Which model will you select and why? Discuss it.

Question 3:

Download dataset Question3.sav from module cb9108 on moodle.kent.ac.uk. The dataset contains

four variables: average working hours, measure of average prices, measure of average salaries, and

city. Based on the observations of the first three variables, answer the following questions.

(1).Use Ward’s method to determine the number of clusters and explain the reason,

[4 marks]

(2).Based on the number of clusters determined from the above step, use the k-means

clustering method to cluster the observations and interpret the outcome.

[4 marks]

Question 4.

Download dataset Question4.sav from module cb9108 on moodle.kent.ac.uk. This dataset contains

questionnaire on factors related to the quality of a public place. Each observation represents a

response from a user. Answer the following questions.

(1).How many factors do you select and how do you select them?

[2 marks]

(2).What is the cumulative percentage of variance accounted for by your selected factors?

Interpret it.

[3 marks]

(3).If you use rotation method “Varimax” and use extraction method “Principal component

analysis”, which variables are your factor(s) associated with?

[3 marks]

(4).According to question (3), among the 19 items, ??1, ??2, … , ??19, which items have the most

significant influence on the factors you have selected, respectively?

[3 marks]

Question 5.

A construction equipment company, CX, started its business about 100 years ago. Recently, one

of its product, compact excavator, has become the leading product of CX, but the sales team has

suffered with repeated over productions and under productions, due to wrong sales forecast. The

production team needs to schedule how many compact excavators need to be produced one month

in advance. See dataset Quesion5.xlsx from module CB9108 on moodle.kent.ac.uk for the sales data.

Ms Chan, the senior manager of the sales team, is very anxious with the accuracy of the current

sales forecasting model, which uses the moving average of the last three months. The sales team

was quite puzzled by the great variability in the sales every month, and the current forecast model

seems to be too simplistic and could not justify why it averages the last three periods. Some of her

sales team suggested that the historical data for the number of sales might contain seasonal

dependencies, but they have no idea how to confirm the existence of the seasonal patterns and how

to model this feature. Some others also suggested using a regression model, but they could not find

any relevant external factors that could explain the behaviour of the sales time series, so they

decided to enhance the forecast based on the historical time series data at this stage.

(1).Before any modelling, visually analyse each of the systematic patterns (e.g. trend and

seasonality) in the time series and discuss their existence or/and patterns.

[4 marks]

(2).Divide the data into the training dataset/period (up to the end of 2015) for estimating

forecast models, and the test (hold-out) dataset/period (Jan-2016 onward) to evaluate your

model forecasts. Develop a simple moving average (SMA) model, a simple exponential

smoothing (SES) model, a Holt’s exponential smoothing (HES) model, and Holt-Winters

modle. When estimating the model parameter(s), you are suggested to use the MSE(mean

squared error) in the training period only.

a) Present the four models, respectively;

[6 marks]

b) Which model do you recommend finally and why?

[4 marks]

Question 6.

Download dataset Question6.sav from module CB9108 on moodle.kent.ac.uk.

Bike sharing systems are a new generation of traditional bike rentals where the whole process,

starting from membership registration, rental and return back, has become automatic. Through

these systems, a user is able to easily rent a bike from a particular place and return it back at

another place.

The dataset is sampled from a dataset in the UCI data bank, which implies that the dataset is a

subset of the original dataset. You may find the description of the original dataset on

https://archive.ics.uci.edu/ml/datasets/Bike+Sharing+Dataset.

Develop a linear regression model with variable cnt as the dependent variable and the other

variables as independent variables (you may select a subset of the independent variables). Answer

the following questions.

(1).Provide the process of your model development in detail (for example, you may check

model assumptions, statistical inference, etc).

[16 marks]

(2).Analyse the performance of your model, and

[2 marks]

(3).Analyse how the dependent variable relates to the independent variables.

[2 marks]

Question 7.

Download dataset Quesion7.sav from module CB9108 on moodle.kent.ac.uk. The dataset is sampled

from a dataset in the UCI data bank, which implies that the dataset is a subset of the original

dataset. You may find the description of the original dataset on

https://archive.ics.uci.edu/ml/datasets/Polish+companies+bankruptcy+data.

Answer the following two questions.

Develop a classification model with variable Y as the dependent variable and the other variables as

independent variables (you may select a subset of the independent variables). For this question,

you must use at least three modelling techniques (for example, the k-nearest neighbour algorithm,

decision tree, etc) and then report the model with the best performance. Provide the process of

your model development in detail.

(1).Provide the process of your model development in detail (for example, you may build a

model with the best generalisation, etc).

[12 marks]

(2).Analyse the performance of your model, and

[4 marks]

(3).Provide the model with the best performance.

[4 marks]


版权所有:编程辅导网 2021 All Rights Reserved 联系方式:QQ:99515681 微信:codinghelp 电子信箱:99515681@qq.com
免责声明:本站部分内容从网络整理而来,只供参考!如有版权问题可联系本站删除。 站长地图

python代写
微信客服:codinghelp