联系方式

  • QQ:99515681
  • 邮箱:99515681@qq.com
  • 工作时间:8:00-21:00
  • 微信:codinghelp

您当前位置:首页 >> Algorithm 算法作业Algorithm 算法作业

日期:2023-03-02 11:23

STAT3010/6075 Statistical Methods in Insurance

Assignment 1

This assignment consists of two questions and is worth 10% of the overall mark for STAT3010/6075.

The deadline for submission is 16.00 on Thursday 2 March 2023.

Standard University policies and procedures will be followed for late submission, extensions and

academic integrity (see the Module Outline for details).

Submission is via Blackboard. You must submit a report of at most six pages (in pdf format),

containing your answers, and a separate R script, containing the code that you used to obtain

your results.

– Your should submit your report via TurnitinUK on Blackboard (see Module Outline for

details) in a file called report-ID.pdf, where ID is your student ID number, for example

report-12345678.pdf. In the Assignments folder, click on Assignment 1 report submission

to submit your report. Please enter this file name as the Submission Title.

– You should not include R code used in your analysis in your report, but you must submit

a separate R script via Blackboard containing your code called code-ID.R, for example

code-12345678.R. Please rename and use the R template code-xxx.R provided. In the

Assignments folder, click on Assignment 1 code submission to submit your code.

– Please start your R script with the command set.seed(ID ), for example set.seed(12345678).

– Whenever you are asked to fit a model you should present in your report the estimate,

standard error and p-value corresponding to each parameter in the model.

– Whenever you are asked to perform a formal test, you should present in your report the test

statistic, p-value or critical value and what you conclude from the test.

The page limit is strict and is easily sufficient to receive full credit. If your report is more than

six pages of A4, only the first six pages will be marked.

1

Question 1

A health insurance company is developing a model to assess the risk of its policy holders having diabetes

based on the following data from the file diabetes.csv:

Diabetes Binary variable indicating diabetes diagnosis, either positive (pos) or negative (neg)

Age Age of individual, recorded in years

BMI Body mass index (weight in kg/(height in m)2)

Glucose Plasma glucose concentration

Pressure Diastolic blood pressure (mm Hg)

Pregnant Number of times pregnant

1. Produce and briefly discuss appropriate tables or plots to assess the relationship between Diabetes

and Age, BMI, Glucose, Pressure and Pregnant.

[7 marks]

2. Fit an appropriate generalised linear model to estimate the probability of having diabetes using

Age, BMI, Glucose, Pressure and Pregnant. Explain your choice of distribution and link function.

[4 marks]

3. Formally test if the square of Age improves the fit of the model from Part 2.

[2 marks]

4. Based on the test in Part 3. and any further tests you think appropriate, select a model for the

probability of having diabetes.

[8 marks]

5. Present and interpret the estimated coefficients for the model you selected in Part 4.

[4 marks]

2

Question 2

You have been asked by a general insurance company to create a model to predict the number of third

party liability (TPL) claims for an active policy based on the available information about the policy

holder. You have the following data from the file motor.csv at your disposal:

Policy Number of the TPL policy

Claims Number of claims made during the year

DrivAge Age of the driver/policy holder, recorded in years

DrivGender Gender of the driver/policy holder, either male (M) or female (F)

VehAge Age of the vehicle, recorded in years

Region Region of residence of the policy holder, categorical variable with levels Capital, City,

Rural, Town

1. Produce and briefly discuss appropriate tables or plots to assess the relationship between Claims

and DrivAge, DrivGender, VehAge and Region.

[6 marks]

2. Fit an appropriate generalised linear model to predict the expected number of claims per year

using DrivAge and DrivGender. Explain your choice of distribution and link function.

[4 marks]

3. Formally test if either or both VehAge and Region should be added to the model in Part 2. State

and explain which one of the four models considered you prefer.

[8 marks]

4. Formally test if the interaction between DrivAge and DrivGender significantly improve the fit of

your preferred model from Part 3.

[2 marks]

5. Fit and use your preferred model from Parts 3. or 4. to predict the expected number claims for:

(a) a female driver from the capital, aged 47 with an 4-year-old car;

(b) a male driver from the town, aged 31 with an 10-year-old car.

(Ignore the values for the variables not included in your model.)

[5 marks]


相关文章

版权所有:编程辅导网 2021 All Rights Reserved 联系方式:QQ:99515681 微信:codinghelp 电子信箱:99515681@qq.com
免责声明:本站部分内容从网络整理而来,只供参考!如有版权问题可联系本站删除。 站长地图

python代写
微信客服:codinghelp