Assignment 1

Student name and ID

Due: 23 August 2020

The goal of the assignment is to model the sales price of a bulldozer at auction based on its usage, equipment

type, configuration and other details. The data is sourced from auction result postings and includes information

on usage and equipment configurations.

This assignment is based on a prediction competition that took place in 2013 at kaggle.com/c/bluebook-forbulldozers.

We will only use a subset of the variables used in the competition.

Data for 401125 machines sold at auction are available including the following variables:

? ID (unique identifier of a particular sale of a machine at auction);

? Sale price: cost of sale in $US;

? Year made: year of manufacture of the machine;

? Machine hours: current usage of the machine in hours at time of sale; missing or 0 means no hours

have been reported for that sale;

? Sale date: year of sale;

? Product group: type of earth moving equipment;

? Enclosure: does machine have a roll-over protective structure (ROPS) and air conditioning.

Your task is to build a regression model for the sale price using the other information available. Only use

variables where at least 90% of observations have non-missing values.

Feel free to work in groups if you prefer. But you all need to submit individual reports.

To read in the data set (3 marks):

library(tidyverse)

bds <- readr::read_csv("bds.csv")

1. Use ggplot() to produce appropriate plots of Sale Price against each of the potential predictors. What

do you learn? (3 marks)

2. Use lm() to fit a regression model for SalePrice with the predictors as main effects. (1 mark)

3. Find the best model you can (with smallest AIC) by adding interactions, and discuss what it tells you

about bulldozer sales prices. (4 marks)

4. Use visreg to visualize the terms in the model, and describe what you learn from these. (4 marks)

5. Produce suitable diagnostic plots to check the model fit, and identify unusual or influential observations.

Comment on the results. (3 marks)

You should submit a single Rmd file that is self-contained and compiles without error. You can assume that

the bds.csv file is in the same folder as the Rmd file. (2 marks)

The assignment has 20 marks in total.

1

版权所有：编程辅导网 2018 All Rights Reserved 联系方式：QQ:99515681 电子信箱：99515681@qq.com

免责声明：本站部分内容从网络整理而来，只供参考！如有版权问题可联系本站删除。