ACF5320
Assignment 2
ASSESSMENT TASK: Assignment 2
WEIGHTING: 30%
COMPLETION: Individual
GENERATIVE AI: Generative AI tools can be used in this assessment task
In this assessment, you can use generative artificial intelligence (AI) to generate the
specified content in relation to the assessment task. This material must be
acknowledged and recorded in your declaration of AI use.
DUE DATE: 11:55pm, Monday 8 April, 2024
OVERVIEW
In this assignment, you are tasked with conducting regression analysis on multiple datasets provided in Excel format. The assignment is structured around four key cases, each requiring you to apply regression techniques to predict outcomes based on various independent variables. This exercise aims to assess your proficiency in predictive modelling, data analysis, and the interpretation of results within a business analytics context.
. In the Decision case, using the "Decision.xlsx" dataset, you will analyse the impact of
experience on decision-making quality among auditors, examining how it correlates with intelligence, thinking styles, and personality traits.
. The Haircut case requires you to explore the "Haircut.xlsx" database to determine the factors that significantly influence a company's revenue, employing regression analysis to identify these key predictors.
. For the Audit scenario, with the "Audit.xlsx" dataset, you are to investigate the relationship between audit delay and various descriptive variables, focusing on developing a regression model that can accurately predict delay durations.
. The Prescription Cost Analysis involves the "Prescription.xlsx" dataset, where you will model and predict drug costs based on a set of independent variables, enhancing your model's accuracy through iterative refinement.
Your submission should demonstrate a thorough understanding of regression analysis as applied to predictive analytics. This includes not only the technical execution of statistical tests but also the ability to interpret and communicate the significance of your findings in a clear, concise manner.
Through this assignment, you will showcase your capability to leverage Excel for predictive modelling and to derive actionable insights from complex datasets.
OBJECTIVES
. Understand and apply regression analysis techniques.
. Analyse relationships between dependent and independent variables.
. Interpret and evaluate regression model outputs.
. Develop predictive models based on the analysis.
. Communicate analytical findings effectively.
SUBMISSION REQUIREMENTS
Type your responses in a MS Word document and submit your Word document to Moodle. Cut and paste any relevant output from Excel into your Word document.
You do not need to clean the data and do not delete any data.
Case 1: Audit (5 marks)
A study investigated the relationship between audit delay (the length of time from a company’s fiscal year-end to the date of the auditor’s report) and variables that describe the client and the auditor. Some of the independent variables that were included in this study are presented below.
Variable |
Definition |
Industry |
A binary (also known as ‘dummy’) variable coded 1 if the firm was an industrial company, or 0 if the firm was a bank, savings and loan, or insurance company. |
Public |
A dummy variable coded 1 if the company was traded on an organized exchange or over the counter; otherwise coded 0. |
Quality |
A measure of overall quality of internal controls, as judged by the auditor, on a five-point scale ranging from “virtually none” (1) to “excellent” (5). |
Finished |
A measure ranging from 1 to 4, as judged by the auditor, where 1 indicates “all work performed subsequent to year-end” and 4 indicates “most works performed prior to year-end” . |
Using data in “Audit.xlsx”, answer the following questions:
(1.1) Develop scatter charts of the data using each of the independent variables included in the data. (0.5 mark)
(1.2) Develop the estimated regression equation using all of the independent variables included in the data. (0.5 mark)
(1.3) Test for an overall regression relationship at the 0.05 level of significance. Is there a significant regression relationship? (0.5 mark)
(1.4) How much of the variation in the sample values of delay does this estimated regression equation explain? What other independent variables could you include in this regression model to improve the fit? (1 mark)
(1.5) On the basis of your observations about the relationships between the dependent
variable Delay and the independent variables Quality and Finished, suggest an alternative regression equation to the model developed in your answer to the question (1.2) to explain as much of the variability in Delay as possible. (2.5 marks)
Case 2: Decision (5 marks)
Using the “Decision.xlsx” dataset, analyse differences between experienced and inexperienced participants.
(2.1) Do the experienced versus the inexperienced auditors differ in the quality of their
decisions (i.e., the Decision variable)? Cut and paste relevant statistics from Excel and explain the statistics. (2 marks)
(2.2) Do the experienced versus the inexperienced differ in terms of any intelligence,
thinking style, or personality trait variables? Identify the ones that are different and provide the relevant statistics. Cut and paste relevant statistics from Excel and explain the statistics (only for those that are different). (2 marks)
(2.3) Without using the language of statistics, what do you conclude about experienced versus inexperienced auditors? (1 mark)
Decision data description
Participants consist of auditors and students. Auditors are considered experienced and students are inexperienced.
Variable |
Definition |
ID |
Participant identification number. |
Decision |
Higher values indicate better performance on task requiring professional judgment. |
WPT |
Number of questions correctly answered on the Wonderlic Personnel Test. An IQ test. Higher scores indicate higher IQs. |
FFM_agree |
Response to the measures of the agreeableness factor in the Five Factor Model. |
FFM_cons |
Response to the measures of the conscientiousness factor in the Five Factor Model. |
FFM_ES |
Response to the measures of the emotional stability factor in the Five Factor Model. |
FFM_extra |
Response to the measures of the extraversion factor in the Five Factor Model. |
FFM_open |
Response to the measures of the openness factor in the Five Factor Model. |
Exp dummy |
0 = inexperienced, 1= experienced |
Case 3: Haircut (5 marks)
Use the “Haircut.xlsx” database to run regression models that explain the factors that significantly influence revenue at this company.
(3.1) Report and interpret your best model’s technical details. Cut and paste the relevant statistics from Excel and explain the statistics. (2 marks)
(3.2) Do you believe that your model is effective for explaining changes in revenue? Explain and justify your response. (2 marks)
(3.3) Explain in plain language the meaning of your findings. (1 mark)
Haircut data description
You have been provided an Excel file that contains 4 data items. Each row represents the data for one haircut at a business that operates in two countries. The business does not take appointments. Customers walk in and wait for a haircut.
Variable |
Definition |
Wait_time |
the number of minutes the customer waited for the hair cut |
Chair_time |
the number of minutes needed to complete the hair cut |
Revenue |
revenue generated from the hair cut |
Labour_cost |
cost of labor for the hair cut |
Country |
dummy variable for country 1 and country 2 |
Case 4: Prescription Cost Analysis (15 marks)
Assume that you are working for a government agency that is trying to determine the main causes of different drug costs for different patients. You have data (“Prescription.xlsx”) from six months of drug prescriptions. You need to model and predict drug costs. The appendix shows descriptions of the data.
(4.1) Assume that we are using this model: (3 marks)
GrossDrugCost = B0 + B1 * RiskScore + ε
i. Interpret the coefficient and the p-value for the RiskScore variable. Provide a practical explanation of the RiskScore variable for senior management. (1 mark)
ii. Explain what R-squared means in a statistical way and provide a practical explanation of the information to senior management. (1 mark)
iii. A coworker wants to know what the predicted gross drug costs would be for a new
member. The new member is a 73-year-old man who the government classifies as frail and he has a risk score of 510. Using the model above, what would you predict the gross drug costs will be? (1 mark)
(4.2) Assume we are using this model: (8 marks)
GrossDrugCost = B0 + B1 * Risk Score + B2 * Age + B3 * Gender + ε
iv. Provide a statistical interpretation of the coefficient and p-value for the gender variable. Provide a practical explanation of the information to senior management. (1 mark)
v. Provide a statistical interpretation of the coefficient and p-value for the age variable. Provide a practical explanation of the information for senior management. (1 mark)
vi. Provide a statistical interpretation of this model’s intercept. Provide a practical explanation of the information to senior management. (1 mark)
vii. Compare the adjusted R-squared values between Models 1 and 2. Are they the same or different? Why? What could you conclude about the differences (if any) in the adjusted R- squared values? (2 marks)
viii. Senior management wants to know the expected gross drug costs of the average
customer. That is, for the median value of the RiskScore, age and gender, what would you expect the average gross drug costs to be? (2 marks)
ix. A coworker wants to know what the predicted gross drug costs would be for a new
member. The new member is a 73-year-old who the government classifies as frail and he has a risk score of 510. Using the model above, what would you predict the gross drug costs will be if they were a man and if they were a woman? (1 mark)
(4.3) Create a better model (4 marks)
x. Develop a better regression model to predict gross drug costs. (2 marks)
xi. What did you learn from this model that previous models did not tell you? (2 marks)
Variables |
Definition |
RecordID |
Primary key from the database that is a unique number for each row of MemberID; A unique ID for each different member |
Month |
The month to which the data pertains, listed in numeric format as 1 for January, 2 for February, etc. |
GrossDrugCost |
The total amount of drug costs incurred by a member during the corresponding month |
NLISDummy |
A dummy variable that takes the value of 1 if the member is listed as non-low income by the government and 0 otherwise |
LISCHOSERDummy |
A dummy variable that takes the value of 1 if the member chose a specific plan and 0 if the member automatically was assigned a plan, i.e., members automatically are assigned (thus, LISCHOSERDummy |
RiskScore |
A score assigned by the government based on previous government data indicating how sick someone is, higher scores indicate members are sicker |
SpecialtyDummy |
A dummy variable that takes the value of 1 if the member utilizes specialty drugs and 0 otherwise |
AdjudicationDays |
The number of non-holiday workdays in a month Age |
Gender |
A dummy variable that takes the value of 1 if the member is female and 0 if the member is male |
FrailtyDummy |
A dummy variable that takes the value of 1 if the government indicates the member is frail and 0 if the government indicates the member is not frail |
HospiceDummy |
A dummy variable that takes the value of 1 if the member is receiving hospice care and 0 if they are not |
InstitutionDummy |
A dummy variable that takes the value of 1 if the member is receiving institutionalized long-term care (e.g., hospital, nursing facility) and 0 if they are not |
ESRDDummy |
A dummy variable that takes the value of 1 if the member is receiving care for end-stage renal disease (i.e., end-stage kidney disease) and 0 if they are not |
SUBMISSION DOCUMENT
MS Word file with the answers to all assignment questions supported by screenshots from Excel output (where relevant). The submitted file should contain student’s Name, Surname, and Student ID.
版权所有:编程辅导网 2021 All Rights Reserved 联系方式:QQ:99515681 微信:codinghelp 电子信箱:99515681@qq.com
免责声明:本站部分内容从网络整理而来,只供参考!如有版权问题可联系本站删除。