ETF1100 Business Statistics
Semester 2, 2024: Group Assignment
Problem background:
A home loan is a type of loan provided by financial institutions, such as banks or housing finance companies, to help individuals or families purchase a house
This dataset used for this group project contains a random sample of home loan data from Kaggle:
https://www.kaggle.com/datasets/rishikeshkonapure/home-loan-approval
It is used to analyse the home loan application trends in the different property areas based on various demographic or financial factors such as the borrower's gender, education, income, employment history, credit history, etc.
Assignment submission guidelines
• For all questions, submit one PDF answer and your Excel file with graphs and tables. Copy and paste all graphs and tables into your PDF.
• ONLY 1 attempt is allowed for the Assignment.
• If the question has sub-parts, for example, (a), (b) …, please clearly indicate each part's labels.
• All graphs and tables must be appropriately labelled; otherwise, marks will be
deducted. Marks will also be deducted for poor presentation and spelling errors.
Assignment marks
The maximum total mark for the assignment is 100. Your total score will be composed of two parts:
• Questions 1-6: maximum marks of 85.
• Peer evaluation via Feedback Fruits: maximum marks of 15. You will be required to
fill in the peer evaluation on Feedback Fruits to be eligible for this component. The Chief Examiner reserves the right to adjust individual report marks based on the peer evaluation. Should the feedback indicate that an individual did not contribute to the group assignment, the reporting mark will be adjusted to zero, implying that the individual’s group assignment contribution to their final grade will be 0%.
Presentation requirements:
● All answers should be in font size 12pt and 1.5 spacing.
● Plots and tables must be legible, with appropriate labels to aid readers.
● Statistical results need to be summarised in succinct table formats.
● You will lose marks for poor presentation.
Data and variables:
• Open Module 14: Group Project on Moodle and look for the workbook labelled ETF1100 Group Assignment Data.xlsx.
• The data file contains both numeric and categorical data. It has already been cleaned for you, and any missing records have been removed.
• Hover your cursor over the variable names in the dataset. You will see a description of each variable.
Purpose: Your task is to analyse and report how the loan amount is associated with various
factors. This assignment explores the relationship between loan amount and other independent variables using the following statistical tools:
Pivot Tables and Charts |
Summary Statistics |
Confidence Intervals |
Hypothesis Testing |
Regression Analysis |
|
Assignment questions:
1. Construct two pivot tables as described below:
(i) the distribution of average loan amounts based on Loan Status and Property Area (ii) the proportion of loans approved and rejected based on each Property Area.
Using the two tables above, discuss and compare the information for rural and urban areas, respectively.
Your answer to this question should not be longer than 1 page. (5 marks)
2. Generate summary statistics and histograms to compare the distributions of loan amounts in rural and urban areas. In your discussion, include measures of central tendency, variability and shape. When discussing, include contextual interpretations of the measures used.
Your answer to this question should not be longer than 2 pages. (15 marks)
3. Explore the relationship between loan amounts in rural and urban areas by calculating confidence interval estimates as shown below:
a. Calculate the 95% confidence interval estimate of the true average loan amount for rural and urban areas. Use the format shown below.
Confidence Interval Estimate of Average Loan amount for Property areas |
||
Property areas |
Lower Boundary / Limit |
Upper Boundary / Limit |
Rural |
|
|
Urban |
|
|
b. Calculate the 95% Confidence Interval estimate of the true average loan amount for rural and urban areas that have the following factors:
• Education
• Loan status
Use the format shown below for education. Please use a similar format for Loan status.
Confidence Interval Estimate of Average Loan amount of rural area |
||
Education |
Lower Boundary / Limit |
Upper Boundary / Limit |
Graduate |
|
|
Not Graduate |
|
|
Confidence Interval Estimate of Average Loan amount of urbanarea |
||
Education |
Lower Boundary / Limit |
Upper Boundary / Limit |
Graduate |
|
|
Not Graduate |
|
|
c. Discuss your results obtained in (a) and (b), based on all tables produced.
For part (c) only, the expected length of the answer should be less than a page. (20 marks)
4. We wish to disentangle the relationship between Loan status and education status in rural and urban areas. Use your knowledge in Hypothesis Testing to answer the following questions. State all five steps.
a. More than 70% of approved home loans in rural areas were applied by graduate applicants.
b. More than 70% of approved home loans in urban areas were applied by graduate applicants. (10 marks)
5. Estimate a multiple regression model to analyse the relationship between: the loan amount and all variables below:
Gender |
Applicant’s income |
Self-employed |
Marital Status |
Co-applicant’s income |
Loan Term |
Property areas |
Dependents |
Education |
Credit History |
|
|
You are required to produce one multiple regression output. Based on your output, discuss the findings and include an analysis of the statistical significance of various factors in the model. Highlight the key factors that the multiple regression reveals as the driver of the loan amount.
Your answer to this question should be approximately 1 to 2 pages. (15 marks)
6. Refer to your statistical analysis and results in sections 1 to 5. Write a short summary to conclude your findings on
• factors associated with loan amount
• the importance of demographic or financial status
• recommendations for home loan applicants to improve their ability to obtain higher loan amounts
Your answer to this question should be approximately 1 to 1.5 pages. (20 marks)
版权所有:编程辅导网 2021 All Rights Reserved 联系方式:QQ:99515681 微信:codinghelp 电子信箱:99515681@qq.com
免责声明:本站部分内容从网络整理而来,只供参考!如有版权问题可联系本站删除。