Categorical Data Analysis
Take-home Assignment №2
Deadline: March 12, 2020, at 23:59 MT
Please submit your assignment via LINK. Make sure that you clearly name your assignment
files. You are supposed to submit two files – a PDF/Word doc and an R code. Instead of
submitting an assignment as a Word or PDF doc with the file name like “Assignment 2” try
naming the file something like this: “Your surname-A2-CDA-2020”.
Course Policy Reminder
∙ Working Together. Unless instructed otherwise (e.g. for the Replication
Project), students may work together on assignments. However, students have
to write up their own solutions in their own words. If a student turns in material
that is in the same words as a fellow student, the work will be considered to be
plagiarized. Plagiarism will be dealt with according to the policies of the HSE.
∙ Late Assignments. Unless otherwise instructed, assignments have to be
submitted before the beginning of the corresponding lecture. Late assignment
has a half-life of 24 hours; that is a student gets 50% credit if it is handed in late,
but within 24 hours of the due time; a student gets 25% credit for the next 24
hours; etc.
∙ Academic Fraud. Plagiarism and any other activities when students present
work that is not their own are academic fraud. Academic fraud is a serious matter
and is reported to the Academic Supervisor and the Manager. All the cases of
academic fraud will be individually discussed and resolved with according to the
policies of the HSE.
Part 1: Prove Yourself !
1. Calculate 𝑃(𝑦𝑖 = 0|𝑥𝑖), 𝑃(𝑦𝑖 = 1|𝑥𝑖), 𝑃(𝑦𝑖 = 2|𝑥𝑖) and 𝑃(𝑦𝑖 = 3|𝑥𝑖) for 𝑦𝑖 which is
measured on an ordered 4-point scale. Consider that observed 𝑦𝑖 = 0 if latent 𝑦𝑖* ≤ 𝜏1;
𝑦𝑖 = 1 if 𝜏1 < 𝑦𝑖* ≤ 𝜏2; 𝑦𝑖 = 2 if 𝜏2 < 𝑦𝑖* ≤ 𝜏3; and 𝑦𝑖 = 3 if 𝑦𝑖* > 𝜏3.
2. Let’s assume that we have an ordered probit model. A dependent variable 𝑦𝑖
is
measured on an ordered 4-point scale. An independent variable 𝑥𝑖
is continuous and
1
Categorical Data Analysis: Take-home Assignment №2
distributed as (−∞; +∞). If 𝛽ˆ
0 = −.50, 𝛽ˆ
1 = .052, 𝜏ˆ1 = .75, 𝜏ˆ2 = 3.5 and 𝜏ˆ3 = 5.0,
then calculate predicted probabilities for the following cases:
x = 15 x = 40 x = 80
P(y = 1 | x) ??? ??? ???
P(y = 2 | x) ??? ??? ???
P(y = 3 | x) ??? ??? ???
P(y = 4 | x) ??? ??? ???
Part 2: Ordered Logit and Gologit
To complete these tasks use The European Quality of Government Index (EQI) data (France,
2017). I’ve already prepared a dataset with all the variables recoded. See the description of
variables and their scales in the table below. Feel free to practice your data management
skills and use files from official website here, or use this file otherwise.
Code Question Wording Scale
Q10 All citizens are treated equally in the public education system in my area
Ordered:
4-Agree
3-Rather agree,
2-Rather disagree
1-Disagree
Q1 Have you or any of your immediate family been enrolled or employed in
the public school system in your area in the past 12 months? Binary: Yes/No
Q4 How would you rate the quality of public education in your area?
Continuous:
1- Very poor
10 - Excellent quality
Q13 Corruption is prevalent in my area’s local public school system
Continuous:
1 - Strongly disagree
10 - Strongly agree
Q17_1 In the past 12 months have you or anyone living in your household
paid a bribe in any form to Education services? Binary: Yes/No
D1 Gender of respondent Binary: M/F
D3 Age of respondent Continuous: 18-99
D2 Education level of respondent
Factor:
1-Elementary school or less,
2-High school (but did not graduated),
3-Graduation from high school,
4-Graduation from college, university,
5-Post-graduate degree (Masters, PHD)
RECODED4 Please tell me your average total household net income per month
Factor:
1-low,
2-medium
3-high
1. Show descriptive statistics for all the variables in the dataset. Use tables, simple tests
(correlations, T-tests, Chi-square, etc.) and visualization to describe your data. In case
of missing values, analyze whether they are completely random, random or not at
random. Compare descriptive statistics before and after you omit missing values.
2. Rearrange levels of Q10 in the ’Agree → Rather Agree → Rather Disagree → Disagree’
order. Build an ordered logistic regression model.
2 CDA 2020
Categorical Data Analysis: Take-home Assignment №2
3. Use Q10 as the dependent variable and the rest of variables in your dataset as independent
ones. Build an ordered probit and a linear regression model. Create a table with three
columns then (you might use stargazer package to create a fancy table). Compare
the results.
4. Test two hypotheses about linear restrictions using the Wald test. Use any variables
for which you find it relevant. Interpret the results, i.e. include in your answer 𝐻0 and
𝐻1, statistics, p-value and substantial interpretation.
5. Get rid of variables whose coefficients are not statistically significant. Compare this
model with the full one using Likelihood Ratio Test. Interpret the results, i.e. include
in your answer 𝐻0 and 𝐻1, statistics, p-value and substantial interpretation.
6. Calculate odds ratios for one continuous and two binary variables and interpret them.
Use both statistical and substantive interpretation of your results.
7. Create one graph with predicted probabilities and one graph with cumulative probability
of different categories depending on a continuous variable. Interpret the resulted graphs.
Use both statistical and substantive interpretation of your results.
8. Test your model for the parallel regression assumption. nterpret the results, i.e. include
in your answer 𝐻0 and 𝐻1, statistics, p-value and substantial interpretation.
9. Build (if necessary) least constrained model (Gologit), Partial Proportional Odds
Model and Ordered logit. Compare these models using Likelihood Ratio Test. Show
estimates of the best model in a table. Interpret the results.
Part 3: Multinomial Logit
To complete these tasks use the data from European Election Study on elections in Netherlands.
I’ve already prepared a dataset with all the variables recoded. See the description of variables
and their scales in the table below. Use this file.
1. Use party as the dependent variable and income, age, educ, union as independent
ones. Let’s assume that the reference party is PvdA - Social Democrats (0). Build a
multinomial logistic regression.
2. Build another model with the same set of predictors, but add relig as well. Test the
significance of this coefficient using Wald test. Compare these models using Likelihood
Ratio Test. Interpret the results, i.e. include in your answer 𝐻0 and 𝐻1, statistics,
p-value and substantial interpretation.
3. Tests for combining dependent categories. Interpret the results, i.e. include in your
answer 𝐻0 and 𝐻1, statistics, p-value and substantial interpretation. Combine categories
if it’s necessary.
4. Calculate odds ratios for one continuous and one binary variables. Interpret them. Use
both statistical and substantive interpretation of your results.
3 CDA 2020
Categorical Data Analysis: Take-home Assignment №2
Code Question Wording Scale
party Which party did you vote for in the last elections?
Factor:
0 - PvdA (Social Democrats - Left)
1 - CDA (Christian Democrats - Right, Religious)
2 - VVD (Liberals - Right Secular)
3 - D66 (Democrats 66 - Social-liberals, Democrats)
income Income of respondent’s household
Continuous:
1 - less than 21,000
2 - 21,000 - 23,999
3 - 24,000 - 29,999
4 - 30,000 - 35,999
5 - 36,000 - 43,999
6 - 44,000 - 54,000
7 - more than 54,000
age Age of respondent
Continuous:
1 - 17-20 years
2 - 21-25 years
3 - 26-30 years
4 - 31-35 years
5 - 36-40 years
6 - 41-45 years
7 - 46-50 years
8 - 51-55 years
9 - 56-60 years
10 - 61-65 years
11 - 66-70 years
12 - 71-75 years
13 - 76 years and older
educ Education of respondent
Continuous:
1 - low
5 - high
relig Is respondent religious?
Binary:
1 - yes
2 - no
union Is respondent a member of any labor union?
Binary:
1 - yes
2 - no
5. Interpret the predicted probabilities for the parties depending on the values of one
binary and one continuous variables using graphs.
6. Describe individuals who are most and least likely to support Social Democrats.
7. Test the IIA assumption and interpret the results.
4 CDA 2020
版权所有:编程辅导网 2021 All Rights Reserved 联系方式:QQ:99515681 微信:codinghelp 电子信箱:99515681@qq.com
免责声明:本站部分内容从网络整理而来,只供参考!如有版权问题可联系本站删除。