联系方式

  • QQ:99515681
  • 邮箱:99515681@qq.com
  • 工作时间:8:00-21:00
  • 微信:codinghelp

您当前位置:首页 >> Algorithm 算法作业Algorithm 算法作业

日期:2019-08-17 10:43

Final Coursework

Introduction to Quantitative Research Methods (PUBL0055)

Instructions

The coursework will be posted on Moodle on 14 December 2018 at 6pm, and is due on 7 January 2019

at 2pm. Please follow all designated SPP submission guidelines for online submission as detailed on the

PUBL0055 Moodle page. Standard late submission penalties apply.

This is an assessed piece of coursework (worth 75% of your final module mark) for the PUBL0055

module; collaboration and/or discussion of the coursework with anyone is strictly prohibited. The rules

for plagiarism apply and any cases of suspected plagiarism of published work or the work of classmates

will be taken seriously.

As this is an assessed piece of work, you may not email/ask the course tutors or teaching fellows

questions about the coursework.

Along with the coursework itself, the datasets for the coursework can be found in the PUBL0055 page

on Moodle.

Coursework should be submitted via the appropriate link on the course Moodle page. You will need to

click the ‘Submit Paper’ link at the bottom of the page. When presented with the ‘Submit Paper’ box,

the ‘Submission Title’ should be your candidate number, and you should upload your document into

the box provided.

– Please remember to state ONLY your candidate number on your coursework (your candidate

number is made up of four letters and one number e.g. ABCD5). Your name and/or student

number must not appear on your coursework.

Answers should be written in complete sentences. Be sure to answer all parts of the questions posed

and interpret the results.

The word count for this assessment is 3000 words. This does not include the appendix, or any words

(or numbers) contained within tables. Please note that any full sentences included in tables will form

part of the word count.

Please submit your type-written (numbered) answers in a single document. Create an appendix section

at the end which contains all the R code needed to reproduce your results (you do not need to include

the code that failed to run, but just the cleaned-up version. Your code has to work when we run it).

Failure to include the R code means that the coursework will be marked incomplete.

You may assume the methods you have used (e.g. t-test, linear regression, etc) are understood by the

reader and do not need definitions, but you do need to explain the intuition of these methods.

Round all numbers to two digits after the decimal point.

Do not copy and paste any brute R output (e.g. summary(lm(y ~x))) into your answers. Create a

minimally formatted table, e.g. with the screenreg command as seen in class. If that does not work,

re-create by hand such a table.

Assign every table and figure a title and a number and refer to the number in the text when discussing

a specific figure or table.

1

Datasets

Varieties of Democracy – vdem.csv

This data set includes several variables taken from the Varieties of Democracy project (https: //www.vdem.net/en/).

The unit of analysis is the country-year. The data here covers 161 countries for the years 1993

and 2010. There are a total of 2898 observations in the data.

Table 1: Varieties of Democracy codebook

Variable Description

country_name The name of the country

year The year of the observation

child_mortality The number of deaths prior to age 1 per 1000 live births in a year

inequality_gini A measure of income inequality, based on the GINI coefficient.

Higher values indicate more inequality. The theoretical minimum is

0, where income is perfectly equal, and the theroetical maximum is

100, where one individual has all the income.

life_expectancy Life expectancy at birth (in years)

radio_television_per_cap Number of radio and television sets per capita

log_population Logged population

civil_war 1 if there was an intra-state war with at least 1,000 battle deaths in

a given country-year, 0 otherwise

international_war 1 if the country participated in an international armed conflict in a

given year, 0 otherwise

urban_population_pct Percentage of population living in urban areas (in percentage points)

oil_production_per_cap Value of petroleum produced per capita

gdp_per_cap Gross domestic product, per capita

inflation Annual inflation rate

region_name Geographic region in which the country is located (categorical)

education15 Average years of education among citizens older than 15

government_effectiveness A continuous measure of government effectiveness based on the

quality of public service provision amongst bureaucrats and

government actors (Higher values indicate more effective

government)

political_stability A continuous measure of political stability based on perceptions of

the likelihood that the government in power will be destabilized or

overthrown by possibly unconstitutional and/or violent means

(Higher values indicate higher levels of stability)

polity Score on the polity scale (higher values indicate more democratic

countries, lower values indicate more autocratic countries)

healthcare A continuous variable measuring the extent to which high quality

basic healthcare is guaranteed to all (higher values indicate higher

access to healthcare)

womens_civ_lib A continuous variable indicating whether women have the ability to

make meaningful decisions in key areas of their lives (higher values

indicate higher levels of civil liberties for women)

media_censorship A continuous variable indicating whether the government directly or

indirectly attempts to censor the print or broadcast media (lower

values indicate higher levels of censorship)

internet_access 1 if there internet in this country-year, 0 otherwise

2

You can access this data in two ways:

1. You can download the vdem.csv data file from Moodle, copy it to your working directory, and load it

into R as we have been doing in class.

– or –

2. You can run the following line of code in R and this will load the data directly from the course website:

vdem <- read.csv("https://uclspp.github.io/datasets/data/vdem.csv")

These two ways of loading the data will produce identical results.

3

European Social Survey – ess.csv

This dataset includes several variables taken from the 2016 European Social Survey (https://www.europeansocialsurvey.org).

The unit of analysis is individual respondents to a face-to-face survey. There are a total of 13075 observations

in the data, with respondents surveyed in 17 different European countries.

Table 2: ESS codebook

Variable Description

country_code The country of the respondent

leave 1 if the respondent would vote to leave the European Union in a referendum, 0

otherwise

gender Whether the respondent is male or female

age The age of the respondent (in years)

years_education The number of years of education the respondent has completed

unemployed 1 if the respondent is unemployed, 0 otherwise

income 1 if the respondent earns above the median income in their country, 0 otherwise

religion Categorical variable of the religion of the respondent

trade_union 1 if the respondent is a member of a trade union, 0 otherwise

news_consumption Amount of time the respondent spends reading newspapers/online news each

week (in minutes)

trust_people The degree to which the respondent trusts other people (0 = low trust, 10 = high

trust)

trust_politicians The degree to which the respondent trusts politicians (0 = low trust, 10 = high

trust)

past_vote 1 if the respondent voted in the last general election in their country, 0 otherwise

immig_econ The respondent’s view of the economic effects of immigration in their country (0

= Immigration is bad for the economy; 10 = Immigration is good for the

economy)

immig_culture The respondent’s view of the cultural effects of immigration in their country (0 =

Immigration undermines the country’s culture; 10 = Immigration enriches the

country’s culture)

country_attach The respondent’s emotional attachment to their country (0 = Not at all

emotionally attached; 10 = Very emotionally attached)

climate_change How worried the respondent is about climate change (1 = Not at all worried; 5 =

Very worried)

imp_tradition How important the respondent feels it is to follow traditions and customs (1 =

Very important; 6 = Not at all important)

imp_equality How important the respondent feels it is people are treated equally and have

equal opportunities (1 = Very important; 6 = Not at all important)

eu_integration The respondent’s views on European unification/integration (0 = “Unification

has already gone too far”; 10 = “Unification should go much further”)

Again, you can access this data in two ways:

1. You can download the ess.csv data file from Moodle, copy it to your working directory, and load it

into R as we have been doing in class.

– or –

2. You can run the following line of code in R and this will load the data directly from the course website:

ess <- read.csv("https://uclspp.github.io/datasets/data/ess.csv")

4

Part 1: Does rain affect turnout? (25 points)

If voters are rational, then their behaviour should be responsive to changes in the costs and benefits they

face in casting votes on election day. Elections in the US are typically held in November, when it can be

very bad weather in some parts of the country. When the weather is bad – in particular, when it rains –

this imposes additional costs on potential voters, and may lead to declines in turnout on election day. In

this question, your task is to interpret regression models which analyse the relationship between inclement

(i.e. bad) weather and voter turnout at the county level in US presidential elections. The models also include

information about how competitive the presidential race is in different counties – where a county is considered

competitive if it is located in a state where the presidential race is close between the top two candidates.

A team of researchers decided to analyse the effects of rain on turnout by collecting data from 43124 countylevel

elections in the US between 1948 and 2000. For each county-year observation in their data, they collect

information on the following variables:

Table 3: Part 1 variables

Variable Description

Turnout The turnout rate in a given county in the election (measured in percentage points)

Rain The rainfall in the county on election day (measured in inches)

Competitive 1 if the county is located in a competitive state, and 0 otherwise

Unemployment The unemployment rate in a given county (measured in percentage points)

High_School The high school graduation rate in the county (measured in percentage points)

To test the effect of rain on voter turnout, the researchers ran the following linear regression models, both of

which have turnout as the dependent variable.

Model 1

Turnouti = α + β1Raini + β2Competitivei + β3Unemploymenti + β4High_schooli + i

Model 2

Turnouti = α+β1Raini+β2Competitivei+β3(Raini

·Competitivei

)+β4Unemploymenti+β5High_schooli+i

The estimates produced by these two models are presented in table 4 below. You should use these estimates

to provide answers to the following questions.

Questions

1) What is the null hypothesis for the interaction term β3 in model 2?

2) Interpret the effects of rain on turnout using the coefficients from models 1 and 2.

3) Based on model 2, what is the expected level of turnout for a county with the following characteristics?

A) A county with 0 inches of rain, in an uncompetitive state, with an unemployment rate of 6% and a

high school graduation rate of 80%

B) A county with 3 inches of rain, in an uncompetitive state, with an unemployment rate of 6% and a

high school graduation rate of 80%

C) A county with 0 inches of rain, in a competitive state, with an unemployment rate of 6% and a high

school graduation rate of 80%

5

D) A county with 3 inches of rain, in a competitive state, with an unemployment rate of 6% and a high

school graduation rate of 80%

4) Based on your answers to the questions above, do you conclude that rain is an important determinant of

turnout in US elections?

Table 4: Rain and turnout regressions

Turnout

Note: Standard errors in parentheses

Something

6

Part 2: Inequality and child mortality (40 points)

There is considerable academic debate regarding the relationship between socioeconomic inequalities and

various health outcomes for children. In particular, one influential theoretical argument suggests that if

income is redistributed from richer people to relatively poorer people, health outcomes for poor children

should be expected to improve, but we should not expect a similar decline in the health outcomes of rich

children.

At the aggregate level, and focussing on child mortality as a measure of children’s health outcomes, this

argument suggests that we should observe a positive association between income inequality and child mortality.

The plot below shows the bivariate association between these variables based on the data that you will use

for this question. The dataset for this question is the Varieties of Democracy dataset, which can be found in

the vdem.csv file described above.

0 20 40 60 80 100

0 50 100 150

Income inequality

Child mortality per 1000 births

Child mortality – here displayed on the Y -axis – measures the number of children who die before the age of

1 for every 1000 live births in a country each year. Income inequality – on the X-axis – is measured as a

country’s GINI coefficient, which is a measure of how concentrated the wealth of a country is. The GINI

measure can range in theory from 0 – where income is perfectly equally shared amongst all citizens – to 100 –

where all of a country’s income is concentrated in the hands of a single individual.

Questions

1) Your main task in this section is to develop theoretically-grounded models of child mortality using the

Varieties of Democracy data. In this subquestion, you should implement two linear regression models with

child_mortality as the dependent variable.

In the first model, the only explanatory variable should be the inequality_gini variable.

For the second model, you should build a model which – in addition to the inequality_gini variable –

includes six theoretically important explanatory variables that you think might be appropriate from the

supplied dataset. You should explain why you think these particular variables are important to include, given

that our main interest is in the relationship between inequality and child mortality. Please note that, for the

second model, you should not estimate several different models and present the results, but rather you should

7

argue theoretically why you chose certain variables. You should also consider whether any non-linear and/or

interactive specifications of the variables you include in your model would be appropriate.

You should write up the results of these models as if they were to be published in a political science journal

article with a focus on communicating the substantive meaning of your results. In your discussion of these

models, you should focus on communicating the substantive implications of the regression that you implement,

paying particular attention to the relationship between child mortality and income inequality. You may wish

to focus on the following:

Provide descriptive statistics and/or plots to provide the reader with an overview of the dependent

variable and the important explanatory variables that you intend to use.

Provide a well-formatted table of regression output which includes the key information about the two

models you have estimated.

For the second model, you must use a model which includes 7 explanatory variables (inequality_gini

plus the 6 you choose) to explain child mortality. You should state an appropriate hypothesis/null

hypothesis for the variables in your model.

Discuss the statistical significance of the coefficients in the models.

Present quantities of interest from your models that help to describe the relationship between income

inequality and child mortality. You could also illustrate the relative importance of the different

explanatory factors that you include in the second model. Examine the effects for sensible values of the

independent variables, and focus your interpretation not just on the direction of the effects, but also

the magnitude of the effects.

Discuss the fit of your two models using appropriate statistics.

Evaluate the models with reference to the assumptions of linear regression and, if appropriate, implement

corrections when these assumptions appear to be violated.

You should not use fixed-effects in either of the models in this question.

2) You present the results of your models to a friend who argues that, because you are dealing with panel

data, you should think about adding fixed-effects to your second model. Address this point now by focussing

on the following:

a) Test for the presence of unit and time fixed-effects in this data, and present results from an appropriate

model specification. Interpret any changes that have resulted from this change in model specification,

particularly with reference to the inequality variable.

b) Test for and correct any dependence in the error term. Do your substantive conclusions change at

all?

8

Part 3: Leave or Remain? (35 points)

What determines support for the European Union? In the aftermath of the UK public’s vote to leave the

EU in the 2016 referendum, much attention has been paid to whether support for the EU varies predictably

across different types of individuals. In this question, you will use an appropriate binary dependent variable

model to improve our understanding of which types of citizens are more or less likely to vote to leave the

European Union if a referendum on membership were to be held in their country.

The data for this question comes from the 2016 European Social Survey (ESS) and includes information on

the political attitudes and demographics of European citizens. The data can be found in the ess.csv file

described above. In 2016, the ESS included the following question:

Imagine there were a referendum in your country tomorrow about membership of the European

Union. Would you vote for your country to remain a member of the European Union or to leave

the European Union?

The dependent variable for this analysis is leave, which measures 1 if the respondent said that they would

vote to leave the EU, and 0 otherwise.

1) The primary task of this section is to implement a logistic regression model with 5 theoretically important

predictors from the dataset. You should explain why you have selected the variables that you include in the

model and explain – from a theoretical perspective – why you expect them to be important for determining

whether an individual would decide to vote to leave the EU or not. As with question 2, you should think

carefully about your choice of variables, and consider whether it would be appropriate to include non-linear

or interactive specifications of these variables.

You should focus on the following:

Fit a model which uses 5 explanatory variables to predict referendum vote choice. You should be

clear about the theoretical rationale for including each variable, and you should state the appropriate

hypothesis for each of the variables in your model. Do not include the immig_econ or immig_cultre

variables in this model.

Provide and discuss an appropriate fit statistic for your model

Interpret your model in both statistical and substantive terms. You should present predicted probabilities

from the model that help to illustrate the substantive importance of the variables in your model. (I.e.

simply reporting estimated coefficients is not sufficient for full marks).

Create at least one plot of predicted probabilities from your model for a continuous independent variable.

You should write up your results as if they were to be published in a political science journal article with a

focus on communicating the substantive meaning of your results.

2) An ongoing academic discussion focusses on whether cultural or economic concerns about immigration are

more important as predictors of support for the European Union. To contribute to this debate, you will now

develop your model from the first part of this section by including some additional variables.

a) Estimate a new version of your model, this time including the immig_econ and immig_cultural

variables.

b) Does this model provide a better fit to the data than your original model? Use a fit statistic that

you have learned on this course to check.

c) How – if at all – does your interpretation of the effects of the original variables change in this new

model? Why might this be the case?

d) Calculate some predicted probabilities to demonstrate the substantive effects of the immig_econ and

immig_cultural variables. Do you conclude that economic concerns about immigration or cultural

concerns about immigration are more important in predicting opposition to the EU?

9


版权所有:编程辅导网 2021 All Rights Reserved 联系方式:QQ:99515681 微信:codinghelp 电子信箱:99515681@qq.com
免责声明:本站部分内容从网络整理而来,只供参考!如有版权问题可联系本站删除。 站长地图

python代写
微信客服:codinghelp