联系方式

  • QQ:99515681
  • 邮箱:99515681@qq.com
  • 工作时间:8:00-21:00
  • 微信:codinghelp

您当前位置:首页 >> Python编程Python编程

日期:2023-04-04 09:34

Statistical Modelling for Business

Q1 Why in OLS for SLR is the sample average error, eˉ = 1n

∑n

i=1 ei = 0?

(a) Because it is an error term , it has to average 0.

(b) Because each error ei is equal to 0, therefore the average of a set of 0s is 0.

(c) Because when we do OLS, we take the 1st derivative of the RSS, and the derivative

with respect to β1 is ?2× the sum of the errors times X, i.e. ?2 ×

∑n

i=1 eiXi. We

set this sum equal to 0 to get the LS estimate. Thus eˉ = 0

(d) Because when we do OLS, we take the 1st derivative of the RSS, and the

derivative with respect to β0 is ?2× the sum of the errors, i.e. ?2×

∑n

i=1 ei.

We set this sum equal to 0 to get the OLS estimate. Thus eˉ = 0.

(e) None of the above are correct.

Q2 True or False? LSA 2 states that E(εi|Xi) = 0. This implies that the residual series ε

and X are uncorrelated. Answer: True

Q3 LSA 2 states that E(εi|Xi) = 0. This implies that the error series ε and X are uncorre-

lated, because:

(a) Other factors always exist and are implicitly affecting Y through ε, thus ε and X

must be uncorrelated.

(b) ε is an i.i.d error series and hence must be uncorrelated with X.

(c) If they were correlated, then the slope of the regression of εi on Xi would

not be 0, i.e. we could write E(εi|Xi) = γ0 + γ1Xi and γ1 6= 0. Thus LSA 2

would not be correct.

(d) Other factors always exist and are implicitly affecting Y through ε, thus ε and X

must be correlated. Hence, LSA 2 does not imply they are uncorrelated.

Why does RSS always decrease when you add another X variable to the regression model?

Q4

(a) Because the new X variable is always significantly related to Y

2(b) Because now there is one extra parameter with which to optimize RSS, meaning a

more optimum, hence lower, RSS can be found.

(c) Because the OLS estimate of the new X’s regression slope will not be exactly 0.

(d) It doesn’t, sometimes RSS increases or stays the same, e.g. if the new X variable is

not related to Y.

(e) Both (b) and (c) are true.

Q5 Would the variable number of children cause OVB regarding the effect of Salary on

Amount Spent?

(a) Number of children would not be correlated with Salary, so: NO.

(b) Number of children is likely correlated with Salary, but it would not be a factor

determining Amount Spent, so: NO.

(c) Number of children is likely correlated with Salary. Also, number of children could

be a factor determining Amount Spent, so: YES.

(d) Even though number of children is a likely determinant of Amount Spent, it would

not be correlated with Salary, so NO.

(e) We should first look at the sample correlation between number of children

and Salary here. Then, decide whether number of children could be a

determinant of Amount Spent.

Q6 Would the variable IQ level cause OVB regarding the effect of Salary on Amount Spent?

(a) It is likely that IQ is correlated with Salary. It is unlikely that IQ is a

determinant of Amount Spent for a company like Direct Marketing which

sells clothing,books and sports gear. So: NO.

(b) IQ would not be correlated with Salary nor would it determine Amount Spent, so

NO.

(c) IQ would be correlated with Salary and thus also be correlated with Amount Spent,

since Salary is correlated with Amount Spent. Thus, YES.

(d) IQ would not be correlated with Salary, but it would help determine Amount Spent,

so NO.

Q7 The Mann-Whitney U test is preferred to the t-test whenever:

(a) The dataset in each group has a large enough sample size, ni, for the central limit to

work, i.e. ni ≥ 30.

(b) The data has no outliers and has a symmetric shaped distribution in each group.

3(c) The data has some outliers and it is unclear if E(Y 4) <∞ in each group.

(d) The data are on the ordinal scale.

(e) Both (c) and (d) are correct.

Q8 The t-test for a mean (difference) is very popular and mostly used in practice because:

(a) It has comparatively high power and is also robust to outliers.

(b) Its properties are very well known under the LSA; e.g. BLUE, consistency,

etc

(c) It has higher power than both the Mann-Whitney and median tests, for data with

infinite 4th moments.

(d) It has lower power than both the Mann-Whitney and median tests, for data with

infinite 4th moments.

(e) None of the above.

Q9 Consider a MLR model with p predictors, estimated by OLS. If another predictor was

added to the model and then the model was re-estimated by OLS, with those p+1 predictors,

then:

(a) R2 would increase and SER would decrease.

(b) R2adj would increase and SER would decrease.

(c) R2 would increase and RSS would decrease.

(d) R2adj would increase and RSS would decrease.

(e) None of the above would occur.

Q10 Omitted variable bias usually occurs in an SLR of Y on X whenever:

(a) OLS is used, but not when LAD is used.

(b) we have observational-type data, but not with randomized experimental

data.

(c) we have experimental data, but not with observational data.

(d) we estimate the SLR.

(e) all of the above occur.

Q11 Web designers conducted an A/B test regarding a new ”call to action” design on their

website. Visitors to the page were randomly assigned to see only one of either the Original,

or the New, call to action button on the website. It is then recorded as to whether each

4visitor clicked on the call to action button or not. The main question of interest is: Is there

a difference in button click rates between the two designs?

The observed contingency table for this dataset is given below.

Clicked

Button Yes No Total

Old 351 ?? 3642

New 485 ?? 3556

Total 836 ?? ??

Fill in the missing values in the contingency table.

There are four cells missing: o12, o22, then the sum of ”No” and the Total sum or

total sample size. o12 = 3642 ? 351 = 3291. o22 = 3556 ? 485 = 3071. Sum of ”No”

= 3291 + 3071 = 6362. Total sample size = 3642 + 3556 = 7198.

Q12 In the context of Q11, the expected values in the contingency table are calculated as:

Clicked

Button Yes No

Old ??? 3219.01

New 413.01 3142.99

Calculate it for the (1,1) cell (i.e. ”Yes” and ”Old”).

e1,1 =

R1×C1

N =

3642×836

7198 = 422.99.

Q13 In the context of Q11, are the conditions for Pearson’s chi-squared test satisfied in this

data?

There are 4 cells here, thus we need all 4 cells to have expected values of at least

5. All expected values are ≥ 5 (the smallest os 413), as required. We also need

iid data, which could be achieved by a random sample. Whilst the people coming

to the website are not randomly chosen, they are randomly allocated to either

the new or old call to action buttons. Thus, there is a reasonable chance that

this is close to an iid sample. Thus the conditions for Pearson’s test seem well

satisfied.

Q14 In the context of Q11, Pearson’s chi-squared test is conducted, giving a test statistic of

27.67 and a p-value of 1.43× 10?7. What are the hypotheses and conclusion of the test?

5The hypotheses are: H0 : Type of call to action button and whether the customer

clicks the button are unrelated, or independent. H1 : Type of call to action

button and whether the customer clicks the button are related, or dependent.

I choose α = 0.05 as standard. The test stat is V = 27.67, which follows a χ2

with (2 ? 1)(2 ? 1) = 1 degree of freedom, under the null hypothesis. The p-val

is P (χ21 ≥ 27.67) = 1.43 × 10?7 ≈ 0. The p-value < α = 0.05, so we reject the null

hypothesis and conclude that the variables type of button and clicking the button

are significantly related to each other.

Q15 LSA 5 states that homoskedasticity is assumed. This assumption implies that:

(a) the residual series ε and X are uncorrelated.

(b) the conditional variance V (Y |X) is a constant.

(c) there is omitted variable bias in the OLS estimates.

(d) the residual series ε and Y are uncorrelated.

(e) none of the above are true

Q16 When there are omitted variables in the regression, which are determinants of the

dependent variable, then

(a) the OLS estimator is biased if the omitted variable is correlated with the

included variable.

(b) you cannot measure the effect of the omitted variable, but the estimator of your

included variable(s) is (are) unaffected.

(c) this has no effect on the estimator of your included variable because the other variable

is not included.

(d) this will always bias the OLS estimator of the included variable.

Q17 A regression diagnostic tool used to study the possible effects of collinearity is

(a) the Variance Inflation Factor

(b) the slope

(c) the Durbin-Watson statistic

(d) the standard error of the estimate

Q18 Managed funds offer investors a convenient method for diversifying their portfolios.

However, there are many types of funds to choose from. The following table show the level

of return over five years for a sample of investors for various categories of managed funds.

6The observed contingency table for the data is given below:

Fund type High Ret Med Ret Low Ret Total

Maximum capital gain 108 46 71 225

Long-term growth 18 12 30 60

Balanced income 35 14 26 75

Common stock 25 7 8 40

Total 186 79 135 400

The estimated conditional probability of high return for Maximum capital gain is

(a) 0.48

(b) 0.3

(c) 0.467

(d) 0.625

Q19 Consider the following regression line: Y? = 10 ? 15X1 + 20X2. You are told that the

t-statistic on the slope coefficient of X1 is -3. What is the value of the standard error of the

slope coefficient on X1?

(a) 5

(b) 20

(c) -15

(d) 1.96

Q20 Consider the following regression model Y = β0 + β1X1 + β2X2 + . . .+ βpXp + . If any

X variable has R2j > 0.80 then

(a) V IFj < 20

(b) VIFj > 5

(c) VIFj > 20

(d) V IFj > 10


版权所有:编程辅导网 2021 All Rights Reserved 联系方式:QQ:99515681 微信:codinghelp 电子信箱:99515681@qq.com
免责声明:本站部分内容从网络整理而来,只供参考!如有版权问题可联系本站删除。 站长地图

python代写
微信客服:codinghelp