联系方式

  • QQ:99515681
  • 邮箱:99515681@qq.com
  • 工作时间:8:00-21:00
  • 微信:codinghelp

您当前位置:首页 >> Python编程Python编程

日期:2018-11-18 09:08

hw06-Copy1

November 16, 2018

1 Homework 6: Probability and Hypothesis Testing

1.1 Due Sunday November 18th, 11:59pm

Directly sharing answers is not okay, but discussing problems with the course staff or with other

students is encouraged.

You should start early so that you have time to get help if you’re stuck.

In [ ]: #: Don't change this cell; just run it.

import numpy as np

from datascience import *

%matplotlib inline

import matplotlib.pyplot as plt

plt.style.use('fivethirtyeight')

from client.api.notebook import Notebook

ok = Notebook('hw06.ok')

_ = ok.auth(inline=True)

Important: The ok tests don’t usually tell you that your answer is correct. More often, they

help catch careless mistakes. It’s up to you to ensure that your answer is correct. If you’re not

sure, ask someone (not for the answer, but for some guidance about your approach).

Once you’re finished, you must do two things:

1.1.1 a. Turn into OK

Select "Save and Checkpoint" in the File menu and then execute the submit cell below. The result

will contain a link that you can use to check that your assignment has been submitted successfully.

If you submit more than once before the deadline, we will only grade your final submission.

In [ ]: #: turn in your notebook

_ = ok.submit()

1.1.2 b. Turn PDF into Gradescope

Select File > Download As > PDF via LaTeX in the File menu. Turn in this PDF file into the

respective assignement at https://gradescope.com/. If you submit more than once before the

deadline, we will only grade your final submission

1

1.2 1. Numbers in a Slot Machine

You are in front of a slot machine with three slots. Each slot in the slot machine has 10 possible

outcomes: the numbers from 0-9. When you press the "Spin" button on the slot machine, each of

the three slots spins independently and stops at a number. Assume that the slot machine always

picks a number randomly.

Question 1. Suppose you win the jackpot if you are lucky enough to encounter the following

sequence of spins, in order:

Spin 1: You see a 777 in the slot machine.

Spin 2: You see a 999 in the slot machine.

What is the probability that you win the jackpot if you press the "Spin" button twice? Assign

your answer to jackpot_chance.

In [ ]: jackpot_chance =

jackpot_chance

In [ ]: #: grade 1.1

_ = ok.grade('q1_1')

Question 2. What is the probability that you see a number greater than 700 when you press

"Spin" once? Assign your answer to greater_than_700.

In [ ]: greater_than_700 = ...

greater_than_700

In [ ]: #: grade 1.2

_ = ok.grade('q1_2')

Question 3. Write a function called simulate_one_spin. It should take no arguments, and it

should return a random number that is equally-likely to come up in the slot-machine. Note that

since it is a number, the leading zeros are ignored. For example, if the slot number spits out 009,

then the corresponding return value of your function should be 9.

In [ ]: # Place your answer here. It may contain several lines of code.

In [ ]: #: grade 1.3

_ = ok.grade('q1_3')

Question 4. Call the function simulate_one_spin 100,000 times. What proportion of times

does the slot machine output 777? Assign your answer to proportion_777. Your solution may

take more than one line.

In [ ]: proportion_777 = ...

proportion_777

In [ ]: #: grade 1.4

_ = ok.grade('q1_4')

2

Question 5. Compute the probability that at least one of the slots in the slot machine (out of the

three) gives out a 7. You can write it as an expression which can be evaluated by Python. Assign

your answer to at_least_one_7.

In [ ]: at_least_one_7 = ...

at_least_one_7

In [ ]: #: grade 1.5

_ = ok.grade('q1_5')

1.3 2. Apples and Oranges

Suppose you are given a huge farm that yields apples and oranges.

In [ ]: #: Don't change this cell, just run it

apples = ['Apple' for _ in range(400)]

oranges = ['Orange' for _ in range(600)]

farm_table = Table().with_column(

'Fruit Type', apples + oranges

)

farm_table

Question 1. Because you like apples more, you’re interested in the proportion of apples

in the farm. Calculate the true proportion of apples in the farm. Store it in the variable

apples_true_prop.

In [ ]: apples_true_prop =

apples_true_prop

In [ ]: #: grade 2.1

_ = ok.grade('q2_1')

Question 2. Which of the following would create a representative sample of fruits and why?

Explain your answer.

1. farm_table.take(np.arange(200))

2. farm_table.sample(200)

Option 2 would create a representative sample of fruits becuause .sample would choose 200

fruits at random so each fruit has an equal chance of being selected; whereas np.arange does not

do it by random.

Question 3. Let’s say we have a fruit basket that can contain at most 200 fruits. We pick 200

fruits (without replacement) from the farm and place it in our fruit basket using the sampling you

chose in question 3 above. Write a function called pick_200_fruits that simulates this. Specifi-

cally, the function should take no arguments and should return an array of 200 fruits selected as

per your choice in question 3.

In [ ]: # Place your answer here. It may contain several lines of code.

3

In [ ]: #: grade 2.3

_ = ok.grade('q2_3')

Question 4. As we mentioned, we’re interested in knowing the true proportion of apples in the

farm. But we can pick only 200 fruits at a time in our fruit basket. Hence, we simulate this experiment

in 500 trials. For each trial, we decide to calculate the proportion of apples in our basket. Simulate

the experiment and store the array of proportions in the variable apples_empirical_props.

In [ ]: # Place your answer here. It may contain several lines of code.

In [ ]: #: grade 2.4

_ = ok.grade('q2_4')

Question 5. Now, compute the average of apples_empirical_props. You claim that this

average is a good estimate of the proportion of apples in the farm. Store your proportion in

apples_claim_prop.

In [ ]: apples_claim_prop = ...

apples_claim_prop

In [ ]: #: grade 2.5

_ = ok.grade('q2_5')

Question 6. How far away is your claim from the true proportion of apples. Compute the absolute

difference between the two and store it in the variable error. Remember that you calculated

the true proportion of apples in Question 2

In [ ]: error = ...

error

In [ ]: #: grade 2.6

_ = ok.grade('q2_6')

1.4 3. Broken Phones

A phone manufacturing company claims that it produces phones that are 99% non-faulty. In other

words, only 1% of the phones that they manufacture have some fault in them. They open a retail

shop in the friendly neighbourhood of La Jolla. Because the phones are cheap and nice, 100 UCSD

students have bought phones at this shop. However, it is soon discovered that four of the students

had faulty phones. You’re angry and argue that the company’s claim is wrong. But the company

is adament that they are right. You decide to investigate.

Question 1. Assign null_probabilities to a two-item array such that the first element contains

the chance of a phone being non-faulty and the second element contains the chance that the

phone is faulty under the null hypothesis.

In [ ]: null_probabilities = ...

null_probabilities

In [ ]: #: grade 3.1

_ = ok.grade('q3_1')

4

Question 2. Using the function you wrote above, simulate the buying of 100 phones

5,000 times, using the proportions that you assigned to null_probabilities. Create an array

simulations with the number of faulty phones in each simulation.

Note that the number of faulty phones in a simulation of sample size x is the proportion of

faulty phones in the simulation multiplied by x.

In [ ]: # Place your answer here. It may contain several lines of code.

In [ ]: #: Consider the resulting histogram of the fault_statistics array

Table().with_column("Faulty Statistic", simulations).hist(bins=np.arange(8))

In [ ]: #: grade 3.2

_ = ok.grade('q3_2')

Question 3. Using the results of your simulation, calculate an estimate of the p-value, i.e.,

the probability of observing four or more faulty phones under the null hypothesis. Assign your

answer to p_value_3_3

In [ ]: p_value = ...

p_value

In [ ]: #: grade 3.3

_ = ok.grade('q3_3')

Question 4. Given the results of your above experiment, do you reject the null hypothesis?

Explain why.

Write your answer here.

1.5 4. Bias towards customers

The insurance company LivLife10 classifies its customers into 3 categories - low-income, midincome

and high-income. The company claims that it treats all of its customers equally and makes

no compromises on the quality of the products that it provides. You know that the company

has 10,000 customers, 40% of which are low-income customers, 30% mid-income and 30% highincome

customers. However, over the past year, 60% of the complaints that the company has

received are from low-income customers, 30% from mid-income customers and 10% from highincome

customers.

In [ ]: #: Don't change the below three lines

type_of_customers = ["low-income", "mid-income", "high-income"]

proportion_of_customers = np.array([0.4, 0.3, 0.3])

proportion_of_complaints = np.array([0.6, 0.3, 0.1])

insurance_customers = Table().with_columns(

"Type of Customers", type_of_customers,

"Proportion of Customers", proportion_of_customers,

"Proportion of Complaints", proportion_of_complaints)

insurance_customers

5

You have a suspicion that the insurance company is biased towards its high-income customers.

That is, the insurance company is providing a better product to the high-income customers than

to others. A better product is one that generates lesser complaints. You decide to test your idea.

Your null hypothesis is:

Null hypothesis: The complaints are drawn from the population according to the proportion

of customers which are low-, mid-, and high-income.

Question 1. What is the expected proportion of complaints that should be heard

from the high-income customers under the null hypothesis? Assign your answer to

complaints_proportion_null.

In [ ]: complaints_proportion_null = ...

complaints_proportion_null

In [ ]: #: grade 4.1

_ = ok.grade('q4_1')

Question 2. You wish to check the bias in the insurance company towards different categories

of customers. However, there are three categories of customers: high-, mid-, and low-income.

Which among the following do you think is not a reasonable choice of test statistic for your

hypothesis. You may include more than one answer. Append all your choices in a list called

unreasonable_test_statistics. For example, if you think statistics 1, 2, and 3 are unreasonable,

you should have unreasonable_test_statistics = [1,2,3]

1. Average of the absolute difference between proportion of customers and proportion of corresponding

complaints

2. Sum of the absolute difference between proportion of customers and proportion of corresponding

complaints

3. The total number of complaints that the company has received in the past year

4. The total variation distance between the probability distribution of customers and the distribution

of complaints

5. The absolute difference between the sum of proportion of customers and the sum of proportion

of corresponding complaints

6. Average of the sum of the proportion of customers and the proportion of corresponding

complaints

In [ ]: unreasonable_test_statistics = ...

unreasonable_test_statistics

In [ ]: #: grade 4.2

_ = ok.grade('q4_2')

Question 3. Say you went ahead with the total variation distance as your test statistic

Write a function called total_variation_distance that takes in two probability distributions

as arrays and calculates the total variation distance between them.

In [ ]: # Place your answer here. It may contain several lines of code.

In [ ]: #: Use the below code to test your function

total_variation_distance(np.array([1,0,0]), np.array([0,0,1])) # Output should be 1.0

6

In [ ]: #: grade 4.3

_ = ok.grade('q4_3')

Question 4. Write a simulation which computes the TVD statistic 5000 times on data generated

under the null hypothesis. Save the simulated statistics in an array called empirical_tvds.

Hint: Use sample_proportions.

In [ ]: # Place your answer here. It may contain several lines of code.

In [ ]: #: grade 4.4

_ = ok.grade('q4_4')

Question 5. Calculate the total variation distance in the actual scenario, that is, the observed

scenario. Save the result in observed_tvd.

In [ ]: observed_tvd = ...

observed_tvd

Let us plot a histogram of empirical_tvds and compare that to our observed_tvd

In [ ]: #: Visualize

Table().with_column("Empirical TVDs", empirical_tvds).hist()

plt.scatter(observed_tvd, 0, color='red', s=30)

In [ ]: #: grade 4.5

_ = ok.grade('q4_5')

Question 6. Recall that the null hypothesis was that the complaints are drawn from the population

according to the proportion of customers which where low-, mid-, and high-income. Looking

at the histogram above, do you think it is likely that the null hypothesis is true? Write your answer

in the variable insurance_claim_true. The value of the boolean variable should be True if you

agree that the null hypothesis is true, and False otherwise.

In [ ]: insurance_claim_true = ...

insurance_claim_true

In [ ]: #: grade 4.6

_ = ok.grade('q4_6')

Question 7. Does rejecting the null hypothesis in this case prove (or otherwise highly suggest)

that the company is biased in its treatment of customers? Why or why not?

Write your answer here.

1.6 5. Loaded Die

... And we are back to rolling dice! A loaded die is one that is unfair, i.e., does not have equal

probability for each of the outcomes 1–6 (inclusive).

Question 1. Your friend Aby has a model that says that the die is loaded in a way such that

the probability of "1" coming up is 0.5 and all the other values have the same probabilities.

Write down Aby’s model’s distribution as an array. It should contain 6 elements, each describing

the probability of seeing the corresponding face of the die, and it should sum to 1.

7

In [ ]: aby_hypothesis_model_distribution = ...

aby_hypothesis_model_distribution

In [ ]: #: grade 5.1

_ = ok.grade('q5_1')

Question 2. Say we want to test Aby’s model. In particular, we wish to test if the probability

of "1" coming up is 0.5. We roll the die 10 times and we got "1" a whopping 8 times. We claim that

Aby’s model is wrong. In order to substantiate our claim, we run a simulation of the die-roll.

Write a simulation and run it 5000 times, maintaining an array differences which keeps track

of the absolute difference between number of ’1’s that were seen and the expected number (5) in

each simulation.

In [ ]: # Place your answer here. It may contain several lines of code.

In [ ]: #: Visualize with a histogram

Table().with_column("Difference", differences).hist(bins=np.arange(8))

In [ ]: #: grade 5.2

_ = ok.grade('q5_2')

Question 3. Recall that we saw the die come up "1" eight times. Set the variable

null_hypothesis_boolean below to True if you think Aby’s model is plausible or False if it

should be rejected.

In [ ]: null_hypothesis_boolean = ...

null_hypothesis_boolean

In [ ]: #: grade 5.3

_ = ok.grade('q5_3')

Question 4. Now, we check the p-value of our claim. That is, compute the proportion of

times in our simulation that we saw a difference of 3 or more between the number of ’1’s and the

expected number of ’1’s. Assign your result to p_value_5_4

In [ ]: p_value_5_4 = ...

p_value_5_4

In [ ]: #: grade 5.4

_ = ok.grade('q5_4')

To submit:

1. Select Run All from the Cell menu to ensure that you have executed all cells, including the

test cells.

2. Read through the notebook to make sure everything is fine.

3. Submit using the cell below.

4. Save PDF and submit to gradescope

In [ ]: #: For your convenience, you can run this cell to run all the tests at once!

import os

_ = [ok.grade(q[:-3]) for q in os.listdir('tests') if q.startswith('q')]

8

1.7 Before submitting, select "Kernel" -> "Restart & Run All" from the menu!

Then make sure that all of your cells ran without error.

In [ ]: #: submit your notebook

_ = ok.submit()

1.8 Don’t forget to submit to both OK and Gradescope!


版权所有:编程辅导网 2021 All Rights Reserved 联系方式:QQ:99515681 微信:codinghelp 电子信箱:99515681@qq.com
免责声明:本站部分内容从网络整理而来,只供参考!如有版权问题可联系本站删除。 站长地图

python代写
微信客服:codinghelp