DATA1001 – Introduction to Data Science
and Decisions
Assignment 3
Student details
Surname:
Given names:
Student number:
Tutorial day/time:
Date due: 4pm on 26 October 2018
Date submitted:
Declaration
I declare that this assessment item is my own work, except where acknowledged, and
acknowledge that the assessor of this item may, for the purpose of assessing this item:
1. Reproduce this assessment item and provide a copy to another member of the
University; and/or
2. Communicate a copy of this assessment item to a plagiarism checking service
(which may then retain a copy of the assessment item on its database for the
purpose of future plagiarism checking)
I certify that I have read and understood the University Rules in respect of Student
Conduct.
Signed1
:
Date:
Mark:
Comments:
1Sign in ink if handing in a hard copy; or type your name if handing in an electronic copy.
1
A bit of Differential Privacy
The Australian Taxation Office would like to find out the proportion of Australians who
cheat on their tax returns. They select a random sample (e.g. by tax file numbers) of
Australians and ask them if they have ever cheated on their tax return. Of course, they
promise that there will be no repercussion if the answer is ‘yes’, and that they will not
record this information against your identity.
If you had cheated on a tax return previously, would you answer truthfully?
Your answer is probably ‘No’, because you wouldn’t trust the ATO to securely manage
this information on you. In this assignment, you’ll explore a way in which an individual
can safely disclose this information.
Procedure: The respondent secretly flips a coin twice. If the first flip shows ‘heads’,
they answer truthfully. Otherwise, they answer ‘yes’ or ‘no’ according to the
second flip being ‘heads’ or ‘tails’.
The idea is that this way, enough ‘randomness’ is added to the respondents answer so
that they cannot be identified.
Below,
the (unknown) true proportion of tax return frauds is θ ∈ [0, 1]
the variable fraud with values 0, 1 denotes whether a respondent is a fraud
the variable truth with values 0, 1 denotes whether a respondent answers truthfully
the variable yes with values 0, 1 denotes whether a respondent answers ‘yes’
according to the above procedure.
Question 1) The use for the ATO
a) [3 marks]
Draw a tree diagram for the three variables fraud, truth and yes.
b) [3 marks]
Fill in the remaining values in this probability table:
fraud truth yes P
0 0 0
0 0 1 1θ
4
0 1 0
0 1 1 0
1 0 0
2
fraud truth yes P
1 0 1
1 1 0
1 1 1
Hint: according to your tree diagram,
P((fraud = 0) ∩ (truth = 0) ∩ (yes = 1))
= P((fraud = 0))P(truth = 0|fraud = 0)P(yes = 1|(truth = 0) ∩ (fraud = 0))
= (1 ? θ)(1/2)(1/2) = (1 ? θ)/4
c) [2 mark]
Show that P(yes = 1) = 1/4 + θ/2.
Hint: Follow all paths in your tree diagram that lead to yes=1 and add their probabilities.
d) [1 mark]
The ATO has received 10384 responses, of which 3448 were yes=1. From these numbers,
derive an estimate of θ.
Question 2) Is it really safe to participate?
In this question, we’ll calculate the probability P(fraud = 1|yes = 1). If this probability
is close to 1, then by playing the game and answering yes you will reveal yourself as a
tax fraud!
a) [2 mark]
Compute all probabilities P((fraud = i) ∩ (yes = j)) for all possible values of i and
j, and collect them in a table. (This procedure is called computing the “marginal
distribution” of fraud and yes.)
Hint: the event A = (fraud = 1) ∩ (yes = 0) is the disjoint union of A ∩ (truth = 0)
and A ∩ (truth = 1). Find the probabilities for these events in your table.
b) [2 mark]
Calculate P(fraud = 1|yes = 1) and P(fraud = 1|yes = 0)
3
c) [2 mark]
What would happen if a biased coin was used in the first toss? (Limit your answer to 3
normal-length sentences.)
Assignment Submission
Due Date: 4pm on 26 October 2018 (That’s Friday in Week 13.)
Hand in your assignment to the School Office of the School of Mathematics & Statistics,
Level 3, Red Centre Building (Centre Wing).
Late submissions
20% (3 marks) will be deducted at 0, 24, 48, 72, 96 hours after the deadline. Work
submitted more than five days late will not be marked.
On Plagiarism
The University regards plagiarism as a form of academic misconduct, and
has very strict rules regarding plagiarism. For UNSW policies, penalties,
and information to help you avoid plagiarism see: https://student.unsw.
edu.au/plagiarism as well as the guidelines in the online ELISE tutorials for
all new UNSW students: http://subjectguides.library.unsw.edu.au/elise
版权所有:编程辅导网 2021 All Rights Reserved 联系方式:QQ:99515681 微信:codinghelp 电子信箱:99515681@qq.com
免责声明:本站部分内容从网络整理而来,只供参考!如有版权问题可联系本站删除。