A fun question based on something one of my wife's doctors told me. It turns out the information I received from the doctor was likely incorrect, but the question the information is too good not to use, so I'm just going to pretend it is legit. Here goes:
Suppose that 98% of the time, the tissue extracted is from the fetus, but 2% of the time the tissue extracted, by accident, is that of the mother (this is the information I received, which appears now to be inaccurate - but for the sake of argument suppose it is correct).
Assume there are no other possible inaccuracies of the test, that gender is always well defined, etc. and if we assume the baby has a 50/50 chance of being either gender (not entirely true either, but close enough for our purposes).
Suppose the result of the test indicates that the fetus is a female. What are the odds given the test result, expressed as a fraction, that the fetus is actually male?
Excellent. I'm trying to put together some questions for the econometrics comprehensive exam.
Posted by: Stephen Gordon | May 24, 2011 at 05:38 PM
I'll see if the doctors have any more for you. :)
Posted by: Mike Moffatt | May 24, 2011 at 05:41 PM
Anyhow, you can imagine my delight when I calculated there's an x% chance that I'd be buying a whole different set of clothes, etc.
Posted by: Mike Moffatt | May 24, 2011 at 05:46 PM
2/52, right?
Posted by: Ben from Queens | May 24, 2011 at 06:16 PM
I get 1/100
Posted by: Chris Dittmar | May 24, 2011 at 06:18 PM
No, 1/51.
Posted by: Ben from Queens | May 24, 2011 at 06:18 PM
1/51
Posted by: sw | May 24, 2011 at 06:23 PM
It's a good basian stats problem - It should work out to about 2%.
The scenarios are:
is male, shows male - 49%
is male, shows mother as female 1%
is female, shows actual female - 49%
is female, shows mother female - 1%
So the probablility of being the 2nd case, given shows female is 1% / (1% + 49% + 1%) = 1.96%
Posted by: Jesse | May 24, 2011 at 06:26 PM
Given that the test shows female, there are two scenarios
1) sample from baby (98%)
2) sample from mother (2%)
So, probability that baby is male is
.98*(0) + .02*(.5) = .01 1/100
Or am I missing something?
Posted by: Chris Dittmar | May 24, 2011 at 06:31 PM
1/51 seems right, which is as strong as i'll go without breaking out my old texts.
[P(male) - P(male|male_on_test)xP(male_on_test)]/P(female_on_test)
Posted by: Mark M. | May 24, 2011 at 06:34 PM
2 in 100 test results will give a false result, always female. 1/2 of those (1 in 100) will be males, half females.
98 in 100 test results will give a true result, half male and half female.
Of the 51 test results that show "female", 1 in 51 will actually be a male child.
Posted by: Craig Burley | May 24, 2011 at 06:38 PM
d'accord
Posted by: Chris Dittmar | May 24, 2011 at 06:58 PM
".98*(0) + .02*(.5) = .01 1/100"
I suck at statistics and am trying to figure out why exactly this is wrong. Bear with me through the conceptual work, please?
It seems the .98 is erroneous here. While 98% of the tests are accurate, once we are given the information that the test reads female, this percentage SHOULD be different, because the test is biased and will show female more times than it will show male. Specifically, a test that shows male is ALWAYS accurate, so the chance a female-positive test will be inaccurate is high enough to "create" that 98% accuracy rating that we have in the first place.
Posted by: GVChamp | May 24, 2011 at 07:39 PM
In 200 pregnancies, we have 100 boys and 100 girls.
Of the 100 girls, we get 100 "girl" results.
Of the 100 boys, we get 2 "girl" results.
So we get 102 "girl" results, but 2 are actually boys.
So the answer is 2/102 or 1/51.
Posted by: Jon | May 24, 2011 at 08:56 PM
Great question! Here's an finance version for Stephen's exam:
One in a thousand people have the skills to beat the market every year with a probability of 100%. The rest, who have no skill, will beat the market 50% of the time. A fund manager beats the market for 10 successive years. What are the odds the manager has skill.
Posted by: K | May 24, 2011 at 09:34 PM
P(male|female_result) = P(female_result|male)P(male) / P(female_result) (Bayes' Theorem)
= 0.02*0.5 / P(female_result)
and by the Law of Total Probability:
P(female_result) = P(female_result|female)P(female) + P(female_result|male)P(male)
= 1.0*0.5 + 0.02*0.5 = 0.51
so P(male|female_result) = 0.02*0.5 / 0.51 = 0.00196 ~ 0.2%.
Now, the question actually asked the *odds*, so P(male|female_result)/P(female|female_result) = 0.00196/(1-0.00196) = 0.00196, or the same answer (since P(female|female_result) is almost 1).
Posted by: Wesley | May 24, 2011 at 09:41 PM
Of 100 tests, 49 will show "male," 51 will show "female."
Of the 51 "female" results, 1 is actually male.
Answer: 1/51.
Posted by: WmT | May 24, 2011 at 09:45 PM
"It seems the .98 is erroneous here."
Yup. The 98% is really a composite of [(probability test right given male result) + (probability test right given female result)] = [(probability male result on test) * (probability male given male result) + (probability female result on test)*(probability female given female result)] = 0.49 *1.0 + 0.51*49/51 = 0.49 + 0.49 = 0.98
In other words, all the error in the test occurs with a female result. Consequently, the probability that the sample is from the baby given a female result is not 98%.
Posted by: david | May 24, 2011 at 10:31 PM
P(Baby is male|Test says it is a female)=P(male and test says its female)/P(test says female)= (.02*.5)/(.51)=.01/.51=1/51
Posted by: Josh M | May 24, 2011 at 10:46 PM
Wow, I am an idiot, it said odds, so it is actually 1/50
Posted by: Josh M | May 24, 2011 at 10:51 PM
Heck, why not give my own method of finding the answer.
2 times out of 100 we extract wrong tissue. Of these 2 times, on average one is female and the other is male.
The other 98 times out of 100 we extract correct tissue. On average 49 of those times it is female.
So we show results of 51 females. However, of those 51 results, only 1 is actually male on average.
Hence on average 1/51 of the reported female results is actually a male, which is the droid, er result, you are looking for.
Posted by: happyjuggler0 | May 24, 2011 at 11:06 PM
1/99
odds = p / (1-p) where p is the probability the fetus is male
Given the test shows female then the only way for it to be a male is if (1) the test took DNA from the mother and (2) the child is male. So p = 2/100 * 1/2 = 1/100.
odds = 1/100 / (1-1/100)
= 1/100 / (99/100)
= 1/99
Ok, so where did I go wrong?
Posted by: Jason | May 25, 2011 at 05:55 AM
I get .01/.51 = 1.96%
Let F be the event of female, T the event the test gives female.
P(F given T) = P(F and T) / P(T)
P( F and T) = P(F) = .5 because if the fetus is female the test must return female (since, presumably, the mother is female).
P(T) is (.2 + .5*.98) = .51. The .2 is the event from the event the mother's tissue is extracted, and if the fetus tissue is extracted, with probablility 98% you get a female 50% of those times.
Thus P(F given T) = .5/.51.
Thus P(male given T) = 1 - .5/.51.
Posted by: Adam P | May 25, 2011 at 06:58 AM
@Josh M, yes, by asking for "odds" this is as much a wording question as a math question. I think "idiot" is a little strong though; after all, yours is the first correct answer! Asking for odds to be expressed as a fraction is somewhat ambiguous: a kind of misdirection.
Posted by: Phil Koop | May 25, 2011 at 08:07 AM
K said: "Great question! Here's an finance version for Stephen's exam:
One in a thousand people have the skills to beat the market every year with a probability of 100%. The rest, who have no skill, will beat the market 50% of the time. A fund manager beats the market for 10 successive years. What are the odds the manager has skill."
You might also ask the obvious follow-up question. Given the answer to the first question what are the odds that the Fund Manager is actually running a Ponzi scheme? :)
Posted by: Bob Smith | May 25, 2011 at 09:18 AM
Bob Smith: "Given the answer to the first question what are the odds that the Fund Manager is actually running a Ponzi scheme?"
Indeed: "One in one thousand has skill. One in ten has dishonest tendencies. A manager who also runs his own brokerage firm and back office, and whose accountant is a guy with a basement office on 42nd and 9th..."
Constant attention to prior probailities is at the core of basic financial literacy. Students (all of them, not just the ones in econometrics) should be drilled with these kinds of questions until they are deeply and thoroughly cynical.
Posted by: K | May 25, 2011 at 10:45 AM
This is a great example of Bayesian probability. The interesting thing is if you play with values to approximate what is often published about diagnostic tests. Suppose you have a test that is trumpeted as "99% accurate". The main consideration is not the accuracy of the test, but the prevalence of the disease:
Probability a patient has condition X: 1/500 or 0.2%
"Accuracy" or reliability of a test: P(P) = .99 (sounds good)
Probability of a false positive: P(F) = .01
Prior probability of disease P(X) = .002
Probability of not having it P(-X) = .998
Probability that a person with a positive diagnosis actually has the disease:
[P(P)*P(X)]/[P(P)*P(X) + P(F)*P(-X)], or
(.99*.002)/[(.99*.002)+(.01*.998)]
= .1655 ~ 16.6%
In other words, for a condition with 1/500 prevalence and a 99% accurate test, there is only a 16.6% chance that a positive result is accurate, and thus an 83.4% that a person without the disease will get a positive result.
Hence, if you screen 10,000 people for the disease, you will ~120 positive results (1.196%), but ~100 will be false positives. You have to raise the prior accuracy of the test to an astronomically high level to really reduce the level of false positives. The prevalence (rarity) of the disease is more important to know than the accuracy of the test. The example Mike gives looks pretty good because the tested condition occurs in 50% of the population.
Posted by: Shangwen | May 25, 2011 at 10:46 AM
A variation of Shangwen's example is the Prosecutor's fallacy: http://en.wikipedia.org/wiki/Prosecutor's_fallacy
In a nutshell, if you have a rare event but look hard enough, you will find evidence of the event sooner than expected because the evidence found is incorrect. (Feel free to blast that explanation with a more correct description of the stats :)
Posted by: Peter McClung | May 25, 2011 at 12:10 PM
@Phil Koop, Thank you for the kind words, good sir. Although it is not standard notation, (I have usually seen it with a colon, 1:50) it is not an illegal move. Heck, it is how wikipedia defines it, http://en.wikipedia.org/wiki/Odds, and as we all know, wikipedia is a reliable source (actually, ignoring the sarcasm, it is usually not bad when it comes to math and things of that nature.)
Posted by: Josh M | May 25, 2011 at 12:18 PM
Love the blog, been reading for a long time, please keep up the good work.
I have a stats question.
Can ECON students answers to a statistics question be modeled as a random variable?
ECSE McGill University.
Posted by: Chris | May 25, 2011 at 03:31 PM
Peter, thanks for the reference. Very interesting indeed.
Posted by: Shangwen | May 25, 2011 at 08:26 PM
@Josh M, odds are a ratio of probabilities, and ratios are often expressed as fractions, so naturally there is nothing illegal about expressing odds as a fraction. For that matter, it is just as valid to express odds as a number; one could as well write 0.02 as 1/50.
But being legal does not make it something to be proud of. If you see a number like 1/50 or 0.02 in a discussion of probabilities, would it not be natural to assume that the numbers themselves are probabilities? That is particularly true when the numbers involved are small, as in this case, so that the probability and the odds are similar. The use of the colon disambiguates these numbers.
My view is that if the purpose of the question was to test reasoning about Bayesian inference, then asking for odds vitiates that purpose.
Posted by: Phil Koop | May 26, 2011 at 08:31 AM
Isn't it a faulty assumption that there's a 50% chance the child is a boy? If you're going to try to be as precise as say 1.96%, then your accuracy will be skewed by the fact that more boys than girls appear to be conceived
Posted by: Saj Karsan | May 28, 2011 at 02:20 PM