When I was a graduate student - some 25 years ago - there was a practical reason for why Bayesian methods played little or no role in the textbooks. Even though classical methods asked the wrong questions* and forced people to wade through a myriad of complicated and contradictory ways of answering them, it was at least possible to extract point estimates from the data. A practitioner who wanted to address problems more complicated than the linear normal regression model would find that Bayesian methods had very little to offer in the way of concrete advice, so she could be forgiven for concluding that spending time on them was pretty much pointless.
This is a classic example of hysteresis: the persistence of a phenomenon after its cause has been removed. Students who aren't taught Bayesian methods almost never make the effort to learn enough to teach it when they go on to become professors.
Bayesian methods are best adapted to the questions of most interest, and are easier to use. Unfortunately for the next generation of economists - and almost certainly the one after that - this perspective has yet to significantly infiltrate how econometrics is taught.
*Ask yourself which is of more interest:
a) Probability statements about an observed statistic, conditional on an unobservable feature of interest.
b) Probability statements about an unobservable feature of interest, conditional on an observed statistic.
If you answered a), then you are God - or perhaps one of the lesser deities who is bored with just knowing The Truth, and is looking to make some extra cash by betting on what statisticians will conclude from what Nature draws out of its urn of coloured balls. But if your fate is to live among mortals, knowing only what mortals can learn, then b) is the only question that is worth spending time thinking about.
What's the best way to learn Bayesian econometrics?
Posted by: A student | October 07, 2011 at 05:52 PM
I didn't know it, but I've been looking for that word for years. Thank you!
Posted by: Brett | October 07, 2011 at 05:56 PM
My own introduction to statistics and econometrics was very strange. A tiny bit of "cookbook" stats in Intro Psych. Then a course at Berkeley in the Philosophy of Probability (or something), where I because a subjectivist after reading de Finetti's "Foresight: its Logical Laws, Its Subjective Sources," (translation of the 1937 article in French) in H. E. Kyburg and H. E. Smokler (eds), Studies in Subjective Probability, New York: Wiley, 1964. (God, but Google is good, in refreshing a 35 year old memory).
http://en.wikipedia.org/wiki/Bruno_de_Finetti
Then MA econometrics at UWO with Robin Carter in 1977. And it was very much as you say. Robin was, I'm pretty sure, a Bayesian. The introduction was full of Bayesian thinking. But then, after a brief bit about ML estimation, and why confidence intervals didn't mean what you thought they mean, it was all classical. Probably all they could do then. But classical frequentist statistics never made any sense to me. And I couldn't do math, so gave up. But I still don't like confidence intervals.
I did teach my MA Public Admin students Bayes' Theorem, even though it was supposed to be macro. I'll be damned if I'm letting them run the public service without knowing it. Gave them that breast cancer screening example.
http://larrywillmore.net/blog/2011/09/29/bayes-theorem/
Can you explain what MCMC means, simply, please?
Posted by: Nick Rowe | October 07, 2011 at 06:04 PM
The main computational problem for Bayesian methods has been the calculation of the integrals that keep popping up. Analytical results are available only for some simple cases (eg, normal regression), and numerical integration was subject to the curse of dimensionality. If you had a model with more than 5 or 6 parameters, you couldn't estimate it.
MCMC is Markov Chain Monte Carlo. It's a way of simulating from the posterior directly. Instead of sampling from p(x_1,x_2) directly (which is often hard), it simulates from p(x_1|x_2) and p(x_2|x_1). Iterating back and forth generates a series of (autocorrelated) draws from the joint distribution, and you can then use those draws to calculate means, std deviations and the like. It's especially useful for latent variable models, because you basically treat the unobserved data the same way you would treat a parameter: as an unobserved part of the model.
Posted by: Stephen Gordon | October 07, 2011 at 06:35 PM
And yes, Robin Carter was the first to teach me about Bayesian methods, but only briefly.
Posted by: Stephen Gordon | October 07, 2011 at 06:40 PM
Nick - "Can you explain what MCMC means, simply, please?" - no, but I could lend you one of my econometrics cookbooks ;-)
Posted by: Frances Woolley | October 07, 2011 at 06:54 PM
Guys need a *man*ual, just like I have a Haynes Repair Manual for the MX6. (All the cookbooks I own were given me by women: Mother, Aunt, girlfriends, and though I do use them for *ideas*, I *never* actually do what they say.)
Thanks Stephen. Good try!
Posted by: Nick Rowe | October 07, 2011 at 07:09 PM
A more detailed take on MCMC:
A Markov process is one for which the distribution of future values depends only on the current value (and not not values before the current.)
A Markov chain is a discrete-time Markov process; but in MCMC it means in addition a process with a countable (in practice, finite) set of states. Such a process can be completely characterized by specifying the probability of the next process state given the current state, for all states.
Markov chain monte carlo means a monte carlo simulation of the process; that is, of state transitions.
Suppose that one would like to estimate a probability distribution with countably many states x. If the problem is non-trivial one does not currently know p(x) for all x. But one may know the conditional probabilities p(x1|x2) for all x; in that case, MCMC can be applied to estimate the distribution by setting the transition probabilities between states to these conditional probabilities. Under the right technical conditions, sufficiently many trials of sufficiently many steps will converge to an equilibrium distribution which is the distribution to be estimated.
BTW: I did not realize you were a fellow admirer of de Finitti. "PROBABILITY DOES NOT EXIST"!
Posted by: Phil Koop | October 07, 2011 at 07:19 PM
Phil Koop: There is no chance, only ignorance...
Posted by: Jacques René Giguère | October 07, 2011 at 07:33 PM
"b) Probability statements about an unobservable feature of interest, conditional on an observed statistic."
This is why I am a committed Bayesian.
Posted by: Determinant | October 07, 2011 at 07:53 PM
So you're saying God DOES play dice.
I want a real Bayesian to come into this discussion. One of those who objects strongly to the idea of Bayesian econometrics as a collection of methods, as opposed to a philosophy, a way of conceiving of uncertainty, a Zen unto itself. Because that's when the party REALLY gets started.
Posted by: jh | October 08, 2011 at 12:12 AM
I suppose that it is worth mentioning "A Treatise on Probability" by Keynes. :)
Posted by: Min | October 08, 2011 at 01:38 AM
Probability statements about an unobserved feature of interest, conditional on an observed statistic is a meaningless notion, unless we have an actual population to study. Probability statements about a parameter, which is random, but was only drawn once at the beginning of time, are meaningless because they are unfalsifiable. You can defend it as a subjective judgment a la Savage, but in an academic paper the one thing I don't care about is your subjective judgments.
Bayesian econometrics is okay because it's a perfectly fine classical procedure for producing consistent estimators.
Posted by: Walt | October 08, 2011 at 03:48 AM
For any who are interested, here is an excellent lecture touching on frequentist/Bayesian differences & interplay.
http://videolectures.net/mlss09uk_jordan_bfway/
Posted by: Jared Tobin | October 08, 2011 at 05:55 AM
in an academic paper the one thing I don't care about is your subjective judgments.
All academic papers are replete with subjective judgments: functional form, choice of data set and everything else that goes into the construction of the likelihood. For instance, an identifying assumption that sets a parameter to zero is just a dogmatic prior.
Posted by: Stephen Gordon | October 08, 2011 at 07:43 AM
That's true. But that doesn't mean adding more subjectivity is a step in the right direction. And certainly much of econometrics doesn't require a likelihood, since it doesn't require an assumption of a distribution.
Posted by: Walt | October 08, 2011 at 07:56 AM
Only if you're going to assume that you have a sample whose size is "close enough" - a heroic, subjective judgment call - to infinity. Eta: And if you do, then you might as well assume normality in the first place.
The subjectivity is always there. Best to call a spade a spade.
Posted by: Stephen Gordon | October 08, 2011 at 08:30 AM
Min: yes, Keynes always seems to pop up! A very beautiful beautiful monograph founded on Keynes' approach is "The Algebra of Probable Inference" by Richard T. Cox: about 90 pages of lucid perfection.
In a related vein, there is an interesting essay by Patrick Suppes that deals with de Finetti's observation that many of our ordinary ideas about probability emerge from a purely qualitative probability structure: "Axiomatizing qualitative probability".
Posted by: Phil Koop | October 08, 2011 at 08:38 AM
"This is a classic example of hysteresis: the persistence of a phenomenon after its cause has been removed. Students who aren't taught Bayesian methods almost never make the effort to learn enough to teach it when they go on to become professors."
Haha. Almost never is correct, since it does occur sometimes ... I guess I'm part that set of measure zero crowd as I did self teach Bayesian approaches and MCMC as an assistant professor.
Posted by: Arin Dube | October 08, 2011 at 08:57 AM
Momentum is building!
Posted by: Stephen Gordon | October 08, 2011 at 09:53 AM
It is really rare to see people promoting bayesian econometrics. people working on that have often a hard time publishing. people dont like the subjectivity....
Posted by: John | October 08, 2011 at 10:26 AM
Determinant:
"b) Probability statements about an unobservable feature of interest, conditional on an observed statistic."
This is why I am a committed Bayesian.
"
Are not you glossing over the priors selection problem, for example, too easily ?
Not saying that Bayesian inference is "bad", just that it does not come for free as some may imagine in comparison to the "frequentist" approach, both computationally and more importantly philosophically and practically.
Besides, there are many strains of Bayesians, starting with de Finetti and going to "objectivist" Bayesians that de Finetti proponents accuse, rightfully it seems, of being incoherent. See Senn's http://www.rmm-journal.de/downloads/Article_Senn.pdf for a review.
So, what strain do you belong to :) ?
"
I find de Finetti’s subjective Bayes theory extremely beautiful and seductive (even though I must confess to also having some perhaps irrational dislike of it). The only problem with it is that it seems impossible to apply.
" Senn, supra.
Posted by: vjk | October 08, 2011 at 11:40 AM
More of a subjectivist Bayesian. When I took Probability as an undergrad I found Bayesian methods far more intuitive than classical methods. As an engineer I care about what I don't know and how I can know it. I'm far more of a principle analyzer than a brute math junkie.
Plus I love giving MATLAB a spin....
My economics philosophizing makes me a Bayesian.
And Nick, dear Nick, wherefore art thou Math limited, O Nick?
Posted by: Determinant | October 08, 2011 at 04:00 PM
Are not you glossing over the priors selection problem, for example, too easily?
Why is this a 'problem'? Quite frankly, my thinking has evolved to the point that if you don't understand your model well enough to form priors about its parameters, you shouldn't be estimating it.
eta: For example, if you're estimating a demand curve, are you *really* completely agnostic about the sign of the own-price elasticity? Of the income elasticity? Do you really believe that an elasticity between 1000000 and 1000000.5 is as plausible as one between 0.5 and 1?
Posted by: Stephen Gordon | October 08, 2011 at 05:01 PM
I don't really know enough to comment on classical vs. Bayesian statistics. I believe that, assuming a uniform prior distribution, classical and Bayesian statistics give the same (or similar) results. In any case, I was able to get the same (low) answer for the breast-cancer screening case mentioned above, using both methods. (Bayes's Theorem was a heck of a lot simpler, when I actually did it.)
Consider large numbers n1, n2, n3, n4 (sum = N). Use maximum likelihood -- find the "probability" that maximizes the probabilities of getting those numbers, using binomial distributions. (n1 and n2 have cancer, n1 and n3 get positive tests.)
Overall cancer prob. = (n1 + n2)/N = 1%
False positive prob. = n3/(n3 + n4) = 9.6%
Correct positive prob. = n1/(n1 + n2) = 80%
Having cancer given positive = n1/(n1 + n3)
The calculation, of course, is exactly the same using both methods. Just the interpretation is different. The problem with the majority of physicians in the article is simply that they don't understand probability.
I prefer to apply the argument to criminal ID -- one criminal out of a population of thousands. There is some probability of a false positive, at least 1% (say). This leads to an enormous probability of mistaken identity.
Posted by: John H. Morrison | October 08, 2011 at 08:44 PM
John:
Not sure why one would use maximum likelihood to obtain a solution where a more intuitive answer can be obtained by a straightforward application of the Bayes rule to compute the conditional probability of having cancer given positive test result in the purely frequentist formulation you provided yourself:
P(C|+) = P(+|C)*P(C)/(P(+|C)*P(C)+P(+|~C)*P(~C))
The likelihood talk muddies the picture, unnecessarily, for the hypothetical physician I'd speculate.
Posted by: vjk | October 08, 2011 at 11:24 PM
Stephen:
"Do you really believe that an elasticity between 1000000 and 1000000.5 is as plausible as one between 0.5 and 1?
"
The question is: how exactly would one arrive at the elasticity prior distribution function ? Purely by guessing in the de Finetti style ? ("the(a) probability of Chelsea Clinton of becoming the president of the US is 1%").
Posted by: vjk | October 08, 2011 at 11:30 PM
Determinant:
"More of a subjectivist Bayesian
"
So, your probability assignments are similar to the one I mentioned in my response to Stephen. Does it not bother you, as an engineer, that you may use subjective priors that would lead to radically wrong posteriors ?
Posted by: vjk | October 08, 2011 at 11:34 PM
VJK: Like I said, the Bayesian calculation was simpler for me. (Was your answer in response to my comment?) I tend to think Bayesian myself, but I'm not really up to the advanced issues.
Posted by: John H. Morrison | October 09, 2011 at 01:24 AM
John M:
"Was your answer in response to my comment ?"
It was.
"Like I said, the Bayesian calculation was simpler for me"
What Bayesian calculation did you have in mind when you gave your cancer incidence example ?
Posted by: vjk | October 09, 2011 at 09:58 AM
I'd have to disagree with Stephen on this one... there's many times I'm extremely interested in "[p]robability statements about an observed statistic, conditional on an unobservable feature of interest"; I'm generally interested in finding out how closely my data match assumptions of the model (linearity, etc.), or if things like outliers can realistically be generated by the process I think is going on. In those cases, "what's the probability of seeing an observed statistic this large/small given that the model is true" is an extremely interesting question.
I'm pretty strongly opposed to the idea of subjectivist statistics* but I'm perfectly comfortable, as a frequentist, to use Bayesian model fitting to estimate parameters and confidence intervals. All it means is that I've added one extra assumption (the prior) that I should test how sensitive the model is to before publishing.
I think Andrew Gelman and Cosma Shalizi explain it best (http://www.stat.columbia.edu/~gelman/research/unpublished/philosophy.pdf): Bayesian inference is very valuable when doing inference within a model, but that doesn't mean that you shouldn't also test the frequency properties and goodness of fits of your model.
*To paraphrase many people: if you really think that Bayesian updating captures your true belief in the probability of a parameter taking a certain value, why bother with model fitting at all? Just look at the data then draw what your updated posterior looks like. Why restrict yourself to normal priors, or standard likelihood functions?
Posted by: Eric Pedersen | October 09, 2011 at 12:47 PM
vjk wrote:
Determinant:
"More of a subjectivist Bayesian
"
So, your probability assignments are similar to the one I mentioned in my response to Stephen. Does it not bother you, as an engineer, that you may use subjective priors that would lead to radically wrong posteriors ?
No, it doesn't bother me. It is what I live with. When you work on jet-engine controllers as I did for a summer, you have to admit that the line between working and nonsense is very, very thin at the best of times.
Engineering teaches you to be comfortable with your own imperfections.
Posted by: Determinant | October 09, 2011 at 01:32 PM
"When you work on jet-engine controllers as I did for a summer, you have to admit that the line between working and nonsense is very, very thin at the best of time
"
If that's how jet engine reliability is modeled, based on subjective estimates, then God help us all.
Hope that is not the case.
Posted by: vjk | October 09, 2011 at 03:01 PM
VJK: "What Bayesian calculation did you have in mind when you gave your cancer incidence example ?"
Something like this: P(Cancer | Positive) = P(Positive | Cancer) P(Cancer) / P(Positive)
P(Positive) = P(Positive | Cancer) P(Cancer) + P(Positive | NoCancer) P(NoCancer)
= .8*.01 + .096*.99 = .103
So P(Cancer | Positive) = .008/.103 = .078 = 7.8%
From what little I learned and figured out, the prior probability P(Cancer) is the only problem with Bayesian statistics. In the particular problem as stated by the link in Nick Rowe's post, the P(Cancer) was given. In general, is it ever wrong to assume a uniform prior probability in a symmetric situation? For example, if we try to measure the fine structure constant, should our analysis be independent of previous measurements?
Posted by: John H. Morrison | October 09, 2011 at 03:32 PM
vjk:
Just because you can throw a complex math model at it doesn't make it true. Sure we can look back and extract a good prior through frequentist methods, we often do, but Bayes Rule gives us insight into the five other related things that may be related.
Posted by: Determinant | October 09, 2011 at 04:32 PM
John M:
"P(Cancer | Positive) = P(Positive | Cancer) P(Cancer) / P(Positive)"
That's the same formula I have written above.
It's just the Bayes identity/rule which is trivially derivable from probability addition/multiplication axioms and is valid both in the frequentist as well as Bayesian interpretation by virtue of being an identity.
The Bayesian interpretation starts with the way you treat the factors of the identity: P(Positive | Cancer) -- "likelihood", P(Cancer) -- "prior", P(Positive) --"evidence"
In your simple example, the frequentist interpretation is straightforward: P(Cancer) -- relative frequency of cancer cases in the population obtained say by classical statistical sampling, there is no need to come up with some cryptic prior distribution, similarly for the other terms. Thus, obtaining the conditional probability of the population subsample suffering from cancer given a positive test result is accomplished through a simple Bayes rule application -- just as you did. At no point, the Bayesian reasoning was applied to get the answer -- just the Bayes rule was.
Posted by: vjk | October 09, 2011 at 06:27 PM
Deteminant:
"Sure we can look back and extract a good prior through frequentist methods, we often do"
That would be cheating for an avowed subjectivist :)
"
Bayesian: One who, vaguely expecting a horse['prior'] and catching a glimpse of a donkey['evidence'], strongly concludes he has seen a mule['posterior']
" (Senn)
Posted by: vjk | October 09, 2011 at 06:33 PM
Engineering philosophy: Never let theory get in the way of a good practical solution.
Posted by: Determinant | October 09, 2011 at 06:54 PM
I probably am missing quite a bit, but I thought that in general, using Bayes's Theorem was Bayesian, almost per se. It's Bayesian to talk about the actual probability that the fine structure constant is a certain quantity. Frequentists at best used that language informally, while formally viewing the value and the uncertainty as numbers that best fit the data. (Maximum likelihood method.)
Posted by: John H. Morrison | October 10, 2011 at 10:17 AM
Stephen,
What is your take on semi-parametric and non-parametric estimation? It seems to me that it blurs the distinction between classical and bayesian econometrics. No need to make assumptions about unobserved features or pin-down priors; the computational issues are also getting increasingly manageable.
Posted by: Youcef M | October 10, 2011 at 11:18 AM
John M:
No, John, using the Bayes rule is equally legitimate in either interpretation as I mentioned earlier.
Using the rule does not make one a Bayesian practitioner.
Posted by: vjk | October 10, 2011 at 01:28 PM