Third-year students in Laval's Baccalauréat intégré en économie et politique are required to take a seminar course on policy evaluation, and this week, I'm going to be giving a lecture on the basics of how it's done. It occurs to me that this is a lecture that many, many people should sit in on, so here is a summary.
Would you go on to conclude that in Case 1, the policy was a success and that it failed in Case 2? If so, you are in good company, along with any number of pundits and not a small number of people whose job description includes 'policy analysis'.
But you would also have joined the ranks of those who had fallen for the post hoc ergo propter hoc fallacy. In order to do a proper evaluation of the policy, you need the proper counterfactual: what would have happened without the policy?
If Star Trek were still on the air, this would be a simple enough problem to solve: just wait for an episode in which the crew stumbles across a parallel universe in which the policy of interest hadn't occurred, and compare it with the one we happen to be in. But since it's not, the policy analyst is obliged to construct the relevant parallel universe on her own.
This is what controlled experiments try to do. Two samples are constructed: the 'treatment' group, where the policy is applied, and the 'control' group, where it isn't. If the experiment is designed properly, the only difference between the two samples is the treatment, so any difference between the two outcomes can be attributed to the policy.
For those of us who have to deal with non-experimental data - economics is only one of many such fields - the alternate universe takes the form of a model. A well-designed model will be able to reproduce the main features of interest of the real world. More importantly, it will also be able to reproduce the main features of interest of a world in which the policy under study did not take place.
(Sometimes, if we're very lucky, we will stumble across 'natural experiments', where two more-or-less identical groups are subjected to different treatments.)
In order to do policy analysis, you have to be able to augment the above graphs to include the counterfactuals:
Post hoc ergo propter hoc would say that the effect of the policy in Case 1 was positive (A1-B1), and negative (A2-B2) in Case 2. What the model gives us is the blue line: the outcome in the parallel world in which the policy had not been applied. With the counterfactual, the effect of the policy is C1-B1 and C2-B2. It turns out that in both cases, the policy had a positive effect.
It is at this point where it is instructive to look at the link between policy analysis and forecasting. The most popular instrument in an economist's toolkit is a linear regression model. If y is the variable of interest and x is the policy instrument, then we can write the relationship (to first-order approximation) as
y = a + b x + e
where a and b are fixed coefficients, and where e is an error term that includes all the unobserved/excluded factors that influence the outcome y. (If there are other observed explanatory variables, we can include them without altering the basic story.) In the graph, the bx term represents the distance B1-C1/B2-C2, and as we've seen, this is was it is relevant for policy evaluation. The e term represents variations that can't be explained by x - the lines A1-C1/A2-C2 - and aren't of primary interest.
But if the object of the exercise is to produce a forecast, then the size of e matters a great deal. If the variations in y are almost completely explained by variations in the error term, then the model is unlikely to produce accurate predictions. This is usually the case in our models, which explains why our forecasting record is so poor.
This distinction between policy evaluation and forecasting is crucial: a model that makes bad forecasts can still be useful for policy analysis.
Update: In the comments, Adam P makes this important point:
I think you can strengthen your conclusion to say that a model that makes no forecast at all can be still be useful for policy analysis.
If we mean by forecast a time series model where the forecasting instruments are known at least one period ahead of what is being forecast then, which is what the layman means, then a model that models y as a function of contemporaneous x makes no forecast at all. Such a model is still useful for policy evaluation.
It is also worth ... stating that this difference also applies to policy setting, not just ex-post evaluation. Such a distinction is important so that we don't confuse the lack of an ability to forecast, say, a financial crisis with a lack of understanding of what to do when one occurs.
Yes. But just something else. Looking of your examples of change and policy-induced change, I can't help but thinking: Why bother?
Especially for policies whose outcome is dubious, and which are devilishly expensive.
Posted by: Vasile | February 01, 2011 at 01:15 AM
Very nice.
You used a = 0 in your forecasting example for simplicity?
“If the experiment is designed properly, the only difference between the two samples is the treatment, so any difference between the two outcomes can be attributed to the policy.”
Isn’t there a problem in making a clear delineation between the identification of e in the counterfactual, and the identification of the policy effect as the difference between the counterfactual and the actual? Doesn’t this problem grow with a higher e?
Posted by: anon | February 01, 2011 at 03:26 AM
Wandering just a little off-topic (sorry).
When evaluating a policy, economists will often argue like this (and it will sound bizzarre to a non-economist):
"I have no idea what the optimal level of y is. I don't even have any idea what the actual level of y is. But I have good reason to believe the actual level of y is less than the optimal level of y. And I have no idea how much policy x will increase y. But I have good reason to believe that policy x will increase y by the optimal amount. So I recommend policy x."
Example: policy x removes a distorting tax, or internalises an externality.
Posted by: Nick Rowe | February 01, 2011 at 05:52 AM
Stephen, really great post.
I think you can strengthen your conclusion to say that a model that makes no forecast at all can be still be useful for policy analysis.
If we mean by forecast a time series model where the forecasting instruments are known at least one period ahead of what is being forecast then, which is what the layman means, then a model that models y as a function of contemporaneous x makes no forecast at all. Such a model is still useful for policy evaluation.
It is also worth your stating that this difference also applies to policy setting, not just ex-post evaluation. Such a distinction is important so that we don't confuse the lack of an ability to forecast, say, a financial crisis with a lack of understanding of what to do when one occurs.
Posted by: Adam P | February 01, 2011 at 06:16 AM
You're quite right. It's presented this way because my students have an empirical project where they have to do an ex post policy evaluation, but the same logic does apply ex ante.
Posted by: Stephen Gordon | February 01, 2011 at 07:04 AM
The bottom line is that policy analysis is a species of rhetoric. The math is there for persuasion as evidence of eloquence. That is all. Very little policy analysis is disinterested, although frequently influence is masked and mediated through byzantine funding review processes. I've seen the most salient findings of policy analysis totally ignored or even reversed in the executive summary. I've participated in and observed the farcical politics involved because calling a spade a spade is not in the remit.
It's not that it's all lies. Most policy analysts sincerely believe that the aspects of an issue they've emphasized are indeed the most important ones. It is simply that ALL policy analysis is partial and interested.
Posted by: Sandwichman | February 01, 2011 at 09:28 AM
The policy recommendation can be made conditional on the model being correct. Making predictions tests whether the model is correct. They are separate issues.
A model that makes no testable predictions can certainly be used to make policy recommendations, but these recommendations have no value because there is no basis to believe the prior. The issue is deciding which model, if any, to use in making the policy recommendation.
Posted by: RSJ | February 01, 2011 at 09:32 AM
""I have no idea what the optimal level of y is. I don't even have any idea what the actual level of y is. But I have good reason to believe the actual level of y is less than the optimal level of y. And I have no idea how much policy x will increase y. But I have good reason to believe that policy x will increase y by the optimal amount. So I recommend policy x.
Example: policy x removes a distorting tax, or internalises an externality."
Yep, you can argue this if you have faith that the tax is in fact a distorting tax, etc., but again all of this rests on an assumption of the correctness of theory. But if there is *no* testable prediction, then it really is blind faith in the correctness of theory. Is the state of theory so good that we can go ahead and make these types of arguments?
Posted by: RSJ | February 01, 2011 at 09:38 AM
From the dust jacket of Evidence, Argument, & Persuasion in the Policy Process by Giandomenico Majone.
"In modern industrial democracies, the making of public policy is dependent on policy analysis—the generation, discussion, and evaluation of policy alternatives. Policy analysis is often characterized, especially by economists, as a technical, nonpartisan, objective enterprise, separate from the constraints of the political environment. however, says the eminent political scientist Giandomenico Majone, this characterization of policy analysis is seriously flawed. According to Majone, policy analysts do not engage in a purely technical analysis of alternatives open to policymakers, but instead produce policy arguments that are based on value judgments and are used in the course of public debate."
See Daniel Ellsberg's "Risk, ambiguity and the Savage axioms" for the definitive word on decision theory.
Posted by: Sandwichman | February 01, 2011 at 09:45 AM
(shorter Ellsberg: the 'wicked' policy questions are not amenable to technical policy analysis because they are ambiguous -- they exhibit unquantifiable uncertainty rather than quantifiable risk.)
Posted by: Sandwichman | February 01, 2011 at 10:14 AM
"A well-designed model will be able to reproduce the main features of interest of the real world."
This is the heart of the matter - what I think RSJ is driving at. It is not necessary for future states of the world to be previsible in the model, but it is necessary for the model to correctly describe the future state of the world conditioned on the realized values of the stochastic factors. The model need not forecast, but it must back-cast well in order to have confidence in the policy prescription. So I would take Adam P's point to be not an extension of the conclusion but the entire substance of it.
Posted by: Phil Koop | February 01, 2011 at 10:19 AM
There is another aspect to the situation I have not seen mentioned. You have given graphs for two cases, in both if which the policy is superior to the alternative. This suggests that the policy dominates the alternative in all future states. That is a luxurious but not a necessary property of a good policy and in real life policies must sometimes be chosen from a set in which no dominant policy exists. In that case, the model forecast depicted must be interpreted as an expectation.
Now, suppose I construct a coin biased to flip heads with probability 55%. I am offered a bet on this coin at even money; my model "forecasts" heads with fairly low confidence, and one "policy" (bet) does not dominate the other, but if forced to choose I would bet on heads. I would not be willing to bet a large sum on a single toss, but if the bet were on the sum of a million tosses, then I would literally bet the house.
But suppose, more realistically, that I do not know the bias of the coin but just estimate it; perhaps I am 95% confident that the bias lies in the interval [0.45, 0.65] and the MLE is 55%. Now I would still bet the same way for a 1-shot but would be willing to bet only a small sum on the million-iteration game. So my policy prescription varies according to my confidence in the model, which in turn derives from how well it "back-casts."
Posted by: Phil Koop | February 01, 2011 at 10:34 AM
Really good point Phil, and furthermore it shows that having a good model is itself not enough for policy evaluation. We need to agree on the loss function.
Posted by: Adam P | February 01, 2011 at 11:16 AM
I agree--they should teach this in civics instead of blathering on about first, second, third reading of bills in the House.
Off-topic aside: I was reading the C D Howe report on the renewal of the BoC mandate, where they propose lowering the inflation target. I can't see a compelling reason to do so, given that you're gaining slightly more price stability at the cost of more frequent liquidity traps. Maybe a topic for a post, but I suppose you've discussed it before.
Posted by: Andrew F | February 01, 2011 at 11:21 AM
I'm not sure I follow this argument. As someone who is a physicist by training but a policy analyst by current occupation, I would say that you're missing a discussion of the uncertainty in the model. Counterfactual models in social sciences are (inescapably) highly uncertain. Not a knock against social sciences; the system you're studying is just extremely complex. So, in Case 1, let's say that you add error bars larger than B1-C1. Now you're not sure if the policy worked anymore. Take crime policy for example. Just looking at aggregate numbers for crimes of whatever sort is unlikely to tell you whether your policy was a success; you would need to actually look in more detail at what happened.
There are cases where the method you're describing works, but in my experience (this obviously doesn't apply to all) economists tend to be a bit sloppy about justifying why they think the model they're using is appropriate and what the uncertainty is.
Posted by: Surdas M | February 01, 2011 at 11:50 AM
Um, this really is just the basics. I didn't want to get into optimal decision theory under uncertainty in a 3rd year undergrad course for people who have only a course or two in statistics. I just wanted to get past 'post hoc ergo propter hoc', the need for modeling and the distinction between policy analysis and forecasting. This isn't a PhD class.
Posted by: Stephen Gordon | February 01, 2011 at 12:33 PM
I'm teaching a graduate policy evaluation course this semester. An extensive reading list is available here if anyone is interested.
If you only have time to read one thing, I'd recommend the Imbens and Wooldridge 2009 JEL article. The 2010 Heckman JEL article gives a nice overview of his approach (which differs from I&W) to policy evaluation.
Those wanting an encyclopaedic treatment can look at the Heckman and Vytlacil 2007 Handbook of Econometrics chapters.
This is a fun course to teach because there has been a lot of progress on this topic over the last 15 years. It is just now becoming more formalized into things like textbooks and JEL surveys. The Angrist/Pischke/Imbens vs. Heckman and others disputes are fun to read too.
Posted by: Kevin Milligan | February 01, 2011 at 12:47 PM
Allow me one more comment: So much of the quasi-experimental approach (as in Stephen's example above) that was developed in economics in the mid 1990s is totally borrowed from Psychology. See Donald Campbell 1969 "Reforms as Experiments." It's not a terribly technical article, so undergrads could read it. But it makes clear the kinds of assumptions necessary to motivate the kind of analysis Stephen showed in his post. It also has Regression Discontinuity and lots of other stuff that it took economists 30 years to catch up to.
Posted by: Kevin Milligan | February 01, 2011 at 12:54 PM
Stephen: point taken. But what I think people are reacting to is: "the e term represents variations that can't be explained by x - the lines A1-C1/A2-C2 - and aren't of primary interest." This is only true under strong assumptions (e is known with certainty to have a stationary distribution with zero mean.) Given that you are teaching the basics, it would be a shame if the intended message - "a model need not provide good forecasts to be useful for policy evaluation" - were to be interpreted in an unintended way: "bad models are about as good as good models."
Posted by: Phil Koop | February 01, 2011 at 12:59 PM
So much of the quasi-experimental approach (as in Stephen's example above) that was developed in economics in the mid 1990s is totally borrowed from Psychology.
And I really should give credit to Laval's behavioural economics specialist Sabine Kröger, who usually teaches this course, and from whose lecture notes I am borrowing heavily.
Posted by: Stephen Gordon | February 01, 2011 at 01:26 PM
Your students will have as much fun at Laval as we had there 35 years ago.
A smalll quibble: I would scotch a few points from my cegep students work for not labeling the axis. If policy A was about a risinig crime rate and B about infant mortality, both outcomes were failures.
A rejoinder to Sandwichman:
Anyway, as much as we wish to impart technical competence in our students, this rationality will hold up to deputy minister level at the most, if at that. Most politicians (and CEOs) could not extricate themselves from a post hoc fallacy if their sorry soul salvation depended on it. Anyway, even if they understood, voters and shareholders won't.
My first professionnal posting was at a provincial health dept. Our group was made up of various economists with no trainig at all in health and specially not the subject at hand ( health index where you try to compare the efficency of saving 10 lives with attenuating the suffering of 1000 tec). The deputy minister explained to us that in his previous dept, he worked with economists and found them intelligent and hard-working so he requested some...
Then on to a cabinet staff during the 81-82 recession, I had to educate a group of MLAs (including a couple of Harvard MBAs) on basic macro. After a few hours on conjonctural vs structural, primary, with or without capital expenditures, one of the Harvard guy stood up and left the room, muttering "A deficit is a deficit damn it!"
Makes you humble.
Do your students understand that most of their work will be shelved, shredded, misunderstood and distorted?
Ellsberg was right in his analysys of Vietnam. Fat lot of good it did to him, Vietnam and Nixon.
Posted by: Jacques René Giguère | February 01, 2011 at 03:42 PM
"A model that makes bad forecasts can still be useful for policy analysis."
But then how did we ever know that the model was correctly specified in the first place? By what manner shall we select from a menu of possible models for the purposes of policy evaluation, if not by measuring each model's ability to explain (i.e. predict) the world we observe?
I see Nick is not much a Popperian...
Posted by: Noah | February 01, 2011 at 07:31 PM
When beating others they are always Popperian but in the quick of the night hardly ever.
Posted by: travis fast | February 02, 2011 at 11:21 PM
This piece implies that out-of-sample forecasting can be usefully accurate when most of us know very well that that is not the case.
A game theoretic model rather than an econometric model might work better for illustrative purposes. It also might be more useful for day-to-day policy questions that economist-trained civil servants might encounter.
Econometric models appear to encourage a top-down, technocratic approach to constantly fine-tuning economic behaviour. Good for creating work for economists if that is the objective. Game theory lends itself better to building institutions where, hopefully, all-seeing, all-knowing economist-trained philospher kings can step aside and allow other people to run things.
Posted by: westslope | February 07, 2011 at 11:38 AM
"What the model gives us is the blue line: the outcome in the parallel world in which the policy had not been applied. With the counterfactual, the effect of the policy is C1-B1 and C2-B2. It turns out that in both cases, the policy had a positive effect."
One problem is that there is no such thing as "the" counterfactual. (OC, you mean the one that the model produced.) There are many counterfactuals, and their probability distribution matters. If you focus on only one, even the most likely one, given your assumptions, you can easily go astray.
Posted by: Min | February 17, 2011 at 10:38 AM