I have the 1988 and 2009 editions Gujarati and Porter's Basic Econometrics in front of me. Chapter 1 has been updated for the 21st century:
The Internet has literally revolutionized data gathering. If you just "surf the net" with a keyword (e.g. exchange rates), you will be swamped with all kinds of data sources. In Appendix E we provide some of the frequently visited websites...You may want to bookmark the various websites that might provide you with useful economic data.
Seriously? Today's undergrads don't need to be told how to surf the net; most don't remember a world without it.
It's easy (and fun) to mock well-intentioned advice. Yet the authors have a point: greater data availability, together with advances in computing power and statistical software, is transforming both economics and econometrics. My question is: what are the implications of this transformation for undergraduate teaching?
Consequently, the 1988 and 2009 editions of Basic Econometrics have the same basic structure. The 2009 edition has a couple of added chapters (on panel data and forecasting), and updated examples. Other chapters, for example, the one covering model specification, have been expanded and updated. If you've been teaching econometrics for years, you probably think it's a good, solid textbook, that covers everything students need to know. Certainly no worse than others on the market.
This standard approach to econometrics - as represented by Gujarati and Porter, and other major textbooks - worked fine back in 1988. The costs of data and computer access, together with the unfriendliness of software, meant that few students had the courage to run regressions. But today students start running regressions in their high school data management classes. By the time they are advanced undergraduates, most are capable of doing some serious econometrics.
Since today's students can realistically be expected to actually do econometrics, they need to learn about research methods. For example, the methodology of econometrics, as described by Gujarati and Porter, boils down to 8 steps, the first of which is "Statement of theory or hypothesis." But how does a student go about coming up with a hypothesis? The econometricians leave theory to the theorists, the theorists leave hypothesis testing to the econometricians, and the students are caught in the middle. Type "econometrics course outline" into a search engine, and look at the topics covered in a typical econometrics course. Formulating a research hypothesis isn't one of them.
An econometrician might (justifiably) respond: this isn't our problem. Students should be learning how to come up with research hypotheses in their other courses. Fair enough. But the practical problems students encounter when actually doing econometric work also receive little or no attention in econometrics courses.
Take, for example, the problem of missing observations. Gujarati and Porter devote about half a page of the 2009 edition of the textbook to dealing with missing observations. Their advice? If "the reasons for the missing data are independent of the available observations", then the cases with missing observations are "ignorable" and can just be dropped. Otherwise, treat as for sample selection bias.
Yet the presence of missing observations is a vital diagnostic tool. Suppose, for example, a researcher is interested in the impact of husband's income on the labour supply of mothers of young children. "Husband's income" will be missing for any unmarried mother. The researcher, therefore, needs to redefine her research question. Is she only interested in the labour supply of married women? In this case, she can drop her missing observations, and acknowledge that her findings apply to only a subset of mothers. Or is she interested in the impact of other household members' incomes more generally - including the income of cohabiting partners, for example? In this case, she might wish to recode her data to include a new explanatory variable "income of other household members", sitting it equal to zero for people who are living alone. A final strategy, especially useful if the information for husband's income is in categorical form, is model husband's income with a series of dummy variables (less than $20,000; $20,000 to $40,000 etc), and just include a dummy set equal to one for mothers with no husband present.
The point is that the presence of missing observations is a simple diagnostic test that indicates that the researcher has not yet clearly defined their sample or explanatory variables. It can't be formalized or written mathematically - but it would be pretty sad if mathematical elegance was a more important criterion than practical usefulness.
Missing observations can also arise because of the way that data is gathered (see, for example, my post on "when is a missing observation not really missing"). For example, information about savings in registered retirement savings plans might be gathered through two questions (1) "do you have a savings plan?" and, if yes, (2) "how much money is in your plan?" To record everyone who answered "no" to (1) - and thus did not answer question (2) - as "missing" is throwing away information. It seems obvious that no one would do that - but in fact it's easy to do, since software packages like Stata don't distinguish between valid skips (like this example) and refusals. And when are students ever taught how to read questionnaires and utilize data most effectively? Why is this less worthy of class time than, say, the formalities behind logit models?
When I discuss these issues with colleagues - econometricians or non-econometricians alike - they typically say "yes, I agree, what we need is a course in applied economics." It's the easiest solution to any pie division problem: make the pie bigger. No one has to change what they're doing, question the worth or relevance of the material that they're teaching - though perhaps they should (see, for example, David Giles on multicollinearity).
Once I heard someone say this about teaching, "It's not how much material you get through, it's how much material the students get through." Formalities - theorems, proofs, whatever -- are soon forgotten. Methods, ways of approaching or thinking about problems - may stick around a little longer.
Frances: Nice post! I agree totally - especially with your punchline. It's for that very reason that I like to teach grad. econometrics in a "thematic" way, emphasizing broad strategies. As I told my ECON546 class on day one, last week, there's no point in just having a bunch of specific tools in your econometric toolbox. The first thing that will happen when you are working on your thesis, or are "on the job", is that you'll encounter a non-textbook situation. That's when you have to have good intuition and the confidence to fall back on general methodological approaches that you understand.
It's also why it's absolutely essential to have weekly lab. classes in which sudents get their hands dirty with real data, at all levels of study.
Posted by: Dave Giles | January 08, 2012 at 12:08 PM
"To record everyone who answered question (2) as missing is throwing away information." Why would students record people who answer a question as missing?
Posted by: Brett | January 08, 2012 at 12:48 PM
Dave - thanks. I agree with you about having students get their hands dirty. The question is: why isn't that done more often? Partly it's a question of university resources - money for TAs, labs etc. Partly, though, it's that exercises involving real data (as opposed to canned data that comes with the textbook) are time-consuming to create and difficult to evaluate. It's easy to examine students on their ability to reproduce a proof of theorem X. It's harder to examine students' ability to formulate research hypotheses.
Brett, you're right, that didn't make sense, I meant something along the lines of "everyone who was a valid skip for question (2) as missing." I've rewritten that section.
Posted by: Frances Woolley | January 08, 2012 at 01:30 PM
Frances - you're right, it's very time-consuming, but some of us do it, none-the-less. It's a darn sight more fun/interesting for the students. On those occasions when there hasn't been enough TA support to hold lab. classes of a sufficiently small size, I've "donated" my own time so that extra classes could be held. I really do believe that strongly that "realistic", regular, hands-on experience is essential when you're learning econometrics.
Posted by: Dave Giles | January 08, 2012 at 02:01 PM
I was surprised you think multicolinearity is over taught. Please elaborate on why you feel this way. I'd estimate it's still one of the most common mistakes in regression analysis, even if it Is conceptually simple to understand.
Posted by: Jon | January 08, 2012 at 02:29 PM
Jon: "I was surprised you think multicolinearity is over taught. Please elaborate on why you feel this way. I'd estimate it's still one of the most common mistakes in regression analysis, even if it Is conceptually simple to understand."
Here are some examples of the kinds of situations I have in mind:
- the researcher is trying to estimate the determinants of smoking. The literature suggests that both income and education are important influences on smoking behaviour, but income and education are highly correlated.
- the researcher is trying to estimate the determinants of female labour force participation. Economic theory suggests that both female education and husband's income will influence female labour supply, but female education and husband's income are highly correlated.
In situations like this, multicollinearity is not a mistake. It just is. There's not much you can do about it. Sure, in the smoking example, you could drop either income or education, but then you'd run the risk of mis-specifying your model. You could pool cross-sections, or do something else to get a larger data set, and more variance in education and income. You might get your hands on some panel data: if so, well done. You might get lucky and find some random policy change that affected income but not education, or vice versa - in which case, power to you, you've got a publishable paper.
But if you're running OLS regressions on cross-sectional data, you basically have to live with some amount of multicollinearity.
What I tell students in my research class to do is stepwise regression - this basically shows how sensitive the coefficient estimates are to changes in the specification. In Gujarati and Porter (which I'm picking in because it's the only basic econometrics textbook I have to hand) stepwise regression is condemned to an end-of-chapter question, where the procedure is described and students are asked "knowing what you do now about multicollinearity, would you recommend either [forward or backward] stepwise regression?"
With a tip of the hat to "Mostly Harmless", let me quote one of my favourite passages from Hitchikers Guide to the Galaxy. "We don't demand solid facts! What we demand is a total absence of solid facts."
That's what multicollinearity gives you - an absence of solid facts. Either income or education might be important, but it's hard to know which. Textbook treatments of multicollinearity don't really convey this - you have to just to feel it.
Just to clarify: I'm not talking here about the kind of multicollinearity that arises when a dummy for "male" and a dummy for "female" is included in the same regression equation. Or dummies for all 10 of Canada's provinces are included. This is the kind of mistake that students need to be taught how to avoid.
Posted by: Frances Woolley | January 08, 2012 at 05:08 PM
Unfortunately, doing (applied) research is left as an afterthought in the typical undergraduate program in economics. While it is certainly reasonable for some "research methods" (in Frances' wide sense) to be covered in econometrics courses, the reality is that such an approach offers "too little, too late" (most undergrad students will take only one or two econometrics courses, and not until their 3rd or 4th year). If it were up to me, students would be required to take dedicated "research methods" courses in EACH year of their undergraduate program (econometrics courses would then be reserved for teaching the "meat and potatoes" of estimation and inference).
More generally, I (predictably) blame my non-econometrician colleagues for this: In what other discipline do empirical results not even warrant a mention in introductory-level textbooks? Can you imagine an introductory-level textbook in, e.g., psychology or physics that did not even mention the result of any actual experiments? (Yes, I know that intro macro texts typically include a scatterplot when trying to illustrate the Phillips curve, but I can't think of another example.)
Posted by: econometrician | January 08, 2012 at 07:17 PM
econometrician: "If it were up to me, students would be required to take dedicated "research methods" courses in EACH year of their undergraduate program"
What would the course cover, and what would be the background of the people who taught the course?
Posted by: Frances Woolley | January 08, 2012 at 08:22 PM
Actually, I wonder if we shouldn't be putting more econometrics in economics. For example, something along the lines of what is actually involved in applying the theory you just talked about to the data. Almost all of us already know enough about the econometrics of the basic models we teach that it shouldn't be that hard. If there are thorny theoretical issues - say, the distinction between censored and truncated data - you can say "you'll learn how to deal with this in your econometrics course".
Posted by: Stephen Gordon | January 08, 2012 at 08:55 PM
Actually, maybe that just works out to having an applied research component in *all* courses, not just dedicated ones. It's something we should always be thinking about.
Posted by: Stephen Gordon | January 08, 2012 at 08:57 PM
frances: "What would the course cover, and what would be the background of the people who taught the course?"
Again, it would not be a course, but several courses. More importantly, the point would be to integrate an "applied" approach into the broader curriculum, rather than relegate it to a few select courses (I might not go as far as Stephen and suggest that an applied component be included in ALL courses, but it could very well being incorporated into MOST courses).
Now, as for the actual "research methods" courses, I have no strong preference about what is covered exactly, so long as it something that gets students actually working with data. I would hope that anyone who uses any data in their research would be happy to teach such courses (and also to incorporate some of that data in teaching other "field" courses that they may teach, say, environmental economics).
On a related note, isn't it sad that the world "applied" has such a negative connotation in the economics profession? Could there be some correlation between this and the fact that almost all students are repulsed by the idea of actually working with data?
Posted by: econometrician | January 08, 2012 at 10:45 PM
Econometrician - although I agree with much of what you say, e.g. about the unfortunate negative connotations of applied, and the importance of making connections to the real world in all courses (and in our research as well), I find the overall prescription discouraging.
Econometrics, like economic theory, suffers from death-through-spiral-curriculum. I took a look at the courses outline for our intro stats course and our advanced stats course. Basically both of them cover the same material as is covered in high school data management courses. It's the same material over and over again, at ever increasing levels of technical sophistication. (Ditto micro theory. Perhaps macro theory too, I don't know).
So I'm reluctant to put the applied material into the field courses, which are the courses that offer some glimpse into now economics is applied to areas like analysis of environmental policy, government policy, international trade, competition, etc. Couldn't we just teach, say, consumer theory or probability theory one less time, and make room for some research methods instead?
Posted by: Frances Woolley | January 10, 2012 at 10:17 AM
"But today students start running regressions in their high school data management classes."
Is this really true? If so, this is amazing!
Posted by: Dan Dutton | January 10, 2012 at 12:12 PM
Dan - just do a scatterplot in Excel and get it to run a line through it. It'll do a single variable regression for you, and churn out an R squared.
At my kids' high school, they used another simple package, in addition to Excel. But, yes, the kids were doing little regression exercises. Some where problematic, e.g. there was an example the teacher used with internet connections and crime rates showing a strong negative correlation, and she really didn't get into the whole spurious correlation discussion, which seemed to me an obvious possible explanation of the results.
It's easy for grey-haired ones to complain about all of the things that students don't know - how to use a telephone directory, for example, or do mental arithmetic. What we tend to ignore is the things that students can whip our butts at - like learning how to use software.
Posted by: Frances Woolley | January 10, 2012 at 12:25 PM
"...Today kids start running regressions in their high school data management classes".
More anecdotal evidence, I guess; I don't think my daughter's high school had any such courses, and certainly her math courses didn't devote any significant time to this. Does the Ontario curriculum cover this for most university-bound students?
Posted by: Linda | January 10, 2012 at 01:47 PM
Linda, actually there's three grade 12 math courses in Ontario - data management, functions, and calculus. A science-y type kid would take functions and calculus - unless she wanted to take data management for an easy A to boost her GPA.
But BC might well be different.
Posted by: Frances Woolley | January 10, 2012 at 02:07 PM
I say that's amazing because there is probably a lot of consternation in students about doing these things that are called "regressions" when they eventually reach this class called "econometrics". If it becomes something they are familiar with in high school than that's another layer of anxiety dealt with before they have to start thinking about things like unbiasedness of estimators.
Posted by: Dan Dutton | January 10, 2012 at 05:16 PM