Give an economist a problem such as "Why do people waste their time playing video games?" and she'll typically model it like this:
Assume that individual i has a utility function Ui that depends upon time spent gaming, tg, and income, Y:
The individual is assumed to allocate all time available (T) to time spent gaming and time spent working (tw) so tg+tw=T. If the individual's wage rate is given by w and non-labour income is taken to be zero, utility maximization is subject to the budget constraint:
Graphically, the individual's utility maximization problem can be represented as follows:
Now if someone from the business school was presented with the same problem, they would probably come up with some kind of flow chart, showing time spent gaming as the result of the interaction between market forces and individual preferences. It might looks something like this, only prettier:
The dependent variable (the left hand side variable, the thing to be explained) goes in the middle of the chart. The explanatory (right hand side, independent) variables go on one (or both) sides of the diagram.
The first thing to think about is the relationship between the explanatory variables and the dependent variable. Does the arrow go one way - from the explanatory to the dependent - or both ways? Why?
The next step is to consider possible relationships between explanatory variables. Are the explanatory variables related to each other? Are there arrows between the various explanatory variables?
Adding in other possible causal relationships creates a diagram that looks something like this:
First, be careful about drawing conclusions. A lot of the time, the arrows go both ways. One might find, for example, that people with low wages spend more time playing video games. But does that mean low wages cause gaming or gaming causes low wages? It's impossible to tell.
Second, be extremely careful about model specification. Suppose, in the example above, someone ran a regression to explain time spent, but omitted "urban/rural" as an explanatory variable. Because urban/rural status is correlated with the wage rate, the wage rate variable will pick up some of the effect of the urban/rural status variable, if the later is omitted. It matters a lot which variables are included/excluded in a model, yet there are no simple rules for which variables to include/exclude.
Third, theory matters. I could throw the number of carrots eaten per week into a regression explaining time spent video gaming, and it might well come up as statistically significant. But just throwing in explanatory variables and seeing which one fits is a surefire way of picking up spurious relationships - patterns that exist in the data, but have no causal or other significance.
Finally, the rare and special variables are ones that only have arrows coming out of them - things like, in this example, age and gender. These are the variables that can - with luck - help identify causal relationships; the ones to treasure and savour.