I'm teaching two econometrics classes this term (master's and PhD), and I just covered the parts on asymptotic theory. In both lectures, I stumbled badly at explaining the difference between these two forms of convergence: my heart simply isn't in it.

I've never heard of empirical study where the distinction between the two forms of convergence made a difference. Heck, I can't even come up with an example a sequence that converges in probability but not almost surely. And don't get me started on quadratic convergence, convergence in r^{th} mean and other exotica.

Why do we force our students to learn these things?

I confess I have never heard of this distinction (must be too long ago I took econometrics, or I never took enough). What is the difference? Or, is it part of a jobs stimulus package for econometricians? ;-)

Posted by: Nick Rowe | January 28, 2010 at 05:03 PM

That's the thing: there is a difference, but I can't offer an intuitive explanation. When you look at the mathematical representations of the definitions, you can see how one is stronger than the other, but I can't make anyone care about the difference.

Especially myself.

Posted by: Stephen Gordon | January 28, 2010 at 05:22 PM

i wouldn't expect it to make any practical difference. it's more of an intellectual foundation, which can be used to cover your intellectual behind if someone has a technical question.

Posted by: adjacent | January 28, 2010 at 05:22 PM

Wikipedia ( http://en.wikipedia.org/wiki/Convergence_of_random_variables#Definition_3 ) to the rescue:

"Suppose a person takes a bow and starts shooting arrows at a target. Let Xn be his score in n-th shot. Initially he will be very likely to score zeros, but as the time goes and his archery skill increases, he will become more and more likely to hit the bullseye and score 10 points. After the years of practice the probability that he hit anything but 10 will be getting increasingly smaller and smaller. Thus, the sequence Xn converges in probability to X = 10.

Note that Xn does not converge almost surely however. No matter how professional the archer becomes, there will always be a small probability of making an error. Thus the sequence {Xn} will never turn stationary: there will always be non-perfect scores in it, even if they are becoming increasingly less frequent."

Also, almost sure convergence implies convergence in probability.

Posted by: Min | January 28, 2010 at 05:33 PM

Assume your measure is just lebesgue measure on the unit interval, and identify the end-points so that you are on a circle, to make the notation easy.

To converge in probability means that the length of the bad sets goes to zero. Define a bad interval, I_n to be of length 1/n. Then if your random variable sequence is just the characteristic function of the bad interval, you converge in probability, *regardless* of where that bad interval is positioned on the circle.

To converge almost surely means that you converge point-wise except at bad points of measure zero.

The difference is that if you are converging almost surely, the good points are nailed down and the tail of the interval cannot move around. If you are converging in probability the interval can move around, hitting all the points over and over again, so for *no* point do you converge pointwise, I.e. you do not have almost sure convergence.

An example is if your random variables are just the characteristic functions of intervals determined by the angles [n, n+1/n). The area of non-convergence to zero is 1/n which goes to zero in length, but each given point on the circle will be 1 infinitely many times, so the set of points where convergence does not happen has measure 1.

Posted by: RSJ | January 28, 2010 at 05:43 PM

The most useful intuitive understanding I've been taught is that almost sure convergence guarantees that X_n be far from X (ie. further than any epsilon) only a finite number of times. Convergence in probability leaves open the possibility that X_n will be far from X an infinite number of times.

The best example I have to illustrate that is if you take Y_n as a Bernoulli(1/n) random variable. Clearly Y_n converges to 0 in probability, but it doesn't converge almost surely. Y_n will always be 1 for an infinite number of n's. You can see this from the second Borel-Cantelli Lemma.

Of course, I've got no idea if the distinction has any practical relevance for econometrics.

Posted by: Ryan | January 28, 2010 at 05:49 PM

Min - That's *so close* to being an actual example I could use!

What it needs is some structure to characterise the distributions of the Xn and how they satisfy one definition but not the other.

I suppose I could work on doing it for myself, but first I'd need an answer to the question in the title.

Posted by: Stephen Gordon | January 28, 2010 at 05:52 PM

Ryan - many thanks. That's something that I could probably explain fairly easily.

Posted by: Stephen Gordon | January 28, 2010 at 05:57 PM

A good way to motivate this and, I think, the proper way, is to tell them that there's little if any difference but that depending on the estimator property you're tying to prove, you'll find it a lot easier to prove one rather than another.

Posted by: Agent Continuum | January 28, 2010 at 06:38 PM

Are there cases where almost sure convergence is easier to prove than convergence in probability?

Posted by: Stephen Gordon | January 28, 2010 at 07:07 PM

"A good way to motivate this and, I think, the proper way, is to tell them that there's little if any difference"

No! Convergence in probability is a form of weak convergence. Your students should understand the difference between convergence and weak convergence -- the difference is huge. If you have a sequence x_n, then weak convergence means that f(x_n) --> L for some f. This does not mean that x_n converges, but only that some attribute converges.

For example, you can ask, given N asset prices, if the sum of these prices converges to 1, does that mean that each individual asset price converges to something? No. Here f is the operation of taking the sum. It could be average, variance, integration against a test function, the infimum of a large set of integrations against test functions, whatever. But don't perpetuate the stereotype of people using math that they don't understand. Weak convergence, point-wise convergence, and uniform convergence are different concepts and useful ideas to understand, and they appear over and over again in different forms whatever branch of math you are studying. If you want your students to understand, then explain the underlying concepts and show them with several examples what is going on.

Posted by: RSJ | January 28, 2010 at 08:26 PM

"Why do we force our students to learn these things"

Dunno... Never did much of this type of math ... does the type of convergence have implications for modelling? e.g. rule in/out a given stochastic process?

Posted by: Patrick | January 29, 2010 at 12:49 AM

"Why do we force our students to learn these things?"

Because we want them to understand what they are doing.

Posted by: Adam P | January 29, 2010 at 11:04 AM

I've always thought that convergence in mean square was easy to state and often fairly easy to prove (show that the bias and variance both go to zero). Moreover, the verification proceeds more or less the same whether the data are generated from i.i.d, i.n.i.d, or some sort of dependent sequence (unlike for Laws of Large Numbers). Since it implies weak convergence, I like to introduce it to my students.

I agree that the distinction between strong and weak convergence isn't very important in econometric applications; it only matters if you are interested in the path properties of your random sequence (will it eventually enter into the epsilon sleeve and stay there--Ryan is spot on). But weak convergence requires less and is all we need for asymptotic approximations to distributions of estimators.

Posted by: Angelo Melino | January 29, 2010 at 12:32 PM

I'm rusty on these things, but I recall thinking that while it may not make a difference for macro, "normal sized" things, nonetheless when you're studying things broken down into 1/1 bazillionth size, then how and if they converge at that size can make a difference when you multiply by 1 bazillion to get back to "normal sized" things.

So, in other words, I recall thinking it can make a difference in continuous time finance models.

Posted by: Richard H. Serlin | January 29, 2010 at 01:29 PM

Yeah, it's as I remembered.

It matters for continuous time finance.

This is from, "An Introduction to the Mathematics of Financial Derivatives", 2nd Edition, 2000, by Salih N. Neftci:

Mean Square (m.s.) convergence is important because the Ito Integral is defined as the mean square limit of a certain sum. If one uses other definitions of convergence, this limit may not exist...Example Let St be an asset price observed at equidistant time points...It turns out that if St is a Weiner process, then Xn will not converge almost surely, but a mean square limit will exist. Hence the type of approximation one uses will make a difference. This important point is taken up during the discussion of the Ito integral in later chapters. (pages 113-114)

Posted by: Richard H. Serlin | January 29, 2010 at 02:06 PM

Fair enough, but I'm teaching econometrics, not continuous-time finance. And yet econometrics textbooks seem inordinately concerned with these distinctions:

It's a weakly convergent estimator! It's a strongly convergent estimator! It's a dessert topping!

Posted by: Stephen Gordon | January 29, 2010 at 04:26 PM

1: "Why do we force our students to learn these things?"

2: Because we want them to understand what they are doing.

It's not what I do and I learned it. I understand the ideas behind it still and it has no relevance to the research I do.

Posted by: D. Watson | January 29, 2010 at 04:33 PM

Why? Barriers to entry.

Of course, that answer doesn't sound so good after reading RSJ's and Serlins posts.

So Nick, if you weren't fumbling this material, what in the world of econometrics would you rather be teaching? Diagnostic techniques? Bootstrapping? Monte-Carlo simulation? Semi-parametric estimation?

Posted by: westslope | January 31, 2010 at 01:37 AM

See the 4th post in this topic.

http://www.econjobrumors.com/topic.php?id=4410&page=20

Posted by: Daniil | February 02, 2010 at 04:16 PM

ConvergenceConcepts : an R Package to Investigate Various Modes of Convergence

Pierre Lafaye de Micheaux and Benoit Liquet

http://journal.r-project.org/archive/2009-2/RJournal_2009-2_Lafaye~de~Micheaux+Liquet.pdf

Posted by: mtm | February 07, 2010 at 12:24 AM

One additional reason for the distinction is the continuous mapping theorem.

It is almost trivial that if Xn converges to X almost surely and f is continuous almost everywhere, f(Xn) converges to f(X) almost surely. What we need in practice is versions of this that only assume convergence in probability or distribution. The easiest way to prove the continuous mapping theorem for convergence in probability or in distribution seems to be via representation theorems that convert your sequence Xn into a sequence Yn that converges almost surely.

Posted by: thomas | February 15, 2010 at 10:01 PM