 You can follow this conversation by subscribing to the comment feed for this post.

I confess I have never heard of this distinction (must be too long ago I took econometrics, or I never took enough). What is the difference? Or, is it part of a jobs stimulus package for econometricians? ;-)

That's the thing: there is a difference, but I can't offer an intuitive explanation. When you look at the mathematical representations of the definitions, you can see how one is stronger than the other, but I can't make anyone care about the difference.

Especially myself.

i wouldn't expect it to make any practical difference. it's more of an intellectual foundation, which can be used to cover your intellectual behind if someone has a technical question.

Wikipedia ( http://en.wikipedia.org/wiki/Convergence_of_random_variables#Definition_3 ) to the rescue:

"Suppose a person takes a bow and starts shooting arrows at a target. Let Xn be his score in n-th shot. Initially he will be very likely to score zeros, but as the time goes and his archery skill increases, he will become more and more likely to hit the bullseye and score 10 points. After the years of practice the probability that he hit anything but 10 will be getting increasingly smaller and smaller. Thus, the sequence Xn converges in probability to X = 10.
Note that Xn does not converge almost surely however. No matter how professional the archer becomes, there will always be a small probability of making an error. Thus the sequence {Xn} will never turn stationary: there will always be non-perfect scores in it, even if they are becoming increasingly less frequent."

Also, almost sure convergence implies convergence in probability.

Assume your measure is just lebesgue measure on the unit interval, and identify the end-points so that you are on a circle, to make the notation easy.

To converge in probability means that the length of the bad sets goes to zero. Define a bad interval, I_n to be of length 1/n. Then if your random variable sequence is just the characteristic function of the bad interval, you converge in probability, *regardless* of where that bad interval is positioned on the circle.

To converge almost surely means that you converge point-wise except at bad points of measure zero.

The difference is that if you are converging almost surely, the good points are nailed down and the tail of the interval cannot move around. If you are converging in probability the interval can move around, hitting all the points over and over again, so for *no* point do you converge pointwise, I.e. you do not have almost sure convergence.

An example is if your random variables are just the characteristic functions of intervals determined by the angles [n, n+1/n). The area of non-convergence to zero is 1/n which goes to zero in length, but each given point on the circle will be 1 infinitely many times, so the set of points where convergence does not happen has measure 1.

The most useful intuitive understanding I've been taught is that almost sure convergence guarantees that X_n be far from X (ie. further than any epsilon) only a finite number of times. Convergence in probability leaves open the possibility that X_n will be far from X an infinite number of times.

The best example I have to illustrate that is if you take Y_n as a Bernoulli(1/n) random variable. Clearly Y_n converges to 0 in probability, but it doesn't converge almost surely. Y_n will always be 1 for an infinite number of n's. You can see this from the second Borel-Cantelli Lemma.

Of course, I've got no idea if the distinction has any practical relevance for econometrics.

Min - That's *so close* to being an actual example I could use!

What it needs is some structure to characterise the distributions of the Xn and how they satisfy one definition but not the other.

I suppose I could work on doing it for myself, but first I'd need an answer to the question in the title.

Ryan - many thanks. That's something that I could probably explain fairly easily.

A good way to motivate this and, I think, the proper way, is to tell them that there's little if any difference but that depending on the estimator property you're tying to prove, you'll find it a lot easier to prove one rather than another.

Are there cases where almost sure convergence is easier to prove than convergence in probability?

"A good way to motivate this and, I think, the proper way, is to tell them that there's little if any difference"

No! Convergence in probability is a form of weak convergence. Your students should understand the difference between convergence and weak convergence -- the difference is huge. If you have a sequence x_n, then weak convergence means that f(x_n) --> L for some f. This does not mean that x_n converges, but only that some attribute converges.

For example, you can ask, given N asset prices, if the sum of these prices converges to 1, does that mean that each individual asset price converges to something? No. Here f is the operation of taking the sum. It could be average, variance, integration against a test function, the infimum of a large set of integrations against test functions, whatever. But don't perpetuate the stereotype of people using math that they don't understand. Weak convergence, point-wise convergence, and uniform convergence are different concepts and useful ideas to understand, and they appear over and over again in different forms whatever branch of math you are studying. If you want your students to understand, then explain the underlying concepts and show them with several examples what is going on.

"Why do we force our students to learn these things"

Dunno... Never did much of this type of math ... does the type of convergence have implications for modelling? e.g. rule in/out a given stochastic process?

"Why do we force our students to learn these things?"

Because we want them to understand what they are doing.

I've always thought that convergence in mean square was easy to state and often fairly easy to prove (show that the bias and variance both go to zero). Moreover, the verification proceeds more or less the same whether the data are generated from i.i.d, i.n.i.d, or some sort of dependent sequence (unlike for Laws of Large Numbers). Since it implies weak convergence, I like to introduce it to my students.
I agree that the distinction between strong and weak convergence isn't very important in econometric applications; it only matters if you are interested in the path properties of your random sequence (will it eventually enter into the epsilon sleeve and stay there--Ryan is spot on). But weak convergence requires less and is all we need for asymptotic approximations to distributions of estimators.

I'm rusty on these things, but I recall thinking that while it may not make a difference for macro, "normal sized" things, nonetheless when you're studying things broken down into 1/1 bazillionth size, then how and if they converge at that size can make a difference when you multiply by 1 bazillion to get back to "normal sized" things.

So, in other words, I recall thinking it can make a difference in continuous time finance models.

Yeah, it's as I remembered.

It matters for continuous time finance.

This is from, "An Introduction to the Mathematics of Financial Derivatives", 2nd Edition, 2000, by Salih N. Neftci:

Mean Square (m.s.) convergence is important because the Ito Integral is defined as the mean square limit of a certain sum. If one uses other definitions of convergence, this limit may not exist...Example Let St be an asset price observed at equidistant time points...It turns out that if St is a Weiner process, then Xn will not converge almost surely, but a mean square limit will exist. Hence the type of approximation one uses will make a difference. This important point is taken up during the discussion of the Ito integral in later chapters. (pages 113-114)

Fair enough, but I'm teaching econometrics, not continuous-time finance. And yet econometrics textbooks seem inordinately concerned with these distinctions:

It's a weakly convergent estimator! It's a strongly convergent estimator! It's a dessert topping!

1: "Why do we force our students to learn these things?"

2: Because we want them to understand what they are doing.

It's not what I do and I learned it. I understand the ideas behind it still and it has no relevance to the research I do.

Why? Barriers to entry.

Of course, that answer doesn't sound so good after reading RSJ's and Serlins posts.

So Nick, if you weren't fumbling this material, what in the world of econometrics would you rather be teaching? Diagnostic techniques? Bootstrapping? Monte-Carlo simulation? Semi-parametric estimation?

See the 4th post in this topic.
http://www.econjobrumors.com/topic.php?id=4410&page=20

ConvergenceConcepts : an R Package to Investigate Various Modes of Convergence
Pierre Lafaye de Micheaux and Benoit Liquet

http://journal.r-project.org/archive/2009-2/RJournal_2009-2_Lafaye~de~Micheaux+Liquet.pdf

One additional reason for the distinction is the continuous mapping theorem.

It is almost trivial that if Xn converges to X almost surely and f is continuous almost everywhere, f(Xn) converges to f(X) almost surely. What we need in practice is versions of this that only assume convergence in probability or distribution. The easiest way to prove the continuous mapping theorem for convergence in probability or in distribution seems to be via representation theorems that convert your sequence Xn into a sequence Yn that converges almost surely.

The comments to this entry are closed.

• WWW