Английская Википедия:Glivenko–Cantelli theorem

Материал из Онлайн справочника
Перейти к навигацииПерейти к поиску

Шаблон:Short description

Файл:Donsker theorem for uniform distributions.gif
The left diagram illustrates Glivenko–Cantelli theorem for uniform distributions. The right diagram illustrates the Donsker–Skorokhod–Kolmogorov theorem
Файл:Donsker theorem for normal distributions.gif
The same diagram for normal distributions

In the theory of probability, the Glivenko–Cantelli theorem (sometimes referred to as the Fundamental Theorem of Statistics), named after Valery Ivanovich Glivenko and Francesco Paolo Cantelli, describes the asymptotic behaviour of the empirical distribution function as the number of independent and identically distributed observations grows.[1] Specifically, the empirical distribution function converges uniformly to the true distribution function almost surely.

The uniform convergence of more general empirical measures becomes an important property of the Glivenko–Cantelli classes of functions or sets.[2] The Glivenko–Cantelli classes arise in Vapnik–Chervonenkis theory, with applications to machine learning. Applications can be found in econometrics making use of M-estimators.

Statement

Assume that <math>X_1,X_2,\dots</math> are independent and identically distributed random variables in <math>\mathbb{R}</math> with common cumulative distribution function <math>F(x)</math>. The empirical distribution function for <math>X_1,\dots,X_n</math> is defined by

<math>F_n(x)=\tfrac{1}{n}\sum_{i=1}^n I_{[X_i, \infty)}(x) = \tfrac{1}{n}\ \biggl|\left\{\ i \ \mid X_i \leq x, \ 1 \leq i \leq n \right\}\biggr|</math>

where <math>I_C</math> is the indicator function of the set <math>\ C ~.</math> For every (fixed) <math>\ x\ ,</math> <math>\ F_n(x)\ </math> is a sequence of random variables which converge to <math>F(x)</math> almost surely by the strong law of large numbers. Glivenko and Cantelli strengthened this result by proving uniform convergence of <math>\ F_n\ </math> to <math>\ F ~.</math>

Theorem

<math>\|F_n - F\|_\infty = \sup_{x \in \mathbb{R}} \biggl|F_n(x) - F(x)\biggr| \longrightarrow 0</math> almost surely.[3]Шаблон:Rp

This theorem originates with Valery Glivenko[4] and Francesco Cantelli,[5] in 1933.

Remarks

Proof

For simplicity, consider a case of continuous random variable <math>X</math>. Fix <math>-\infty =x_0<x_1<\cdots <x_{m-1}<x_m=\infty</math> such that <math>F(x_j)-F(x_{j-1})=\frac{1}{m}</math> for <math>j=1,\dots,m</math>. Now for all <math>x \in \mathbb{R}</math> there exists <math>j \in \{1,\dots,m\}</math> such that <math>x \in [x_{j-1},x_j]</math>.

<math>\begin{align}

F_n(x)-F(x) &\leq F_n(x_j)-F(x_{j-1}) = F_n(x_j)-F(x_j)+\frac1m,\\ F_n(x)-F(x) &\geq F_n(x_{j-1})-F(x_j) = F_n(x_{j-1})-F(x_{j-1})-\frac1m. \end{align} </math>

Therefore,

<math>\|F_n-F\|_\infty = \sup_{x\in \mathbb{R}}|F_n(x)-F(x)| \leq \max_{j\in\{1,\dots,m\}} |F_n(x_j)-F(x_j)| + \frac1m.</math>

Since <math display="inline">\max_{j\in\{1,\dots,m\}} |F_n(x_j)-F(x_j)| \to 0 \text{ a.s.}</math> by strong law of large numbers, we can guarantee that for any positive <math display="inline">\varepsilon</math> and any integer <math display="inline">m</math> such that <math display="inline">1/m<\varepsilon</math>, we can find <math display="inline">N</math> such that for all <math>n \geq N</math>, we have <math display="inline">\max_{j\in\{1,\dots,m\}} |F_n(x_j)-F(x_j)|\leq \varepsilon-1/m \text{ a.s.}</math>. Combined with the above result, this further implies that <math display="inline">\|F_n-F\|_\infty \leq \varepsilon \text{ a.s.}</math>, which is the definition of almost sure convergence.

Empirical measures

One can generalize the empirical distribution function by replacing the set <math>(-\infty,x]</math> by an arbitrary set C from a class of sets <math>\mathcal{C}</math> to obtain an empirical measure indexed by sets <math>C \in \mathcal{C}.</math>

<math>P_n(C)=\frac{1}{n} \sum_{i=1}^n I_C(X_i), C\in\mathcal{C}</math>

Where <math>I_C(x)</math> is the indicator function of each set <math>C</math>.

Further generalization is the map induced by <math>P_n</math> on measurable real-valued functions f, which is given by

<math>f\mapsto P_nf=\int_Sf \, dP_n = \frac 1 n \sum_{i=1}^n f(X_i), f\in\mathcal{F}.</math>

Then it becomes an important property of these classes whether the strong law of large numbers holds uniformly on <math>\mathcal{F}</math> or <math>\mathcal{C}</math>.

Glivenko–Cantelli class

Consider a set <math>\ \mathcal{S}\ </math> with a sigma algebra of Borel subsets Шаблон:Mvar and a probability measure <math>\ \mathbb{P} ~.</math> For a class of subsets,

<math> \mathcal{C} \subset \Bigl\{ C: C \text{ is measurable subset of }\mathcal{S} \Bigr\} </math>

and a class of functions

<math> \mathcal{F} \subset \Bigl\{ f:\mathcal{S}\to \mathbb{R}, f \mbox{ is measurable}\ \Bigr\} </math>

define random variables

<math> \Bigl\| \mathbb{P}_n - \mathbb{P} \Bigr\|_{\mathcal C} = \sup_{C \in {\mathcal C}} \Bigl| \mathbb{P}_n(C) - \mathbb{P}(C) \Bigr| </math>
<math> \Bigl\| \mathbb{P}_n - \mathbb{P} \Bigr\|_{\mathcal F} = \sup_{f \in {\mathcal F}} \Bigl| \mathbb{P}_n f - \mathbb{P} f \Bigr| </math>

where <math>\ \mathbb{P}_n(C)\ </math> is the empirical measure, <math>\ \mathbb{P}_n f\ </math> is the corresponding map, and

<math>\ \mathbb{P} f = \int_\mathcal{S} f \ \mathrm{d}\mathbb{P}\ ,</math> assuming that it exists.

Definitions

  • A class <math>\ \mathcal C\ </math> is called a Glivenko–Cantelli class (or GC class, or sometimes strong GC class) with respect to a probability measure Шаблон:Mvar if
<math>\ \Bigl\| \mathbb{P}_n - \mathbb{P} \Bigr\|_\mathcal{C} \to 0\ </math> almost surely as <math>\ n \to \infty ~.</math>
  • A class is <math>\ \mathcal C\ </math> is a weak Glivenko-Cantelli class with respect to Шаблон:Mvar if it instead satisfies the weaker condition
<math>\ \Bigl\| \mathbb{P}_n - \mathbb{P} \Bigr\|_\mathcal{C} \to 0\ </math> in probability as <math>\ n \to \infty ~.</math>
  • A class is called a universal Glivenko–Cantelli class if it is a GC class with respect to any probability measure <math>\mathbb{P}</math> on <math>(\mathcal{S}, A)</math>.
  • A class is a weak uniform Glivenko–Cantelli class if the convergence occurs uniformly over all probability measures <math>\mathbb{P}</math> on <math>(\mathcal{S}, A)</math>: For every <math>\varepsilon > 0</math>,
<math>\ \sup_{\mathbb{P} \in \mathbb{P}(\mathcal{S},A)} \Pr\left(\Bigl\| \mathbb{P}_n - \mathbb{P} \Bigr\|_\mathcal{C} > \varepsilon\right) \to 0\ </math> as <math>\ n \to \infty ~.</math>
  • A class is a (strong) uniform Glivenko-Cantelli class if it satisfies the stronger condition that for every <math>\varepsilon > 0</math>,
<math>\ \sup_{\mathbb{P} \in \mathbb{P}(\mathcal{S},A)} \Pr\left(\sup_{m \geq n} \Bigl\| \mathbb{P}_m - \mathbb{P} \Bigr\|_\mathcal{C} > \varepsilon\right) \to 0\ </math> as <math>\ n \to \infty ~.</math>

Glivenko–Cantelli classes of functions (as well as their uniform and universal forms) are defined similarly, replacing all instances of <math>\mathcal{C}</math> with <math>\mathcal{F}</math>.

The weak and strong versions of the various Glivenko-Cantelli properties often coincide under certain regularity conditions. The following definition commonly appears in such regularity conditions:

  • A class of functions <math>\mathcal{F}</math> is image-admissible Suslin if there exists a Suslin space <math>\Omega</math> and a surjection <math>T:\Omega \rightarrow \mathcal{F}</math> such that the map <math>(x, y) \mapsto [T(y)](x)</math> is measurable <math>\mathcal{X}\times\Omega</math>.
  • A class of measurable sets <math>\mathcal{C}</math> is image-admissible Suslin if the class of functions <math>\{\mathbf{1}_C \mid C\in\mathcal{C}\}</math> is image-admissible Suslin, where <math>\mathbf{1}_C</math> denotes the indicator function for the set <math>C</math>.


Theorems

The following two theorems give sufficient conditions for the weak and strong versions of the Glivenko-Cantelli property to be equivalent.

Theorem (Talagrand, 1987)[6]

Let <math>\mathcal{F}</math> be a class of functions that is integrable <math>\mathbb{P}</math>, and define <math>\mathcal{F}_0 = \{f - \mathbb{P}f \mid f\in \mathcal{F}\}</math>. Then the following are equivalent:
  • <math>\mathcal{F}</math> is a weak Glivenko-Cantelli class and <math>\mathcal{F}_0</math> is dominated by an integrable function
  • <math>\mathcal{F}</math> is a Glivenko-Cantelli class


Theorem (Dudley, Giné, and Zinn, 1991)[7]

Suppose that a function class <math>\mathcal{F}</math> is bounded. Also suppose that the set <math>\mathcal{F}_0 = \{f - \inf f \mid f\in \mathcal{F}\}</math> is image-admissible Suslin. Then <math>\mathcal{F}</math> is a weak uniform Glivenko-Cantelli class if and only if it is a strong uniform Glivenko-Cantelli class.

The following theorem is central to statistical learning of binary classification tasks.

Theorem (Vapnik and Chervonenkis, 1968)[8]

Under certain consistency conditions, a universally measurable class of sets <math>\ \mathcal{C}\ </math> is a uniform Glivenko-Cantelli class if and only if it is a Vapnik–Chervonenkis class.

There exist a variety of consistency conditions for the equivalence of uniform Glivenko-Cantelli and Vapnik-Chervonenkis classes. In particular, either of the following conditions for a class <math>\mathcal{C}</math> suffice:[9]

  • <math>\mathcal{C}</math> is image-admissible Suslin.
  • <math>\mathcal{C}</math> is universally separable: There exists a countable subset <math>\mathcal{C_0}</math> of <math>\mathcal{C}</math> such that each set <math>C\in\mathcal{C}</math> can be written as the pointwise limit of sets in <math>\mathcal{C}_0</math>.

Examples

  • Let <math>S=\mathbb R</math> and <math>{\mathcal C}=\{(-\infty,t]:t\in {\mathbb R}\}</math>. The classical Glivenko–Cantelli theorem implies that this class is a universal GC class. Furthermore, by Kolmogorov's theorem,
<math>\sup_{P\in \mathcal{P}(S,A)}\|P_n-P\|_{\mathcal C} \sim n^{-1/2}</math>, that is <math>\mathcal{C}</math> is uniformly Glivenko–Cantelli class.
  • Let P be a nonatomic probability measure on S and <math>\mathcal{C}</math> be a class of all finite subsets in S. Because <math>A_n=\{X_1,\ldots,X_n\}\in \mathcal{C}</math>, <math>P(A_n)=0</math>, <math>P_n(A_n)=1</math>, we have that <math>\|P_n-P\|_{\mathcal C}=1</math> and so <math>\mathcal{C}</math> is not a GC class with respect to P.

See also

References

Шаблон:Reflist

Further reading

Шаблон:Refbegin

Шаблон:Refend