Английская Википедия:Dvoretzky–Kiefer–Wolfowitz inequality

The above chart shows an example application of the DKW inequality in constructing confidence bounds (in purple) around an empirical distribution function (in light blue). In this random draw, the true CDF (orange) is entirely contained within the DKW bounds.

In the theory of probability and statistics, the Dvoretzky–Kiefer–Wolfowitz–Massart inequality (DKW inequality) provides a bound on the worst case distance of an empirically determined distribution function from its associated population distribution function. It is named after Aryeh Dvoretzky, Jack Kiefer, and Jacob Wolfowitz, who in 1956 proved the inequality

<math>

\Pr\Bigl(\sup_{x\in\mathbb R} |F_n(x) - F(x)| > \varepsilon \Bigr) \le Ce^{-2n\varepsilon^2}\qquad \text{for every }\varepsilon>0. </math>

with an unspecified multiplicative constant C in front of the exponent on the right-hand side.^[1]

In 1990, Pascal Massart proved the inequality with the sharp constant C = 2,^[2] confirming a conjecture due to Birnbaum and McCarty.^[3] In 2021, Michael Naaman proved the multivariate version of the DKW inequality and generalized Massart's tightness result to the multivariate case, which results in a sharp constant of twice the dimension k of the space in which the observations are found: C = 2k.^[4]

The DKW inequality

Given a natural number n, let X₁, X₂, …, X_n be real-valued independent and identically distributed random variables with cumulative distribution function F(·). Let F_n denote the associated empirical distribution function defined by

<math>

   F_n(x) = \frac1n \sum_{i=1}^n \mathbf{1}_{\{X_i\leq x\}},\qquad x\in\mathbb{R}.
 </math>

so <math>F(x)</math> is the probability that a single random variable <math>X</math> is smaller than <math>x</math>, and <math>F_n(x)</math> is the fraction of random variables that are smaller than <math>x</math>.

The Dvoretzky–Kiefer–Wolfowitz inequality bounds the probability that the random function F_n differs from F by more than a given constant ε > 0 anywhere on the real line. More precisely, there is the one-sided estimate

<math>

   \Pr\Bigl(\sup_{x\in\mathbb R} \bigl(F_n(x) - F(x)\bigr) > \varepsilon \Bigr) \le e^{-2n\varepsilon^2}\qquad \text{for every }\varepsilon\geq\sqrt{\tfrac{1}{2n}\ln2},
 </math>

which also implies a two-sided estimate^[5]

<math>

   \Pr\Bigl(\sup_{x\in\mathbb R} |F_n(x) - F(x)| > \varepsilon \Bigr) \le 2e^{-2n\varepsilon^2}\qquad \text{for every }\varepsilon>0.
 </math>

This strengthens the Glivenko–Cantelli theorem by quantifying the rate of convergence as n tends to infinity. It also estimates the tail probability of the Kolmogorov–Smirnov statistic. The inequalities above follow from the case where F corresponds to be the uniform distribution on [0,1] ^[6] as F_n has the same distributions as G_n(F) where G_n is the empirical distribution of U₁, U₂, …, U_n where these are independent and Uniform(0,1), and noting that

<math>

   \sup_{x\in\mathbb R} |F_n(x) - F(x)| \; \stackrel{d}{=} \; \sup_{x \in \mathbb R} | G_n (F(x)) - F(x) | \le \sup_{0 \le t \le 1} | G_n (t) -t | ,
 </math>

with equality if and only if F is continuous.

Multivariate case

In the multivariate case, X₁, X₂, …, X_n is an i.i.d. sequence of k-dimensional vectors. If F_n is the multivariate empirical cdf, then

<math>

   \Pr\Bigl(\sup_{t\in\mathbb R^k} |F_n(t) - F(t)| > \varepsilon \Bigr) \le (n+1)ke^{-2n\varepsilon^2} 
 </math>

for every ε, n, k > 0. The (n + 1) term can be replaced with a 2 for any sufficiently large n.^[4]

Kaplan–Meier estimator

The Dvoretzky–Kiefer–Wolfowitz inequality is obtained for the Kaplan–Meier estimator which is a right-censored data analog of the empirical distribution function

<math>

   \Pr\Bigl(\sqrt n\sup_{t\in[0,\infty)} |(1-G(t))(F_n(t) - F(t))| > \varepsilon \Bigr) \le 2.5 e^{-2\varepsilon^2 + C\varepsilon} 
 </math>

for every <math>\varepsilon > 0</math> and for some constant <math>C <\infty</math>, where <math>F_n</math> is the Kaplan–Meier estimator, and <math>G</math> is the censoring distribution function.^[7]

Building CDF bands

Шаблон:See also

The Dvoretzky–Kiefer–Wolfowitz inequality is one method for generating CDF-based confidence bounds and producing a confidence band, which is sometimes called the Kolmogorov–Smirnov confidence band. The purpose of this confidence interval is to contain the entire CDF at the specified confidence level, while alternative approaches attempt to only achieve the confidence level on each individual point, which can allow for a tighter bound. The DKW bounds runs parallel to, and is equally above and below, the empirical CDF. The equally spaced confidence interval around the empirical CDF allows for different rates of violations across the support of the distribution. In particular, it is more common for a CDF to be outside of the CDF bound estimated using the DKW inequality near the median of the distribution than near the endpoints of the distribution.

The interval that contains the true CDF, <math>F(x)</math>, with probability <math>1-\alpha</math> is often specified as

<math>

   F_n(x) - \varepsilon \le F(x) \le F_n(x) + \varepsilon \; \text{ where } \varepsilon = \sqrt{\frac{\ln\frac{2}{\alpha}}{2n}}
 </math>

which is also a special case of the asymptotic procedure for the multivariate case,^[4] whereby one uses the following critical value

<math>

 \frac{d(\alpha,k)}{\sqrt n}  = \sqrt{\frac{\ln\frac{2k}{\alpha}}{2n}}
 </math>

for the multivariate test; one may replace 2k with k(n + 1) for a test that holds for all n; moreover, the multivariate test described by Naaman can be generalized to account for heterogeneity and dependence.

References

Шаблон:Reflist

[Dvoretzky-1] Шаблон:Citation

[Massart-2] Шаблон:Citation

[3] Шаблон:Cite journal

[:0-4] 4,0 ^4,1 ^4,2 Шаблон:Cite journal

[Kosorok-5] Шаблон:Citation

[Shorack-6] Шаблон:Citation

[Bitouze-7] Шаблон:Citation

[1]

[2]

[3]

[4]

[5]

[6]

[7]

Партнерские ресурсы
Криптовалюты	Обмен криптовалют - www.bestchange.ru Криптовалютная биржа CoinEx Криптовалютная биржа Binance HIVE OS - операционная система для майнинга e4pool - Мультивалютный пул для майнинга.
Магазины	AliExpress — глобальная виртуальная (в Интернете) торговая площадка, предоставляющая возможность покупать товары производителей из КНР; computeruniverse.net - Интернет-магазин компьютеров(Промо код 5 Евро на первую покупку:FWWC3ZKQ);
Хостинг	DigitalOcean - американский провайдер облачных инфраструктур, с главным офисом в Нью-Йорке и с центрами обработки данных по всему миру;
Разное	Викиум - Онлайн-тренажер для мозга Like Центр - Центр поддержки и развития предпринимательства. Gamersbay - лучший магазин по бустингу для World of Warcraft. Ноотропы OmniMind N°1 - Усиливает мозговую активность. Повышает мотивацию. Улучшает память. Санкт-Петербургская школа телевидения - это федеральная сеть образовательных центров, которая имеет филиалы в 37 городах России. Lingualeo.com — интерактивный онлайн-сервис для изучения и практики английского языка в увлекательной игровой форме. Junyschool (Джунискул) – международная школа программирования и дизайна для детей и подростков от 5 до 17 лет, где ученики осваивают компьютерную грамотность, развивают алгоритмическое и креативное мышление, изучают основы программирования и компьютерной графики, создают собственные проекты: игры, сайты, программы, приложения, анимации, 3D-модели, монтируют видео. Умназия - Интерактивные онлайн-курсы и тренажеры для развития мышления детей 6-13 лет SkillBox - это один из лидеров российского рынка онлайн-образования. Среди партнеров Skillbox ведущий разработчик сервисного дизайна AIC, медиа-компания Yoola, первое и самое крупное русскоязычное аналитическое агентство Tagline, онлайн-школа дизайна и иллюстрации Bang! Bang! Education, оператор PR-рынка PACO, студия рисования Draw&Go, агентство performance-маркетинга Ingate, scrum-студия Sibirix, имидж-лаборатория Персона. «Нетология» — это университет по подготовке и дополнительному обучению специалистов в области интернет-маркетинга, управления проектами и продуктами, дизайна, Data Science и разработки. В рамках Нетологии студенты получают ценные теоретические знания от лучших экспертов Рунета, выполняют практические задания на отработку полученных навыков, общаются с экспертами и единомышленниками. Познакомиться со всеми продуктами подробнее можно на сайте https://netology.ru, линейка курсов и профессий постоянно обновляется. StudyBay Brazil – это онлайн биржа для португалоговорящих студентов и авторов! Студент получает уникальную работу любого уровня сложности и больше свободного времени, в то время как у автора появляется дополнительный заработок и бесценный опыт. Автор24 — самая большая в России площадка по написанию учебных работ: контрольные и курсовые работы, дипломы, рефераты, решение задач, отчеты по практике, а так же любой другой вид работы. Сервис сотрудничает с более 70 000 авторов. Более 1 000 000 работ уже выполнено. StudyBay – это онлайн биржа для англоязычных студентов и авторов! Студент получает уникальную работу любого уровня сложности и больше свободного времени, в то время как у автора появляется дополнительный заработок и бесценный опыт.

Английская Википедия:Dvoretzky–Kiefer–Wolfowitz inequality

Содержание

The DKW inequality

Multivariate case

Kaplan–Meier estimator

Building CDF bands

See also

References

Навигация

Действия на странице

Действия на странице

Персональные инструменты

Навигация

Поиск

Инструменты