Английская Википедия:68–95–99.7 rule

For an approximately normal data set, the values within one standard deviation of the mean account for about 68% of the set; while within two standard deviations account for about 95%; and within three standard deviations account for about 99.7%. Shown percentages are rounded theoretical probabilities intended only to approximate the empirical data derived from a normal population.

Файл:Standard score and prediction interval.svg

Prediction interval (on the y-axis) given from the standard score (on the x-axis). The y-axis is logarithmically scaled (but the values on it are not modified).

In statistics, the 68–95–99.7 rule, also known as the empirical rule, is a shorthand used to remember the percentage of values that lie within an interval estimate in a normal distribution: 68%, 95%, and 99.7% of the values lie within one, two, and three standard deviations of the mean, respectively.

In mathematical notation, these facts can be expressed as follows, where Шаблон:Math is the probability function,^[1] Шаблон:Mvar is an observation from a normally distributed random variable, Шаблон:Mvar (mu) is the mean of the distribution, and Шаблон:Mvar (sigma) is its standard deviation:

<math display="block">\begin{align}

 \Pr(\mu-1\sigma \le X \le \mu+1\sigma) & \approx 68.27\% \\
 \Pr(\mu-2\sigma \le X \le \mu+2\sigma) & \approx 95.45\% \\
 \Pr(\mu-3\sigma \le X \le \mu+3\sigma) & \approx 99.73\%

\end{align}</math>

The usefulness of this heuristic especially depends on the question under consideration.

In the empirical sciences, the so-called three-sigma rule of thumb (or 3Шаблон:Mvar rule) expresses a conventional heuristic that nearly all values are taken to lie within three standard deviations of the mean, and thus it is empirically useful to treat 99.7% probability as near certainty.^[2]

In the social sciences, a result may be considered "significant" if its confidence level is of the order of a two-sigma effect (95%), while in particle physics, there is a convention of a five-sigma effect (99.99994% confidence) being required to qualify as a discovery.

A weaker three-sigma rule can be derived from Chebyshev's inequality, stating that even for non-normally distributed variables, at least 88.8% of cases should fall within properly calculated three-sigma intervals. For unimodal distributions, the probability of being within the interval is at least 95% by the Vysochanskij–Petunin inequality. There may be certain assumptions for a distribution that force this probability to be at least 98%.^[3]

Proof

We have that <math display="block">\begin{align}\Pr(\mu -n\sigma \leq X \leq \mu + n\sigma) = \int_{\mu-n\sigma}^{\mu + n\sigma} \frac{1}{\sqrt{2\pi} \sigma} e^{-\frac{1}{2} \left(\frac{x-\mu}{\sigma}\right)^2} dx, \end{align}</math> doing the change of variable <math> u = \frac{x - \mu}{\sigma}</math>, we have

<math display="block">\begin{align}\frac{1}{\sqrt{2\pi}} \int_{-n}^{n} e^{-\frac{u^2}{2}}du\end{align},</math>

and this integral is independent of <math>\mu</math> and <math>\sigma</math>. We only need to calculate each integral for the cases <math>n = 1,2,3</math>.

<math display="block">\begin{align}

       &\Pr(\mu -1\sigma \leq X \leq \mu + 1\sigma) = \frac{1}{\sqrt{2\pi}} \int_{-1}^{1} e^{-\frac{u^2}{2}}du \approx 0.6827 \\
      &\Pr(\mu -2\sigma \leq X \leq \mu + 2\sigma) =\frac{1}{\sqrt{2\pi}}\int_{-2}^{2} e^{-\frac{u^2}{2}}du \approx 0.9545 \\
      &\Pr(\mu -3\sigma \leq X \leq \mu + 3\sigma) = \frac{1}{\sqrt{2\pi}}\int_{-3}^{3} e^{-\frac{u^2}{2}}du \approx 0.9973.

\end{align}</math>

Cumulative distribution function

Файл:Cumulative distribution function for normal distribution, mean 0 and sd 1.png

Diagram showing the cumulative distribution function for the normal distribution with mean (Шаблон:Mvar) 0 and variance (Шаблон:Math) 1

These numerical values "68%, 95%, 99.7%" come from the cumulative distribution function of the normal distribution.

The prediction interval for any standard score z corresponds numerically to Шаблон:Math.

For example, Шаблон:Math, or Шаблон:Math, corresponding to a prediction interval of Шаблон:Math. This is not a symmetrical interval – this is merely the probability that an observation is less than Шаблон:Math. To compute the probability that an observation is within two standard deviations of the mean (small differences due to rounding): <math display="block">\Pr(\mu-2\sigma \le X \le \mu+2\sigma)

= \Phi(2) - \Phi(-2)
\approx 0.9772 - (1 - 0.9772)
\approx 0.9545

</math>

This is related to confidence interval as used in statistics: <math>\bar{X} \pm 2\frac{\sigma}{\sqrt{n}}</math> is approximately a 95% confidence interval when <math>\bar{X}</math> is the average of a sample of size <math>n</math>.

Normality tests

Шаблон:Main The "68–95–99.7 rule" is often used to quickly get a rough probability estimate of something, given its standard deviation, if the population is assumed to be normal. It is also used as a simple test for outliers if the population is assumed normal, and as a normality test if the population is potentially not normal.

To pass from a sample to a number of standard deviations, one first computes the deviation, either the error or residual depending on whether one knows the population mean or only estimates it. The next step is standardizing (dividing by the population standard deviation), if the population parameters are known, or studentizing (dividing by an estimate of the standard deviation), if the parameters are unknown and only estimated.

To use as a test for outliers or a normality test, one computes the size of deviations in terms of standard deviations, and compares this to expected frequency. Given a sample set, one can compute the studentized residuals and compare these to the expected frequency: points that fall more than 3 standard deviations from the norm are likely outliers (unless the sample size is significantly large, by which point one expects a sample this extreme), and if there are many points more than 3 standard deviations from the norm, one likely has reason to question the assumed normality of the distribution. This holds ever more strongly for moves of 4 or more standard deviations.

One can compute more precisely, approximating the number of extreme moves of a given magnitude or greater by a Poisson distribution, but simply, if one has multiple 4 standard deviation moves in a sample of size 1,000, one has strong reason to consider these outliers or question the assumed normality of the distribution.

For example, a 6σ event corresponds to a chance of about two parts per billion. For illustration, if events are taken to occur daily, this would correspond to an event expected every 1.4 million years. This gives a simple normality test: if one witnesses a 6σ in daily data and significantly fewer than 1 million years have passed, then a normal distribution most likely does not provide a good model for the magnitude or frequency of large deviations in this respect.

In The Black Swan, Nassim Nicholas Taleb gives the example of risk models according to which the Black Monday crash would correspond to a 36-σ event: the occurrence of such an event should instantly suggest that the model is flawed, i.e. that the process under consideration is not satisfactorily modeled by a normal distribution. Refined models should then be considered, e.g. by the introduction of stochastic volatility. In such discussions it is important to be aware of the problem of the gambler's fallacy, which states that a single observation of a rare event does not contradict that the event is in fact rare. It is the observation of a plurality of purportedly rare events that increasingly undermines the hypothesis that they are rare, i.e. the validity of the assumed model. A proper modelling of this process of gradual loss of confidence in a hypothesis would involve the designation of prior probability not just to the hypothesis itself but to all possible alternative hypotheses. For this reason, statistical hypothesis testing works not so much by confirming a hypothesis considered to be likely, but by refuting hypotheses considered unlikely.

Table of numerical values

Because of the exponentially decreasing tails of the normal distribution, odds of higher deviations decrease very quickly. From the rules for normally distributed data for a daily event:

Range	Expected fraction of population inside range	Expected fraction of population outside range	Approx. expected frequency outside range		Approx. frequency for daily event
μ ± 0.5σ	Шаблон:Gaps	6.171E-01 = 61.71 %	3 in	5	Four or five times a week
μ ± σ	Шаблон:Gaps^[4]	3.173E-01 = 31.73 %	1 in	3	Twice or thrice a week
μ ± 1.5σ	Шаблон:Gaps	1.336E-01 = 13.36 %	2 in	15	Weekly
μ ± 2σ	Шаблон:Gaps^[5]	4.550E-02 = 4.550 %	1 in	22	Every three weeks
μ ± 2.5σ	Шаблон:Gaps	1.242E-02 = 1.242 %	1 in	81	Quarterly
μ ± 3σ	Шаблон:Gaps^[6]	2.700E-03 = 0.270 % = 2.700 ‰	1 in	370	Yearly
μ ± 3.5σ	Шаблон:Gaps	4.653E-04 = 0.04653 % = 465.3 ppm	1 in	2149	Every 6 years
μ ± 4σ	Шаблон:Gaps	6.334E-05 = 63.34 ppm	1 in	Шаблон:Val	Every 43 years (twice in a lifetime)
μ ± 4.5σ	Шаблон:Gaps	6.795E-06 = 6.795 ppm	1 in	Шаблон:Val	Every 403 years (once in the modern era)
μ ± 5σ	Шаблон:Gaps	5.733E-07 = 0.5733 ppm = 573.3 ppb	1 in	Шаблон:Val	Every Шаблон:Val years (once in recorded history)
μ ± 5.5σ	Шаблон:Gaps	3.798E-08 = 37.98 ppb	1 in	Шаблон:Val	Every Шаблон:Val years (thrice in history of modern humankind)
μ ± 6σ	Шаблон:Gaps	1.973E-09 = 1.973 ppb	1 in	Шаблон:Val	Every 1.38 million years (twice in history of humankind)
μ ± 6.5σ	Шаблон:Gaps	8.032E-11 = 0.08032 ppb = 80.32 ppt	1 in	Шаблон:Val	Every 34 million years (twice since the extinction of dinosaurs)
μ ± 7σ	Шаблон:Gaps	2.560E-12 = 2.560 ppt	1 in	Шаблон:Val	Every 1.07 billion years (four occurrences in history of Earth)
μ ± 7.5σ	Шаблон:Gaps	6.382E-14 = 63.82 ppq	1 in	Шаблон:Val	Once every 43 billion years (never in the history of the Universe, twice in the future of the Local Group before its merger)
μ ± 8σ	Шаблон:Gaps	1.244E-15 = 1.244 ppq	1 in	Шаблон:Val	Once every 2.2 trillion years (never in the history of the Universe, once during the life of a red dwarf)
μ ± Шаблон:Mathσ	<math>\operatorname{erf}\left(\frac{x}{\sqrt{2}}\right)</math>	<math>1-\operatorname{erf}\left(\frac{x}{\sqrt{2}}\right)</math>	1 in	<math>\tfrac{1}{1-\operatorname{erf}\left(\frac{x}{\sqrt{2}}\right)}</math>	Every <math>\tfrac{1}{1-\operatorname{erf}\left(\frac{x}{\sqrt{2}}\right)}</math> days

References

Шаблон:Reflist

External links

"The Normal Distribution" by Balasubramanian Narasimhan
"Calculate percentage proportion within x sigmas at WolframAlpha

Шаблон:ProbDistributions Шаблон:Authority control

pl:Odchylenie standardowe#Dla rozkładu normalnego

↑ Шаблон:Cite book
↑
This usage of "three-sigma rule" entered common usage in the 2000s, e.g. cited in
- Шаблон:Cite book
- Шаблон:Cite book
↑
See:
↑ Шаблон:Cite OEIS
↑ Шаблон:Cite OEIS
↑ Шаблон:Cite OEIS

[1] Шаблон:Cite book

[2] This usage of "three-sigma rule" entered common usage in the 2000s, e.g. cited in
Шаблон:Cite book

Шаблон:Cite book

[3] Шаблон:Cite book

[4] Шаблон:Cite book

[3] See:
Шаблон:Cite book

Шаблон:Cite book

Шаблон:Cite journal

[6] Шаблон:Cite book

[7] Шаблон:Cite book

[8] Шаблон:Cite journal

[4] Шаблон:Cite OEIS

[5] Шаблон:Cite OEIS

[6] Шаблон:Cite OEIS

[1]

[2]

[3]

[4]

[5]

[6]

Партнерские ресурсы
Криптовалюты	Обмен криптовалют - www.bestchange.ru Криптовалютная биржа CoinEx Криптовалютная биржа Binance HIVE OS - операционная система для майнинга e4pool - Мультивалютный пул для майнинга.
Магазины	AliExpress — глобальная виртуальная (в Интернете) торговая площадка, предоставляющая возможность покупать товары производителей из КНР; computeruniverse.net - Интернет-магазин компьютеров(Промо код 5 Евро на первую покупку:FWWC3ZKQ);
Хостинг	DigitalOcean - американский провайдер облачных инфраструктур, с главным офисом в Нью-Йорке и с центрами обработки данных по всему миру;
Разное	Викиум - Онлайн-тренажер для мозга Like Центр - Центр поддержки и развития предпринимательства. Gamersbay - лучший магазин по бустингу для World of Warcraft. Ноотропы OmniMind N°1 - Усиливает мозговую активность. Повышает мотивацию. Улучшает память. Санкт-Петербургская школа телевидения - это федеральная сеть образовательных центров, которая имеет филиалы в 37 городах России. Lingualeo.com — интерактивный онлайн-сервис для изучения и практики английского языка в увлекательной игровой форме. Junyschool (Джунискул) – международная школа программирования и дизайна для детей и подростков от 5 до 17 лет, где ученики осваивают компьютерную грамотность, развивают алгоритмическое и креативное мышление, изучают основы программирования и компьютерной графики, создают собственные проекты: игры, сайты, программы, приложения, анимации, 3D-модели, монтируют видео. Умназия - Интерактивные онлайн-курсы и тренажеры для развития мышления детей 6-13 лет SkillBox - это один из лидеров российского рынка онлайн-образования. Среди партнеров Skillbox ведущий разработчик сервисного дизайна AIC, медиа-компания Yoola, первое и самое крупное русскоязычное аналитическое агентство Tagline, онлайн-школа дизайна и иллюстрации Bang! Bang! Education, оператор PR-рынка PACO, студия рисования Draw&Go, агентство performance-маркетинга Ingate, scrum-студия Sibirix, имидж-лаборатория Персона. «Нетология» — это университет по подготовке и дополнительному обучению специалистов в области интернет-маркетинга, управления проектами и продуктами, дизайна, Data Science и разработки. В рамках Нетологии студенты получают ценные теоретические знания от лучших экспертов Рунета, выполняют практические задания на отработку полученных навыков, общаются с экспертами и единомышленниками. Познакомиться со всеми продуктами подробнее можно на сайте https://netology.ru, линейка курсов и профессий постоянно обновляется. StudyBay Brazil – это онлайн биржа для португалоговорящих студентов и авторов! Студент получает уникальную работу любого уровня сложности и больше свободного времени, в то время как у автора появляется дополнительный заработок и бесценный опыт. Автор24 — самая большая в России площадка по написанию учебных работ: контрольные и курсовые работы, дипломы, рефераты, решение задач, отчеты по практике, а так же любой другой вид работы. Сервис сотрудничает с более 70 000 авторов. Более 1 000 000 работ уже выполнено. StudyBay – это онлайн биржа для англоязычных студентов и авторов! Студент получает уникальную работу любого уровня сложности и больше свободного времени, в то время как у автора появляется дополнительный заработок и бесценный опыт.

Английская Википедия:68–95–99.7 rule

Содержание

Proof

Cumulative distribution function

Normality tests

Table of numerical values

See also

References

External links

Навигация

Действия на странице

Действия на странице

Персональные инструменты

Навигация

Поиск

Инструменты