Английская Википедия:Expected value

Материал из Онлайн справочника
Перейти к навигацииПерейти к поиску

Шаблон:Short description Шаблон:About Шаблон:Redirect Шаблон:Probability fundamentals

In probability theory, the expected value (also called expectation, expectancy, expectation operator, mathematical expectation, mean, expectation value, or first moment) is a generalization of the weighted average. Informally, the expected value is the arithmetic mean of the possible values a random variable can take, weighted by the probability of those outcomes. Since it is obtained through arithmetic, the expected value sometimes may not even be included in the sample data set; it is not the value you would "expect" to get in reality.

The expected value of a random variable with a finite number of outcomes is a weighted average of all possible outcomes. In the case of a continuum of possible outcomes, the expectation is defined by integration. In the axiomatic foundation for probability provided by measure theory, the expectation is given by Lebesgue integration.

The expected value of a random variable Шаблон:Mvar is often denoted by Шаблон:Math, Шаблон:Math, or Шаблон:Math, with Шаблон:Math also often stylized as Шаблон:Mvar or <math>\mathbb{E}.</math>[1][2][3]

Шаблон:TOC limit

History

The idea of the expected value originated in the middle of the 17th century from the study of the so-called problem of points, which seeks to divide the stakes in a fair way between two players, who have to end their game before it is properly finished.[4] This problem had been debated for centuries. Many conflicting proposals and solutions had been suggested over the years when it was posed to Blaise Pascal by French writer and amateur mathematician Chevalier de Méré in 1654. Méré claimed that this problem could not be solved and that it showed just how flawed mathematics was when it came to its application to the real world. Pascal, being a mathematician, was provoked and determined to solve the problem once and for all.

He began to discuss the problem in the famous series of letters to Pierre de Fermat. Soon enough, they both independently came up with a solution. They solved the problem in different computational ways, but their results were identical because their computations were based on the same fundamental principle. The principle is that the value of a future gain should be directly proportional to the chance of getting it. This principle seemed to have come naturally to both of them. They were very pleased by the fact that they had found essentially the same solution, and this in turn made them absolutely convinced that they had solved the problem conclusively; however, they did not publish their findings. They only informed a small circle of mutual scientific friends in Paris about it.[5]

In Dutch mathematician Christiaan Huygens' book, he considered the problem of points, and presented a solution based on the same principle as the solutions of Pascal and Fermat. Huygens published his treatise in 1657, (see Huygens (1657)) "De ratiociniis in ludo aleæ" on probability theory just after visiting Paris. The book extended the concept of expectation by adding rules for how to calculate expectations in more complicated situations than the original problem (e.g., for three or more players), and can be seen as the first successful attempt at laying down the foundations of the theory of probability.

In the foreword to his treatise, Huygens wrote:

Шаблон:Blockquote

During his visit to France in 1655, Huygens learned about de Méré's Problem. From his correspondence with Carcavine a year later (in 1656), he realized his method was essentially the same as Pascal's. Therefore, he knew about Pascal's priority in this subject before his book went to press in 1657.Шаблон:Cn

In the mid-nineteenth century, Pafnuty Chebyshev became the first person to think systematically in terms of the expectations of random variables.[6]

Etymology

Neither Pascal nor Huygens used the term "expectation" in its modern sense. In particular, Huygens writes:[7]

Шаблон:Quote

More than a hundred years later, in 1814, Pierre-Simon Laplace published his tract "Théorie analytique des probabilités", where the concept of expected value was defined explicitly:[8]

Шаблон:Quote

Notations

The use of the letter Шаблон:Math to denote "expected value" goes back to W. A. Whitworth in 1901.[9] The symbol has since become popular for English writers. In German, Шаблон:Math stands for Erwartungswert, in Spanish for esperanza matemática, and in French for espérance mathématique.[10]

When "E" is used to denote "expected value", authors use a variety of stylizations: the expectation operator can be stylized as Шаблон:Math (upright), Шаблон:Mvar (italic), or <math>\mathbb{E}</math> (in blackboard bold), while a variety of bracket notations (such as Шаблон:Math, Шаблон:Math, and Шаблон:Math) are all used.

Another popular notation is Шаблон:Math, whereas Шаблон:Math, Шаблон:Math, and <math>\overline{X}</math> are commonly used in physics,Шаблон:Sfnm and Шаблон:Math in Russian-language literature.

Definition

As discussed above, there are several context-dependent ways of defining the expected value. The simplest and original definition deals with the case of finitely many possible outcomes, such as in the flip of a coin. With the theory of infinite series, this can be extended to the case of countably many possible outcomes. It is also very common to consider the distinct case of random variables dictated by (piecewise-)continuous probability density functions, as these arise in many natural contexts. All of these specific definitions may be viewed as special cases of the general definition based upon the mathematical tools of measure theory and Lebesgue integration, which provide these different contexts with an axiomatic foundation and common language.

Any definition of expected value may be extended to define an expected value of a multidimensional random variable, i.e. a random vector Шаблон:Mvar. It is defined component by component, as Шаблон:Math. Similarly, one may define the expected value of a random matrix Шаблон:Mvar with components Шаблон:Math by Шаблон:Math.

Random variables with finitely many outcomes

Consider a random variable Шаблон:Mvar with a finite list Шаблон:Math of possible outcomes, each of which (respectively) has probability Шаблон:Math of occurring. The expectation of Шаблон:Mvar is defined asШаблон:Sfnm

<math>\operatorname{E}[X] =x_1p_1 + x_2p_2 + \cdots + x_kp_k.</math>

Since the probabilities must satisfy Шаблон:Math, it is natural to interpret Шаблон:Math as a weighted average of the Шаблон:Math values, with weights given by their probabilities Шаблон:Math.

In the special case that all possible outcomes are equiprobable (that is, Шаблон:Math), the weighted average is given by the standard average. In the general case, the expected value takes into account the fact that some outcomes are more likely than others.

Examples

Файл:Largenumbers.svg
An illustration of the convergence of sequence averages of rolls of a die to the expected value of 3.5 as the number of rolls (trials) grows
  • Let <math>X</math> represent the outcome of a roll of a fair six-sided Шаблон:Dice. More specifically, <math>X</math> will be the number of pips showing on the top face of the Шаблон:Dice after the toss. The possible values for <math>X</math> are 1, 2, 3, 4, 5, and 6, all of which are equally likely with a probability of Шаблон:Frac2. The expectation of <math>X</math> is
<math>\operatorname{E}[X] = 1\cdot\frac16 + 2\cdot\frac16 + 3\cdot\frac16 + 4\cdot\frac16 + 5\cdot\frac16 + 6\cdot\frac16 = 3.5.</math>
If one rolls the Шаблон:Dice <math>n</math> times and computes the average (arithmetic mean) of the results, then as <math>n</math> grows, the average will almost surely converge to the expected value, a fact known as the strong law of large numbers.
  • The roulette game consists of a small ball and a wheel with 38 numbered pockets around the edge. As the wheel is spun, the ball bounces around randomly until it settles down in one of the pockets. Suppose random variable <math>X</math> represents the (monetary) outcome of a $1 bet on a single number ("straight up" bet). If the bet wins (which happens with probability Шаблон:Frac2 in American roulette), the payoff is $35; otherwise the player loses the bet. The expected profit from such a bet will be
<math>\operatorname{E}[\,\text{gain from }\$1\text{ bet}\,] = -\$1 \cdot \frac{37}{38} + \$35 \cdot \frac{1}{38} = -\$\frac{1}{19}.</math>
That is, the expected value to be won from a $1 bet is −$Шаблон:Frac2. Thus, in 190 bets, the net loss will probably be about $10.

Random variables with countably infinitely many outcomes

Informally, the expectation of a random variable with a countably infinite set of possible outcomes is defined analogously as the weighted average of all possible outcomes, where the weights are given by the probabilities of realizing each given value. This is to say that

<math>\operatorname{E}[X] = \sum_{i=1}^\infty x_i\, p_i,</math>

where Шаблон:Math are the possible outcomes of the random variable Шаблон:Mvar and Шаблон:Math are their corresponding probabilities. In many non-mathematical textbooks, this is presented as the full definition of expected values in this context.Шаблон:Sfnm

However, there are some subtleties with infinite summation, so the above formula is not suitable as a mathematical definition. In particular, the Riemann series theorem of mathematical analysis illustrates that the value of certain infinite sums involving positive and negative summands depends on the order in which the summands are given. Since the outcomes of a random variable have no naturally given order, this creates a difficulty in defining expected value precisely.

For this reason, many mathematical textbooks only consider the case that the infinite sum given above converges absolutely, which implies that the infinite sum is a finite number independent of the ordering of summands.Шаблон:Sfnm In the alternative case that the infinite sum does not converge absolutely, one says the random variable does not have finite expectation.Шаблон:Sfnm

Examples

  • Suppose <math>x_i = i</math> and <math>p_i = \tfrac{c}{i2^i}</math> for <math>i = 1, 2, 3, \ldots,</math> where <math>c = \tfrac{1}{\ln 2}</math> is the scaling factor which makes the probabilities sum to 1. Then we have <math display="block">\operatorname{E}[X] \,= \sum_i x_i p_i = 1(\tfrac{c}{2})

+ 2(\tfrac{c}{8}) + 3 (\tfrac{c}{24}) + \cdots

\,= \, \tfrac{c}{2} + \tfrac{c}{4} + \tfrac{c}{8} + \cdots \,=\,  c \,=\, \tfrac{1}{\ln  2}.</math>

Random variables with density

Now consider a random variable Шаблон:Mvar which has a probability density function given by a function Шаблон:Mvar on the real number line. This means that the probability of Шаблон:Mvar taking on a value in any given open interval is given by the integral of Шаблон:Mvar over that interval. The expectation of Шаблон:Mvar is then given by the integralШаблон:Sfnm

<math>\operatorname{E}[X] = \int_{-\infty}^\infty x f(x)\, dx.</math>

A general and mathematically precise formulation of this definition uses measure theory and Lebesgue integration, and the corresponding theory of absolutely continuous random variables is described in the next section. The density functions of many common distributions are piecewise continuous, and as such the theory is often developed in this restricted setting.Шаблон:Sfnm For such functions, it is sufficient to only consider the standard Riemann integration. Sometimes continuous random variables are defined as those corresponding to this special class of densities, although the term is used differently by various authors.

Analogously to the countably-infinite case above, there are subtleties with this expression due to the infinite region of integration. Such subtleties can be seen concretely if the distribution of Шаблон:Mvar is given by the Cauchy distribution Шаблон:Math, so that Шаблон:Math. It is straightforward to compute in this case that

<math>\int_a^b xf(x)\,dx=\int_a^b \frac{x}{x^2+\pi^2}\,dx=\frac{1}{2}\ln\frac{b^2+\pi^2}{a^2+\pi^2}.</math>

The limit of this expression as Шаблон:Math and Шаблон:Math does not exist: if the limits are taken so that Шаблон:Math, then the limit is zero, while if the constraint Шаблон:Math is taken, then the limit is Шаблон:Math.

To avoid such ambiguities, in mathematical textbooks it is common to require that the given integral converges absolutely, with Шаблон:Math left undefined otherwise.Шаблон:Sfnm However, measure-theoretic notions as given below can be used to give a systematic definition of Шаблон:Math for more general random variables Шаблон:Mvar.

Arbitrary real-valued random variables

All definitions of the expected value may be expressed in the language of measure theory. In general, if Шаблон:Mvar is a real-valued random variable defined on a probability space Шаблон:Math, then the expected value of Шаблон:Mvar, denoted by Шаблон:Math, is defined as the Lebesgue integralШаблон:Sfnm

<math>\operatorname{E} [X] = \int_\Omega X\,d\operatorname{P}.</math>

Despite the newly abstract situation, this definition is extremely similar in nature to the very simplest definition of expected values, given above, as certain weighted averages. This is because, in measure theory, the value of the Lebesgue integral of Шаблон:Mvar is defined via weighted averages of approximations of Шаблон:Mvar which take on finitely many values.Шаблон:Sfnm Moreover, if given a random variable with finitely or countably many possible values, the Lebesgue theory of expectation is identical with the summation formulas given above. However, the Lebesgue theory clarifies the scope of the theory of probability density functions. A random variable Шаблон:Mvar is said to be absolutely continuous if any of the following conditions are satisfied:

<math>\text{P}(X\in A)=\int_A f(x)\,dx,</math>
for any Borel set Шаблон:Mvar, in which the integral is Lebesgue.

These conditions are all equivalent, although this is nontrivial to establish.Шаблон:Sfnm In this definition, Шаблон:Mvar is called the probability density function of Шаблон:Mvar (relative to Lebesgue measure). According to the change-of-variables formula for Lebesgue integration,Шаблон:Sfnm combined with the law of the unconscious statistician,Шаблон:Sfnm it follows that

<math>\operatorname{E}[X]\equiv\int_\Omega X\,d\operatorname{P}=\int_{\mathbb{R}}xf(x)\,dx</math>

for any absolutely continuous random variable Шаблон:Mvar. The above discussion of continuous random variables is thus a special case of the general Lebesgue theory, due to the fact that every piecewise-continuous function is measurable.

Файл:Roland Uhl 2023 Charakterisierung des Erwartungswertes Bild1.svg

The expected value of any real-valued random variable <math>X</math> can also be defined on the graph of its cumulative distribution function <math>F</math> by a nearby equality of areas. In fact, <math>\operatorname{E}[X] = \mu</math> with a real number <math>\mu</math> if and only if the two surfaces in the <math>x</math>-<math>y</math>-plane, described by

<math>

x\le\mu,\;\, 0\le y\le F(x) \quad</math> or <math>\quad x\ge\mu,\;\, F(x)\le y\le 1 </math> respectively, have the same finite area, i.e. if

<math>

\int_{-\infty}^\mu F(x)\,dx = \int_\mu^\infty \big(1 - F(x)\big)\,dx </math> and both improper Riemann integrals converge. Finally, this is equivalent to the representation

<math>

\operatorname{E}[X] = \int_0^\infty \big(1 - F(x)\big)\,dx - \int_{-\infty}^0 F(x)\,dx, </math> also with convergent integrals.[11]

Infinite expected values

Expected values as defined above are automatically finite numbers. However, in many cases it is fundamental to be able to consider expected values of Шаблон:Math. This is intuitive, for example, in the case of the St. Petersburg paradox, in which one considers a random variable with possible outcomes Шаблон:Math, with associated probabilities Шаблон:Math, for Шаблон:Mvar ranging over all positive integers. According to the summation formula in the case of random variables with countably many outcomes, one has <math display="block"> \operatorname{E}[X]= \sum_{i=1}^\infty x_i\,p_i =2\cdot \frac{1}{2}+4\cdot\frac{1}{4} + 8\cdot\frac{1}{8}+ 16\cdot\frac{1}{16}+ \cdots = 1 + 1 + 1 + 1 + \cdots.</math> It is natural to say that the expected value equals Шаблон:Math.

There is a rigorous mathematical theory underlying such ideas, which is often taken as part of the definition of the Lebesgue integral.Шаблон:Sfnm The first fundamental observation is that, whichever of the above definitions are followed, any nonnegative random variable whatsoever can be given an unambiguous expected value; whenever absolute convergence fails, then the expected value can be defined as Шаблон:Math. The second fundamental observation is that any random variable can be written as the difference of two nonnegative random variables. Given a random variable Шаблон:Mvar, one defines the positive and negative parts by Шаблон:Math and Шаблон:Math. These are nonnegative random variables, and it can be directly checked that Шаблон:Math. Since Шаблон:Math and Шаблон:Math are both then defined as either nonnegative numbers or Шаблон:Math, it is then natural to define: <math display="block"> \operatorname{E}[X] = \begin{cases} \operatorname{E}[X^+] - \operatorname{E}[X^-] & \text{if } \operatorname{E}[X^+] < \infty \text{ and } \operatorname{E}[X^-] < \infty;\\ +\infty & \text{if } \operatorname{E}[X^+] = \infty \text{ and } \operatorname{E}[X^-] < \infty;\\ -\infty & \text{if } \operatorname{E}[X^+] < \infty \text{ and } \operatorname{E}[X^-] = \infty;\\ \text{undefined} & \text{if } \operatorname{E}[X^+] = \infty \text{ and } \operatorname{E}[X^-] = \infty. \end{cases} </math>

According to this definition, Шаблон:Math exists and is finite if and only if Шаблон:Math and Шаблон:Math are both finite. Due to the formula Шаблон:Math, this is the case if and only if Шаблон:Math is finite, and this is equivalent to the absolute convergence conditions in the definitions above. As such, the present considerations do not define finite expected values in any cases not previously considered; they are only useful for infinite expectations.

Expected values of common distributions

The following table gives the expected values of some commonly occurring probability distributions. The third column gives the expected values both in the form immediately given by the definition, as well as in the simplified form obtained by computation therefrom. The details of these computations, which are not always straightforward, can be found in the indicated references.

Distribution Notation Mean E(X)
BernoulliШаблон:Sfnm <math>X \sim~ b(1,p)</math> <math>0\cdot(1-p)+1\cdot p=p</math>
BinomialШаблон:Sfnm <math>X \sim B(n,p)</math> <math>\sum_{i=0}^n i{n\choose i}p^i(1-p)^{n-i}=np</math>
PoissonШаблон:Sfnm <math>X \sim \mathrm{Po}(\lambda)</math> <math>\sum_{i=0}^\infty \frac{ie^{-\lambda}\lambda^i}{i!}=\lambda</math>
GeometricШаблон:Sfnm <math>X \sim \mathrm{Geometric}(p)</math> <math>\sum_{i=1}^\infty ip(1-p)^{i-1}=\frac{1}{p}</math>
UniformШаблон:Sfnm <math>X\sim U(a,b)</math> <math>\int_a^b \frac{x}{b-a}\,dx=\frac{a+b}{2}</math>
ExponentialШаблон:Sfnm <math>X\sim \exp(\lambda)</math> <math>\int_0^\infty \lambda xe^{-\lambda x}\,dx=\frac{1}{\lambda}</math>
NormalШаблон:Sfnm <math>X\sim N(\mu,\sigma^2)</math> <math>\frac{1}{\sqrt{2\pi\sigma^2}}\int_{-\infty}^\infty xe^{-(x-\mu)^2/2\sigma^2}\,dx=\mu</math>
Standard NormalШаблон:Sfnm <math>X\sim N(0,1)</math> <math>\frac{1}{\sqrt{2\pi}}\int_{-\infty}^\infty xe^{-x^2/2}\,dx=0</math>
ParetoШаблон:Sfnm <math>X\sim \mathrm{Par}(\alpha, k)</math> <math>\int_k^\infty\alpha k^\alpha x^{-\alpha}\,dx=\begin{cases}\frac{\alpha k}{\alpha-1}&\alpha>1\\ \infty&0 \leq \alpha \leq 1.\end{cases}</math>
CauchyШаблон:Sfnm <math>X\sim \mathrm{Cauchy}(x_0,\gamma)</math> <math>\frac{1}{\pi}\int_{-\infty}^\infty \frac{\gamma x}{(x - x_0)^2 + \gamma^2}\,dx</math> is undefined

Properties

The basic properties below (and their names in bold) replicate or follow immediately from those of Lebesgue integral. Note that the letters "a.s." stand for "almost surely"—a central property of the Lebesgue integral. Basically, one says that an inequality like <math>X \geq 0</math> is true almost surely, when the probability measure attributes zero-mass to the complementary event <math>\left\{ X < 0 \right\}.</math>

  • Non-negativity: If <math>X \geq 0</math> (a.s.), then <math>\operatorname{E}[ X] \geq 0.</math>

Шаблон:Anchor

  • Linearity of expectation:[12] The expected value operator (or expectation operator) <math>\operatorname{E}[\cdot]</math> is linear in the sense that, for any random variables <math>X</math> and <math>Y,</math> and a constant <math>a,</math> <math display="block">\begin{align}
 \operatorname{E}[X + Y] &=   \operatorname{E}[X] + \operatorname{E}[Y], \\
 \operatorname{E}[aX]    &= a \operatorname{E}[X],

\end{align} </math>

whenever the right-hand side is well-defined. By induction, this means that the expected value of the sum of any finite number of random variables is the sum of the expected values of the individual random variables, and the expected value scales linearly with a multiplicative constant. Symbolically, for <math>N</math> random variables <math>X_{i}</math> and constants <math>a_{i} (1\leq i \leq N),</math> we have <math display="inline"> \operatorname{E}\left[\sum_{i=1}^{N}a_{i}X_{i}\right] = \sum_{i=1}^{N}a_{i}\operatorname{E}[X_{i}].</math> If we think of the set of random variables with finite expected value as forming a vector space, then the linearity of expectation implies that the expected value is a linear form on this vector space.
  • Monotonicity: If <math>X\leq Y</math> (a.s.), and both <math>\operatorname{E}[X]</math> and <math>\operatorname{E}[Y]</math> exist, then <math>\operatorname{E}[X]\leq\operatorname{E}[Y].</math> Шаблон:Pb Proof follows from the linearity and the non-negativity property for <math>Z=Y-X,</math> since <math>Z\geq 0</math> (a.s.).
  • Non-degeneracy: If <math>\operatorname{E}[|X|]=0,</math> then <math>X=0</math> (a.s.).
  • If <math>X = Y</math> (a.s.), then <math>\operatorname{E}[ X] = \operatorname{E}[ Y].</math> In other words, if X and Y are random variables that take different values with probability zero, then the expectation of X will equal the expectation of Y.
  • If <math>X=c</math> (a.s.) for some real number Шаблон:Mvar, then <math>\operatorname{E}[X] = c.</math> In particular, for a random variable <math>X</math> with well-defined expectation, <math>\operatorname{E}[\operatorname{E}[X]] = \operatorname{E}[X].</math> A well defined expectation implies that there is one number, or rather, one constant that defines the expected value. Thus follows that the expectation of this constant is just the original expected value.
  • As a consequence of the formula Шаблон:Math as discussed above, together with the triangle inequality, it follows that for any random variable <math>X</math> with well-defined expectation, one has <math>|\operatorname{E}[X]| \leq \operatorname{E}|X|.</math>
  • Let Шаблон:Math denote the indicator function of an event Шаблон:Mvar, then Шаблон:Math is given by the probability of Шаблон:Mvar. This is nothing but a different way of stating the expectation of a Bernoulli random variable, as calculated in the table above.
  • Formulas in terms of CDF: If <math>F(x)</math> is the cumulative distribution function of a random variable Шаблон:Mvar, then
<math display="block">\operatorname{E}[X] = \int_{-\infty}^\infty x\,dF(x),</math>
where the values on both sides are well defined or not well defined simultaneously, and the integral is taken in the sense of Lebesgue-Stieltjes. As a consequence of integration by parts as applied to this representation of Шаблон:Math, it can be proved that <math display="block"> \operatorname{E}[X] = \int_0^\infty (1-F(x))\,dx - \int^0_{-\infty} F(x)\,dx,</math> with the integrals taken in the sense of Lebesgue.Шаблон:Sfnm As a special case, for any random variable Шаблон:Mvar valued in the nonnegative integers Шаблон:Math}, one has <math display="block"> \operatorname{E}[X]=\sum _{n=0}^\infty \operatorname{P}(X>n),</math>
where Шаблон:Mvar denotes the underlying probability measure.
  • Non-multiplicativity: In general, the expected value is not multiplicative, i.e. <math>\operatorname{E}[XY]</math> is not necessarily equal to <math>\operatorname{E}[X]\cdot \operatorname{E}[Y].</math> If <math>X</math> and <math>Y</math> are independent, then one can show that <math>\operatorname{E}[XY]=\operatorname{E}[X] \operatorname{E}[Y].</math> If the random variables are dependent, then generally <math>\operatorname{E}[XY] \neq \operatorname{E}[X] \operatorname{E}[Y],</math> although in special cases of dependency the equality may hold.
  • Law of the unconscious statistician: The expected value of a measurable function of <math>X,</math> <math>g(X),</math> given that <math>X</math> has a probability density function <math>f(x),</math> is given by the inner product of <math>f</math> and <math>g</math>:[12] <math display="block">\operatorname{E}[g(X)] = \int_{\R} g(x) f(x)\, dx .</math> This formula also holds in multidimensional case, when <math>g</math> is a function of several random variables, and <math>f</math> is their joint density.[12]Шаблон:Sfnm

Inequalities

Concentration inequalities control the likelihood of a random variable taking on large values. Markov's inequality is among the best-known and simplest to prove: for a nonnegative random variable Шаблон:Mvar and any positive number Шаблон:Mvar, it states thatШаблон:Sfnm <math display="block"> \operatorname{P}(X\geq a)\leq\frac{\operatorname{E}[X]}{a}. </math>

If Шаблон:Mvar is any random variable with finite expectation, then Markov's inequality may be applied to the random variable Шаблон:Math to obtain Chebyshev's inequality <math display="block"> \operatorname{P}(|X-\text{E}[X]|\geq a)\leq\frac{\operatorname{Var}[X]}{a^2}, </math> where Шаблон:Math is the variance.Шаблон:Sfnm These inequalities are significant for their nearly complete lack of conditional assumptions. For example, for any random variable with finite expectation, the Chebyshev inequality implies that there is at least a 75% probability of an outcome being within two standard deviations of the expected value. However, in special cases the Markov and Chebyshev inequalities often give much weaker information than is otherwise available. For example, in the case of an unweighted die, Chebyshev's inequality says that odds of rolling between 1 and 6 is at least 53%; in reality, the odds are of course 100%.Шаблон:Sfnm The Kolmogorov inequality extends the Chebyshev inequality to the context of sums of random variables.Шаблон:Sfnm

The following three inequalities are of fundamental importance in the field of mathematical analysis and its applications to probability theory.

f(\operatorname{E}(X)) \leq \operatorname{E} (f(X)). </math>

Part of the assertion is that the negative part of Шаблон:Math has finite expectation, so that the right-hand side is well-defined (possibly infinite). Convexity of Шаблон:Mvar can be phrased as saying that the output of the weighted average of two inputs under-estimates the same weighted average of the two outputs; Jensen's inequality extends this to the setting of completely general weighted averages, as represented by the expectation. In the special case that Шаблон:Math for positive numbers Шаблон:Math, one obtains the Lyapunov inequalityШаблон:Sfnm <math display="block">

\left(\operatorname{E}|X|^s\right)^{1/s}\leq\left(\operatorname{E}|X|^t\right)^{1/t}. </math>

This can also be proved by the Hölder inequality.Шаблон:Sfnm In measure theory, this is particularly notable for proving the inclusion Шаблон:Math of [[Lp space|Шаблон:Math]], in the special case of probability spaces.

\operatorname{E}|XY|\leq(\operatorname{E}|X|^p)^{1/p}(\operatorname{E}|Y|^q)^{1/q}. </math>

for any random variables Шаблон:Mvar and Шаблон:Mvar.Шаблон:Sfnm The special case of Шаблон:Math is called the Cauchy–Schwarz inequality, and is particularly well-known.Шаблон:Sfnm

\Bigl(\operatorname{E}|X+Y|^p\Bigr)^{1/p}\leq\Bigl(\operatorname{E}|X|^p\Bigr)^{1/p}+\Bigl(\operatorname{E}|Y|^p\Bigr)^{1/p}. </math> The Hölder and Minkowski inequalities can be extended to general measure spaces, and are often given in that context. By contrast, the Jensen inequality is special to the case of probability spaces.

Expectations under convergence of random variables

In general, it is not the case that <math>\operatorname{E}[X_n] \to \operatorname{E}[X]</math> even if <math>X_n\to X</math> pointwise. Thus, one cannot interchange limits and expectation, without additional conditions on the random variables. To see this, let <math>U</math> be a random variable distributed uniformly on <math>[0,1].</math> For <math>n\geq 1,</math> define a sequence of random variables

<math>X_n = n \cdot \mathbf{1}\left\{ U \in \left(0,\tfrac{1}{n}\right)\right\},</math>

with <math>{\mathbf 1}\{A\}</math> being the indicator function of the event <math>A.</math> Then, it follows that <math>X_n \to 0</math> pointwise. But, <math>\operatorname{E}[X_n] = n \cdot \operatorname{P}\left(U \in \left[ 0, \tfrac{1}{n}\right] \right) = n \cdot \tfrac{1}{n} = 1</math> for each <math>n.</math> Hence, <math>\lim_{n \to \infty} \operatorname{E}[X_n] = 1 \neq 0 = \operatorname{E}\left[ \lim_{n \to \infty} X_n \right].</math>

Analogously, for general sequence of random variables <math>\{ Y_n : n \geq 0\},</math> the expected value operator is not <math>\sigma</math>-additive, i.e.

<math>\operatorname{E}\left[\sum^\infty_{n=0} Y_n\right] \neq \sum^\infty_{n=0}\operatorname{E}[Y_n].</math>

An example is easily obtained by setting <math>Y_0 = X_1</math> and <math>Y_n = X_{n+1} - X_n</math> for <math>n \geq 1,</math> where <math>X_n</math> is as in the previous example.

A number of convergence results specify exact conditions which allow one to interchange limits and expectations, as specified below.

  • Monotone convergence theorem: Let <math>\{X_n : n \geq 0\}</math> be a sequence of random variables, with <math>0 \leq X_n \leq X_{n+1}</math> (a.s) for each <math>n \geq 0.</math> Furthermore, let <math>X_n \to X</math> pointwise. Then, the monotone convergence theorem states that <math>\lim_n\operatorname{E}[X_n]=\operatorname{E}[X].</math> Шаблон:Pb Using the monotone convergence theorem, one can show that expectation indeed satisfies countable additivity for non-negative random variables. In particular, let <math>\{X_i\}^\infty_{i=0}</math> be non-negative random variables. It follows from monotone convergence theorem that <math display="block">

\operatorname{E}\left[\sum^\infty_{i=0}X_i\right] = \sum^\infty_{i=0}\operatorname{E}[X_i]. </math>

  • Fatou's lemma: Let <math>\{ X_n \geq 0 : n \geq 0\}</math> be a sequence of non-negative random variables. Fatou's lemma states that <math display="block">\operatorname{E}[\liminf_n X_n] \leq \liminf_n \operatorname{E}[X_n].</math> Шаблон:Pb Corollary. Let <math>X_n \geq 0</math> with <math>\operatorname{E}[X_n] \leq C</math> for all <math>n \geq 0.</math> If <math>X_n \to X</math> (a.s), then <math>\operatorname{E}[X] \leq C.</math> Шаблон:Pb Proof is by observing that <math display="inline"> X = \liminf_n X_n</math> (a.s.) and applying Fatou's lemma.
  • Dominated convergence theorem: Let <math>\{X_n : n \geq 0 \}</math> be a sequence of random variables. If <math>X_n\to X</math> pointwise (a.s.), <math>|X_n|\leq Y \leq +\infty</math> (a.s.), and <math>\operatorname{E}[Y]<\infty.</math> Then, according to the dominated convergence theorem,
    • <math>\operatorname{E}|X| \leq \operatorname{E}[Y] <\infty</math>;
    • <math>\lim_n\operatorname{E}[X_n]=\operatorname{E}[X]</math>
    • <math>\lim_n\operatorname{E}|X_n - X| = 0.</math>
  • Uniform integrability: In some cases, the equality <math>\lim_n\operatorname{E}[X_n]=\operatorname{E}[\lim_n X_n]</math> holds when the sequence <math>\{X_n\}</math> is uniformly integrable.

Relationship with characteristic function

The probability density function <math>f_X</math> of a scalar random variable <math>X</math> is related to its characteristic function <math>\varphi_X</math> by the inversion formula:

<math>f_X(x) = \frac{1}{2\pi}\int_{\mathbb{R}} e^{-itx}\varphi_X(t) \, \mathrm{d}t.</math>

For the expected value of <math>g(X)</math> (where <math>g:{\mathbb R}\to{\mathbb R}</math> is a Borel function), we can use this inversion formula to obtain

<math>\operatorname{E}[g(X)] = \frac{1}{2\pi} \int_{\mathbb R} g(x)\left[ \int_{\mathbb R} e^{-itx}\varphi_X(t) \, \mathrm{d}t \right]\,\mathrm{d}x.</math>

If <math>\operatorname{E}[g(X)]</math> is finite, changing the order of integration, we get, in accordance with Fubini–Tonelli theorem,

<math>\operatorname{E}[g(X)] = \frac{1}{2\pi} \int_{\mathbb R} G(t) \varphi_X(t) \, \mathrm{d}t,</math>

where

<math>G(t) = \int_{\mathbb R} g(x) e^{-itx} \, \mathrm{d}x</math>

is the Fourier transform of <math>g(x).</math> The expression for <math>\operatorname{E}[g(X)]</math> also follows directly from the Plancherel theorem.

Uses and applications

The expectation of a random variable plays an important role in a variety of contexts.

In statistics, where one seeks estimates for unknown parameters based on available data gained from samples, the sample mean serves as an estimate for the expectation, and is itself a random variable. In such settings, the sample mean is considered to meet the desirable criterion for a "good" estimator in being unbiased; that is, the expected value of the estimate is equal to the true value of the underlying parameter. Шаблон:See also

For a different example, in decision theory, an agent making an optimal choice in the context of incomplete information is often assumed to maximize the expected value of their utility function.

It is possible to construct an expected value equal to the probability of an event by taking the expectation of an indicator function that is one if the event has occurred and zero otherwise. This relationship can be used to translate properties of expected values into properties of probabilities, e.g. using the law of large numbers to justify estimating probabilities by frequencies.

The expected values of the powers of X are called the moments of X; the moments about the mean of X are expected values of powers of Шаблон:Math. The moments of some random variables can be used to specify their distributions, via their moment generating functions.

To empirically estimate the expected value of a random variable, one repeatedly measures observations of the variable and computes the arithmetic mean of the results. If the expected value exists, this procedure estimates the true expected value in an unbiased manner and has the property of minimizing the sum of the squares of the residuals (the sum of the squared differences between the observations and the estimate). The law of large numbers demonstrates (under fairly mild conditions) that, as the size of the sample gets larger, the variance of this estimate gets smaller.

This property is often exploited in a wide variety of applications, including general problems of statistical estimation and machine learning, to estimate (probabilistic) quantities of interest via Monte Carlo methods, since most quantities of interest can be written in terms of expectation, e.g. <math>\operatorname{P}({X \in \mathcal{A}}) = \operatorname{E}[{\mathbf 1}_{\mathcal{A}}],</math> where <math>{\mathbf 1}_{\mathcal{A}}</math> is the indicator function of the set <math>\mathcal{A}.</math>

Файл:Beta first moment.svg
The mass of probability distribution is balanced at the expected value, here a Beta(α,β) distribution with expected value α/(α+β).

In classical mechanics, the center of mass is an analogous concept to expectation. For example, suppose X is a discrete random variable with values xi and corresponding probabilities pi. Now consider a weightless rod on which are placed weights, at locations xi along the rod and having masses pi (whose sum is one). The point at which the rod balances is E[X].

Expected values can also be used to compute the variance, by means of the computational formula for the variance

<math>\operatorname{Var}(X)= \operatorname{E}[X^2] - (\operatorname{E}[X])^2.</math>

A very important application of the expectation value is in the field of quantum mechanics. The expectation value of a quantum mechanical operator <math>\hat{A}</math> operating on a quantum state vector <math>|\psi\rangle</math> is written as <math>\langle\hat{A}\rangle = \langle\psi|A|\psi\rangle.</math> The uncertainty in <math>\hat{A}</math> can be calculated by the formula <math>(\Delta A)^2 = \langle\hat{A}^2\rangle - \langle \hat{A} \rangle^2</math>.

See also

References

Шаблон:Reflist

Bibliography

Шаблон:Refbegin

Шаблон:Refend

Шаблон:Theory of probability distributions

Шаблон:Authority control

  1. Шаблон:Cite web
  2. Шаблон:Cite web
  3. Шаблон:Cite book
  4. Шаблон:Cite book
  5. Шаблон:Cite journal
  6. Шаблон:Cite journal
  7. Шаблон:Cite web
  8. Шаблон:Cite book
  9. Whitworth, W.A. (1901) Choice and Chance with One Thousand Exercises. Fifth edition. Deighton Bell, Cambridge. [Reprinted by Hafner Publishing Co., New York, 1959.]
  10. Шаблон:Cite web
  11. Шаблон:Cite book pp. 2–4.
  12. 12,0 12,1 12,2 Шаблон:Cite web