Английская Википедия:Hammersley–Clifford theorem

Материал из Онлайн справочника
Перейти к навигацииПерейти к поиску

The Hammersley–Clifford theorem is a result in probability theory, mathematical statistics and statistical mechanics that gives necessary and sufficient conditions under which a strictly positive probability distribution (of events in a probability space) Шаблон:Clarify can be represented as events generated by a Markov network (also known as a Markov random field). It is the fundamental theorem of random fields.[1] It states that a probability distribution that has a strictly positive mass or density satisfies one of the Markov properties with respect to an undirected graph G if and only if it is a Gibbs random field, that is, its density can be factorized over the cliques (or complete subgraphs) of the graph.

The relationship between Markov and Gibbs random fields was initiated by Roland Dobrushin[2] and Frank Spitzer[3] in the context of statistical mechanics. The theorem is named after John Hammersley and Peter Clifford, who proved the equivalence in an unpublished paper in 1971.[4][5] Simpler proofs using the inclusion–exclusion principle were given independently by Geoffrey Grimmett,[6] Preston[7] and Sherman[8] in 1973, with a further proof by Julian Besag in 1974.[9]

Proof outline

Файл:A simple Markov network.png
A simple Markov network for demonstrating that any Gibbs random field satisfies every Markov property.

It is a trivial matter to show that a Gibbs random field satisfies every Markov property. As an example of this fact, see the following:

In the image to the right, a Gibbs random field over the provided graph has the form <math>\Pr(A,B,C,D,E,F) \propto f_1(A,B,D)f_2(A,C,D)f_3(C,D,F)f_4(C,E,F)</math>. If variables <math>C</math> and <math>D</math> are fixed, then the global Markov property requires that: <math>A, B \perp E, F | C, D</math> (see conditional independence), since <math>C, D</math> forms a barrier between <math>A, B</math> and <math>E, F</math>.

With <math>C</math> and <math>D</math> constant, <math>\Pr(A,B,E,F|C=c,D=d) \propto [f_1(A,B,d)f_2(A,c,d)] \cdot [f_3(c,d,F)f_4(c,E,F)] = g_1(A,B)g_2(E,F)</math> where <math>g_1(A,B) = f_1(A,B,d)f_2(A,c,d)</math> and <math>g_2(E,F) = f_3(c,d,F)f_4(c,E,F)</math>. This implies that <math>A, B \perp E, F | C, D</math>.

To establish that every positive probability distribution that satisfies the local Markov property is also a Gibbs random field, the following lemma, which provides a means for combining different factorizations, needs to be proved:

Файл:Merging two factorizations of a positive mass function.png
Lemma 1 provides a means for combining factorizations as shown in this diagram. Note that in this image, the overlap between sets is ignored.

Lemma 1

Let <math>U</math> denote the set of all random variables under consideration, and let <math>\Theta, \Phi_1, \Phi_2, \dots, \Phi_n \subseteq U</math> and <math>\Psi_1, \Psi_2, \dots, \Psi_m \subseteq U</math> denote arbitrary sets of variables. (Here, given an arbitrary set of variables <math>X</math>, <math>X</math> will also denote an arbitrary assignment to the variables from <math>X</math>.)

If

<math>\Pr(U) = f(\Theta)\prod_{i=1}^n g_i(\Phi_i) = \prod_{j=1}^m h_j(\Psi_j)</math>

for functions <math>f, g_1, g_2, \dots g_n</math> and <math>h_1, h_2, \dots, h_m</math>, then there exist functions <math>h'_1, h'_2, \dots, h'_m</math> and <math>g'_1, g'_2, \dots, g'_n</math> such that

<math>\Pr(U) = \bigg(\prod_{j=1}^m h'_j(\Theta \cap \Psi_j)\bigg)\bigg(\prod_{i=1}^n g'_i(\Phi_i)\bigg)</math>

In other words, <math>\prod_{j=1}^m h_j(\Psi_j)</math> provides a template for further factorization of <math>f(\Theta)</math>.

Шаблон:Collapse top

In order to use <math>\prod_{j=1}^m h_j(\Psi_j)</math> as a template to further factorize <math>f(\Theta)</math>, all variables outside of <math>\Theta</math> need to be fixed. To this end, let <math>\bar{\theta}</math> be an arbitrary fixed assignment to the variables from <math>U \setminus \Theta</math> (the variables not in <math>\Theta</math>). For an arbitrary set of variables <math>X</math>, let <math>\bar{\theta}[X]</math> denote the assignment <math>\bar{\theta}</math> restricted to the variables from <math>X \setminus \Theta</math> (the variables from <math>X</math>, excluding the variables from <math>\Theta</math>).

Moreover, to factorize only <math>f(\Theta)</math>, the other factors <math>g_1(\Phi_1), g_2(\Phi_2), ..., g_n(\Phi_n)</math> need to be rendered moot for the variables from <math>\Theta</math>. To do this, the factorization

<math>\Pr(U) = f(\Theta)\prod_{i=1}^n g_i(\Phi_i)</math>

will be re-expressed as

<math>\Pr(U) = \bigg(f(\Theta)\prod_{i=1}^n g_i(\Phi_i \cap \Theta, \bar{\theta}[\Phi_i])\bigg)\bigg(\prod_{i=1}^n \frac{g_i(\Phi_i)}{g_i(\Phi_i \cap \Theta, \bar{\theta}[\Phi_i])}\bigg)</math>

For each <math>i = 1, 2, ..., n</math>: <math>g_i(\Phi_i \cap \Theta, \bar{\theta}[\Phi_i])</math> is <math>g_i(\Phi_i)</math> where all variables outside of <math>\Theta</math> have been fixed to the values prescribed by <math>\bar{\theta}</math>.

Let <math>f'(\Theta) = f(\Theta)\prod_{i=1}^n g_i(\Phi_i \cap \Theta, \bar{\theta}[\Phi_i])</math> and <math>g'_i(\Phi_i) = \frac{g_i(\Phi_i)}{g_i(\Phi_i \cap \Theta, \bar{\theta}[\Phi_i])}</math> for each <math>i = 1, 2, \dots, n</math> so

<math>\Pr(U) = f'(\Theta)\prod_{i=1}^n g'_i(\Phi_i) = \prod_{j=1}^m h_j(\Psi_j)</math>

What is most important is that <math>g'_i(\Phi_i) = \frac{g_i(\Phi_i)}{g_i(\Phi_i \cap \Theta, \bar{\theta}[\Phi_i])} = 1</math> when the values assigned to <math>\Phi_i</math> do not conflict with the values prescribed by <math>\bar{\theta}</math>, making <math>g'_i(\Phi_i)</math> "disappear" when all variables not in <math>\Theta</math> are fixed to the values from <math>\bar{\theta}</math>.

Fixing all variables not in <math>\Theta</math> to the values from <math>\bar{\theta}</math> gives

<math>\Pr(\Theta, \bar{\theta}) = f'(\Theta) \prod_{i=1}^n g'_i(\Phi_i \cap \Theta, \bar{\theta}[\Phi_i]) = \prod_{j=1}^m h_j(\Psi_j \cap \Theta, \bar{\theta}[\Psi_j])</math>

Since <math>g'_i(\Phi_i \cap \Theta, \bar{\theta}[\Phi_i]) = 1</math>,

<math>f'(\Theta) = \prod_{j=1}^m h_j(\Psi_j \cap \Theta, \bar{\theta}[\Psi_j])</math>

Letting <math>h'_j(\Theta \cap \Psi_j) = h_j(\Psi_j \cap \Theta, \bar{\theta}[\Psi_j]) </math> gives:

<math>f'(\Theta) = \prod_{j=1}^m h'_j(\Theta \cap \Psi_j)</math> which finally gives:

<math>\Pr(U) = \bigg(\prod_{j=1}^m h'_j(\Theta \cap \Psi_j)\bigg)\bigg(\prod_{i=1}^n g'_i(\Phi_i)\bigg)</math>

Шаблон:Collapse bottom

Файл:Neighborhood Intersections.png
The clique formed by vertices <math>x_1</math>, <math>x_2</math>, and <math>x_3</math>, is the intersection of <math>\{x_1\} \cup \partial x_1</math>, <math>\{x_2\} \cup \partial x_2</math>, and <math>\{x_3\} \cup \partial x_3</math>.

Lemma 1 provides a means of combining two different factorizations of <math>\Pr(U)</math>. The local Markov property implies that for any random variable <math>x \in U</math>, that there exists factors <math>f_x</math> and <math>f_{-x}</math> such that:

<math>\Pr(U) = f_x(x, \partial x)f_{-x}(U \setminus \{x\})</math>

where <math>\partial x</math> are the neighbors of node <math>x</math>. Applying Lemma 1 repeatedly eventually factors <math>\Pr(U)</math> into a product of clique potentials (see the image on the right).

End of Proof

See also

Notes

Шаблон:Reflist

Further reading