Английская Википедия:Coupling from the past

Материал из Онлайн справочника
Перейти к навигацииПерейти к поиску

Among Markov chain Monte Carlo (MCMC) algorithms, coupling from the past is a method for sampling from the stationary distribution of a Markov chain. Contrary to many MCMC algorithms, coupling from the past gives in principle a perfect sample from the stationary distribution. It was invented by James Propp and David Wilson in 1996.

The basic idea

Consider a finite state irreducible aperiodic Markov chain <math>M</math> with state space <math>S</math> and (unique) stationary distribution <math>\pi</math> (<math>\pi</math> is a probability vector). Suppose that we come up with a probability distribution <math>\mu</math> on the set of maps <math>f:S\to S</math> with the property that for every fixed <math>s\in S</math>, its image <math>f(s)</math> is distributed according to the transition probability of <math>M</math> from state <math>s</math>. An example of such a probability distribution is the one where <math>f(s)</math> is independent from <math>f(s')</math> whenever <math>s\ne s'</math>, but it is often worthwhile to consider other distributions. Now let <math>f_j</math> for <math>j\in\mathbb Z</math> be independent samples from <math>\mu</math>.

Suppose that <math>x</math> is chosen randomly according to <math>\pi</math> and is independent from the sequence <math>f_j</math>. (We do not worry for now where this <math>x</math> is coming from.) Then <math>f_{-1}(x)</math> is also distributed according to <math>\pi</math>, because <math>\pi</math> is <math>M</math>-stationary and our assumption on the law of <math>f</math>. Define

<math>F_j:= f_{-1}\circ f_{-2}\circ\cdots\circ f_{-j}.</math>

Then it follows by induction that <math>F_j(x)</math> is also distributed according to <math>\pi</math> for every <math>j\in\mathbb{N}</math>. However, it may happen that for some <math>n\in\mathbb{N}</math> the image of the map <math>F_n</math> is a single element of <math>S</math>. In other words, <math>F_n(x)=F_n(y)</math> for each <math>y\in S</math>. Therefore, we do not need to have access to <math>x</math> in order to compute <math>F_n(x)</math>. The algorithm then involves finding some <math>n\in \mathbb N</math> such that <math>F_n(S)</math> is a singleton, and outputting the element of that singleton. The design of a good distribution <math>\mu</math> for which the task of finding such an <math>n</math> and computing <math>F_n</math> is not too costly is not always obvious, but has been accomplished successfully in several important instances.[1]

The monotone case

There is a special class of Markov chains in which there are particularly good choices for <math>\mu</math> and a tool for determining if <math>|F_n(S)|=1</math>. (Here <math>|\cdot|</math> denotes cardinality.) Suppose that <math>S</math> is a partially ordered set with order <math>\le</math>, which has a unique minimal element <math>s_0</math> and a unique maximal element <math>s_1</math>; that is, every <math>s\in S</math> satisfies <math>s_0\le s\le s_1</math>. Also, suppose that <math>\mu</math> may be chosen to be supported on the set of monotone maps <math>f:S\to S</math>. Then it is easy to see that <math>|F_n(S)|=1</math> if and only if <math>F_n(s_0)=F_n(s_1)</math>, since <math>F_n</math> is monotone. Thus, checking this becomes rather easy. The algorithm can proceed by choosing <math>n:=n_0</math> for some constant <math>n_0</math>, sampling the maps <math>f_{-1},\dots,f_{-n}</math>, and outputting <math>F_n(s_0)</math> if <math>F_n(s_0)=F_n(s_1)</math>. If <math>F_n(s_0)\ne F_n(s_1)</math> the algorithm proceeds by doubling <math>n</math> and repeating as necessary until an output is obtained. (But the algorithm does not resample the maps <math>f_{-j}</math> which were already sampled; it uses the previously sampled maps when needed.)

References

Шаблон:Reflist