Английская Википедия:Alias method
In computing, the alias method is a family of efficient algorithms for sampling from a discrete probability distribution, published in 1974 by A. J. Walker.[1][2] That is, it returns integer values Шаблон:Math according to some arbitrary probability distribution Шаблон:Mvar. The algorithms typically use Шаблон:Math or Шаблон:Math preprocessing time, after which random values can be drawn from the distribution in Шаблон:Math time.[3]
Operation
Internally, the algorithm consults two tables, a probability table Шаблон:Mvar and an alias table Шаблон:Mvar (for Шаблон:Math). To generate a random outcome, a fair dice is rolled to determine an index into the two tables. Based on the probability stored at that index, a biased coin is then flipped, and the outcome of the flip is used to choose between a result of Шаблон:Mvar and Шаблон:Mvar.[4]
More concretely, the algorithm operates as follows:
- Generate a uniform random variate Шаблон:Math.
- Let Шаблон:Math and Шаблон:Math. (This makes Шаблон:Math uniformly distributed on Шаблон:Math and Шаблон:Math uniformly distributed on Шаблон:Math.)
- If Шаблон:Math, return Шаблон:Mvar. This is the biased coin flip.
- Otherwise, return Шаблон:Mvar.
An alternative formulation of the probability table, proposed by Marsaglia et al.[5] as the "square histogram" method, uses the condition Шаблон:Math in the third step (where Шаблон:Math) instead of computing Шаблон:Mvar.
Table generation
The distribution may be padded with additional probabilities Шаблон:Math to increase Шаблон:Mvar to a convenient value, such as a power of two.
To generate the table, first initialize Шаблон:Math. While doing this, divide the table entries into three categories:
- The "overfull" group, where Шаблон:Math,
- The "underfull" group, where Шаблон:Math and Шаблон:Mvar has not been initialized, and
- The "exactly full" group, where Шаблон:Math or Шаблон:Mvar has been initialized.
If Шаблон:Math, the corresponding value Шаблон:Mvar will never be consulted and is unimportant, but a value of Шаблон:Math is sensible.
As long as not all table entries are exactly full, repeat the following steps:
- Arbitrarily choose an overfull entry Шаблон:Math and an underfull entry Шаблон:Math. (If one of these exists, the other must, as well.)
- Allocate the unused space in entry Шаблон:Mvar to outcome Шаблон:Mvar, by setting Шаблон:Math.
- Remove the allocated space from entry Шаблон:Mvar by changing Шаблон:Math.
- Entry Шаблон:Mvar is now exactly full.
- Assign entry Шаблон:Mvar to the appropriate category based on the new value of Шаблон:Mvar.
Each iteration moves at least one entry to the "exactly full" category (and the last moves two), so the procedure is guaranteed to terminate after at most Шаблон:Math iterations. Each iteration can be done in Шаблон:Math time, so the table can be set up in Шаблон:Math time.
Vose[3]Шаблон:Rp points out that floating-point rounding errors may cause the guarantee referred to in step 1 to be violated. If one category empties before the other, the remaining entries may have Шаблон:Mvar set to 1 with negligible error. The solution accounting for floating point is sometimes called the Walker-Vose method or the Vose alias method.
The Alias structure is not unique.
As the lookup procedure is slightly faster if Шаблон:Math (because Шаблон:Mvar does not need to be consulted), one goal during table generation is to maximize the sum of the Шаблон:Mvar. Doing this optimally turns out to be NP hard,[5]Шаблон:Rp but a greedy algorithm comes reasonably close: rob from the richest and give to the poorest. That is, at each step choose the largest Шаблон:Mvar and the smallest Шаблон:Mvar. Because this requires sorting the Шаблон:Mvar, it requires Шаблон:Math time.
Efficiency
Although the alias method is very efficient if generating a uniform deviate is itself fast, there are cases where it is far from optimal in terms of random bit usage. This is because it uses a full-precision random variate Шаблон:Mvar each time, even when only a few random bits are needed.
One case arises when the probabilities are particularly well balanced, so many Шаблон:Math and Шаблон:Mvar is not needed. Generating Шаблон:Mvar is a waste of time. For example if Шаблон:Math, then a 32-bit random variate Шаблон:Mvar could be used to make 32 choices, but the alias method will only generate one.
Another case arises when the probabilities are strongly unbalanced, so many Шаблон:Math. For example if Шаблон:Math and Шаблон:Math, then the great majority of the time, only a few random bits are required to determine that case 1 applies. In such cases, the table method described by Marsaglia et al.[5]Шаблон:Rp is more efficient. If we make many choices with the same probability we can on average require much less than one unbiased random bit. Using arithmetic coding techniques arithmetic we can approach the limit given by the binary entropy function.
Literature
- Donald Knuth, The Art of Computer Programming, Vol 2: Seminumerical Algorithms, section 3.4.1.
Implementations
- http://www.keithschwarz.com/darts-dice-coins/ Keith Schwarz: Detailed explanation, numerically stable version of Vose's algorithm, and link to Java implementation
- https://jugit.fz-juelich.de/mlz/ransampl Joachim Wuttke: Implementation as a small C library.
- https://gist.github.com/0b5786e9bfc73e75eb8180b5400cd1f8 Liam Huang's Implementation in C++
- https://github.com/joseftw/jos.weightedresult/blob/develop/src/JOS.WeightedResult/AliasMethodVose.cs C# implementation of Vose's algorithm.
- https://github.com/cdanek/KaimiraWeightedList C# implementation of Vose's algorithm without floating point instability.
References