Английская Википедия:Golomb coding

Шаблон:Short description Golomb coding is a lossless data compression method using a family of data compression codes invented by Solomon W. Golomb in the 1960s. Alphabets following a geometric distribution will have a Golomb code as an optimal prefix code,^[1] making Golomb coding highly suitable for situations in which the occurrence of small values in the input stream is significantly more likely than large values.

Rice coding

Rice coding (invented by Robert F. Rice) denotes using a subset of the family of Golomb codes to produce a simpler (but possibly suboptimal) prefix code. Rice used this set of codes in an adaptive coding scheme; "Rice coding" can refer either to that adaptive scheme or to using that subset of Golomb codes. Whereas a Golomb code has a tunable parameter that can be any positive integer value, Rice codes are those in which the tunable parameter is a power of two. This makes Rice codes convenient for use on a computer, since multiplication and division by 2 can be implemented more efficiently in binary arithmetic.

Rice was motivated to propose this simpler subset due to the fact that geometric distributions are often varying with time, not precisely known, or both, so selecting the seemingly optimal code might not be very advantageous.

Rice coding is used as the entropy encoding stage in a number of lossless image compression and audio data compression methods.

Overview

Файл:Golomb code example.png

Golomb coding example for a source x with geometric distribution, with parameter Шаблон:Math, using Golomb code with Шаблон:Math. The 2-bit code 00 is used 20% of the time; the 3-bit codes 010, 011, and 100 are used over 38% of the time; 4 bits or more are needed in a minority of cases. For this source, entropy = 3.610 bits; for this code with this source, rate = 3.639 bits; therefore redundancy = 0.030 bits, or efficiency = 0.992 bits per bit.

Construction of codes

Golomb coding uses a tunable parameter Шаблон:Mvar to divide an input value Шаблон:Mvar into two parts: Шаблон:Mvar, the result of a division by Шаблон:Mvar, and Шаблон:Mvar, the remainder. The quotient is sent in unary coding, followed by the remainder in truncated binary encoding. When <math>M=1</math>, Golomb coding is equivalent to unary coding.

Golomb–Rice codes can be thought of as codes that indicate a number by the position of the bin (Шаблон:Mvar), and the offset within the bin (Шаблон:Mvar). The example figure shows the position Шаблон:Mvar and offset Шаблон:Mvar for the encoding of integer Шаблон:Mvar using Golomb–Rice parameter Шаблон:Math, with source probabilities following a geometric distribution with Шаблон:Math.

Formally, the two parts are given by the following expression, where Шаблон:Mvar is the nonnegative integer being encoded:

<math>q = \left \lfloor \frac{x}{M} \right \rfloor</math>

and

<math>r = x - qM</math>.

Файл:GolombCodeRedundancy.svg

This image shows the redundancy, in bits, of the Golomb code, when Шаблон:Mvar is chosen optimally, for Шаблон:Math

Both Шаблон:Mvar and Шаблон:Mvar will be encoded using variable numbers of bits: Шаблон:Mvar by a unary code, and Шаблон:Mvar by Шаблон:Mvar bits for Rice code, or a choice between Шаблон:Mvar and Шаблон:Math bits for Golomb code (i.e. Шаблон:Mvar is not a power of 2), with <math>b = \lfloor\log_2(M)\rfloor</math>. If <math>r < 2^{b+1} - M</math>, then use Шаблон:Mvar bits to encode Шаблон:Mvar; otherwise, use Шаблон:Mvar+1 bits to encode Шаблон:Mvar. Clearly, <math>b=\log_2(M)</math> if Шаблон:Mvar is a power of 2 and we can encode all values of Шаблон:Mvar with Шаблон:Mvar bits.

The integer Шаблон:Mvar treated by Golomb was the run length of a Bernoulli process, which has a geometric distribution starting at 0. The best choice of parameter Шаблон:Mvar is a function of the corresponding Bernoulli process, which is parameterized by <math>p = P(x=0)</math> the probability of success in a given Bernoulli trial. Шаблон:Mvar is either the median of the distribution or the median ±1. It can be determined by these inequalities:

which are solved by

<math>M = \left\lceil -\frac{\log(2 -p)}{\log(1-p)}\right\rceil</math>.

For the example with Шаблон:Math:

<math>M = \left\lceil -\frac{\log(1.8)}{\log(0.8)}\right\rceil = \left\lceil 2.634 \right\rceil = 3</math>.

The Golomb code for this distribution is equivalent to the Huffman code for the same probabilities, if it were possible to compute the Huffman code for the infinite set of source values.

Use with signed integers

Golomb's scheme was designed to encode sequences of non-negative numbers. However, it is easily extended to accept sequences containing negative numbers using an overlap and interleave scheme, in which all values are reassigned to some positive number in a unique and reversible way. The sequence begins: 0, −1, 1, −2, 2, −3, 3, −4, 4, ... The n-th negative value (i.e., Шаблон:Tmath) is mapped to the n^th odd number (Шаблон:Tmath), and the m^th positive value is mapped to the m-th even number (Шаблон:Tmath). This may be expressed mathematically as follows: a positive value Шаблон:Mvar is mapped to (<math>x' = 2|x| = 2x,\ x \ge 0</math>), and a negative value Шаблон:Mvar is mapped to (<math>y' = 2|y| - 1 = -2y - 1,\ y < 0</math>). Such a code may be used for simplicity, even if suboptimal. Truly optimal codes for two-sided geometric distributions include multiple variants of the Golomb code, depending on the distribution parameters, including this one.^[2]

Simple algorithm

Below is the Rice–Golomb encoding, where the remainder code uses simple truncated binary encoding, also named "Rice coding" (other varying-length binary encodings, like arithmetic or Huffman encodings, are possible for the remainder codes, if the statistic distribution of remainder codes is not flat, and notably when not all possible remainders after the division are used). In this algorithm, if the M parameter is a power of 2, it becomes equivalent to the simpler Rice encoding:

Fix the parameter M to an integer value.
For N, the number to be encoded, find
1. quotient = q = floor(N/M)
2. remainder = r = N modulo M
Generate codeword
1. The code format : <Quotient code><Remainder code>, where
2. Quotient code (in unary coding)
  1. Write a q-length string of 1 bits (alternatively, of 0 bits)
  2. Write a 0 bit (respectively, a 1 bit)
3. Remainder code (in truncated binary encoding)
  1. Let <math>b = \lfloor\log_2(M)\rfloor</math>
    1. If <math>r < 2^{b+1}-M</math> code r in binary representation using b bits.
    2. If <math>r \ge 2^{b+1}-M</math> code the number <math>r+2^{b+1}-M</math> in binary representation using b + 1 bits.

Decoding:

Decode the unary representation of q (count the number of 1 in the beginning of the code)
Skip the 0 delimiter
Let <math>b = \lfloor\log_2(M)\rfloor</math>
1. Interpret next b bits as a binary number r'. If <math>r' < 2^{b+1}-M</math> holds, then the reminder <math> r = r' </math>
2. Otherwise interpret b + 1 bits as a binary number r', the reminder is given by <math>r = r' - 2^{b+1} + M</math>
Compute <math>N = q * M + r</math>

Example

Set Шаблон:Math. Thus <math>b = \lfloor\log_2(10)\rfloor = 3</math>. The cutoff is <math>2^{b+1} - M = 16 - 10 = 6</math>.

Encoding of quotient part
Шаблон:Mvar	output bits
0	0
1	10
2	110
3	1110
4	11110
5	111110
6	1111110
<math>\vdots</math>	<math>\vdots</math>
N	<math>\underbrace{111\cdots111}_N 0</math>

Encoding of remainder part
Шаблон:Mvar	offset	binary	output bits
0	0	0000	000
1	1	0001	001
2	2	0010	010
3	3	0011	011
4	4	0100	100
5	5	0101	101
6	12	1100	1100
7	13	1101	1101
8	14	1110	1110
9	15	1111	1111

For example, with a Rice–Golomb encoding using parameter Шаблон:Math, the decimal number 42 would first be split into Шаблон:Mvar = 4 and Шаблон:Mvar = 2, and would be encoded as qcode(Шаблон:Mvar),rcode(Шаблон:Mvar) = qcode(4),rcode(2) = 11110,010 (you don't need to encode the separating comma in the output stream, because the 0 at the end of the Шаблон:Mvar code is enough to say when Шаблон:Mvar ends and Шаблон:Mvar begins; both the qcode and rcode are self-delimited).

Use for run-length encoding

Note that Шаблон:Mvar and Шаблон:Math are reversed in this section compared to the use in earlier sections.

Given an alphabet of two symbols, or a set of two events, P and Q, with probabilities p and (Шаблон:Math) respectively, where Шаблон:Math, Golomb coding can be used to encode runs of zero or more P′s separated by single Q′s. In this application, the best setting of the parameter M is the nearest integer to <math>- \frac{1}{\log_{2}p}</math>. When p = 1/2, M = 1, and the Golomb code corresponds to unary (Шаблон:Math P′s followed by a Q is encoded as n ones followed by a zero). If a simpler code is desired, one can assign Golomb–Rice parameter Шаблон:Mvar (i.e., Golomb parameter <math>M=2^b</math>) to the integer nearest to <math>- \log_2(-\log_2 p)</math>; although not always the best parameter, it is usually the best Rice parameter and its compression performance is quite close to the optimal Golomb code. (Rice himself proposed using various codes for the same data to figure out which was best. A later JPL researcher proposed various methods of optimizing or estimating the code parameter.^[3])

Consider using a Rice code with a binary portion having Шаблон:Mvar bits to run-length encode sequences where P has a probability Шаблон:Mvar. If <math>\mathbb{P}[\text{bit is part of }k\text{-run}]</math> is the probability that a bit will be part of an Шаблон:Mvar-bit run (<math>k-1</math> Ps and one Q) and <math>(\text{compression ratio of }k\text{-run})</math> is the compression ratio of that run, then the expected compression ratio is

<math>\begin{align}

\mathbb{E}[\text{compression ratio}] &= \sum_{k=1}^\infty (\text{compression ratio of }k\text{-run}) \cdot \mathbb{P}[\text{bit is part of }k\text{-run}] \\ &= \sum_{k=1}^\infty \frac{b+1+\lfloor 2^{-b}(k-1) \rfloor}{k} \cdot kp^{k-1} (1-p)^2 \\ &= (1-p)^2 \sum_{j=0}^\infty (b+1+j) \cdot \sum_{i=j2^b+1}^{(j+1)2^b} p^{i-1} \\ &= (1-p)^2 \sum_{j=0}^\infty (b+1+j) \cdot \left(p^{2^b j} - p^{2^{b} (j+1)}\right) \\ &= (1-p) \cdot \left (b + \sum_{m=0}^\infty p^{2^b m} \right ) \\ &= (1-p) \cdot \left(b + {\left (1-p^{2^b} \right )}^{-1}\right ) \\ \end{align}</math>

Compression is often expressed in terms of <math>1-\mathbb{E}[\text{compression ratio}]</math>, the proportion compressed. For <math>p \approx 1</math>, the run-length coding approach results in compression ratios close to entropy. For example, using Rice code <math>b=6</math> for <math>p=0.99</math> yields Шаблон:Val compression, while the entropy limit is Шаблон:Val.

Adaptive run-length Golomb–Rice encoding

When a probability distribution for integers is not known, the optimal parameter for a Golomb–Rice encoder cannot be determined. Thus, in many applications, a two-pass approach is used: first, the block of data is scanned to estimate a probability density function (PDF) for the data. The Golomb–Rice parameter is then determined from that estimated PDF. A simpler variation of that approach is to assume that the PDF belongs to a parametrized family, estimate the PDF parameters from the data, and then compute the optimal Golomb–Rice parameter. That is the approach used in most of the applications discussed below.

An alternative approach to efficiently encode integer data whose PDF is not known, or is varying, is to use a backwards-adaptive encoder. The RLGR encoder [1] achieves that using a very simple algorithm that adjusts the Golomb–Rice parameter up or down, depending on the last encoded symbol. A decoder can follow the same rule to track the variation of the encoding parameters, so no side information needs to be transmitted, just the encoded data. Assuming a generalized Gaussian PDF, which covers a wide range of statistics seen in data such as prediction errors or transform coefficients in multimedia codecs, the RLGR encoding algorithm can perform very well in such applications.

Applications

Файл:Golomb coded Rice Algorithm experiment Compression Ratios.png

Golomb-coded Rice algorithm experiment compression ratios

Numerous signal codecs use a Rice code for prediction residues. In predictive algorithms, such residues tend to fall into a two-sided geometric distribution, with small residues being more frequent than large residues, and the Rice code closely approximates the Huffman code for such a distribution without the overhead of having to transmit the Huffman table. One signal that does not match a geometric distribution is a sine wave, because the differential residues create a sinusoidal signal whose values are not creating a geometric distribution (the highest and lowest residue values have similar high frequency of occurrences, only the median positive and negative residues occur less often).

Several lossless audio codecs, such as Shorten,^[4] FLAC,^[5] Apple Lossless, and MPEG-4 ALS, use a Rice code after the linear prediction step (called "adaptive FIR filter" in Apple Lossless). Rice coding is also used in the FELICS lossless image codec.

The Golomb–Rice coder is used in the entropy coding stage of Rice algorithm based lossless image codecs. One such experiment yields the compression ratio graph shown.

The JPEG-LS scheme uses Rice–Golomb to encode the prediction residuals.

The adaptive version of Golomb–Rice coding mentioned above, the RLGR encoder [2],is used for encoding screen content in virtual machines in the RemoteFX component of the Microsoft Remote Desktop Protocol.

References

Шаблон:Reflist

Партнерские ресурсы
Криптовалюты	Обмен криптовалют - www.bestchange.ru Криптовалютная биржа CoinEx Криптовалютная биржа Binance HIVE OS - операционная система для майнинга e4pool - Мультивалютный пул для майнинга.
Магазины	AliExpress — глобальная виртуальная (в Интернете) торговая площадка, предоставляющая возможность покупать товары производителей из КНР; computeruniverse.net - Интернет-магазин компьютеров(Промо код 5 Евро на первую покупку:FWWC3ZKQ);
Хостинг	DigitalOcean - американский провайдер облачных инфраструктур, с главным офисом в Нью-Йорке и с центрами обработки данных по всему миру;
Разное	Викиум - Онлайн-тренажер для мозга Like Центр - Центр поддержки и развития предпринимательства. Gamersbay - лучший магазин по бустингу для World of Warcraft. Ноотропы OmniMind N°1 - Усиливает мозговую активность. Повышает мотивацию. Улучшает память. Санкт-Петербургская школа телевидения - это федеральная сеть образовательных центров, которая имеет филиалы в 37 городах России. Lingualeo.com — интерактивный онлайн-сервис для изучения и практики английского языка в увлекательной игровой форме. Junyschool (Джунискул) – международная школа программирования и дизайна для детей и подростков от 5 до 17 лет, где ученики осваивают компьютерную грамотность, развивают алгоритмическое и креативное мышление, изучают основы программирования и компьютерной графики, создают собственные проекты: игры, сайты, программы, приложения, анимации, 3D-модели, монтируют видео. Умназия - Интерактивные онлайн-курсы и тренажеры для развития мышления детей 6-13 лет SkillBox - это один из лидеров российского рынка онлайн-образования. Среди партнеров Skillbox ведущий разработчик сервисного дизайна AIC, медиа-компания Yoola, первое и самое крупное русскоязычное аналитическое агентство Tagline, онлайн-школа дизайна и иллюстрации Bang! Bang! Education, оператор PR-рынка PACO, студия рисования Draw&Go, агентство performance-маркетинга Ingate, scrum-студия Sibirix, имидж-лаборатория Персона. «Нетология» — это университет по подготовке и дополнительному обучению специалистов в области интернет-маркетинга, управления проектами и продуктами, дизайна, Data Science и разработки. В рамках Нетологии студенты получают ценные теоретические знания от лучших экспертов Рунета, выполняют практические задания на отработку полученных навыков, общаются с экспертами и единомышленниками. Познакомиться со всеми продуктами подробнее можно на сайте https://netology.ru, линейка курсов и профессий постоянно обновляется. StudyBay Brazil – это онлайн биржа для португалоговорящих студентов и авторов! Студент получает уникальную работу любого уровня сложности и больше свободного времени, в то время как у автора появляется дополнительный заработок и бесценный опыт. Автор24 — самая большая в России площадка по написанию учебных работ: контрольные и курсовые работы, дипломы, рефераты, решение задач, отчеты по практике, а так же любой другой вид работы. Сервис сотрудничает с более 70 000 авторов. Более 1 000 000 работ уже выполнено. StudyBay – это онлайн биржа для англоязычных студентов и авторов! Студент получает уникальную работу любого уровня сложности и больше свободного времени, в то время как у автора появляется дополнительный заработок и бесценный опыт.

Английская Википедия:Golomb coding

Содержание

Rice coding

Overview

Construction of codes

Use with signed integers

Simple algorithm

Example

Use for run-length encoding

Adaptive run-length Golomb–Rice encoding

Applications

See also

References

Further reading

Навигация

Действия на странице

Действия на странице

Персональные инструменты

Навигация

Поиск

Инструменты