Английская Википедия:Bayesian quadrature

Bayesian quadrature^[1]^[2]^[3]^[4]^[5] is a method for approximating intractable integration problems. It falls within the class of probabilistic numerical methods. Bayesian quadrature views numerical integration as a Bayesian inference task, where function evaluations are used to estimate the integral of that function. For this reason, it is sometimes also referred to as "Bayesian probabilistic numerical integration" or "Bayesian numerical integration". The name "Bayesian cubature" is also sometimes used when the integrand is multi-dimensional. A potential advantage of this approach is that it provides probabilistic uncertainty quantification for the value of the integral.

Bayesian quadrature

Numerical integration

Let <math>f:\mathcal{X} \rightarrow \mathbb{R}</math> be a function defined on a domain <math>\mathcal{X}</math> (where typically <math>\mathcal{X}\subseteq \mathbb{R}^d</math>). In numerical integration, function evaluations <math>f(x_1), \ldots, f(x_n)</math> at distinct locations <math>x_1, \ldots, x_n</math> in <math>\mathcal{X}</math> are used to estimate the integral of <math> f </math> against a measure <math> \nu </math>: i.e. <math> \textstyle \nu[f] := \int_{\mathcal{X}} f(x) \nu(\mathrm{d}x). </math> Given weights <math>w_1, \ldots, w_n \in \mathbb{R}</math>, a quadrature rule is an estimator of <math>\nu[f]</math> of the form <math display> \textstyle \hat{\nu}[f] := \sum_{i=1}^n w_i f(x_i). </math>

Bayesian quadrature consists of specifying a prior distribution over <math>f</math>, conditioning this prior on <math>f(x_1), \ldots, f(x_n)</math> to obtain a posterior distribution <math>f</math>, then computing the implied posterior distribution on <math> \nu[f] </math>. The name "quadrature" comes from the fact that the posterior mean on <math> \nu[f] </math> sometimes takes the form of a quadrature rule whose weights are determined by the choice of prior.

Bayesian quadrature with Gaussian processes

The most common choice of prior distribution for <math> f </math> is a Gaussian process as this permits conjugate inference to obtain a closed-form posterior distribution on <math> \nu[f] </math>. Suppose we have a Gaussian process with prior mean function <math> m: \mathcal{X} \rightarrow \mathbb{R} </math> and covariance function (or kernel function) <math> k: \mathcal{X} \times \mathcal{X} \rightarrow \mathbb{R} </math>. Then, the posterior distribution on <math> f </math> is a Gaussian process with mean <math> m_n:\mathcal{X} \rightarrow \mathbb{R} </math> and kernel <math> k_n:\mathcal{X} \times \mathcal{X} \rightarrow \mathbb{R} </math> given by: <math display="block"> m_n(x) = m(x) + k(x,X)k(X,X)^{-1} f(X) \qquad \text{and} \qquad k_n(x,y) = k(x,y)-k(x,X)k(X,X)^{-1}k(X,y). </math> where <math> (k(X,X))_{ij} = k(x_i,x_j)</math>, <math> (f(X))_{i} = f(x_i)</math>, <math> (k(\cdot,X))_i = k(\cdot,x_i)</math> and <math> (k(X,\cdot))_i = k(x_i,\cdot)</math>.

Furthermore, the posterior distribution on <math> \nu[f] </math> is a univariate Gaussian distribution with mean <math> \mathbb{E}[\nu[f]] </math> and variance <math> \mathbb{V}[\nu[f]] </math> given by <math display="block"> \mathbb{E}[\nu[f]] = \nu[m]+ \nu[k(\cdot,X)]k(X,X)^{-1} f(X) \qquad \text{and} \qquad \mathbb{V}[\nu[f]] = \nu\nu[k]-\nu[k(\cdot,X)]k(X,X)^{-1}\nu[k(X,\cdot)]. </math> The function <math> \textstyle \nu[k(\cdot, x)] = \int_\mathcal{X} k(y, x) \nu(\mathrm{d} y)</math> is the kernel mean embedding of <math>k</math> and <math> \textstyle \nu\nu[k] = \int_\mathcal{X} k(x, y) \nu(dx) \nu(\mathrm{d}y)</math> denotes the integral of <math>k</math> with respect to both inputs. In particular, note that the posterior mean is a quadrature rule with weights <math> \textstyle w_i = (\nu[k(\cdot,X)]k(X,X)^{-1})_i. </math> and the posterior variance provides a quantification of the user's uncertainty over the value of <math> \nu[f] </math>.

In more challenging integration problems, where the prior distribution cannot be relied upon as a meaningful representation of epistemic uncertainty, it is necessary to use the data <math>f(x_1), \ldots, f(x_n)</math> to set the kernel hyperparameters using, for example, maximum likelihood estimation. The estimation of kernel hyperparameters introduces adaptivity into Bayesian quadrature.^[6]^[7]

Example

Файл:Bayesian quadrature animation.gif

Illustration of Bayesian quadrature for estimating <math> \textstyle \nu[f] = \int_0^1 f(x) \, \mathrm{d}x</math> where <math> \textstyle f(x) = (1 + x^2) \sin(5 \pi x) + 8/5 </math>. The posterior distribution (blue) concentrates on the true integral when more data (the red points) is obtained of the integrand <math>f</math>.

Consider estimation of the integral <math display="block"> \nu[f] = \int_0^1 f(x) \, \mathrm{d}x \approx 1.79 \quad \text{ of the function } \quad f(x) = (1 + x^2) \sin(5 \pi x) + \frac{8}{5}</math> using a Bayesian quadrature rule based on a zero-mean Gaussian process prior with the Matérn covariance function of smoothness <math>3/2</math> and correlation length <math>\rho = 1/5</math>. This covariance function is <math> \textstyle k(x, y) = (1 + \sqrt{3} \, |x - y| / \rho ) \exp( \! - \sqrt{3} \, |x - y|/\rho ). </math> It is straightforward (though tedious) to compute that <math display="block"> \nu[k(\cdot, x)] = \int_0^1 k(y, x) \,\mathrm{d}y = \frac{4\rho}{\sqrt{3}} - \frac{1}{3} \exp\bigg(\frac{\sqrt{3}(x-1)}{\rho}\bigg) \big(3+2\sqrt{3}\,\rho-3x\big)-\frac{1}{3} \exp\bigg(-\frac{\sqrt{3} \, x}{\rho}\bigg)\big(3x+2\sqrt{3}\,\rho\big) </math> <math display="block"> \nu\nu[k] = \int_0^1 \int_0^1 k(x, y) \,\mathrm{d} x \,\mathrm{d} y = \frac{2\rho}{3} \Bigg[ 2\sqrt{3} - 3\rho + \exp\bigg(\!-\frac{\sqrt{3}}{\rho}\bigg) \big( \sqrt{3} + 3\rho \big) \Bigg].</math> Convergence of the Bayesian quadrature point estimate <math>\mathbb{E}[\nu[f]]</math> and concentration of the posterior mass, as quantified by <math>\mathbb{V}[\nu[f]]</math>, around the true integral <math>\nu[f]</math> as <math>f</math> is evaluated at more and more points is displayed in the accompanying animation.

Advantages and disadvantages

Since Bayesian quadrature is an example of probabilistic numerics, it inherits certain advantages compared with traditional numerical integration methods:

It allows uncertainty to be quantified and propagated through all subsequent computations to explicitly model the impact of numerical error.^[8]
It provides a principled way to incorporate prior knowledge by using a judicious choice of prior distributions for <math>f</math>, which may be more sophisticated compared to the standard Gaussian process just described.^[7]
It permits more efficient use of information, e.g. jointly inferring multiple related quantities of interest^[9] or using active learning to reduce the required number of points.^[10]

Despite these merits, Bayesian quadrature methods possess the following limitations:

Although the Bayesian paradigm allows a principled treatment of the quantification of uncertainty, posterior inference over <math>\nu[f]</math> is not always tractable, thus requiring a second-level estimation. E.g. for Bayesian quadrature with Gaussian processes, the kernel mean embedding <math>\nu[k(\cdot, x)]</math> has no closed-form expression for a general kernel <math>k</math> and measure <math>\nu</math>.

The computational cost of Bayesian quadrature methods based on Gaussian processes is in general <math>\mathcal{O}(n^3)</math> due to the cost of inverting <math>n \times n</math> matrices, which may defy their applications to large-scale problems.

Algorithmic design

Prior distributions

The most commonly used prior for <math>f</math> is a Gaussian process prior. This is mainly due to the advantage provided by Gaussian conjugacy and the fact that Gaussian processes can encode a wide range of prior knowledge including smoothness, periodicity and sparsity through a careful choice of prior covariance. However, a number of other prior distributions have also been proposed. This includes multi-output Gaussian processes,^[9] which are particularly useful when tackling multiple related numerical integration tasks simultaneously or sequentially, and tree-based priors such as Bayesian additive regression trees,^[10] which are well suited for discontinuous <math> f </math>. Additionally, Dirichlet processes priors have also been proposed for the integration measure <math> \nu </math>.^[11]

Point selection

The points <math>x_1, \ldots, x_n </math> are either considered to be given, or can be selected so as to ensure the posterior on <math>\nu[f]</math> concentrates at a faster rate. One approach consists of using point sets from other quadrature rules. For example, taking independent and identically distributed realisations from <math>\nu </math> recovers a Bayesian approach to Monte Carlo,^[3] whereas using certain deterministic point sets such as low-discrepancy sequences or lattices recovers a Bayesian alternative to quasi-Monte Carlo.^[4]^[12] It is of course also possible to use point sets specifically designed for Bayesian quadrature; see for example the work of ^[13] who exploited symmetries in point sets to obtain scalable Bayesian quadrature estimators. Alternatively, points can also be selected adaptively following principles from active learning and Bayesian experimental design so as to directly minimise posterior uncertainty,^[14]^[15] including for multi-output Gaussian processes.^[16]

Kernel mean and initial error

One of the challenges when implementing Bayesian quadrature is the need to evaluate the function <math> \nu[k(\cdot,x)] </math> and the constant <math> \nu\nu[k] </math>. The former is commonly called the kernel mean, and is a quantity which is key to the computation of kernel-based distances such as the maximum mean discrepancy. The latter is commonly called the initial error since it provides an upper bound on the integration error before any function values are observed. Unfortunately, the kernel mean and initial error can only be computed for a small number of <math> (k, \nu) </math> pairs; see for example Table 1 in.^[4]

Theory

There have been a number of theoretical guarantees derived for Bayesian quadrature. These usually require Sobolev smoothness properties of the integrand,^[4]^[17]^[18] although recent work also extends to integrands in the reproducing kernel Hilbert space of the Gaussian kernel.^[19] Most of the results apply to the case of Monte Carlo or deterministic grid point sets, but some results also extend to adaptive designs. ^[20]

Software

ProbNum: Probabilistic numerical methods in Python, including a Bayesian quadrature implementation.
Emukit: Emulation and decision making under uncertainty in Python.
QMCPy: Bayesian quadrature with QMC point sets in Python.

References

Шаблон:Reflist

Шаблон:Improve categories

[Diaconis1988-1] Шаблон:Cite journal

[OHagan1991-2] Шаблон:Cite journal

[Rasmussen2002BMC-3] 3,0 ^3,1 Шаблон:Cite journal

[Briol2019-4] 4,0 ^4,1 ^4,2 ^4,3 Шаблон:Cite journal

[HennigOsborneKersting2022-5] Шаблон:Cite book

[6] Шаблон:Cite journal

[:0-7] 7,0 ^7,1 Шаблон:Cite journal

[8] Шаблон:Cite journal

[Xi2018-9] 9,0 ^9,1 Шаблон:Cite journal

[BPNMtrees2020-10] 10,0 ^10,1 Шаблон:Cite journal

[Oates2017-11] Шаблон:Cite journal

[Jagadeeswaran2019-12] Шаблон:Cite journal

[Karvonen2018symmetric-13] Шаблон:Cite journal

[Gunter2014-14] Шаблон:Cite journal

[Briol2015FWBQ-15] Шаблон:Cite journal

[Gessner2019-16] Шаблон:Cite journal

[Kanagawa2020convergence-17] Шаблон:Cite journal

[Wynne2021-18] Шаблон:Cite journal

[Karvonen2020Integration-19] Шаблон:Cite journal

[Kanagawa2019adaptive-20] Шаблон:Cite journal

[1]

[2]

[3]

[4]

[5]

[6]

[7]

[8]

[9]

[10]

[11]

[12]

[13]

[14]

[15]

[16]

[17]

[18]

[19]

[20]

Партнерские ресурсы
Криптовалюты	Обмен криптовалют - www.bestchange.ru Криптовалютная биржа CoinEx Криптовалютная биржа Binance HIVE OS - операционная система для майнинга e4pool - Мультивалютный пул для майнинга.
Магазины	AliExpress — глобальная виртуальная (в Интернете) торговая площадка, предоставляющая возможность покупать товары производителей из КНР; computeruniverse.net - Интернет-магазин компьютеров(Промо код 5 Евро на первую покупку:FWWC3ZKQ);
Хостинг	DigitalOcean - американский провайдер облачных инфраструктур, с главным офисом в Нью-Йорке и с центрами обработки данных по всему миру;
Разное	Викиум - Онлайн-тренажер для мозга Like Центр - Центр поддержки и развития предпринимательства. Gamersbay - лучший магазин по бустингу для World of Warcraft. Ноотропы OmniMind N°1 - Усиливает мозговую активность. Повышает мотивацию. Улучшает память. Санкт-Петербургская школа телевидения - это федеральная сеть образовательных центров, которая имеет филиалы в 37 городах России. Lingualeo.com — интерактивный онлайн-сервис для изучения и практики английского языка в увлекательной игровой форме. Junyschool (Джунискул) – международная школа программирования и дизайна для детей и подростков от 5 до 17 лет, где ученики осваивают компьютерную грамотность, развивают алгоритмическое и креативное мышление, изучают основы программирования и компьютерной графики, создают собственные проекты: игры, сайты, программы, приложения, анимации, 3D-модели, монтируют видео. Умназия - Интерактивные онлайн-курсы и тренажеры для развития мышления детей 6-13 лет SkillBox - это один из лидеров российского рынка онлайн-образования. Среди партнеров Skillbox ведущий разработчик сервисного дизайна AIC, медиа-компания Yoola, первое и самое крупное русскоязычное аналитическое агентство Tagline, онлайн-школа дизайна и иллюстрации Bang! Bang! Education, оператор PR-рынка PACO, студия рисования Draw&Go, агентство performance-маркетинга Ingate, scrum-студия Sibirix, имидж-лаборатория Персона. «Нетология» — это университет по подготовке и дополнительному обучению специалистов в области интернет-маркетинга, управления проектами и продуктами, дизайна, Data Science и разработки. В рамках Нетологии студенты получают ценные теоретические знания от лучших экспертов Рунета, выполняют практические задания на отработку полученных навыков, общаются с экспертами и единомышленниками. Познакомиться со всеми продуктами подробнее можно на сайте https://netology.ru, линейка курсов и профессий постоянно обновляется. StudyBay Brazil – это онлайн биржа для португалоговорящих студентов и авторов! Студент получает уникальную работу любого уровня сложности и больше свободного времени, в то время как у автора появляется дополнительный заработок и бесценный опыт. Автор24 — самая большая в России площадка по написанию учебных работ: контрольные и курсовые работы, дипломы, рефераты, решение задач, отчеты по практике, а так же любой другой вид работы. Сервис сотрудничает с более 70 000 авторов. Более 1 000 000 работ уже выполнено. StudyBay – это онлайн биржа для англоязычных студентов и авторов! Студент получает уникальную работу любого уровня сложности и больше свободного времени, в то время как у автора появляется дополнительный заработок и бесценный опыт.