Английская Википедия:Bayesian multivariate linear regression

Шаблон:Short description Шаблон:Regression bar In statistics, Bayesian multivariate linear regression is a Bayesian approach to multivariate linear regression, i.e. linear regression where the predicted outcome is a vector of correlated random variables rather than a single scalar random variable. A more general treatment of this approach can be found in the article MMSE estimator.

Details

Consider a regression problem where the dependent variable to be predicted is not a single real-valued scalar but an m-length vector of correlated real numbers. As in the standard regression setup, there are n observations, where each observation i consists of k−1 explanatory variables, grouped into a vector <math>\mathbf{x}_i</math> of length k (where a dummy variable with a value of 1 has been added to allow for an intercept coefficient). This can be viewed as a set of m related regression problems for each observation i: <math display="block">\begin{align} y_{i,1} &= \mathbf{x}_i^\mathsf{T}\boldsymbol\beta_{1} + \epsilon_{i,1} \\ &\;\;\vdots \\ y_{i,m} &= \mathbf{x}_i^\mathsf{T}\boldsymbol\beta_{m} + \epsilon_{i,m} \end{align}</math> where the set of errors <math>\{ \epsilon_{i,1}, \ldots, \epsilon_{i,m}\}</math> are all correlated. Equivalently, it can be viewed as a single regression problem where the outcome is a row vector <math>\mathbf{y}_i^\mathsf{T}</math> and the regression coefficient vectors are stacked next to each other, as follows: <math display="block">\mathbf{y}_i^\mathsf{T} = \mathbf{x}_i^\mathsf{T}\mathbf{B} + \boldsymbol\epsilon_{i}^\mathsf{T}.</math>

The coefficient matrix B is a <math>k \times m</math> matrix where the coefficient vectors <math>\boldsymbol\beta_1,\ldots,\boldsymbol\beta_m</math> for each regression problem are stacked horizontally: <math display="block">\mathbf{B} = \begin{bmatrix} \begin{pmatrix} \\ \boldsymbol\beta_1 \\ \\ \end{pmatrix} \cdots \begin{pmatrix} \\ \boldsymbol\beta_m \\ \\ \end{pmatrix} \end{bmatrix} = \begin{bmatrix} \begin{pmatrix} \beta_{1,1} \\ \vdots \\ \beta_{k,1} \end{pmatrix} \cdots \begin{pmatrix} \beta_{1,m} \\ \vdots \\ \beta_{k,m} \end{pmatrix} \end{bmatrix} .</math>

The noise vector <math>\boldsymbol\epsilon_{i}</math> for each observation i is jointly normal, so that the outcomes for a given observation are correlated: <math display="block">\boldsymbol\epsilon_i \sim N(0, \boldsymbol\Sigma_{\epsilon}).</math>

We can write the entire regression problem in matrix form as: <math display="block">\mathbf{Y} = \mathbf{X}\mathbf{B} + \mathbf{E},</math> where Y and E are <math>n \times m</math> matrices. The design matrix X is an <math>n \times k</math> matrix with the observations stacked vertically, as in the standard linear regression setup: <math display="block">

\mathbf{X} = \begin{bmatrix} \mathbf{x}^\mathsf{T}_1 \\ \mathbf{x}^\mathsf{T}_2 \\ \vdots \\ \mathbf{x}^\mathsf{T}_n \end{bmatrix}
= \begin{bmatrix} x_{1,1} & \cdots & x_{1,k} \\
x_{2,1} & \cdots & x_{2,k} \\
\vdots & \ddots & \vdots \\
x_{n,1} & \cdots & x_{n,k}
\end{bmatrix}.

</math>

The classical, frequentists linear least squares solution is to simply estimate the matrix of regression coefficients <math>\hat{\mathbf{B}}</math> using the Moore-Penrose pseudoinverse: <math display="block"> \hat{\mathbf{B}} = (\mathbf{X}^\mathsf{T}\mathbf{X})^{-1}\mathbf{X}^\mathsf{T}\mathbf{Y}.</math>

To obtain the Bayesian solution, we need to specify the conditional likelihood and then find the appropriate conjugate prior. As with the univariate case of linear Bayesian regression, we will find that we can specify a natural conditional conjugate prior (which is scale dependent).

Let us write our conditional likelihood as^[1] <math display="block">\rho(\mathbf{E}|\boldsymbol\Sigma_{\epsilon}) \propto |\boldsymbol\Sigma_{\epsilon}|^{-n/2} \exp\left(-\tfrac{1}{2} \operatorname{tr}\left(\mathbf{E}^\mathsf{T} \mathbf{E} \boldsymbol\Sigma_{\epsilon}^{-1}\right) \right) ,</math> writing the error <math>\mathbf{E}</math> in terms of <math>\mathbf{Y},\mathbf{X},</math> and <math>\mathbf{B}</math> yields <math display="block">\rho(\mathbf{Y}|\mathbf{X},\mathbf{B},\boldsymbol\Sigma_{\epsilon}) \propto |\boldsymbol\Sigma_{\epsilon}|^{-n/2} \exp(-\tfrac{1}{2} \operatorname{tr}((\mathbf{Y}-\mathbf{X} \mathbf{B})^\mathsf{T} (\mathbf{Y}-\mathbf{X} \mathbf{B}) \boldsymbol\Sigma_{\epsilon}^{-1} ) ) ,</math>

We seek a natural conjugate prior—a joint density <math>\rho(\mathbf{B},\Sigma_{\epsilon})</math> which is of the same functional form as the likelihood. Since the likelihood is quadratic in <math>\mathbf{B}</math>, we re-write the likelihood so it is normal in <math>(\mathbf{B}-\hat{\mathbf{B}})</math> (the deviation from classical sample estimate).

Using the same technique as with Bayesian linear regression, we decompose the exponential term using a matrix-form of the sum-of-squares technique. Here, however, we will also need to use the Matrix Differential Calculus (Kronecker product and vectorization transformations).

First, let us apply sum-of-squares to obtain new expression for the likelihood: <math display="block">\rho(\mathbf{Y}|\mathbf{X},\mathbf{B},\boldsymbol\Sigma_{\epsilon}) \propto |\boldsymbol\Sigma_{\epsilon}|^{-(n-k)/2} \exp(-\operatorname{tr}(\tfrac{1}{2}\mathbf{S}^\mathsf{T} \mathbf{S} \boldsymbol\Sigma_{\epsilon}^{-1})) |\boldsymbol\Sigma_{\epsilon}|^{-k/2} \exp(-\tfrac{1}{2} \operatorname{tr}((\mathbf{B}-\hat{\mathbf{B}})^\mathsf{T} \mathbf{X}^\mathsf{T} \mathbf{X}(\mathbf{B}-\hat{\mathbf{B}}) \boldsymbol\Sigma_{\epsilon}^{-1} ) ) ,</math> <math display="block">\mathbf{S} = \mathbf{Y} - \mathbf{X}\hat{\mathbf{B}}</math>

We would like to develop a conditional form for the priors: <math display="block">\rho(\mathbf{B},\boldsymbol\Sigma_{\epsilon}) = \rho(\boldsymbol\Sigma_{\epsilon})\rho(\mathbf{B}|\boldsymbol\Sigma_{\epsilon}),</math> where <math>\rho(\boldsymbol\Sigma_{\epsilon})</math> is an inverse-Wishart distribution and <math>\rho(\mathbf{B}|\boldsymbol\Sigma_{\epsilon})</math> is some form of normal distribution in the matrix <math>\mathbf{B}</math>. This is accomplished using the vectorization transformation, which converts the likelihood from a function of the matrices <math>\mathbf{B}, \hat{\mathbf{B}}</math> to a function of the vectors <math>\boldsymbol\beta = \operatorname{vec}(\mathbf{B}), \hat{\boldsymbol\beta} = \operatorname{vec}(\hat{\mathbf{B}})</math>.

Write <math display="block">\operatorname{tr}((\mathbf{B} - \hat{\mathbf{B}})^\mathsf{T}\mathbf{X}^\mathsf{T} \mathbf{X}(\mathbf{B} - \hat{\mathbf{B}}) \boldsymbol\Sigma_\epsilon^{-1}) = \operatorname{vec}(\mathbf{B} - \hat{\mathbf{B}})^\mathsf{T} \operatorname{vec}(\mathbf{X}^\mathsf{T} \mathbf{X}(\mathbf{B} - \hat{\mathbf{B}}) \boldsymbol\Sigma_{\epsilon}^{-1} )</math>

Let <math display="block"> \operatorname{vec}(\mathbf{X}^\mathsf{T} \mathbf{X}(\mathbf{B} - \hat{\mathbf{B}}) \boldsymbol\Sigma_{\epsilon}^{-1} ) = (\boldsymbol\Sigma_{\epsilon}^{-1} \otimes \mathbf{X}^\mathsf{T}\mathbf{X} )\operatorname{vec}(\mathbf{B} - \hat{\mathbf{B}}), </math> where <math>\mathbf{A} \otimes \mathbf{B}</math> denotes the Kronecker product of matrices A and B, a generalization of the outer product which multiplies an <math>m \times n</math> matrix by a <math>p \times q</math> matrix to generate an <math>mp \times nq</math> matrix, consisting of every combination of products of elements from the two matrices.

Then <math display="block">\begin{align} &\operatorname{vec}(\mathbf{B} - \hat{\mathbf{B}})^\mathsf{T} (\boldsymbol\Sigma_{\epsilon}^{-1} \otimes \mathbf{X}^\mathsf{T}\mathbf{X} )\operatorname{vec}(\mathbf{B} - \hat{\mathbf{B}}) \\ &= (\boldsymbol\beta - \hat{\boldsymbol\beta})^\mathsf{T}(\boldsymbol\Sigma_{\epsilon}^{-1} \otimes \mathbf{X}^\mathsf{T}\mathbf{X} )(\boldsymbol\beta-\hat{\boldsymbol\beta}) \end{align}</math> which will lead to a likelihood which is normal in <math>(\boldsymbol\beta - \hat{\boldsymbol\beta})</math>.

With the likelihood in a more tractable form, we can now find a natural (conditional) conjugate prior.

Conjugate prior distribution

The natural conjugate prior using the vectorized variable <math>\boldsymbol\beta</math> is of the form:^[1] <math display="block">\rho(\boldsymbol\beta, \boldsymbol\Sigma_{\epsilon}) = \rho(\boldsymbol\Sigma_{\epsilon})\rho(\boldsymbol\beta|\boldsymbol\Sigma_{\epsilon}),</math> where <math display="block"> \rho(\boldsymbol\Sigma_{\epsilon}) \sim \mathcal{W}^{-1}(\mathbf V_0,\boldsymbol\nu_0)</math> and <math display="block"> \rho(\boldsymbol\beta|\boldsymbol\Sigma_{\epsilon}) \sim N(\boldsymbol\beta_0, \boldsymbol\Sigma_{\epsilon} \otimes \boldsymbol\Lambda_0^{-1}).</math>

Posterior distribution

Using the above prior and likelihood, the posterior distribution can be expressed as:^[1] <math display="block">\begin{align} \rho(\boldsymbol\beta,\boldsymbol\Sigma_{\epsilon}|\mathbf{Y},\mathbf{X}) \propto{}& |\boldsymbol\Sigma_{\epsilon}|^{-(\boldsymbol\nu_0 + m + 1)/2}\exp{(-\tfrac{1}{2}\operatorname{tr}(\mathbf V_0 \boldsymbol\Sigma_{\epsilon}^{-1}))} \\ &\times|\boldsymbol\Sigma_{\epsilon}|^{-k/2}\exp{(-\tfrac{1}{2} \operatorname{tr}((\mathbf{B}-\mathbf B_0)^\mathsf{T}\boldsymbol\Lambda_0(\mathbf{B}-\mathbf B_0)\boldsymbol\Sigma_{\epsilon}^{-1}))} \\ &\times|\boldsymbol\Sigma_{\epsilon}|^{-n/2}\exp{(-\tfrac{1}{2}\operatorname{tr}((\mathbf{Y}-\mathbf{XB})^\mathsf{T}(\mathbf{Y}-\mathbf{XB})\boldsymbol\Sigma_{\epsilon}^{-1}))}, \end{align}</math> where <math>\operatorname{vec}(\mathbf B_0) = \boldsymbol\beta_0</math>. The terms involving <math>\mathbf{B}</math> can be grouped (with <math>\boldsymbol\Lambda_0 = \mathbf{U}^\mathsf{T}\mathbf{U}</math>) using: <math display="block">\begin{align} & \left(\mathbf{B} - \mathbf B_0\right)^\mathsf{T} \boldsymbol\Lambda_0 \left(\mathbf{B} - \mathbf B_0\right) + \left(\mathbf{Y} - \mathbf{XB}\right)^\mathsf{T} \left(\mathbf{Y} - \mathbf{XB}\right) \\ ={}& \left(\begin{bmatrix}\mathbf Y \\ \mathbf U \mathbf B_0\end{bmatrix} - \begin{bmatrix}\mathbf{X}\\ \mathbf{U}\end{bmatrix}\mathbf{B}\right)^\mathsf{T} \left(\begin{bmatrix}\mathbf{Y}\\ \mathbf U \mathbf B_0\end{bmatrix}-\begin{bmatrix}\mathbf{X}\\ \mathbf{U}\end{bmatrix}\mathbf{B}\right) \\ ={}& \left(\begin{bmatrix}\mathbf Y \\ \mathbf U \mathbf B_0\end{bmatrix} - \begin{bmatrix}\mathbf{X}\\ \mathbf{U}\end{bmatrix}\mathbf B_n\right)^\mathsf{T}\left(\begin{bmatrix}\mathbf{Y}\\ \mathbf U \mathbf B_0\end{bmatrix}-\begin{bmatrix}\mathbf{X}\\ \mathbf{U}\end{bmatrix}\mathbf B_n\right) + \left(\mathbf B - \mathbf B_n\right)^\mathsf{T} \left(\mathbf{X}^\mathsf{T} \mathbf{X} + \boldsymbol\Lambda_0\right) \left(\mathbf{B}-\mathbf B_n\right) \\ ={}& \left(\mathbf{Y} - \mathbf X \mathbf B_n \right)^\mathsf{T} \left(\mathbf{Y} - \mathbf X \mathbf B_n\right) + \left(\mathbf B_0 - \mathbf B_n\right)^\mathsf{T} \boldsymbol\Lambda_0 \left(\mathbf B_0 - \mathbf B_n\right) + \left(\mathbf{B} - \mathbf B_n\right)^\mathsf{T} \left(\mathbf{X}^\mathsf{T} \mathbf{X} + \boldsymbol\Lambda_0\right)\left(\mathbf B - \mathbf B_n\right), \end{align}</math> with <math display="block">\mathbf B_n = \left(\mathbf{X}^\mathsf{T}\mathbf{X} + \boldsymbol\Lambda_0\right)^{-1}\left(\mathbf{X}^\mathsf{T} \mathbf{X} \hat{\mathbf{B}} + \boldsymbol\Lambda_0\mathbf B_0\right) = \left(\mathbf{X}^\mathsf{T} \mathbf{X} + \boldsymbol\Lambda_0\right)^{-1}\left(\mathbf{X}^\mathsf{T} \mathbf{Y} + \boldsymbol\Lambda_0 \mathbf B_0\right).</math>

This now allows us to write the posterior in a more useful form: <math display="block">\begin{align} \rho(\boldsymbol\beta,\boldsymbol\Sigma_{\epsilon}|\mathbf{Y},\mathbf{X}) \propto{}&|\boldsymbol\Sigma_{\epsilon}|^{-(\boldsymbol\nu_0 + m + n + 1)/2}\exp{(-\tfrac{1}{2}\operatorname{tr}((\mathbf V_0 + (\mathbf{Y}-\mathbf{XB_n})^\mathsf{T} (\mathbf{Y}-\mathbf{XB_n}) + (\mathbf B_n-\mathbf B_0)^\mathsf{T}\boldsymbol\Lambda_0(\mathbf B_n-\mathbf B_0))\boldsymbol\Sigma_{\epsilon}^{-1}))} \\ &\times|\boldsymbol\Sigma_{\epsilon}|^{-k/2}\exp{(-\tfrac{1}{2}\operatorname{tr}((\mathbf{B}-\mathbf B_n)^\mathsf{T} (\mathbf{X}^T\mathbf{X} + \boldsymbol\Lambda_0) (\mathbf{B}-\mathbf B_n)\boldsymbol\Sigma_{\epsilon}^{-1}))}. \end{align}</math>

This takes the form of an inverse-Wishart distribution times a Matrix normal distribution: <math display="block">\rho(\boldsymbol\Sigma_{\epsilon}|\mathbf{Y},\mathbf{X}) \sim \mathcal{W}^{-1}(\mathbf V_n,\boldsymbol\nu_n)</math> and <math display="block"> \rho(\mathbf{B}|\mathbf{Y},\mathbf{X},\boldsymbol\Sigma_{\epsilon}) \sim \mathcal{MN}_{k,m}(\mathbf B_n, \boldsymbol\Lambda_n^{-1}, \boldsymbol\Sigma_{\epsilon}).</math>

The parameters of this posterior are given by: <math display="block">\mathbf V_n = \mathbf V_0 + (\mathbf{Y}-\mathbf{XB_n})^\mathsf{T}(\mathbf{Y}-\mathbf{XB_n}) + (\mathbf B_n - \mathbf B_0)^\mathsf{T}\boldsymbol\Lambda_0(\mathbf B_n-\mathbf B_0)</math> <math display="block">\boldsymbol\nu_n = \boldsymbol\nu_0 + n</math> <math display="block">\mathbf B_n = (\mathbf{X}^\mathsf{T}\mathbf{X} + \boldsymbol\Lambda_0)^{-1}(\mathbf{X}^\mathsf{T} \mathbf{Y} + \boldsymbol\Lambda_0\mathbf B_0)</math> <math display="block">\boldsymbol\Lambda_n = \mathbf{X}^\mathsf{T} \mathbf{X} + \boldsymbol\Lambda_0</math>

References

Шаблон:Reflist Шаблон:More footnotes

↑ ^1,0 ^1,1 ^1,2 Peter E. Rossi, Greg M. Allenby, Rob McCulloch. Bayesian Statistics and Marketing. John Wiley & Sons, 2012, p. 32.

[BSaM-1] 1,0 ^1,1 ^1,2 Peter E. Rossi, Greg M. Allenby, Rob McCulloch. Bayesian Statistics and Marketing. John Wiley & Sons, 2012, p. 32.

[1]

Партнерские ресурсы
Криптовалюты	Обмен криптовалют - www.bestchange.ru Криптовалютная биржа CoinEx Криптовалютная биржа Binance HIVE OS - операционная система для майнинга e4pool - Мультивалютный пул для майнинга.
Магазины	AliExpress — глобальная виртуальная (в Интернете) торговая площадка, предоставляющая возможность покупать товары производителей из КНР; computeruniverse.net - Интернет-магазин компьютеров(Промо код 5 Евро на первую покупку:FWWC3ZKQ);
Хостинг	DigitalOcean - американский провайдер облачных инфраструктур, с главным офисом в Нью-Йорке и с центрами обработки данных по всему миру;
Разное	Викиум - Онлайн-тренажер для мозга Like Центр - Центр поддержки и развития предпринимательства. Gamersbay - лучший магазин по бустингу для World of Warcraft. Ноотропы OmniMind N°1 - Усиливает мозговую активность. Повышает мотивацию. Улучшает память. Санкт-Петербургская школа телевидения - это федеральная сеть образовательных центров, которая имеет филиалы в 37 городах России. Lingualeo.com — интерактивный онлайн-сервис для изучения и практики английского языка в увлекательной игровой форме. Junyschool (Джунискул) – международная школа программирования и дизайна для детей и подростков от 5 до 17 лет, где ученики осваивают компьютерную грамотность, развивают алгоритмическое и креативное мышление, изучают основы программирования и компьютерной графики, создают собственные проекты: игры, сайты, программы, приложения, анимации, 3D-модели, монтируют видео. Умназия - Интерактивные онлайн-курсы и тренажеры для развития мышления детей 6-13 лет SkillBox - это один из лидеров российского рынка онлайн-образования. Среди партнеров Skillbox ведущий разработчик сервисного дизайна AIC, медиа-компания Yoola, первое и самое крупное русскоязычное аналитическое агентство Tagline, онлайн-школа дизайна и иллюстрации Bang! Bang! Education, оператор PR-рынка PACO, студия рисования Draw&Go, агентство performance-маркетинга Ingate, scrum-студия Sibirix, имидж-лаборатория Персона. «Нетология» — это университет по подготовке и дополнительному обучению специалистов в области интернет-маркетинга, управления проектами и продуктами, дизайна, Data Science и разработки. В рамках Нетологии студенты получают ценные теоретические знания от лучших экспертов Рунета, выполняют практические задания на отработку полученных навыков, общаются с экспертами и единомышленниками. Познакомиться со всеми продуктами подробнее можно на сайте https://netology.ru, линейка курсов и профессий постоянно обновляется. StudyBay Brazil – это онлайн биржа для португалоговорящих студентов и авторов! Студент получает уникальную работу любого уровня сложности и больше свободного времени, в то время как у автора появляется дополнительный заработок и бесценный опыт. Автор24 — самая большая в России площадка по написанию учебных работ: контрольные и курсовые работы, дипломы, рефераты, решение задач, отчеты по практике, а так же любой другой вид работы. Сервис сотрудничает с более 70 000 авторов. Более 1 000 000 работ уже выполнено. StudyBay – это онлайн биржа для англоязычных студентов и авторов! Студент получает уникальную работу любого уровня сложности и больше свободного времени, в то время как у автора появляется дополнительный заработок и бесценный опыт.

Английская Википедия:Bayesian multivariate linear regression

Содержание

Details

Conjugate prior distribution

Posterior distribution

See also

References

Навигация

Действия на странице

Действия на странице

Персональные инструменты

Навигация

Поиск

Инструменты