Английская Википедия:Heteroskedasticity-consistent standard errors

Шаблон:Short description The topic of heteroskedasticity-consistent (HC) standard errors arises in statistics and econometrics in the context of linear regression and time series analysis. These are also known as heteroskedasticity-robust standard errors (or simply robust standard errors), Eicker–Huber–White standard errors (also Huber–White standard errors or White standard errors),^[1] to recognize the contributions of Friedhelm Eicker,^[2] Peter J. Huber,^[3] and Halbert White.^[4]

In regression and time-series modelling, basic forms of models make use of the assumption that the errors or disturbances u_i have the same variance across all observation points. When this is not the case, the errors are said to be heteroskedastic, or to have heteroskedasticity, and this behaviour will be reflected in the residuals <math dispaly="inline">\widehat{u}_i </math> estimated from a fitted model. Heteroskedasticity-consistent standard errors are used to allow the fitting of a model that does contain heteroskedastic residuals. The first such approach was proposed by Huber (1967), and further improved procedures have been produced since for cross-sectional data, time-series data and GARCH estimation.

Heteroskedasticity-consistent standard errors that differ from classical standard errors may indicate model misspecification. Substituting heteroskedasticity-consistent standard errors does not resolve this misspecification, which may lead to bias in the coefficients. In most situations, the problem should be found and fixed.^[5] Other types of standard error adjustments, such as clustered standard errors or HAC standard errors, may be considered as extensions to HC standard errors.

History

Heteroskedasticity-consistent standard errors are introduced by Friedhelm Eicker,^[6]^[7] and popularized in econometrics by Halbert White.

Problem

Consider the linear regression model for the scalar <math>y</math>.

<math>

y = \mathbf{x}^{\top} \boldsymbol{\beta} + \varepsilon, \, </math>

where <math>\mathbf{x}</math> is a k x 1 column vector of explanatory variables (features), <math>\boldsymbol{\beta}</math> is a k × 1 column vector of parameters to be estimated, and <math>\varepsilon</math> is the residual error.

The ordinary least squares (OLS) estimator is

<math>

\widehat \boldsymbol{\beta}_\mathrm{OLS} = (\mathbf{X}^{\top} \mathbf{X})^{-1} \mathbf{X}^\top \mathbf{y}. \, </math>

where <math>\mathbf{y}</math> is a vector of observations <math>y_i</math>, and <math>\mathbf{X}</math> denotes the matrix of stacked <math>\mathbf{x}_i</math> values observed in the data.

If the sample errors have equal variance <math>\sigma^2</math> and are uncorrelated, then the least-squares estimate of <math>\boldsymbol{\beta}</math> is BLUE (best linear unbiased estimator), and its variance is estimated with

<math>\hat{\mathbb{V}}\left[\widehat\boldsymbol\beta_\mathrm{OLS}\right] = s^2 (\mathbf{X}^{\top}\mathbf{X})^{-1}, \quad s^2 = \frac{\sum_i \widehat \varepsilon_i^2}{n-k} </math>

where <math>\widehat \varepsilon_i = y_i - \mathbf{x}_i^{\top} \widehat \boldsymbol{\beta}_\mathrm{OLS}</math> are the regression residuals.

When the error terms do not have constant variance (i.e., the assumption of <math> \mathbb{E}[\mathbf{u}\mathbf{u}^{\top}] = \sigma^2 \mathbf{I}_n</math> is untrue), the OLS estimator loses its desirable properties. The formula for variance now cannot be simplified:

<math> \mathbb{V}\left[\widehat\boldsymbol\beta_\mathrm{OLS}\right] = \mathbb{V}\big[ (\mathbf{X}^{\top}\mathbf{X})^{-1} \mathbf{X}^{\top}\mathbf{y} \big] = (\mathbf{X}^{\top}\mathbf{X})^{-1} \mathbf{X}^{\top} \mathbf{\Sigma} \mathbf{X} (\mathbf{X}^{\top}\mathbf{X})^{-1}</math>

where <math> \mathbf{\Sigma} = \mathbb{V}[\mathbf{u}].</math>

While the OLS point estimator remains unbiased, it is not "best" in the sense of having minimum mean square error, and the OLS variance estimator <math>\hat{\mathbb{V}} \left[ \widehat \boldsymbol{\beta}_\mathrm{OLS} \right]</math> does not provide a consistent estimate of the variance of the OLS estimates.

For any non-linear model (for instance logit and probit models), however, heteroskedasticity has more severe consequences: the maximum likelihood estimates of the parameters will be biased (in an unknown direction), as well as inconsistent (unless the likelihood function is modified to correctly take into account the precise form of heteroskedasticity).^[8]^[9] As pointed out by Greene, “simply computing a robust covariance matrix for an otherwise inconsistent estimator does not give it redemption.”^[10]

Solution

If the regression errors <math>\varepsilon_i</math> are independent, but have distinct variances <math>\sigma^2_i</math>, then <math>\mathbf{\Sigma} = \operatorname{diag}(\sigma_1^2, \ldots, \sigma_n^2)</math> which can be estimated with <math>\widehat\sigma_i^2 = \widehat \varepsilon_i^2</math>. This provides White's (1980) estimator, often referred to as HCE (heteroskedasticity-consistent estimator):

<math>

\begin{align} \hat{\mathbb{V}}_\text{HCE} \big[ \widehat \boldsymbol{\beta}_\text{OLS} \big] &= \frac{1}{n} \bigg(\frac{1}{n} \sum_i \mathbf{x}_i \mathbf{x}_i^{\top} \bigg)^{-1} \bigg(\frac{1}{n} \sum_i \mathbf{x}_i \mathbf{x}_i^\top \widehat{\varepsilon}_i^2 \bigg) \bigg(\frac{1}{n} \sum_i \mathbf{x}_i \mathbf{x}_i^{\top} \bigg)^{-1} \\ &= ( \mathbf{X}^{\top} \mathbf{X} )^{-1} ( \mathbf{X}^{\top} \operatorname{diag}(\widehat \varepsilon_1^2, \ldots, \widehat \varepsilon_n^2) \mathbf{X} ) ( \mathbf{X}^{\top} \mathbf{X})^{-1}, \end{align} </math>

where as above <math>\mathbf{X}</math> denotes the matrix of stacked <math>\mathbf{x}_i^{\top}</math> values from the data. The estimator can be derived in terms of the generalized method of moments (GMM).

Also often discussed in the literature (including White's paper) is the covariance matrix <math>\widehat\mathbf{\Omega}_n</math> of the <math>\sqrt{n}</math>-consistent limiting distribution:

<math>

\sqrt{n}(\widehat \boldsymbol{\beta}_n - \boldsymbol{\beta}) \, \xrightarrow{d} \, \mathcal{N}(\mathbf{0}, \mathbf{\Omega}), </math>

where

<math>

\mathbf{\Omega} = \mathbb{E}[\mathbf{X} \mathbf{X}^{\top}]^{-1} \mathbb{V}[\mathbf{X} \boldsymbol{\varepsilon}]\operatorname \mathbb{E}[\mathbf{X} \mathbf{X}^{\top}]^{-1}, </math>

and

<math>

\begin{align} \widehat\mathbf{\Omega}_n &= \bigg(\frac{1}{n} \sum_i \mathbf{x}_i \mathbf{x}_i^{\top} \bigg)^{-1} \bigg(\frac{1}{n} \sum_i \mathbf{x}_i \mathbf{x}_i^{\top} \widehat \varepsilon_i^2 \bigg) \bigg(\frac{1}{n} \sum_i \mathbf{x}_i \mathbf{x}_i^{\top} \bigg)^{-1} \\ &= n ( \mathbf{X}^{\top} \mathbf{X} )^{-1} ( \mathbf{X}^{\top} \operatorname{diag}(\widehat \varepsilon_1^2, \ldots, \widehat \varepsilon_n^2) \mathbf{X} ) ( \mathbf{X}^{\top} \mathbf{X})^{-1} \end{align} </math>

Thus,

<math>

\widehat \mathbf{\Omega}_n = n \cdot \hat{\mathbb{V}}_\text{HCE}[\widehat \boldsymbol{\beta}_\text{OLS}] </math>

and

<math>

\widehat \mathbb{V}[\mathbf{X} \boldsymbol{\varepsilon}] = \frac{1}{n} \sum_i \mathbf{x}_i \mathbf{x}_i^{\top} \widehat \varepsilon_i^2 = \frac{1}{n} \mathbf{X}^{\top} \operatorname{diag}(\widehat \varepsilon_1^2, \ldots, \widehat \varepsilon_n^2) \mathbf{X}. </math>

Precisely which covariance matrix is of concern is a matter of context.

Alternative estimators have been proposed in MacKinnon & White (1985) that correct for unequal variances of regression residuals due to different leverage.^[11] Unlike the asymptotic White's estimator, their estimators are unbiased when the data are homoscedastic.

Of the four widely available different options, often denoted as HC0-HC3, the HC3 specification appears to work best, with tests relying on the HC3 estimator featuring better power and closer proximity to the targeted size, especially in small samples. The larger the sample, the smaller the difference between the different estimators.^[12]

An alternative to explicitly modelling the heteroskedasticity is using a resampling method such as the wild bootstrap. Given that the studentized bootstrap, which standardizes the resampled statistic by its standard error, yields an asymptotic refinement,^[13] heteroskedasticity-robust standard errors remain nevertheless useful.

Instead of accounting for the heteroskedastic errors, most linear models can be transformed to feature homoskedastic error terms (unless the error term is heteroskedastic by construction, e.g. in a linear probability model). One way to do this is using weighted least squares, which also features improved efficiency properties.

Software

EViews: EViews version 8 offers three different methods for robust least squares: M-estimation (Huber, 1973), S-estimation (Rousseeuw and Yohai, 1984), and MM-estimation (Yohai 1987).^[14]
Julia: the CovarianceMatrices package offers several methods for heteroskedastic robust variance covariance matrices.^[15]
MATLAB: See the hac function in the Econometrics toolbox.^[16]
Python: The Statsmodel package offers various robust standard error estimates, see statsmodels.regression.linear_model.RegressionResults for further descriptions
R: the vcovHC() command from the Шаблон:Mono package.^[17]^[18]
RATS: Шаблон:Mono option is available in many of the regression and optimization commands (Шаблон:Mono, Шаблон:Mono, etc.).
Stata: robust option applicable in many pseudo-likelihood based procedures.^[19]
Gretl: the option --robust to several estimation commands (such as ols) in the context of a cross-sectional dataset produces robust standard errors.^[20]

References

Шаблон:Reflist

Партнерские ресурсы
Криптовалюты	Обмен криптовалют - www.bestchange.ru Криптовалютная биржа CoinEx Криптовалютная биржа Binance HIVE OS - операционная система для майнинга e4pool - Мультивалютный пул для майнинга.
Магазины	AliExpress — глобальная виртуальная (в Интернете) торговая площадка, предоставляющая возможность покупать товары производителей из КНР; computeruniverse.net - Интернет-магазин компьютеров(Промо код 5 Евро на первую покупку:FWWC3ZKQ);
Хостинг	DigitalOcean - американский провайдер облачных инфраструктур, с главным офисом в Нью-Йорке и с центрами обработки данных по всему миру;
Разное	Викиум - Онлайн-тренажер для мозга Like Центр - Центр поддержки и развития предпринимательства. Gamersbay - лучший магазин по бустингу для World of Warcraft. Ноотропы OmniMind N°1 - Усиливает мозговую активность. Повышает мотивацию. Улучшает память. Санкт-Петербургская школа телевидения - это федеральная сеть образовательных центров, которая имеет филиалы в 37 городах России. Lingualeo.com — интерактивный онлайн-сервис для изучения и практики английского языка в увлекательной игровой форме. Junyschool (Джунискул) – международная школа программирования и дизайна для детей и подростков от 5 до 17 лет, где ученики осваивают компьютерную грамотность, развивают алгоритмическое и креативное мышление, изучают основы программирования и компьютерной графики, создают собственные проекты: игры, сайты, программы, приложения, анимации, 3D-модели, монтируют видео. Умназия - Интерактивные онлайн-курсы и тренажеры для развития мышления детей 6-13 лет SkillBox - это один из лидеров российского рынка онлайн-образования. Среди партнеров Skillbox ведущий разработчик сервисного дизайна AIC, медиа-компания Yoola, первое и самое крупное русскоязычное аналитическое агентство Tagline, онлайн-школа дизайна и иллюстрации Bang! Bang! Education, оператор PR-рынка PACO, студия рисования Draw&Go, агентство performance-маркетинга Ingate, scrum-студия Sibirix, имидж-лаборатория Персона. «Нетология» — это университет по подготовке и дополнительному обучению специалистов в области интернет-маркетинга, управления проектами и продуктами, дизайна, Data Science и разработки. В рамках Нетологии студенты получают ценные теоретические знания от лучших экспертов Рунета, выполняют практические задания на отработку полученных навыков, общаются с экспертами и единомышленниками. Познакомиться со всеми продуктами подробнее можно на сайте https://netology.ru, линейка курсов и профессий постоянно обновляется. StudyBay Brazil – это онлайн биржа для португалоговорящих студентов и авторов! Студент получает уникальную работу любого уровня сложности и больше свободного времени, в то время как у автора появляется дополнительный заработок и бесценный опыт. Автор24 — самая большая в России площадка по написанию учебных работ: контрольные и курсовые работы, дипломы, рефераты, решение задач, отчеты по практике, а так же любой другой вид работы. Сервис сотрудничает с более 70 000 авторов. Более 1 000 000 работ уже выполнено. StudyBay – это онлайн биржа для англоязычных студентов и авторов! Студент получает уникальную работу любого уровня сложности и больше свободного времени, в то время как у автора появляется дополнительный заработок и бесценный опыт.

Английская Википедия:Heteroskedasticity-consistent standard errors

Содержание

History

Problem

Solution

See also

Software

References

Further reading

Навигация

Действия на странице

Действия на странице

Персональные инструменты

Навигация

Поиск

Инструменты