Английская Википедия:Biweight midcorrelation

Материал из Онлайн справочника
Перейти к навигацииПерейти к поиску

In statistics, biweight midcorrelation (also called bicor) is a measure of similarity between samples. It is median-based, rather than mean-based, thus is less sensitive to outliers, and can be a robust alternative to other similarity metrics, such as Pearson correlation or mutual information.[1]

Derivation

Here we find the biweight midcorrelation of two vectors <math>x</math> and <math>y</math>, with <math>i=1,2, \ldots,m</math> items, representing each item in the vector as <math>x_1, x_2, \ldots, x_m</math> and <math>y_1, y_2, \ldots, y_m</math>. First, we define <math>\operatorname{med}(x)</math> as the median of a vector <math>x</math> and <math>\operatorname{mad}(x)</math> as the median absolute deviation (MAD), then define <math>u_i</math> and <math>v_i</math> as,

<math>

\begin{align} u_i &= \frac{x_i - \operatorname{med}(x)}{9 \operatorname{mad}(x)},\\ v_i &= \frac{y_i - \operatorname{med}(y)}{9 \operatorname{mad}(y)}. \end{align} </math>

Now we define the weights <math>w_i^{(x)}</math> and <math>w_i^{(y)}</math> as,

<math>

\begin{align} w_i^{(x)} &= \left(1-u_i^2\right)^2 I\left(1-|u_i|\right)\\ w_i^{(y)} &= \left(1-v_i^2\right)^2 I\left(1-|v_i|\right) \end{align} </math>

where <math>I</math> is the identity function where,

<math>

I(x) = \begin{cases}1, & \text{if } x >0\\ 0, & \text{otherwise}\end{cases} </math>

Then we normalize so that the sum of the weights is 1:

<math>

\begin{align} \tilde{x}_i &= \frac{\left(x_i - \operatorname{med}(x)\right) w_i^{(x)}}{\sqrt{\sum_{j=1}^m \left[(x_j -\operatorname{med}(x)) w_j^{(x)}\right]^2}}\\ \tilde{y}_i &= \frac{\left(y_i - \operatorname{med}(y)\right) w_i^{(y)}}{\sqrt{\sum_{j=1}^m \left[(y_j -\operatorname{med}(y)) w_j^{(y)}\right]^2}}. \end{align} </math>

Finally, we define biweight midcorrelation as,

<math>

\mathrm{bicor}\left(x, y\right) = \sum_{i=1}^m \tilde{x}_i \tilde{y}_i </math>

Applications

Biweight midcorrelation has been shown to be more robust in evaluating similarity in gene expression networks,[2] and is often used for weighted correlation network analysis.

Implementations

Biweight midcorrelation has been implemented in the R statistical programming language as the function bicor as part of the WGCNA package[3]

Also implemented in the Raku programming language as the function bi_cor_coef as part of the Statistics module.[4]

References

Шаблон:Reflist