Английская Википедия:Cramér's V

Материал из Онлайн справочника
Перейти к навигацииПерейти к поиску

Шаблон:Short description In statistics, Cramér's V (sometimes referred to as Cramér's phi and denoted as φc) is a measure of association between two nominal variables, giving a value between 0 and +1 (inclusive). It is based on Pearson's chi-squared statistic and was published by Harald Cramér in 1946.[1]

Usage and interpretation

φc is the intercorrelation of two discrete variables[2] and may be used with variables having two or more levels. φc is a symmetrical measure: it does not matter which variable we place in the columns and which in the rows. Also, the order of rows/columns doesn't matter, so φc may be used with nominal data types or higher (notably, ordered or numerical).

Cramér's V varies from 0 (corresponding to no association between the variables) to 1 (complete association) and can reach 1 only when each variable is completely determined by the other. It may be viewed as the association between two variables as a percentage of their maximum possible variation.

φc2 is the mean square canonical correlation between the variables.Шаблон:Citation needed

In the case of a 2 × 2 contingency table Cramér's V is equal to the absolute value of Phi coefficient.

Calculation

Let a sample of size n of the simultaneously distributed variables <math>A</math> and <math>B</math> for <math>i=1,\ldots,r; j=1,\ldots,k</math> be given by the frequencies

<math>n_{ij}=</math> number of times the values <math>(A_i,B_j)</math> were observed.

The chi-squared statistic then is:

<math>\chi^2=\sum_{i,j}\frac{(n_{ij}-\frac{n_{i.}n_{.j}}{n})^2}{\frac{n_{i.}n_{.j}}{n}}\;,</math>

where <math>n_{i.}=\sum_jn_{ij}</math> is the number of times the value <math>A_i</math> is observed and <math>n_{.j}=\sum_in_{ij}</math> is the number of times the value <math>B_j</math> is observed.

Cramér's V is computed by taking the square root of the chi-squared statistic divided by the sample size and the minimum dimension minus 1:

<math>V = \sqrt{\frac{\varphi^2}{\min(k - 1,r-1)}} = \sqrt{ \frac{\chi^2/n}{\min(k - 1,r-1)}}\;,</math>

where:

  • <math>\varphi</math> is the phi coefficient.
  • <math>\chi^2</math> is derived from Pearson's chi-squared test
  • <math>n</math> is the grand total of observations and
  • <math>k</math> being the number of columns.
  • <math>r</math> being the number of rows.

The p-value for the significance of V is the same one that is calculated using the Pearson's chi-squared test.Шаблон:Citation needed

The formula for the variance of Vc is known.[3]

In R, the function cramerV() from the package rcompanion[4] calculates V using the chisq.test function from the stats package. In contrast to the function cramersV() from the lsr[5] package, cramerV() also offers an option to correct for bias. It applies the correction described in the following section.

Bias correction

Cramér's V can be a heavily biased estimator of its population counterpart and will tend to overestimate the strength of association. A bias correction, using the above notation, is given by[6]

<math>\tilde V = \sqrt{\frac{\tilde\varphi^2}{\min(\tilde k - 1,\tilde r - 1)}} </math> 

where

<math> \tilde\varphi^2 = \max\left(0,\varphi^2 - \frac{(k-1)(r-1)}{n-1}\right) </math> 

and

<math> \tilde k = k - \frac{(k-1)^2}{n-1} </math> 
<math> \tilde r = r - \frac{(r-1)^2}{n-1} </math> 

Then <math>\tilde V</math> estimates the same population quantity as Cramér's V but with typically much smaller mean squared error. The rationale for the correction is that under independence, <math>E[\varphi^2]=\frac{(k-1)(r-1)}{n-1}</math>.[7]

See also

Other measures of correlation for nominal data:

Other related articles:

References

Шаблон:Reflist

External links

Шаблон:Statistics

  1. Cramér, Harald. 1946. Mathematical Methods of Statistics. Princeton: Princeton University Press, page 282 (Chapter 21. The two-dimensional case). Шаблон:ISBN (table of content Шаблон:Webarchive)
  2. Sheskin, David J. (1997). Handbook of Parametric and Nonparametric Statistical Procedures. Boca Raton, Fl: CRC Press.
  3. Liebetrau, Albert M. (1983). Measures of association. Newbury Park, CA: Sage Publications. Quantitative Applications in the Social Sciences Series No. 32. (pages 15–16)
  4. Шаблон:Cite web
  5. Шаблон:Cite web
  6. Шаблон:Cite journal
  7. Шаблон:Cite journal
  8. Шаблон:Cite journal