Английская Википедия:Barnardisation

Материал из Онлайн справочника
Перейти к навигацииПерейти к поиску

Barnardisation is a method of statistical disclosure control for tables of counts. It involves adding +1, 0 or -1 to some or all of the internal non-zero cells in a table in a pseudo-random fashion. The probability of adjustment for each internal cell is calculated as p/2 (add 1), 1-p (leave as is), p/2 (subtract 1). The table totals are then calculated as the sum of the post-adjustment internal counts.[1][2]

Etymology

The technique of Barnardisation appears to have been named after Professor George Alfred Barnard (1915–2002), a Professor of Mathematics at the University of Essex. Barnard, at that time President of the Royal Statistical Society, was one of three Fellows appointed by the Council of the Royal Statistical Society to help provide a government-commissioned review of data security for the 1971 UK Census.[3] The resulting report questioned whether rounding small numbers to the nearest five was the best approach to preserving respondent confidentiality.[3]Шаблон:Rp The formal government response to the report noted that an additional safeguard of small random adjustments had been introduced for 1971 Census, the suggestion for which they explicitly attributed to Professor Barnard,[3]Шаблон:Rp as did a New Scientist article dated July 1973.[4] Muddying the waters slightly, a 1973 paper in the Journal of the Royal Statistical Society discussing this new safeguard reported that "after much discussion, a variant of a procedure suggested in Canada was adopted."[5]Шаблон:Rp. Presumably Professor Barnard was involved in these discussions, and was the inventor of the variant. In any case, no evidence can be found of any such safeguard being applied in Canada, with Statistics Canada seeming to stick instead to the use of random rounding of all counts to the nearest 0 or 5.[6]Шаблон:Rp

Despite originating from Prof Barnard, in documentation surrounding the 1971 Census the method of adjustment now known as Barnardisation was simply described as a 'procedure';[5] an 'adjustment of values';[7] a 'special procedure';[1] a 'process of random error injection';[8] or a 'modification' or 'adjustment'.[9][10]

The earliest use of the term 'Barnardisation' found in print so far dates to an Office for Population Censuses and Surveys working paper written by Hakim in 1979, where the term is mentioned without citation, and without ascribing it to Prof G A Barnard.[11] But, at the time, Hakim's coinage of this term appears to have been either widely overlooked or widely ignored, at least in print, as demonstrated by the wide range of later publications already cited above.

The term 'Barnardisation' does not appear to have reemerged in print until the 1995 publication of Stan Openshaw's Census Users' Handbook,[12] where it is used by two separate chapter authors and by the index compiler. However, by at least the late 1980s the term was already in widespread conversational usage during UK academic conferences and meetings.[13] More recently the term 'Barnardisation' has also become firmly ensconced in the lexicon of official reports produced by official UK statistical agencies and others.[2][14]

Operational details

As originally conceived and implemented in the 1971 UK Census, Barnardisation had the added characteristic of pairing tables from separate areas, and applying equal and opposite adjustments to the two areas. For example, if a given table cell in Area A had its value increased by 1, then in paired Area B the equivalent table cell would have its value reduced by 1 (subject to not making the value negative). The purpose of this pairing was to cancel out, as much as possible, the amount of noise introduced via the Barnardisation process at a more aggregate level.[1]

For the 1991 UK Census the pairing of areas prior to the application of Barnardisation was dropped; and for the more detailed Local Base Statistics, its scope was extended to include adjustments of -2, -1, 0, +1 or +2, achieved by applying the +1, 0 or +1 adjustment twice.[10]

In the United Kingdom, barnardisation became increasingly employed by public agencies in order to enable them to provide information for statistical purposes without infringing the information privacy rights of the individuals to whom the information relates (e.g. [2][15]). In some cases this has involved further modifications to the Barndardisation procedure. For example, as implemented by the Common Service Agency, adjustments of -1, 0 or +1 were only applied to counts of 1 to 4, whilst counts of 0, instead of being left unchanged, were adjusted by the addition of 0 or +1.[15]Шаблон:Rp

Pros and Cons

A review of Statistical Disclosure Control methods in the run up to the 2011 UK Census [14] identified the following list of pros/cons of Barnardisation from the point-of view of the data provider:

Advantages

  • Easy to understand
  • Easy to implement
  • Table totals are consistent with internal cell values
  • The adjustment is unbiased

Disadvantages

  • Leads to inconsistent values for the same cell counts and table totals if they are present in two or more separately barnardised tables
  • The adjustment can be unpicked via differencing if other tables are available that share the same counts or totals, or that provide an unadjusted total for a larger spatial area within which the barnardised tables nest
  • The probability of adjustment used is typically small, meaning that many cell values are left unadjusted

From a user point-of-view, another advantage of Barnardisation is that it has been shown to have a smaller impact on typical user analyses than the following Statistical Disclose Control measures: random rounding to base 5; as used by Statistics Canada; random rounding to base 3, as used by Statistics New Zealand; and Small Cell Adjustment, as used at various points in time by the Office for National Statistics and the Australian Bureau of Statistics.[16]

Efficacy reappraised

Since the late 1990s concerns over the efficacy of Barnardisation in protecting confidentiality have increased to the point where it is now no longer recommended as a 'go to' tool, but rather as a technique only to be used in special circumstances. This change in attitudes appears to centre around the relatively high probability that Barnardisation will leave a small count (in particular a 1) unadjusted [2][15] and, secondarily, to the dangers of reverse engineering the original value if sufficient overlapping barnardised tables are released.[14] For these and other reasons UK Censuses from 2001 onwards have abandoned the use of Barnardisation. See Spicer for a good review of the 2001, 2011 and 2021 alternatives to Barnardisation that have been adopted, and the rationale for this,[17].

The question of whether barnardisation may fall short of the complete anonymisation of data, and the status of barnardised data under the complex provisions of the Data Protection Act 1998, were considered by the Scottish Information Commissioner. Some aspects of an initial decision by the Commissioner were overturned on appeal to the House of Lords, and the Commissioner was invited to revisit his original decision. The Commissioner's final decision ruled that barnardisation provided insufficient disclosure protection for rare events (in this case, Childhood Leukaemia), reversing in part his original decision: "the barnardised data, by themselves, can lead to identification, and [...] the effect of barnardisation on the actual figures, at least as deployed by the CSA, does not have the effect of concealing or disguising the data which he [the Commissioner] had originally considered that it would."[15]Шаблон:Rp However, in his written decision the Commissioner offered no statistical justification for this assertion. Instead the Commissioner's decision centred mainly around addressing points of law relating to the nature of the original and barnardised data, and how this related to legal definitions of (sensitive) personal data.

References

Шаблон:Reflist

  1. 1,0 1,1 1,2 Ошибка цитирования Неверный тег <ref>; для сносок Newman_1978 не указан текст
  2. 2,0 2,1 2,2 2,3 Ошибка цитирования Неверный тег <ref>; для сносок ONS_2006 не указан текст
  3. 3,0 3,1 3,2 Ошибка цитирования Неверный тег <ref>; для сносок Moore_1973 не указан текст
  4. Ошибка цитирования Неверный тег <ref>; для сносок New_Scientist_1973 не указан текст
  5. 5,0 5,1 Ошибка цитирования Неверный тег <ref>; для сносок Jones_et_al_1973 не указан текст
  6. Ошибка цитирования Неверный тег <ref>; для сносок Statistics_Canada_1974 не указан текст
  7. Ошибка цитирования Неверный тег <ref>; для сносок Rhind_1975 не указан текст
  8. Ошибка цитирования Неверный тег <ref>; для сносок Hakim_1978 не указан текст
  9. Ошибка цитирования Неверный тег <ref>; для сносок Dewdney_1983 не указан текст
  10. 10,0 10,1 Ошибка цитирования Неверный тег <ref>; для сносок Marsh_1993 не указан текст
  11. Ошибка цитирования Неверный тег <ref>; для сносок Hakim_1979 не указан текст
  12. Ошибка цитирования Неверный тег <ref>; для сносок Openshaw_1995 не указан текст
  13. Ошибка цитирования Неверный тег <ref>; для сносок Williamson_2022 не указан текст
  14. 14,0 14,1 14,2 Ошибка цитирования Неверный тег <ref>; для сносок SDC_UKCDMAC_Subgroup не указан текст
  15. 15,0 15,1 15,2 15,3 Ошибка цитирования Неверный тег <ref>; для сносок SIC_2010 не указан текст
  16. Ошибка цитирования Неверный тег <ref>; для сносок Williamson_2007 не указан текст
  17. Ошибка цитирования Неверный тег <ref>; для сносок Spicer не указан текст