Английская Википедия:Academic studies about Wikipedia

Материал из Онлайн справочника
Перейти к навигацииПерейти к поиску

Шаблон:Short description Шаблон:For Шаблон:Primary sources Шаблон:Use dmy dates

Wikipedia has been studied extensively. Between 2001 and 2010, researchers published at least 1,746 peer-reviewed articles about the online encyclopedia.[1] Such studies are greatly facilitated by the fact that Wikipedia's database can be downloaded without help from the site owner.[2]

Research topics have included the reliability of the encyclopedia and various forms of systemic bias; social aspects of the Wikipedia community (including administration, policy, and demographics); the encyclopedia as a dataset for machine learning; and whether Wikipedia trends might predict or influence human behaviour.

Notable findings include factual accuracy similar to other encyclopedias, the presence of cultural and gender bias as well as gaps in coverage of the Global South; that a tiny minority of editors produce the majority of content; various models for understanding online conflict; and limited correlation between Wikipedia trends and various phenomena such as stock market movements or electoral results.

Шаблон:TOC limit

Content

Production

A minority of editors produce the majority of persistent content

Studies from 2005 to 2007 found that a small minority of editors produce most of the edits on Wikipedia, and that the distribution of edits follows a power law with about half of the total edits produced by 1% of the editors. Another 2007 study found that 'elite' editors with many edits produced 30% of the content changes, measured in number of words. These editors were also more likely to add, rather than delete, content.[3]

A 2007 study from the University of Minnesota used reader-based measures that weighted content based on the number of times it was viewed (a persistent word view (PWV)). This study analyzed trillions of word views between September 2002 and October 2006 and concluded that 0.1% of the Wikipedia community (4,200 editors) produced 44% of the word views during this time. The editors concluded that,[3]

Шаблон:BlockquoteA 2009 study determined that one percent of editors who average more than 1,000 edits/month make 55% of edits.[4]

Work distribution and social strata

Шаблон:Further A peer-reviewed paper noted the "social stratification in the Wikipedia society" due to the "admins class". The paper suggested that such stratification could be beneficial in some respects but recognized a "clear subsequent shift in power among levels of stratification" due to the "status and power differentials" between administrators and other editors.[5]

Analyzing the entire edit history of English Wikipedia up to July 2006, the same study determined that the influence of administrator edits on contents has steadily diminished since 2003, when administrators performed roughly 50% of total edits, to 2006 when only 10% of the edits were performed by administrators. This happened despite the fact that the average number of edits per administrator had increased more than fivefold during the same period. This phenomenon was labeled the "rise of the crowd" by the authors of the paper. An analysis that used as metric the number of words edited instead of the number of edit actions showed a similar pattern. Because the admin class is somewhat arbitrary with respect to the number of edits, the study also considered a breakdown of users in categories based on the number of edits performed. The results for "elite users", i.e. users with more than 10,000 edits, were somewhat in line with those obtained for administrators, except that "the number of words changed by elite users has kept up with the changes made by novice users, even though the number of edits made by novice users has grown proportionally faster". The study concludes:

Шаблон:Blockquote

Reliability

Шаблон:Main article

An Argumentation conference paper (2010) assessed whether trust in Wikipedia is based on epistemic or pragmatic merits. While readers may not assess the actual knowledge and expertise of the authors of a given article, they may assess the contributors' passion for the project, and communicative design through which that passion is made manifest, and provide a reason for trust.[6]

In details, the author argued that Wikipedia can't be trusted based on individual expertise, collective knowledge, or past experience of reliability. This is because anonymity and pseudonymity prevent knowledge assessment, and "anti-expert culture" makes it unlikely that this will change. Editing Wikipedia may largely be confined to an elite group of editors, without aggregating "wisdom of the crowd" which in some cases lowers the quality of an article anyway. Personal experiences and empirical studies, confirmed by incidents including Seigenthaler biography controversy, point to the conclusion that Wikipedia is not generally reliable. Hence, these epistemic factors don't justify consulting with Wikipedia.

The author then proposed rationale to trust Wikipedia based on pragmatic values, which roughly can be summarized into two factors. First, the size and activity around Wikipedia indicates that editors are deeply committed to provide the world with knowledge. Second, transparent developments of policies, practices, institutions, and technologies in addition to conspicuous massive efforts, address the possible concerns that one might have in trusting Wikipedia. The concerns raised include the definition of provided knowledge, preventing distorted contributions from people not sharing the same commitment, correcting editing damages, and article quality control and improvement.

Health information

Шаблон:Main article Health information on English Wikipedia is popularly accessed as results from search engines and search engine result page, which frequently deliver links to Wikipedia articles.[7] Independent assessments of the quality of health information provided on Wikipedia and of who is accessing the information have been undertaken. The number and demographics of people who seek health information on Wikipedia, the scope of health information on Wikipedia, and the quality of the information on Wikipedia have been studied.[8] There are drawbacks to using Wikipedia as a source of health information.Шаблон:Explain

Bias

Research has consistently shown that Wikipedia systematically over-represents a point of view (POV) belonging to a particular demographic described as the "average Wikipedian", who is an educated, technically inclined, English speaking white male, aged 15–49 from a developed Christian country in the northern hemisphere.[9] This POV is over-represented in relation to all existing POVs.[10][11] This systemic bias in editor demographic results in cultural bias, gender bias, and lack of information about the Global South.[12][13]

There are two broad types of bias, which are implicit (when a topic is omitted) and explicit (when a certain POV is supported in an article or by references).[10]

Interdisciplinary scholarly assessments of Wikipedia articles have found that while articles are typically accurate and free of misinformation, they are also typically incomplete and fail to present all perspectives with a neutral point of view.[12]

Researchers from Washington University in St. Louis developed a statistical model to measure systematic bias in the behavior of Wikipedia's users regarding controversial topics. The authors focused on behavioral changes of the encyclopedia's administrators after assuming the post, writing that systematic bias occurred after the fact.[14][15]

Geographical bias

Шаблон:Main Research conducted in 2009 by the Oxford Internet Institute showed that geotagged articles in all language editions of Wikipedia covered about half a million places on Earth. However, the geographic distribution of articles was highly uneven: most articles are written about North America, Europe, and East Asia, with very little coverage of large parts of the developing world, including most of Africa.[16]

Another 2009 study of 15 language editions determined that each edition was highly "self-focused", with emphasis on the geographic "home region" of that language.[17]

Gender bias

Шаблон:Main The gender bias on Wikipedia has been widely discussed.[4] A 2010 survey found that only 13% of editors and 31% of readers were female.[4] A 2017 paper confirmed that only 15% of the editing community is female.[10]

A 2021 study by Francesca Tripodi found that of the roughly 1.5 million biographical articles on the English Wikipedia in 2021, only 19% were about women.[18][19] The study found that biographies that do exist are considerably more likely to be nominated for deletion than existing articles of men.[18][19]

Addressing bias

Some studies have investigated the work of WikiProject Countering Systemic Bias (WP:CSB),[20] which is a collective effort of some Wikipedia editors to broaden the encyclopedia's POV. A 2010 study of 329 editors participating in WP:CSB found that these editors' work favoured topics belonging to the United States and England, and that "the areas of the globe of main concern to WP:CSB proved to be much less represented by the coalition itself."[9]

A 2021 paper recommended addressing a "sweet spot" within the encyclopedia's bias where existing scholarship includes reliable, peer-reviewed sources that offer a more complete POV than existing Wikipedia articles. The study suggested that incorporation of these sources would offer better representation for excluded or marginalized POVs, and that the possibilities for potential improvement are "massive."[11]

Natural language processing

The textual content and the structured hierarchy of Wikipedia has become an important knowledge source for researchers in natural language processing and artificial intelligence. In 2007 researchers at Technion – Israel Institute of Technology developed a technique called Explicit Semantic Analysis[21] which uses the world knowledge contained in English Wikipedia articles. Conceptual representations of words and texts are created automatically and used to compute the similarity between words and between texts.

Researchers at Ubiquitous Knowledge Processing Lab use the linguistic and world knowledge encoded in Wikipedia and Wiktionary to automatically create linguistic knowledge bases which are similar to expert-built resources like WordNet.[22] Strube and Ponzetto created an algorithm to identify relationships among words by traversing English Wikipedia via its categorization scheme, and concluded that Wikipedia had created "a taxonomy able to compete with WordNet on linguistic processing tasks".[23]

Social aspects

Conflict

A 2011 study reported a new way to measure how disputed a Wikipedia article is, and verified against 6 Indo-European language editions including English.[24]Шаблон:Clarify

A 2013 article in Physical Review Letters reported a generic social dynamics model in a collaborative environment involving opinions, conflicts, and consensus, with a specific analogue to Wikipedia: "a peaceful article can suddenly become controversial when more people get involved in its editing."[25]Шаблон:Clarification needed

In 2014 published as a book chapter titled "The Most Controversial Topics in Wikipedia: A Multilingual and Geographical Analysis": analysed the volume of editing of articles in various language versions of Wikipedia in order to establish the most controversial topics in different languages and groups of languages. For the English version, the top three most controversial articles were George W. Bush, Anarchism and Muhammad. Topics in other languages causing most controversy were Croatia (German), Ségolène Royal (French), Chile (Spanish) and Homosexuality (Czech).[26]

Demographics

A 2007 study by Hitwise, reproduced in Time magazine,[27] found that visitors to Wikipedia are almost equally split 50/50 male/female, but that 60% of edits are made by male editors. A 2010 survey found that only 13% of editors and 31% of readers were female.[4] 2017 paper confirmed that only 15% of the editing community is female.[10]

A 2012 study covering 32 language editions analysed circadian activity of editors and concluded that the shares of contributions to English Wikipedia, from North America and Europe-Far East-Australia are almost equal, whereas this increases to 75% of European-Far Eastern-Australian contributions for the Simple English Wikipedia. The research also covers some other demographic analysis on the other editions in different languages.[28]

Policies and guidelines

A descriptive study[29] that analyzed English language Wikipedia's policies and guidelines up to September 2007 identified a number of key statistics:

  • 44 official policies
  • 248 guidelines

Even a short policy like "ignore all rules" was found to have generated a lot of discussion and clarifications:

Шаблон:Blockquote

The study sampled the expansion of some key policies since their inception:

The number for "deletion" was considered inconclusive however because the policy was split in several sub-policies.

Шаблон:AnchorPower plays

Шаблон:Undue weight section A 2007 joint peer-reviewed study[30] conducted by researchers from the University of Washington and HP Labs examined how policies are employed and how contributors work towards consensus by quantitatively analyzing a sample of active talk pages. Using a November 2006 English Wikipedia database dump, the study focused on 250 talk pages in the tail of the distribution: 0.3% of all talk pages, but containing 28.4% of all talk page revisions, and more significantly, containing 51.1% of all links to policies. From the sampled pages' histories, the study examined only the months with high activity, called critical sections—sets of consecutive months where both article and talk page revisions were significant in number.

The study defined and calculated a measure of policy prevalence. A critical section was considered policy-laden if its policy factor was at least twice the average. Articles were tagged with 3 indicator variables:

  • controversial
  • featured
  • policy-laden

All possible levels of these three factors yielded 8 sampling categories. The study intended to analyze 9 critical sections from each sampling category, but only 69 critical sections could be selected because only 6 articles (histories) were simultaneously featured, controversial, and policy laden.

The study found that policies were by no means consistently applied. Illustrative of its broader findings, the report presented the following two extracts from Wikipedia talk pages in obvious contrast:

  • a discussion where participants decided that calculating a mean from data provided by a government agency constituted original research:

Шаблон:Blockquote

  • a discussion where logical deduction was used as counterargument for the original research policy:

Шаблон:Blockquote

Claiming that such ambiguities easily give rise to power plays, the study identified, using the methods of grounded theory (Strauss), 7 types of power plays:

  • article scope (what is off-topic in an article)
  • prior consensus (past decisions presented as absolute and uncontested)
  • power of interpretation (a sub-community claiming greater interpretive authority than another)
  • legitimacy of contributor (his/her expertise etc.)
  • threat of sanction (blocking etc.)
  • practice on other pages (other pages being considered models to follow)
  • legitimacy of source (the cited reference is disputed)

Due to lack of space, the study detailed only the first 4 types of power plays that were exercised by merely interpreting policy. A fifth power play category was analyzed; it consisted of blatant violations of policy that were forgiven because the contributor was valued for his or her contributions despite his lack of respect for rules.

Article scope

The study considers that Wikipedia's policies are ambiguous on scoping issues. The following vignette is used to illustrate the claim:

Шаблон:Blockquote

The study gives the following interpretation for the heated debate:

Шаблон:Blockquote

Prior consensus

The study remarks that in Wikipedia consensus is never final, and what constitutes consensus can change at any time. The study finds that this temporal ambiguity is fertile ground for power plays, and places the generational struggle over consensus in larger picture of the struggle for article ownership: Шаблон:Blockquote

The study uses the following discussion snippet to illustrate this continuous struggle:

Шаблон:Blockquote

Power of interpretation

A vignette illustrated how administrators overrode consensus and deleted personal accounts of users/patients with an anonymized illness (named Frupism in the study). The administrator's intervention happened as the article was being nominated to become as a featured article.

Legitimacy of contributor

This type of power play is illustrated by a contributor (U24) that draws on his past contributions to argue against another contributor who is accusing U24 of being unproductive and disruptive:

Шаблон:Blockquote

Explicit vie for ownership

The study finds that there are contributors who consistently and successfully violate policy without sanction:

Шаблон:Blockquote

Obtaining administratorship

Шаблон:See also

In 2008, researchers from Carnegie Mellon University devised a probit model of English Wikipedia editors who had successfully passed the peer review process to become admins.[31] Using only Wikipedia metadata, including the text of edit summaries, their model was 74.8% accurate in predicting successful candidates.

The paper observed that despite protestations to the contrary, "in many ways election to admin is a promotion, distinguishing an elite core group from the large mass of editors." Consequently, the paper used policy capture[32]—a method that compares nominally important attributes to those that actually lead to promotion in a work environment.

The overall success rate for promotion decreased from 75% in 2005, to 53% in 2006, and to 42% in 2007. This sudden increase in failure rate was attributed to a higher standard that recently promoted administrators had to meet, and supported by anecdotal evidence from another recent study[33] quoting some early admins who have expressed doubt that they would pass muster if their election (RfA) were held recently. In light of these developments the study argued that:

Шаблон:Blockquote

Probability increase/decrease of successful RfA per unit being regressed
Шаблон:Small
Factor 2006–2007 pre–2006
each previous RfA attempt −14.7% −11.1%
each month since first edit 0.4% (0.2%)
every 1000 article edits 1.8% (1.1%)
every 1000 Wikipedia policy edits 19.6% (0.4%)
every 1000 WikiProject edits 17.1% (7.2%)
every 1000 article talk edits 6.3% 15.4%
each Arb/mediation/wikiquette edit −0.1% −0.2%
each diversity score (see text) 2.8% 3.7%
each percentage of "Minor edit" indication in edit summaries 0.2% 0.2%
each percentage of human written edit summaries 0.5% 0.4%
each "thank" in edit summaries 0.3% (0.0%)
each "POV" indication in edit summaries 0.1% (0.0%)
each edit in Admin attention/noticeboard −0.1% (0.2%)

Contrary to expectations, "running" for administrator multiple times is detrimental to the candidate's chance of success. Each subsequent attempt has a 14.8% lower chance of success than the previous one. Length of participation in the project makes only a small contribution to the chance of a successful RfA.

Another significant finding of the paper is that one Wikipedia policy edit or WikiProject edit is worth ten article edits. A related observation is that candidates with experience in multiple areas of the site stood better chance of election. This was measured by the diversity score, a simple count of the number of areas that the editor has participated in. The paper divided Wikipedia in 16 areas: article, article talk, articles/categories/templates for deletion (XfD), (un)deletion review, etc. (see paper for full list). For instance, a user who has edited articles, her own user page, and posted once at (un)deletion review would have a diversity score of 3. Making a single edit in any additional region of Wikipedia correlated with a 2.8% increased likelihood of success in gaining administratorship.

Making minor edits also helped, although the study authors consider that this may be so because minor edits correlate with experience. In contrast, each edit to an Arbitration or Mediation committee page, or a Wikiquette notice, all of which are venues for dispute resolution, decreases the likelihood of success by 0.1%. Posting messages to administrator noticeboards had a similarly deleterious effect. The study interpreted this as evidence that editors involved in escalating or protracting conflicts lower their chances of becoming administrators.

Saying "thanks" or variations thereof in edit summaries, and pointing out point of view ("POV") issues (also only in edit summaries because the study only analyzed metadata) were of minor benefit, contributing to 0.3% and 0.1% to candidate's chances in 2006–2007, but did not reach statistical significance before.

A few factors that were found to be irrelevant or marginal at best:

  • Editing user pages (including one's own) does not help. Somewhat surprisingly, user talk page edits also do not affect the likelihood of administratorship.
  • Welcoming newcomers or saying "please" in edit summaries had no effect.
  • Participating in consensus-building, such as RfA votes or the village pump, does not increase the likelihood of becoming admin. The study admits however that participation in consensus was measured quantitatively but not qualitatively.
  • Vandal-fighting as measured by the number of edits to the vandalism noticeboard had no effect. Every thousand edits containing variations of "revert" was positively correlated (7%) with adminship for 2006–2007, but did not attain statistical significance unless one is willing to lower the threshold to p<.1). More confusingly, before 2006 the number of reverts was negatively correlated (-6.8%) with adminship success, against without attaining statistical significance even at p<.1. This may be because of the introduction of a policy known as "3RR" in 2006 to reduce reverts.[34]

The study suggests that some of the 25% unexplained variability in outcomes may be due to factors that were not measured, such as quality of edits or participation in off-site coordination, such as the (explicitly cited) secret mailing list reported in The Register.[35] The paper concludes:

Шаблон:Blockquote

Subsequent research by another group[36] probed the sensemaking activities of individuals during their contributions to RfA decisions. This work establishes that decisions about RfA candidates is based on a shared interpretation of evidence in the wiki and histories of prior interactions.

Readership

Several studies have shown that Wikipedia is used by doctors, students, journalists and scientists.[37] One study in 2009 found that 70% of junior physicians used Wikipedia weekly to find medical information, and in 26% of their cases.[4]

At least one study found that British people trust Wikipedia more than the BBC.[37]

In education

Studies have found that Wikipedia is the most commonly used open educational resource in higher education, and is 2,000 times more cost effective than printed textbooks.[37] It has been found that using Wikipedia improves writing students' interest in learning, their investment in their work, their learning and personal development, and creates opportunity for local and international collaborations.[38]Шаблон:Additional citation needed

Machine learning

Automated semantic knowledge extraction using machine learning algorithms is used to "extract machine-processable information at a relatively low complexity cost".[39] DBpedia uses structured content extracted from infoboxes of Wikipedia articles in different languages by machine learning algorithms to create a resource of linked data in a Semantic Web.[40]

As predictor or influence on human behavior

In a study published in PLoS ONE[41] Taha Yasseri from Oxford Internet Institute and his colleagues from Central European University have shown that the page view statistics of articles about movies are well correlated with the box office revenue of them. They developed a mathematical model to predict the box office takings by analysing the page view counts as well as number of edits and unique editors of the Wikipedia pages on movies. Although this model was developed against English Wikipedia for movies, the language-independent methods can be generalized to other languages and to other kinds of products beyond movies.[42]

In a work published in Scientific Reports in 2013,[43] Helen Susannah Moat, Tobias Preis and colleagues demonstrated a link between changes in the number of views of English Wikipedia articles relating to financial topics and subsequent large US stock market moves.[44][45]

In an article published in Public Opinion Quarterly,[46] Benjamin K. Smith and Abel Gustafon have shown that the data on Wikipedia pageviews can improve traditional election forecasting methods like polls.

Between 2019 and 2021, a team of American and Irish researchers conducted a randomised field experiment which found that creating a Wikipedia article about a legal precedent increased its likelihood of citation in subsequent court judgments by over 20%, and that the language of the court judgments echoed that of the Wikipedia articles.[47]

See also

References

Шаблон:Reflist

Further reading

Шаблон:Refbegin

Шаблон:Refend


Шаблон:Wikipedia

  1. Шаблон:Cite journal
  2. Шаблон:Cite book
  3. 3,0 3,1 Шаблон:Cite book
  4. 4,0 4,1 4,2 4,3 4,4 Шаблон:Cite journal
  5. Шаблон:Cite journal
  6. Goodwin, Jean. (2010). The authority of Wikipedia Шаблон:Webarchive. In Juho Ritola (Ed.), Argument cultures: Proceedings of the Ontario Society for the Study of Argumentation Conference. Windsor, ON, Canada: Ontario Society for the Study of Argumentation.CD-ROM.24 pp.
  7. Шаблон:Cite journal
  8. Шаблон:Cite journal
  9. 9,0 9,1 Шаблон:Cite journal
  10. 10,0 10,1 10,2 10,3 Шаблон:Cite book
  11. 11,0 11,1 Шаблон:Cite journal
  12. 12,0 12,1 Шаблон:Cite journal
  13. Шаблон:Cite web
  14. Шаблон:Cite conference
  15. Шаблон:Cite journal
  16. Шаблон:Cite web
  17. Шаблон:Cite journal
  18. 18,0 18,1 Шаблон:Cite news
  19. 19,0 19,1 Шаблон:Cite journal
  20. Шаблон:Cite web
  21. Шаблон:Cite conference
  22. Шаблон:Cite conference
  23. Шаблон:Cite journal
  24. Шаблон:Cite book
  25. Шаблон:Cite journal cited
  26. Шаблон:Cite book
  27. Шаблон:Cite magazine
  28. Шаблон:Cite journal
  29. Шаблон:Cite book
  30. Шаблон:Cite book
  31. Шаблон:Cite conference
  32. Шаблон:Cite journal
  33. Forte, A., and Bruckman, A. Scaling consensus: Increasing decentralization in Wikipedia governance Шаблон:Webarchive. Proc. HICSS 2008.
  34. WP:3RR and WP:EW, policies which prevent repetitive reverting.
  35. Шаблон:Cite web
  36. Шаблон:Cite book
  37. 37,0 37,1 37,2 Шаблон:Cite journal
  38. Шаблон:Cite journal
  39. Шаблон:Cite book
  40. Шаблон:Cite book
  41. Шаблон:Cite journal
  42. Шаблон:Cite web
  43. Шаблон:Cite journal
  44. Шаблон:Cite web
  45. Шаблон:Cite magazine
  46. Шаблон:Cite journal
  47. Шаблон:Cite journal