Английская Википедия:Christopher D. Paice

Материал из Онлайн справочника
Перейти к навигацииПерейти к поиску

Christopher D Paice was one of the pioneers of research into stemming. The Paice-Husk stemmer was published in 1990 and his method of evaluation of stemmer performance by means of Error Rate with Respect to Truncation (ERRT) was the first direct method of comparing under-stemming and over-stemming errors. Apart from his pioneering work on stemming algorithms and evaluation methods he made other research contributions in the area of Information Retrieval, anaphora resolution and automatic abstracting.[1] [2]

Teaching career

Christopher D Paice was a member of the School of Computing and Communications (SCC) at Lancaster University, United Kingdom for around forty years, initially joining the then Department of Computer Studies as a Research Associate in 1969-70; then moving on to a Lectureship. He was acting Head of Department in 1977-78, Head of Department 1979-82 and retired in 2009.[3]

The Paice-Husk Stemming Algorithm

The Paice-Husk Stemmer was developed by Chris D Paice with the assistance of Gareth Husk in the Computing Department at Lancaster University, in the late 1980s, it features an externally stored set of stemming rules, and this flexibility over the Porter stemmer made it of interest to several researchers.[4]

Originally implemented in Pascal programming language, further implementations have been made using ANSI C and Java. A Perl version was implemented by Mary Taffet at the Center for Natural Language Processing at Syracuse University, USA.[5]

The stemmer consists of a stemming algorithm and a separate set of stemming rules. The standard set of rules provides a 'strong' stemmer. Stemmer strength is a quality that is advantageous for index compression, however, it produce a larger number of Overstemming errors relative to the number of Understemming errors; users who need a lighter stemmer can easily develop their own set of rules.

The Stemmer is iterative (i.e. endings are removed piecemeal in an indefinite number of stages) and the rules may specify the removal or replacement of an ending. The replacement technique avoids the need for a separate stage in the process to recode or provide partial matching; this helps maintain the efficiency of the algorithm. The rules are indexed by the last letter of the ending to allow efficient searching.[6]

Stemmer Evaluation

Apart from the Stemmer itself, Chris Paice developed a method for directly measuring the performance of stemmers using grouped lists of words applied to the stemmer, counting the number of overstemming and understemming errors, then comparing the results with what would have been obtained by using a set of truncation stemmers. The final measure being the Error Rate Relative to Truncation (ERRT).[7] [8]

Personal life

Christopher D Paice was born in 1941, he married Kathleen F Moss in 1965 in the Manchester Registration district. In 2015 he was diagnosed with an aggressive brain tumour, shortly after he and his wife moved away from Cumbria to Stratford, he passed away 21 April 2016.

Publications

References

Шаблон:Reflist

  1. [1], University Trier, DBLP Computer Science Bibliography
  2. [2], ACM Author page, C D Paice
  3. [3], Lancaster University, In Memory of Chris Paice
  4. [4], Improvements to the Lancaster Stemming Algorithm (Paice-Husk Stemmer), Antonio Zamora
  5. [5], GitHub, Paice-Husk Stemmer in several languages
  6. Шаблон:Cite web
  7. Paice, C.D., (1994) An evaluation method for stemming algorithms, in Croft, W.B. & van Rijsbergen, C.J. (eds.), Proceedings of the 17th ACM SIGIR conference held at Dublin, July 3–6, 1994; pp. 42-50.
  8. Paice, C.D. (1996) Method for Evaluation of Stemming Algorithms based on Error Counting, JASIS, 47(8): 632-649