Английская Википедия:AsoSoft text corpus

Материал из Онлайн справочника
Версия от 12:04, 3 февраля 2024; EducationBot (обсуждение | вклад) (Новая страница: «{{Английская Википедия/Панель перехода}} {{Short description|Kurdish text corpus}} {{notability|date=June 2019}} The '''AsoSoft text corpus''' is the first large-scale Kurdish text corpus, collected and processed by the AsoSoft research and development group. It contains 458,000 documents (188 million tokens) that are collected from sources such as websites, news agencies, books, and magazines. The...»)
(разн.) ← Предыдущая версия | Текущая версия (разн.) | Следующая версия → (разн.)
Перейти к навигацииПерейти к поиску

Шаблон:Short description Шаблон:Notability

The AsoSoft text corpus is the first large-scale Kurdish text corpus, collected and processed by the AsoSoft research and development group. It contains 458,000 documents (188 million tokens) that are collected from sources such as websites, news agencies, books, and magazines. The corpus is partially tagged by topic, so it can be used for topic identification tasks. Also, it is applicable for extracting language model and computational lexicon information. Part of the corpus (75 million tokens) is available online for non-commercial use. The corpus uses the TEI format.[1]

References

Шаблон:Reflist

External links


Шаблон:Comp-ling-stub