Английская Википедия:Incremental decision tree

An incremental decision tree algorithm is an online machine learning algorithm that outputs a decision tree. Many decision tree methods, such as C4.5, construct a tree using a complete dataset. Incremental decision tree methods allow an existing tree to be updated using only new individual data instances, without having to re-process past instances. This may be useful in situations where the entire dataset is not available when the tree is updated (i.e. the data was not stored), the original data set is too large to process or the characteristics of the data change over time.

Applications

On-line learning
Data streams
Concept drift
Data which can be modeled well using a hierarchical model.
Systems where a user-interpretable output is desired.

Methods

Here is a short list of incremental decision tree methods, organized by their (usually non-incremental) parent algorithms.

CART family

CART^[1] (1984) is a nonincremental decision tree inducer for both classification and regression problems. developed in the mathematics and statistics communities. CART traces its roots to AID (1963)^[2]

incremental CART (1989)^[3] Crawford modified CART to incorporate data incrementally.

ID3/C4.5 family

ID3 (1986)^[4] and C4.5 (1993)^[5] were developed by Quinlan and have roots in Hunt's Concept Learning System (CLS, 1966)^[6] The ID3 family of tree inducers was developed in the engineering and computer science communities.

ID3' (1986)^[7] was suggested by Schlimmer and Fisher. It was a brute-force method to make ID3 incremental; after each new data instance is acquired, an entirely new tree is induced using ID3.
ID4 (1986)^[7] could incorporate data incrementally. However, certain concepts were unlearnable, because ID4 discards subtrees when a new test is chosen for a node.
ID5 (1988)^[8] didn't discard subtrees, but also did not guarantee that it would produce the same tree as ID3.
ID5R (1989)^[9] output the same tree as ID3 for a dataset regardless of the incremental training order. This was accomplished by recursively updating the tree's subnodes. It did not handle numeric variables, multiclass classification tasks, or missing values.
ID6MDL (2007)^[10] an extended version of the ID3 or ID5R algorithms.
ITI (1997)^[11] is an efficient method for incrementally inducing decision trees. The same tree is produced for a dataset regardless of the data's presentation order, or whether the tree is induced incrementally or non incrementally (batch mode). It can accommodate numeric variables, multiclass tasks, and missing values. Code is available on the web. [1]

note: ID6NB (2009)^[12] is not incremental.

Other Incremental Learning Systems

There were several incremental concept learning systems that did not build decision trees, but which predated and influenced the development of the earliest incremental decision tree learners, notably ID4.^[7] Notable among these was Schlimmer and Granger's STAGGER (1986),^[13] which learned disjunctive concepts incrementally. STAGGER was developed to examine concepts that changed over time (concept drift). Prior to STAGGER, Michalski and Larson (1978)^[14] investigated an incremental variant of AQ (Michalski, 1973),^[15] a supervised system for learning concepts in disjunctive normal form (DNF). Experience with these earlier systems and others, to include incremental tree-structured unsupervised learning, contributed to a conceptual framework for evaluating incremental decision tree learners specifically, and incremental concept learning generally, along four dimensions that reflect the inherent tradeoffs between learning cost and quality:^[7] (1) cost of knowledge base update, (2) the number of observations that are required to converge on a knowledge base with given characteristics, (3) the total effort (as a function of the first two dimensions) that a system exerts, and the (4) quality (often consistency) of the final knowledge base. Some of the historical context in which incremental decision tree learners emerged is given in Fisher and Schlimmer (1988),^[16] and which also expands on the four factor framework that was used to evaluate and design incremental learning systems.

VFDT Algorithm

Very Fast Decision Trees learner reduces training time for large incremental data sets by subsampling the incoming data stream.

VFDT (2000)^[17]
CVFDT (2001)^[18] can adapt to concept drift, by using a sliding window on incoming data. Old data outside the window is forgotten.
VFDTc (2006)^[19] extends VFDT for continuous data, concept drift, and application of Naive Bayes classifiers in the leaves.
VFML (2003) is a toolkit and available on the web. [2]. It was developed by the creators of VFDT and CVFDT.

EFDT Algorithm

The Extremely Fast Decision Tree learner^[20] is statistically more powerful than VFDT, allowing it to learn more detailed trees from less data. It differs from VFDT in the method for deciding when to insert a new branch into the tree. VFDT waits until it is confident that the best available branch is better than any alternative. In contrast, EFDT splits as soon as it is confident that the best available branch is better than the current alternative. Initially, the current alternative is no branch. This allows EFDT to insert branches much more rapidly than VFDT. During incremental learning this means that EFDT can deploy useful trees much sooner than VFDT.

However, the new branch selection method greatly increases the likelihood of selecting a suboptimal branch. In consequence, EFDT keeps monitoring the performance of all branches and will replace a branch as soon as it is confident there is a better alternative.

OLIN and IFN

OLIN (2002)^[21]
IOLIN (2008)^[22] — based on Info-Fuzzy Network (IFN)^[23]

GAENARI

gaenari

References

Шаблон:Reflist

External links

ITI code. http://www-lrn.cs.umass.edu/iti/index.html
VFML code. http://www.cs.washington.edu/dm/vfml/
C++ incremental decision tree. https://github.com/greenfish77/gaenari

↑ Шаблон:Cite book
↑ Шаблон:Cite journal
↑ Шаблон:Cite journal
↑ Шаблон:Cite journal
↑ Шаблон:Cite book
↑ Шаблон:Cite book
↑ ^7,0 ^7,1 ^7,2 ^7,3 Шаблон:Cite book
↑ Шаблон:Cite book Publishers.
↑ Шаблон:Cite journal
↑ Kroon, M., Korzec, S., Adriani, P. (2007) ID6MDL: Post-Pruning Incremental Decision Trees.
↑ Шаблон:Cite journal
↑ Шаблон:Cite journal
↑ Шаблон:Cite journal
↑ Шаблон:Cite tech report
↑ Шаблон:Cite book
↑ Шаблон:Cite tech report
↑ Шаблон:Cite book
↑ Шаблон:Cite book
↑ Шаблон:Cite journal
↑ Шаблон:Cite book
↑ Шаблон:Cite journal
↑ Шаблон:Cite journal
↑ Шаблон:Cite book

[1] Шаблон:Cite book

[2] Шаблон:Cite journal

[3] Шаблон:Cite journal

[4] Шаблон:Cite journal

[5] Шаблон:Cite book

[6] Шаблон:Cite book

[Schlimmer,_J._C._1986_pp._496-501-7] 7,0 ^7,1 ^7,2 ^7,3 Шаблон:Cite book

[8] Шаблон:Cite book Publishers.

[9] Шаблон:Cite journal

[10] Kroon, M., Korzec, S., Adriani, P. (2007) ID6MDL: Post-Pruning Incremental Decision Trees.

[11] Шаблон:Cite journal

[12] Шаблон:Cite journal

[13] Шаблон:Cite journal

[MichalskiLarson78-14] Шаблон:Cite tech report

[Michalski,_1973-15] Шаблон:Cite book

[FisherSchlimmer88a-16] Шаблон:Cite tech report

[17] Шаблон:Cite book

[18] Шаблон:Cite book

[19] Шаблон:Cite journal

[20] Шаблон:Cite book

[21] Шаблон:Cite journal

[22] Шаблон:Cite journal

[23] Шаблон:Cite book

[1]

[2]

[3]

[4]

[5]

[6]

[7]

[8]

[9]

[10]

[11]

[12]

[13]

[14]

[15]

[16]

[17]

[18]

[19]

[20]

[21]

[22]

[23]

Партнерские ресурсы
Криптовалюты	Обмен криптовалют - www.bestchange.ru Криптовалютная биржа CoinEx Криптовалютная биржа Binance HIVE OS - операционная система для майнинга e4pool - Мультивалютный пул для майнинга.
Магазины	AliExpress — глобальная виртуальная (в Интернете) торговая площадка, предоставляющая возможность покупать товары производителей из КНР; computeruniverse.net - Интернет-магазин компьютеров(Промо код 5 Евро на первую покупку:FWWC3ZKQ);
Хостинг	DigitalOcean - американский провайдер облачных инфраструктур, с главным офисом в Нью-Йорке и с центрами обработки данных по всему миру;
Разное	Викиум - Онлайн-тренажер для мозга Like Центр - Центр поддержки и развития предпринимательства. Gamersbay - лучший магазин по бустингу для World of Warcraft. Ноотропы OmniMind N°1 - Усиливает мозговую активность. Повышает мотивацию. Улучшает память. Санкт-Петербургская школа телевидения - это федеральная сеть образовательных центров, которая имеет филиалы в 37 городах России. Lingualeo.com — интерактивный онлайн-сервис для изучения и практики английского языка в увлекательной игровой форме. Junyschool (Джунискул) – международная школа программирования и дизайна для детей и подростков от 5 до 17 лет, где ученики осваивают компьютерную грамотность, развивают алгоритмическое и креативное мышление, изучают основы программирования и компьютерной графики, создают собственные проекты: игры, сайты, программы, приложения, анимации, 3D-модели, монтируют видео. Умназия - Интерактивные онлайн-курсы и тренажеры для развития мышления детей 6-13 лет SkillBox - это один из лидеров российского рынка онлайн-образования. Среди партнеров Skillbox ведущий разработчик сервисного дизайна AIC, медиа-компания Yoola, первое и самое крупное русскоязычное аналитическое агентство Tagline, онлайн-школа дизайна и иллюстрации Bang! Bang! Education, оператор PR-рынка PACO, студия рисования Draw&Go, агентство performance-маркетинга Ingate, scrum-студия Sibirix, имидж-лаборатория Персона. «Нетология» — это университет по подготовке и дополнительному обучению специалистов в области интернет-маркетинга, управления проектами и продуктами, дизайна, Data Science и разработки. В рамках Нетологии студенты получают ценные теоретические знания от лучших экспертов Рунета, выполняют практические задания на отработку полученных навыков, общаются с экспертами и единомышленниками. Познакомиться со всеми продуктами подробнее можно на сайте https://netology.ru, линейка курсов и профессий постоянно обновляется. StudyBay Brazil – это онлайн биржа для португалоговорящих студентов и авторов! Студент получает уникальную работу любого уровня сложности и больше свободного времени, в то время как у автора появляется дополнительный заработок и бесценный опыт. Автор24 — самая большая в России площадка по написанию учебных работ: контрольные и курсовые работы, дипломы, рефераты, решение задач, отчеты по практике, а так же любой другой вид работы. Сервис сотрудничает с более 70 000 авторов. Более 1 000 000 работ уже выполнено. StudyBay – это онлайн биржа для англоязычных студентов и авторов! Студент получает уникальную работу любого уровня сложности и больше свободного времени, в то время как у автора появляется дополнительный заработок и бесценный опыт.

Английская Википедия:Incremental decision tree

Содержание

Applications

Methods

CART family

ID3/C4.5 family

Other Incremental Learning Systems

VFDT Algorithm

EFDT Algorithm

OLIN and IFN

GAENARI

See also

References

External links

Навигация

Действия на странице

Действия на странице

Персональные инструменты

Навигация

Поиск

Инструменты