Английская Википедия:Hunt–Szymanski algorithm

In computer science, the Hunt–Szymanski algorithm,^[1]^[2] also known as Hunt–McIlroy algorithm, is a solution to the longest common subsequence problem. It was one of the first non-heuristic algorithms used in diff which compares a pair of files each represented as a sequence of lines. To this day, variations of this algorithm are found in incremental version control systems, wiki engines, and molecular phylogenetics research software.

The worst-case complexity for this algorithm is Шаблон:Math, but in practice Шаблон:Math is rather expected.^[3]^[4]

History

The algorithm was proposed by Harold S. Stone as a generalization of a special case solved by Thomas G. Szymanski.^[5]^[6]^[7] James W. Hunt refined the idea, implemented the first version of the candidate-listing algorithm used by diff and embedded it into an older framework of Douglas McIlroy.^[5]

The description of the algorithm appeared as a technical report by Hunt and McIlroy in 1976.^[5] The following year, a variant of the algorithm was finally published in a joint paper by Hunt and Szymanski.^[5]^[8]

Algorithm

The Hunt–Szymanski algorithm is a modification to a basic solution for the longest common subsequence problem which has complexity Шаблон:Math. The solution is modified so that there are lower time and space requirements for the algorithm when it is working with typical inputs.

Basic longest common subsequence solution

Algorithm

Let Шаблон:Math be the Шаблон:Mathth element of the first sequence.

Let Шаблон:Math be the Шаблон:Mathth element of the second sequence.

Let Шаблон:Math be the length of the longest common subsequence for the first Шаблон:Math elements of Шаблон:Math and the first Шаблон:Math elements Шаблон:Math.

<math>

P_{ij} = \begin{cases}

& \text{ if }\ i = 0 \text{ or } j = 0 \\

 1 + P_{i-1, j-1}

& \text{ if } A_i = B_j \\

 \max(P_{i-1, j}, P_{i, j-1})

& \text{ if } A_i \ne B_j \end{cases} </math>

Example

Файл:Longest Common Subsequence Recursion.png

A table showing the recursive steps that the basic longest common subsequence algorithm takes

Consider the sequences Шаблон:Math and Шаблон:Math.

Шаблон:Math contains three elements:

<math>\begin{align}

A_1 = a\\
A_2 = b\\
A_3 = c

\end{align}</math>

Шаблон:Math contains three elements:

<math>\begin{align}

B_1 = a\\
B_2 = c\\
B_3 = b

\end{align}</math>

The steps that the above algorithm would perform to determine the length of the longest common subsequence for both sequences are shown in the diagram. The algorithm correctly reports that the longest common subsequence of the two sequences is two elements long.

Complexity

The above algorithm has worst-case time and space complexities of Шаблон:Math (see big O notation), where Шаблон:Math is the number of elements in sequence Шаблон:Math and Шаблон:Math is the number of elements in sequence Шаблон:Math. The Hunt–Szymanski algorithm modifies this algorithm to have a worst-case time complexity of Шаблон:Math and space complexity of Шаблон:Math, though it regularly beats the worst case with typical inputs.

Essential matches

Шаблон:Math-candidates

The Hunt–Szymanski algorithm only considers what the authors call essential matches, or Шаблон:Math-candidates. Шаблон:Math-candidates are pairs of indices Шаблон:Math such that:

The second point implies two properties of Шаблон:Math-candidates:

There is a common subsequence of length Шаблон:Math in the first Шаблон:Math elements of sequence Шаблон:Math and the first Шаблон:Math elements of sequence Шаблон:Math.
There are no common subsequences of length Шаблон:Math for any fewer than Шаблон:Math elements of sequence Шаблон:Math or Шаблон:Math elements of sequence Шаблон:Math.

Connecting Шаблон:Math-candidates

Файл:K Candidate Diagram.png

A diagram that shows how using Шаблон:Math-candidates reduces the amount of time and space needed to find the longest common subsequence of two sequences.

To create the longest common subsequence from a collection of Шаблон:Math-candidates, a grid with each sequence's contents on each axis is created. The Шаблон:Math-candidates are marked on the grid. A common subsequence can be created by joining marked coordinates of the grid such that any increase in Шаблон:Math is accompanied by an increase in Шаблон:Math.

This is illustrated in the adjacent diagram.

Black dots represent candidates that would have to be considered by the simple algorithm and the black lines are connections that create common subsequences of length 3.

Red dots represent Шаблон:Math-candidates that are considered by the Hunt–Szymanski algorithm and the red line is the connection that creates a common subsequence of length 3.

References

Шаблон:Reflist

↑ Шаблон:Cite web
↑ Шаблон:Cite journal
↑ Шаблон:Cite journal
↑ See Section 5.6 of Aho, A. V., Hopcroft, J. E., Ullman, J. D., Data Structures and Algorithms. Addison-Wesley, 1983. Шаблон:ISBN
↑ ^5,0 ^5,1 ^5,2 ^5,3 Шаблон:Cite journal
↑ Шаблон:Cite web
↑ Szymanski, T. G. (1975) A special case of the maximal common subsequence problem. Technical Report TR-170, Computer Science Lab., Princeton University.
↑ Шаблон:Cite journal

[1] Шаблон:Cite web

[2] Шаблон:Cite journal

[3] Шаблон:Cite journal

[4] See Section 5.6 of Aho, A. V., Hopcroft, J. E., Ullman, J. D., Data Structures and Algorithms. Addison-Wesley, 1983. Шаблон:ISBN

[HM76-5] 5,0 ^5,1 ^5,2 ^5,3 Шаблон:Cite journal

[6] Шаблон:Cite web

[7] Szymanski, T. G. (1975) A special case of the maximal common subsequence problem. Technical Report TR-170, Computer Science Lab., Princeton University.

[8] Шаблон:Cite journal

[1]

[2]

[3]

[4]

[5]

[6]

[7]

[8]

Партнерские ресурсы
Криптовалюты	Обмен криптовалют - www.bestchange.ru Криптовалютная биржа CoinEx Криптовалютная биржа Binance HIVE OS - операционная система для майнинга e4pool - Мультивалютный пул для майнинга.
Магазины	AliExpress — глобальная виртуальная (в Интернете) торговая площадка, предоставляющая возможность покупать товары производителей из КНР; computeruniverse.net - Интернет-магазин компьютеров(Промо код 5 Евро на первую покупку:FWWC3ZKQ);
Хостинг	DigitalOcean - американский провайдер облачных инфраструктур, с главным офисом в Нью-Йорке и с центрами обработки данных по всему миру;
Разное	Викиум - Онлайн-тренажер для мозга Like Центр - Центр поддержки и развития предпринимательства. Gamersbay - лучший магазин по бустингу для World of Warcraft. Ноотропы OmniMind N°1 - Усиливает мозговую активность. Повышает мотивацию. Улучшает память. Санкт-Петербургская школа телевидения - это федеральная сеть образовательных центров, которая имеет филиалы в 37 городах России. Lingualeo.com — интерактивный онлайн-сервис для изучения и практики английского языка в увлекательной игровой форме. Junyschool (Джунискул) – международная школа программирования и дизайна для детей и подростков от 5 до 17 лет, где ученики осваивают компьютерную грамотность, развивают алгоритмическое и креативное мышление, изучают основы программирования и компьютерной графики, создают собственные проекты: игры, сайты, программы, приложения, анимации, 3D-модели, монтируют видео. Умназия - Интерактивные онлайн-курсы и тренажеры для развития мышления детей 6-13 лет SkillBox - это один из лидеров российского рынка онлайн-образования. Среди партнеров Skillbox ведущий разработчик сервисного дизайна AIC, медиа-компания Yoola, первое и самое крупное русскоязычное аналитическое агентство Tagline, онлайн-школа дизайна и иллюстрации Bang! Bang! Education, оператор PR-рынка PACO, студия рисования Draw&Go, агентство performance-маркетинга Ingate, scrum-студия Sibirix, имидж-лаборатория Персона. «Нетология» — это университет по подготовке и дополнительному обучению специалистов в области интернет-маркетинга, управления проектами и продуктами, дизайна, Data Science и разработки. В рамках Нетологии студенты получают ценные теоретические знания от лучших экспертов Рунета, выполняют практические задания на отработку полученных навыков, общаются с экспертами и единомышленниками. Познакомиться со всеми продуктами подробнее можно на сайте https://netology.ru, линейка курсов и профессий постоянно обновляется. StudyBay Brazil – это онлайн биржа для португалоговорящих студентов и авторов! Студент получает уникальную работу любого уровня сложности и больше свободного времени, в то время как у автора появляется дополнительный заработок и бесценный опыт. Автор24 — самая большая в России площадка по написанию учебных работ: контрольные и курсовые работы, дипломы, рефераты, решение задач, отчеты по практике, а так же любой другой вид работы. Сервис сотрудничает с более 70 000 авторов. Более 1 000 000 работ уже выполнено. StudyBay – это онлайн биржа для англоязычных студентов и авторов! Студент получает уникальную работу любого уровня сложности и больше свободного времени, в то время как у автора появляется дополнительный заработок и бесценный опыт.

Английская Википедия:Hunt–Szymanski algorithm

Содержание

History

Algorithm

Basic longest common subsequence solution

Algorithm

Example

Complexity

Essential matches

Шаблон:Math-candidates

Connecting Шаблон:Math-candidates

See also

References

Навигация

Действия на странице

Действия на странице

Персональные инструменты

Навигация

Поиск

Инструменты