Английская Википедия:Chomsky normal form

Шаблон:Short description Шаблон:Distinguish In formal language theory, a context-free grammar, G, is said to be in Chomsky normal form (first described by Noam Chomsky)^[1] if all of its production rules are of the form:^[2]^[3]

A → BC, or

A → a, or

S → ε,

where A, B, and C are nonterminal symbols, the letter a is a terminal symbol (a symbol that represents a constant value), S is the start symbol, and ε denotes the empty string. Also, neither B nor C may be the start symbol, and the third production rule can only appear if ε is in L(G), the language produced by the context-free grammar G.^[4]Шаблон:Rp

Every grammar in Chomsky normal form is context-free, and conversely, every context-free grammar can be transformed into an equivalent one^{[note 1]} which is in Chomsky normal form and has a size no larger than the square of the original grammar's size.

Converting a grammar to Chomsky normal form

To convert a grammar to Chomsky normal form, a sequence of simple transformations is applied in a certain order; this is described in most textbooks on automata theory.^[4]Шаблон:Rp^[5]^[6]^[7] The presentation here follows Hopcroft, Ullman (1979), but is adapted to use the transformation names from Lange, Leiß (2009).^[8]^{[note 2]} Each of the following transformations establishes one of the properties required for Chomsky normal form.

START: Eliminate the start symbol from right-hand sides

Introduce a new start symbol S₀, and a new rule

S₀ → S,

where S is the previous start symbol. This does not change the grammar's produced language, and S₀ will not occur on any rule's right-hand side.

TERM: Eliminate rules with nonsolitary terminals

To eliminate each rule

A → X₁ ... a ... X_n

with a terminal symbol a being not the only symbol on the right-hand side, introduce, for every such terminal, a new nonterminal symbol N_a, and a new rule

N_a → a.

Change every rule

A → X₁ ... a ... X_n

to

A → X₁ ... N_a ... X_n.

If several terminal symbols occur on the right-hand side, simultaneously replace each of them by its associated nonterminal symbol. This does not change the grammar's produced language.^[4]Шаблон:Rp

BIN: Eliminate right-hand sides with more than 2 nonterminals

Replace each rule

A → X₁ X₂ ... X_n

with more than 2 nonterminals X₁,...,X_n by rules

A → X₁ A₁,

A₁ → X₂ A₂,

... ,

A_n-2 → X_n-1 X_n,

where A_i are new nonterminal symbols. Again, this does not change the grammar's produced language.^[4]Шаблон:Rp

DEL: Eliminate ε-rules

An ε-rule is a rule of the form

A → ε,

where A is not S₀, the grammar's start symbol.

To eliminate all rules of this form, first determine the set of all nonterminals that derive ε. Hopcroft and Ullman (1979) call such nonterminals nullable, and compute them as follows:

If a rule A → ε exists, then A is nullable.
If a rule A → X₁ ... X_n exists, and every single X_i is nullable, then A is nullable, too.

Obtain an intermediate grammar by replacing each rule

A → X₁ ... X_n

by all versions with some nullable X_i omitted. By deleting in this grammar each ε-rule, unless its left-hand side is the start symbol, the transformed grammar is obtained.^[4]Шаблон:Rp

For example, in the following grammar, with start symbol S₀,

S₀ → AbB | C

B → AA | AC

C → b | c

A → a | ε

the nonterminal A, and hence also B, is nullable, while neither C nor S₀ is. Hence the following intermediate grammar is obtained:^{[note 3]}

S₀ → AbB | AbB | AbB | AbB | C

B → AA | AA | AA | AεA | AC | AC

C → b | c

A → a | ε

In this grammar, all ε-rules have been "inlined at the call site".^{[note 4]} In the next step, they can hence be deleted, yielding the grammar:

S₀ → AbB | Ab | bB | b | C

B → AA | A | AC | C

C → b | c

A → a

This grammar produces the same language as the original example grammar, viz. {ab,aba,abaa,abab,abac,abb,abc,b,bab,bac,bb,bc,c}, but has no ε-rules.

UNIT: Eliminate unit rules

A unit rule is a rule of the form

A → B,

where A, B are nonterminal symbols. To remove it, for each rule

B → X₁ ... X_n,

where X₁ ... X_n is a string of nonterminals and terminals, add rule

A → X₁ ... X_n

unless this is a unit rule which has already been (or is being) removed. The skipping of nonterminal symbol B in the resulting grammar is possible due to B being a member of the unit closure of nonterminal symbol A.^[9]

Order of transformations

Mutual preservation
of transformation results
Шаблон:Diagonal split header	START	TERM	BIN	DEL	UNIT
Transformation X always preserves (Шаблон:Aye) resp. may destroy (Шаблон:Nay) the result of Y:
START		Шаблон:Ya	Шаблон:Ya	Шаблон:Na	Шаблон:Na
TERM	Шаблон:Ya		Шаблон:Na	Шаблон:Ya	Шаблон:Ya
BIN	Шаблон:Ya	Шаблон:Ya		Шаблон:Ya	Шаблон:Ya
DEL	Шаблон:Ya	Шаблон:Ya	Шаблон:Ya		Шаблон:Na
UNIT	Шаблон:Ya	Шаблон:Ya	Шаблон:Ya	Шаблон:Ya
^*UNIT preserves the result of DEL if START had been called before.

When choosing the order in which the above transformations are to be applied, it has to be considered that some transformations may destroy the result achieved by other ones. For example, START will re-introduce a unit rule if it is applied after UNIT. The table shows which orderings are admitted.

Moreover, the worst-case bloat in grammar size^{[note 5]} depends on the transformation order. Using |G| to denote the size of the original grammar G, the size blow-up in the worst case may range from |G|² to 2^{2 |G|}, depending on the transformation algorithm used.^[8]Шаблон:Rp The blow-up in grammar size depends on the order between DEL and BIN. It may be exponential when DEL is done first, but is linear otherwise. UNIT can incur a quadratic blow-up in the size of the grammar.^[8]Шаблон:Rp The orderings START,TERM,BIN,DEL,UNIT and START,BIN,DEL,UNIT,TERM lead to the least (i.e. quadratic) blow-up.

Example

Файл:Syntax tree of arithmetic expression wrt Chomsky normal form grammar.gif

Abstract syntax tree of the arithmetic expression "a^2+4*b" wrt. the example grammar (top) and its Chomsky normal form (bottom)

The following grammar, with start symbol Expr, describes a simplified version of the set of all syntactical valid arithmetic expressions in programming languages like C or Algol60. Both number and variable are considered terminal symbols here for simplicity, since in a compiler front end their internal structure is usually not considered by the parser. The terminal symbol "^" denoted exponentiation in Algol60.

Expr	→ Term	\| Expr AddOp Term	\| AddOp Term
Term	→ Factor	\| Term MulOp Factor
Factor	→ Primary	\| Factor ^ Primary
Primary	→ number	\| variable	\| ( Expr )
AddOp	→ +	\| −
MulOp	→ *	\| /

In step "START" of the above conversion algorithm, just a rule S₀→Expr is added to the grammar. After step "TERM", the grammar looks like this:

S₀	→ Expr
Expr	→ Term	\| Expr AddOp Term	\| AddOp Term
Term	→ Factor	\| Term MulOp Factor
Factor	→ Primary	\| Factor PowOp Primary
Primary	→ number	\| variable	\| Open Expr Close
AddOp	→ +	\| −
MulOp	→ *	\| /
PowOp	→ ^
Open	→ (
Close	→ )

After step "BIN", the following grammar is obtained:

S₀	→ Expr
Expr	→ Term	\| Expr AddOp_Term	\| AddOp Term
Term	→ Factor	\| Term MulOp_Factor
Factor	→ Primary	\| Factor PowOp_Primary
Primary	→ number	\| variable	\| Open Expr_Close
AddOp	→ +	\| −
MulOp	→ *	\| /
PowOp	→ ^
Open	→ (
Close	→ )
AddOp_Term	→ AddOp Term
MulOp_Factor	→ MulOp Factor
PowOp_Primary	→ PowOp Primary
Expr_Close	→ Expr Close

Since there are no ε-rules, step "DEL" does not change the grammar. After step "UNIT", the following grammar is obtained, which is in Chomsky normal form:

S₀	→ number	\| variable	\| Open Expr_Close	\| Factor PowOp_Primary	\| Term MulOp_Factor	\| Expr AddOp_Term	\| AddOp Term
Expr	→ number	\| variable	\| Open Expr_Close	\| Factor PowOp_Primary	\| Term MulOp_Factor	\| Expr AddOp_Term	\| AddOp Term
Term	→ number	\| variable	\| Open Expr_Close	\| Factor PowOp_Primary	\| Term MulOp_Factor
Factor	→ number	\| variable	\| Open Expr_Close	\| Factor PowOp_Primary
Primary	→ number	\| variable	\| Open Expr_Close
AddOp	→ +	\| −
MulOp	→ *	\| /
PowOp	→ ^
Open	→ (
Close	→ )
AddOp_Term	→ AddOp Term
MulOp_Factor	→ MulOp Factor
PowOp_Primary	→ PowOp Primary
Expr_Close	→ Expr Close

The N_a introduced in step "TERM" are PowOp, Open, and Close. The A_i introduced in step "BIN" are AddOp_Term, MulOp_Factor, PowOp_Primary, and Expr_Close.

Alternative definition

Chomsky reduced form

Another way^[4]Шаблон:Rp^[10] to define the Chomsky normal form is:

A formal grammar is in Chomsky reduced form if all of its production rules are of the form:

<math>A \rightarrow\, BC</math> or

<math>A \rightarrow\, a</math>,

where <math>A</math>, <math>B</math> and <math>C</math> are nonterminal symbols, and <math>a</math> is a terminal symbol. When using this definition, <math>B</math> or <math>C</math> may be the start symbol. Only those context-free grammars which do not generate the empty string can be transformed into Chomsky reduced form.

Floyd normal form

In a letter where he proposed a term Backus–Naur form (BNF), Donald E. Knuth implied a BNF "syntax in which all definitions have such a form may be said to be in 'Floyd Normal Form'",

<math>\langle A \rangle ::= \, \langle B \rangle \mid \langle C \rangle</math> or

<math>\langle A \rangle ::= \, \langle B \rangle \langle C \rangle</math> or

<math>\langle A \rangle ::=\, a</math>,

where <math>\langle A \rangle</math>, <math>\langle B \rangle</math> and <math>\langle C \rangle</math> are nonterminal symbols, and <math>a</math> is a terminal symbol, because Robert W. Floyd found any BNF syntax can be converted to the above one in 1961.^[11] But he withdrew this term, "since doubtless many people have independently used this simple fact in their own work, and the point is only incidental to the main considerations of Floyd's note."^[12] While Floyd's note cites Chomsky's original 1959 article, Knuth's letter does not.

Application

Besides its theoretical significance, CNF conversion is used in some algorithms as a preprocessing step, e.g., the CYK algorithm, a bottom-up parsing for context-free grammars, and its variant probabilistic CKY.^[13]

Notes

Шаблон:Reflist

References

↑ Шаблон:Cite journal Here: Sect.6, p.152ff.
↑ Шаблон:Cite web
↑ Шаблон:Cite book
↑ ^4,0 ^4,1 ^4,2 ^4,3 ^4,4 ^4,5 Шаблон:Cite book
↑ Шаблон:Cite book Section 7.1.5, p.272
↑ Шаблон:Cite book
↑ Шаблон:Cite book Section 6.2 "Die Chomsky-Normalform für kontextfreie Grammatiken", p. 149–152
↑ ^8,0 ^8,1 ^8,2 Шаблон:Cite journal
↑ Шаблон:Cite book
↑ Hopcroft et al. (2006)Шаблон:Page needed
↑ Шаблон:Cite journal Here: p.354
↑ Шаблон:Cite journal
↑ Шаблон:Cite book

Партнерские ресурсы
Криптовалюты	Обмен криптовалют - www.bestchange.ru Криптовалютная биржа CoinEx Криптовалютная биржа Binance HIVE OS - операционная система для майнинга e4pool - Мультивалютный пул для майнинга.
Магазины	AliExpress — глобальная виртуальная (в Интернете) торговая площадка, предоставляющая возможность покупать товары производителей из КНР; computeruniverse.net - Интернет-магазин компьютеров(Промо код 5 Евро на первую покупку:FWWC3ZKQ);
Хостинг	DigitalOcean - американский провайдер облачных инфраструктур, с главным офисом в Нью-Йорке и с центрами обработки данных по всему миру;
Разное	Викиум - Онлайн-тренажер для мозга Like Центр - Центр поддержки и развития предпринимательства. Gamersbay - лучший магазин по бустингу для World of Warcraft. Ноотропы OmniMind N°1 - Усиливает мозговую активность. Повышает мотивацию. Улучшает память. Санкт-Петербургская школа телевидения - это федеральная сеть образовательных центров, которая имеет филиалы в 37 городах России. Lingualeo.com — интерактивный онлайн-сервис для изучения и практики английского языка в увлекательной игровой форме. Junyschool (Джунискул) – международная школа программирования и дизайна для детей и подростков от 5 до 17 лет, где ученики осваивают компьютерную грамотность, развивают алгоритмическое и креативное мышление, изучают основы программирования и компьютерной графики, создают собственные проекты: игры, сайты, программы, приложения, анимации, 3D-модели, монтируют видео. Умназия - Интерактивные онлайн-курсы и тренажеры для развития мышления детей 6-13 лет SkillBox - это один из лидеров российского рынка онлайн-образования. Среди партнеров Skillbox ведущий разработчик сервисного дизайна AIC, медиа-компания Yoola, первое и самое крупное русскоязычное аналитическое агентство Tagline, онлайн-школа дизайна и иллюстрации Bang! Bang! Education, оператор PR-рынка PACO, студия рисования Draw&Go, агентство performance-маркетинга Ingate, scrum-студия Sibirix, имидж-лаборатория Персона. «Нетология» — это университет по подготовке и дополнительному обучению специалистов в области интернет-маркетинга, управления проектами и продуктами, дизайна, Data Science и разработки. В рамках Нетологии студенты получают ценные теоретические знания от лучших экспертов Рунета, выполняют практические задания на отработку полученных навыков, общаются с экспертами и единомышленниками. Познакомиться со всеми продуктами подробнее можно на сайте https://netology.ru, линейка курсов и профессий постоянно обновляется. StudyBay Brazil – это онлайн биржа для португалоговорящих студентов и авторов! Студент получает уникальную работу любого уровня сложности и больше свободного времени, в то время как у автора появляется дополнительный заработок и бесценный опыт. Автор24 — самая большая в России площадка по написанию учебных работ: контрольные и курсовые работы, дипломы, рефераты, решение задач, отчеты по практике, а так же любой другой вид работы. Сервис сотрудничает с более 70 000 авторов. Более 1 000 000 работ уже выполнено. StudyBay – это онлайн биржа для англоязычных студентов и авторов! Студент получает уникальную работу любого уровня сложности и больше свободного времени, в то время как у автора появляется дополнительный заработок и бесценный опыт.

Английская Википедия:Chomsky normal form

Содержание

Converting a grammar to Chomsky normal form

START: Eliminate the start symbol from right-hand sides

TERM: Eliminate rules with nonsolitary terminals

BIN: Eliminate right-hand sides with more than 2 nonterminals

DEL: Eliminate ε-rules

UNIT: Eliminate unit rules

Order of transformations

Example

Alternative definition

Chomsky reduced form

Floyd normal form

Application

See also

Notes

References

Further reading

Навигация

Действия на странице

Действия на странице

Персональные инструменты

Навигация

Поиск

Инструменты