Английская Википедия:E-graph

Материал из Онлайн справочника
Перейти к навигацииПерейти к поиску

Шаблон:Short description

In computer science, an e-graph is a data structure that stores an equivalence relation over terms of some language.

Definition and operations

Let <math>\Sigma</math> be a set of uninterpreted functions, where <math>\Sigma_n</math> is the subset of <math>\Sigma</math> consisting of functions of arity <math>n</math>. Let <math>\mathbb{id}</math> be a countable set of opaque identifiers that may be compared for equality, called e-class IDs. The application of <math>f\in\Sigma_n</math> to e-class IDs <math>i_1, i_2, \ldots, i_n\in\mathbb{id}</math> is denoted <math>f(i_1, i_2, \ldots, i_n)</math> and called an e-node.

The e-graph then represents equivalence classes of e-nodes, using the following data structures:[1]

  • A union-find structure <math>U</math> representing equivalence classes of e-class IDs, with the usual operations <math>\mathrm{find}</math>, <math>\mathrm{add}</math> and <math>\mathrm{merge}</math>. An e-class ID <math>e</math> is canonical if <math>\mathrm{find}(U, e) = e</math>; an e-node <math>f(i_1,\ldots,i_n)</math> is canonical if each <math>i_j</math> is canonical (<math>j</math> in <math>1,\ldots,n</math>).
  • An association of e-class IDs with sets of e-nodes, called e-classes. This consists of
    • a hashcons <math>H</math> (i.e. a mapping) from canonical e-nodes to e-class IDs, and
    • an e-class map <math>M</math> that maps e-class IDs to e-classes, such that <math>M</math> maps equivalent IDs to the same set of e-nodes: <math>\forall i, j\in\mathbb{id},M[i]=M[j]\Leftrightarrow \mathrm{find}(U,i)=\mathrm{find}(U,j)</math>

Invariants

In addition to the above structure, a valid e-graph conforms to several data structure invariants.[2] Two e-nodes are equivalent if they are in the same e-class. The congruence invariant states that an e-graph must ensure that equivalence is closed under congruence, where two e-nodes <math>f(i_1,\ldots,i_n),f(j_1,\ldots,j_n)</math> are congruent when <math>\mathrm{find}(U, i_k)=\mathrm{find}(U, j_k),k\in \{1,\ldots,n\}</math>. The hashcons invariant states that the hashcons maps canonical e-nodes to their e-class ID.

Operations

Шаблон:Expand section

E-graphs expose wrappers around the <math>\mathrm{add}</math>, <math>\mathrm{find}</math>, and <math>\mathrm{merge}</math> operations from the union-find that preserve the e-graph invariants. The last operation, e-matching, is described below.

E-matching

Let <math>V</math> be a set of variables and let <math>\mathrm{Term}(\Sigma, V)</math> be the smallest set that includes the 0-arity function symbols (also called constants), includes the variables, and is closed under application of the function symbols. In other words, <math>\mathrm{Term}(\Sigma, V)</math> is the smallest set such that <math>V\subset\mathrm{Term}(V, \Sigma)</math>, <math>\Sigma_0\subset\mathrm{Term}(\Sigma, V)</math>, and when <math>x_1, \ldots, x_n\in \mathrm{Term}(\Sigma, V)</math> and <math>f\in\Sigma_n</math>, then <math>f(x_1,\ldots,x_n)\in\mathrm{Term}(\Sigma, V)</math>. A term containing variables is called a pattern, a term without variables is called ground.

An e-graph <math>E</math> represents a ground term <math>t\in\mathrm{Term}(\Sigma, \emptyset)</math> if one of its e-classes represents <math>t</math>. An e-class <math>C</math> represents <math>t</math> if some e-node <math>f(i_1,\ldots,i_n)\in C</math> does. An e-node <math>f(i_1,\ldots,i_n)\in C</math> represents a term <math>g(j_1,\ldots,j_n)</math> if <math>f=g</math> and each e-class <math>M[i_k]</math> represents the term <math>j_k</math> (<math>k</math> in <math>1,\ldots,n</math>).

e-matching is an operation that takes a pattern <math>p\in\mathrm{Term}(\Sigma, V)</math> and an e-graph <math>E</math>, and yields all pairs <math>(\sigma, C)</math> where <math>\sigma\subset V\times\mathbb{id}</math> is a substitution mapping the variables in <math>p</math> to e-class IDs and <math>C\in\mathbb{id}</math> is an e-class ID such that each term <math>\sigma(p)</math> is represented by <math>C</math>. There are several known algorithms for e-matching,[3][4] the relational e-matching algorithm is based on worst-case optimal joins and is worst-case optimal.[5]

Complexity

  • An e-graph with n equalities can be constructed in O(n log n) time.[6]

Equality saturation

Шаблон:Expand section

Equality saturation is a technique for building optimizing compilers using e-graphs.[7] It operates by applying a set of rewrites using e-matching until the e-graph is saturated, a timeout is reached, an e-graph size limit is reached, a fixed number of iterations is exceeded, or some other halting condition is reached. After rewriting, an optimal term is extracted from the e-graph according to some cost function, usually related to AST size or performance considerations.

Applications

E-graphs are used in automated theorem proving. They are a crucial part of modern SMT solvers such as Z3[8] and CVC4, where they are used to decide the empty theory by computing the congruence closure of a set of equalities, and e-matching is used to instantiate quantifiers.[9] In DPLL(T)-based solvers that use conflict-driven clause learning (also known as non-chronological backtracking), e-graphs are extended to produce proof certificates.[10] E-graphs are also used in the Simplify theorem prover of ESC/Java.[11]

Equality saturation is used in specialized optimizing compilers,[12] e.g. for deep learning[13] and linear algebra.[14] Equality saturation has also been used for translation validation applied to the LLVM toolchain.[15]

E-graphs have been applied to several problems in program analysis, including fuzzing,[16] abstract interpretation,[17][18] and library learning.[19]

References

Шаблон:Reflist

External links

Шаблон:Program analysis