Английская Википедия:Color-coding
Шаблон:About In computer science and graph theory, the term color-coding refers to an algorithmic technique which is useful in the discovery of network motifs. For example, it can be used to detect a simple path of length Шаблон:Mvar in a given graph. The traditional color-coding algorithm is probabilistic, but it can be derandomized without much overhead in the running time.
Color-coding also applies to the detection of cycles of a given length, and more generally it applies to the subgraph isomorphism problem (an NP-complete problem), where it yields polynomial time algorithms when the subgraph pattern that it is trying to detect has bounded treewidth.
The color-coding method was proposed and analyzed in 1994 by Noga Alon, Raphael Yuster, and Uri Zwick.[1][2]
Results
The following results can be obtained through the method of color-coding:
- For every fixed constant Шаблон:Mvar, if a graph Шаблон:Math contains a simple cycle of size Шаблон:Mvar, then such a cycle can be found in:
- <math>O(|V|^\omega)</math> expected time, or
- <math>O(|V|^\omega \log |V|)</math> worst-case time, where Шаблон:Mvar is the exponent of matrix multiplication.[3]
- For every fixed constant Шаблон:Mvar, and every graph Шаблон:Math that is in any nontrivial minor-closed graph family (e.g., a planar graph), if Шаблон:Mvar contains a simple cycle of size Шаблон:Mvar, then such cycle can be found in:
- Шаблон:Math expected time, or
- Шаблон:Math worst-case time.
- If a graph Шаблон:Math contains a subgraph isomorphic to a bounded treewidth graph which has Шаблон:Math vertices, then such a subgraph can be found in polynomial time.
The method
To solve the problem of finding a subgraph <math>H = (V_H, E_H)</math> in a given graph Шаблон:Math, where Шаблон:Mvar can be a path, a cycle, or any bounded treewidth graph where <math>|V_H| = O(\log |V|)</math>, the method of color-coding begins by randomly coloring each vertex of Шаблон:Mvar with <math>k = |V_H|</math> colors, and then tries to find a colorful copy of Шаблон:Mvar in colored Шаблон:Mvar. Here, a graph is colorful if every vertex in it is colored with a distinct color. This method works by repeating (1) random coloring a graph and (2) finding colorful copy of the target subgraph, and eventually the target subgraph can be found if the process is repeated a sufficient number of times.
Suppose a copy of Шаблон:Mvar in Шаблон:Mvar becomes colorful with some non-zero probability Шаблон:Mvar. It immediately follows that if the random coloring is repeated Шаблон:Math times, then this copy is expected to become colorful once. Note that though Шаблон:Mvar is small, it is shown that if <math>|V_H| = O(\log |V|)</math>, Шаблон:Mvar is only polynomially small. Suppose again there exists an algorithm such that, given a graph Шаблон:Mvar and a coloring which maps each vertex of Шаблон:Mvar to one of the Шаблон:Mvar colors, it finds a copy of colorful Шаблон:Mvar, if one exists, within some runtime Шаблон:Math. Then the expected time to find a copy of Шаблон:Mvar in Шаблон:Mvar, if one exists, is <math>O(\tfrac{r}{p})</math>.
Sometimes it is also desirable to use a more restricted version of colorfulness. For example, in the context of finding cycles in planar graphs, it is possible to develop an algorithm that finds well-colored cycles. Here, a cycle is well-colored if its vertices are colored by consecutive colors.
Example
An example would be finding a simple cycle of length Шаблон:Mvar in graph Шаблон:Math.
By applying random coloring method, each simple cycle has a probability of <math>k!/k^k > e^{-k}</math> to become colorful, since there are <math>k^k</math> ways of coloring the Шаблон:Mvar vertices on the cycle, among which there are <math>k!</math> colorful occurrences. Then an algorithm (described next) can be used to find colorful cycles in the randomly colored graph Шаблон:Mvar in time <math>O(V^\omega)</math>, where <math>\omega</math> is the matrix multiplication constant. Therefore, it takes <math>e^k\cdot O(V^\omega)</math> overall time to find a simple cycle of length Шаблон:Mvar in Шаблон:Mvar.
The colorful cycle-finding algorithm works by first finding all pairs of vertices in Шаблон:Mvar that are connected by a simple path of length Шаблон:Math, and then checking whether the two vertices in each pair are connected. Given a coloring function Шаблон:Math to color graph Шаблон:Mvar, enumerate all partitions of the color set Шаблон:Math into two subsets Шаблон:Math of size <math>k/2</math> each. Note that Шаблон:Mvar can be divided into Шаблон:Math and Шаблон:Math accordingly, and let Шаблон:Math and Шаблон:Math denote the subgraphs induced by Шаблон:Math and Шаблон:Math respectively. Then, recursively find colorful paths of length <math>k/2 - 1</math> in each of Шаблон:Math and Шаблон:Math. Suppose the boolean matrix Шаблон:Math and Шаблон:Math represent the connectivity of each pair of vertices in Шаблон:Math and Шаблон:Math by a colorful path, respectively, and let Шаблон:Mvar be the matrix describing the adjacency relations between vertices of Шаблон:Math and those of Шаблон:Math, the boolean product <math>A_1BA_2</math> gives all pairs of vertices in Шаблон:Mvar that are connected by a colorful path of length Шаблон:Math. Thus, the recursive relation of matrix multiplications is <math>t(k) \le 2^k\cdot t(k/2)</math>, which yields a runtime of <math>2^{O(k)}\cdot V^\omega</math>. Although this algorithm finds only the end points of the colorful path, another algorithm by Alon and Naor[4] that finds colorful paths themselves can be incorporated into it.
Derandomization
The derandomization of color-coding involves enumerating possible colorings of a graph Шаблон:Mvar, such that the randomness of coloring Шаблон:Mvar is no longer required. For the target subgraph Шаблон:Mvar in Шаблон:Mvar to be discoverable, the enumeration has to include at least one instance where the Шаблон:Mvar is colorful. To achieve this, enumerating a Шаблон:Mvar-perfect family Шаблон:Mvar of hash functions from Шаблон:Math to Шаблон:Math is sufficient. By definition, Шаблон:Mvar is Шаблон:Mvar-perfect if for every subset Шаблон:Mvar of Шаблон:Math where <math>|S| = k</math>, there exists a hash function Шаблон:Mvar in Шаблон:Mvar such that Шаблон:Math is perfect. In other words, there must exist a hash function in Шаблон:Mvar that colors any given Шаблон:Mvar vertices with Шаблон:Mvar distinct colors.
There are several approaches to construct such a Шаблон:Mvar-perfect hash family:
- The best explicit construction is by Moni Naor, Leonard J. Schulman, and Aravind Srinivasan,[5] where a family of size <math>e^k k^{O(\log k)} \log |V|</math> can be obtained. This construction does not require the target subgraph to exist in the original subgraph finding problem.
- Another explicit construction by Jeanette P. Schmidt and Alan Siegel[6] yields a family of size <math>2^{O(k)}\log^2 |V|</math>.
- Another construction that appears in the original paper of Noga Alon et al.[2] can be obtained by first building a Шаблон:Mvar-perfect family that maps Шаблон:Math to Шаблон:Math followed by building another Шаблон:Mvar-perfect family that maps Шаблон:Math to Шаблон:Math In the first step, it is possible to construct such a family with Шаблон:Math random bits that are almost Шаблон:Math-wise independent,[7][8] and the sample space needed for generating those random bits can be as small as <math>k^{O(1)}\log |V|</math>. In the second step, it has been shown by Jeanette P. Schmidt and Alan Siegel[6] that the size of such Шаблон:Mvar-perfect family can be <math>2^{O(k)}</math>. Consequently, by composing the Шаблон:Mvar-perfect families from both steps, a Шаблон:Mvar-perfect family of size <math>2^{O(k)}\log |V|</math> that maps from Шаблон:Math to Шаблон:Math can be obtained.
In the case of derandomizing well-coloring, where each vertex on the subgraph is colored consecutively, a Шаблон:Mvar-perfect family of hash functions from Шаблон:Math to Шаблон:Math is needed. A sufficient Шаблон:Mvar-perfect family which maps from Шаблон:Math to Шаблон:Math can be constructed in a way similar to the approach 3 above (the first step). In particular, it is done by using Шаблон:Math random bits that are almost Шаблон:Math independent, and the size of the resulting Шаблон:Mvar-perfect family will be <math>k^{O(k)}\log |V|</math>.
The derandomization of color-coding method can be easily parallelized, yielding efficient NC algorithms.
Applications
Recently, color-coding has attracted much attention in the field of bioinformatics. One example is the detection of signaling pathways in protein-protein interaction (PPI) networks. Another example is to discover and to count the number of motifs in PPI networks. Studying both signaling pathways and motifs allows a deeper understanding of the similarities and differences of many biological functions, processes, and structures among organisms.
Due to the huge amount of gene data that can be collected, searching for pathways or motifs can be highly time consuming. However, by exploiting the color-coding method, the motifs or signaling pathways with <math>k=O(\log n)</math> vertices in a network Шаблон:Mvar with Шаблон:Mvar vertices can be found very efficiently in polynomial time. Thus, this enables us to explore more complex or larger structures in PPI networks.
Further reading
References
- ↑ Alon, N., Yuster, R., and Zwick, U. 1994. Color-coding: a new method for finding simple paths, cycles and other small subgraphs within large graphs. In Proceedings of the Twenty-Sixth Annual ACM Symposium on theory of Computing (Montreal, Quebec, Canada, May 23–25, 1994). STOC '94. ACM, New York, NY, 326–335. DOI= http://doi.acm.org/10.1145/195058.195179
- ↑ 2,0 2,1 Alon, N., Yuster, R., and Zwick, U. 1995. Color-coding. J. ACM 42, 4 (Jul. 1995), 844–856. DOI= http://doi.acm.org/10.1145/210332.210337
- ↑ Coppersmith–Winograd Algorithm
- ↑ Alon, N. and Naor, M. 1994 Derandomization, Witnesses for Boolean Matrix Multiplication and Construction of Perfect Hash Functions. Technical Report. UMI Order Number: CS94-11., Weizmann Science Press of Israel.
- ↑ Naor, M., Schulman, L. J., and Srinivasan, A. 1995. Splitters and near-optimal derandomization. In Proceedings of the 36th Annual Symposium on Foundations of Computer Science (October 23–25, 1995). FOCS. IEEE Computer Society, Washington, DC, 182.
- ↑ 6,0 6,1 Шаблон:Cite journal
- ↑ Naor, J. and Naor, M. 1990. Small-bias probability spaces: efficient constructions and applications. In Proceedings of the Twenty-Second Annual ACM Symposium on theory of Computing (Baltimore, Maryland, United States, May 13–17, 1990). H. Ortiz, Ed. STOC '90. ACM, New York, NY, 213-223. DOI= http://doi.acm.org/10.1145/100216.100244
- ↑ Alon, N., Goldreich, O., Hastad, J., and Peralta, R. 1990. Simple construction of almost k-wise independent random variables. In Proceedings of the 31st Annual Symposium on Foundations of Computer Science (October 22–24, 1990). SFCS. IEEE Computer Society, Washington, DC, 544-553 vol.2. Шаблон:Doi