Английская Википедия:Interleave lower bound
In the theory of optimal binary search trees, the interleave lower bound is a lower bound on the number of operations required by a Binary Search Tree (BST) to execute a given sequence of accesses.
Several variants of this lower bound have been proven.[1][2][3] This article is based on a variation of the first Wilber's bound.[4] This lower bound is used in the design and analysis of Tango tree.[4] Furthermore, this lower bound can be rephrased and proven geometrically, Geometry of binary search trees.[5]
Definition
The bound is based on a fixed perfect BST <math> P </math>, called the lower bound tree, over the keys <math> \{1, 2,..., n \}</math>. For example, for <math> n = 7 </math>, <math> P </math> can be represented by the following parenthesis structure:
- [([1] 2 [3]) 4 ([5] 6 [7])]
For each node <math> y </math> in <math> P </math>, define:
- <math> Left(y) </math> to be the set of nodes in the left sub-tree of <math> y </math>, including <math> y </math>.
- <math> Right(y) </math> to be the set of nodes in the right sub-tree of <math> y </math>.
Consider the following access sequence: <math> X = x_1, x_2, ..., x_m </math>. For a fixed node <math> y </math>, and for each access <math> x_i </math>, define the label of <math> x_i </math> with respect to <math> y </math> as:
- "L" - if <math> x_i </math> is in <math> Left(y) </math>.
- "R" - if <math> x_i </math> is in <math> Right(y) </math>;
- Null - otherwise.
The label of <math> y </math> is the concatenation of the labels from all the accesses. For example, if the sequence of accesses is: <math> 7,6,3 </math> then the label of the root <math> (4) </math> is: "RRL", the label of 6 is: "RL", and the label of 2 is: "L".
For every node <math> y </math>, define the amount of interleaving through y as the number of alternations between L and R in the label of <math> y </math>. In the above example, the interleaving through <math> 4 </math> and <math> 6 </math> is <math> 1 </math> and the interleaving through all other nodes is <math> 0 </math>.
The interleave bound, <math>\mathit{IB}(X)</math>, is the sum of the interleaving through all the nodes of the tree. The interleave bound of the above sequence is <math> 2 </math>.
The Lower Bound Statement and its Proof
The interleave bound is summarized by the following theorem.
The following proof is based on.[4]
Proof
Let <math>X = x_1,x_2,...,x_m</math> be an access sequence. Denote by <math>T_i</math> the state of an arbitrary BST at time <math>i</math> i.e. after executing the sequence <math>x_1,x_2,...,x_i </math>. We also fix a lower bound BST <math>P</math>.
For a node <math> y </math> in <math> P </math>, define the transition point for <math> y </math> at time <math> i </math> to be the minimum-depth node <math> z </math> in the BST <math>T_i </math> such that the path from the root of <math>T_i</math> to <math> z </math> includes both a node from Left(y) and a node from Right(y). Intuitively, any BST algorithm on <math>T_i</math> that accesses an element from Right(y) and then an element from Left(y) (or vice versa) must touch the transition point of <math>y</math> at least once. In the following Lemma, we will show that transition point is well-defined.
The second lemma that we need to prove states that the transition point is stable. It will not change until it is touched. Шаблон:Math theorem Шаблон:Math proof
The last Lemma toward the proof states that every node <math> y \in P </math> has its unique transition point.
Now, we are ready to prove the theorem. First of all, observe that the number of touched transition points by the offline BST algorithm is a lower bound on its cost, we are counting less nodes than the required for the total cost.
We know by Lemma 3 that at any time <math> i </math>, any node <math> y </math> in <math> T_i </math> can be only a transition for at most one node in <math> P </math>. Thus, It is enough to count the number of touches of a transition node of <math> y </math>, the sum over all <math> y </math>.
Therefore, for a fixed node <math> y \in P </math>, let <math> \ell </math> and <math> r </math> to be defined as in Lemma 1. The transition point of <math> y </math> is among these two nodes. In fact, it is the deeper one. Let <math> x_{i_{1}}, x_{i_{2}}, ..., x_{i_p} </math> be a maximal ordered access sequence to nodes that alternate between <math> Left(y) </math> and <math> Right(y) </math>. Then <math> p </math> is the amount of interleaving through the node <math> y </math>. Suppose that the even indexed accesses are in the <math> Left(y) </math>, and the odd ones are in <math> Right(y) </math> i.e. <math> x_{i_{2j}} \in Left(y) </math> and <math> x_{i_{2j - 1}} \in Right(y) </math>. We know by the properties of lowest common ancestor that an access to a node in <math> Left(y) </math>, it must touch <math> \ell </math>. Similarly, an access to a node in <math> Right(y) </math> must touch <math> r </math>. Consider every <math> j \in [1, \lfloor p/2 \rfloor] </math>. For two consecutive accesses <math> x_{i_{2j - 1}} </math> and <math> x_{i_{2j}} </math>, if they avoid touching the access point of <math> y </math>, then <math> \ell </math> and <math> r </math> must change in between. However, by Lemma 2, such change requires touching the transition point. Consequently, the BST access algorithm touches the transition point of <math> y </math> at least once in the interval of <math> [i_{2j - 1}, i_{2j}] </math>. Summing over all <math> j \in [1, \lfloor p/2 \rfloor ] </math>, the best algorithm touches the transition point of <math> y </math> at least <math> \lfloor p/2 \rfloor \geq p/2 - 1</math>. Summing over all <math> y </math>,
<math> \sum_{y \in P} p_y/2 - 1 \geq IB(X)/2 - n </math>
where <math> p_y </math> is the amount of interleave through <math> y </math>. By definition, the <math> p_y</math>'s add up to <math> IB(X) </math>. That concludes the proof.
See also
References