Английская Википедия:Gram–Schmidt process
In mathematics, particularly linear algebra and numerical analysis, the Gram–Schmidt process or Gram-Schmidt algorithm is a way of making two or more vectors perpendicular to each other.
By technical definition, it is a method of constructing an orthonormal basis from a set of vectors in an inner product space, most commonly the Euclidean space Шаблон:Math equipped with the standard inner product. The Gram–Schmidt process takes a finite, linearly independent set of vectors Шаблон:Math for Шаблон:Math and generates an orthogonal set Шаблон:Math that spans the same k-dimensional subspace of Rn as S.
The method is named after Jørgen Pedersen Gram and Erhard Schmidt, but Pierre-Simon Laplace had been familiar with it before Gram and Schmidt.[1] In the theory of Lie group decompositions, it is generalized by the Iwasawa decomposition.
The application of the Gram–Schmidt process to the column vectors of a full column rank matrix yields the QR decomposition (it is decomposed into an orthogonal and a triangular matrix).
The Gram–Schmidt process
The vector projection of a vector <math>\mathbf v</math> on a nonzero vector <math>\mathbf u</math> is defined as <math display="block">\operatorname{proj}_{\mathbf u} (\mathbf{v}) = \frac{\langle \mathbf{v}, \mathbf{u}\rangle}{\langle \mathbf{u}, \mathbf{u}\rangle} \,\mathbf{u} , </math> where <math>\langle \mathbf{v}, \mathbf{u}\rangle</math> denotes the inner product of the vectors <math>\mathbf u</math> and <math>\mathbf v</math>. This means that <math>\operatorname{proj}_{\mathbf u} (\mathbf{v})</math> is the orthogonal projection of <math>\mathbf v</math> onto the line spanned by <math>\mathbf u</math>. If <math>\mathbf u</math> is the zero vector, then <math>\operatorname{proj}_{\mathbf u} (\mathbf{v})</math> is defined as the zero vector.
Given Шаблон:Mvar vectors <math>\mathbf{v}_1, \ldots, \mathbf{v}_k</math> the Gram–Schmidt process defines the vectors <math>\mathbf{u}_1, \ldots, \mathbf{u}_k</math> as follows: <math display="block">\begin{align} \mathbf{u}_1 & = \mathbf{v}_1, & \!\mathbf{e}_1 & = \frac{\mathbf{u}_1}{\|\mathbf{u}_1\|} \\ \mathbf{u}_2 & = \mathbf{v}_2-\operatorname{proj}_{\mathbf{u}_1} (\mathbf{v}_2), & \!\mathbf{e}_2 & = \frac{\mathbf{u}_2}{\|\mathbf{u}_2\|} \\ \mathbf{u}_3 & = \mathbf{v}_3-\operatorname{proj}_{\mathbf{u}_1} (\mathbf{v}_3) - \operatorname{proj}_{\mathbf{u}_2} (\mathbf{v}_3), & \!\mathbf{e}_3 & = \frac{\mathbf{u}_3 }{\|\mathbf{u}_3\|} \\ \mathbf{u}_4 & = \mathbf{v}_4-\operatorname{proj}_{\mathbf{u}_1} (\mathbf{v}_4)-\operatorname{proj}_{\mathbf{u}_2} (\mathbf{v}_4)-\operatorname{proj}_{\mathbf{u}_3} (\mathbf{v}_4), & \!\mathbf{e}_4 & = {\mathbf{u}_4 \over \|\mathbf{u}_4\|} \\ & {}\ \ \vdots & & {}\ \ \vdots \\ \mathbf{u}_k & = \mathbf{v}_k - \sum_{j=1}^{k-1}\operatorname{proj}_{\mathbf{u}_j} (\mathbf{v}_k), & \!\mathbf{e}_k & = \frac{\mathbf{u}_k}{\|\mathbf{u}_k\|}. \end{align}</math>
The sequence Шаблон:Math is the required system of orthogonal vectors, and the normalized vectors Шаблон:Math form an orthonormal set. The calculation of the sequence Шаблон:Math is known as Gram–Schmidt orthogonalization, and the calculation of the sequence Шаблон:Math is known as Gram–Schmidt orthonormalization.
To check that these formulas yield an orthogonal sequence, first compute <math>\langle \mathbf{u}_1, \mathbf{u}_2 \rangle</math> by substituting the above formula for u2: we get zero. Then use this to compute <math>\langle \mathbf{u}_1, \mathbf{u}_3 \rangle</math> again by substituting the formula for u3: we get zero. The general proof proceeds by mathematical induction.
Geometrically, this method proceeds as follows: to compute ui, it projects vi orthogonally onto the subspace U generated by Шаблон:Math, which is the same as the subspace generated by Шаблон:Math. The vector ui is then defined to be the difference between vi and this projection, guaranteed to be orthogonal to all of the vectors in the subspace U.
The Gram–Schmidt process also applies to a linearly independent countably infinite sequence Шаблон:Math. The result is an orthogonal (or orthonormal) sequence Шаблон:Math such that for natural number Шаблон:Mvar: the algebraic span of Шаблон:Math is the same as that of Шаблон:Math.
If the Gram–Schmidt process is applied to a linearly dependent sequence, it outputs the Шаблон:Math vector on the ith step, assuming that Шаблон:Math is a linear combination of Шаблон:Math. If an orthonormal basis is to be produced, then the algorithm should test for zero vectors in the output and discard them because no multiple of a zero vector can have a length of 1. The number of vectors output by the algorithm will then be the dimension of the space spanned by the original inputs.
A variant of the Gram–Schmidt process using transfinite recursion applied to a (possibly uncountably) infinite sequence of vectors <math>(v_\alpha)_{\alpha<\lambda}</math> yields a set of orthonormal vectors <math>(u_\alpha)_{\alpha<\kappa}</math> with <math>\kappa\leq\lambda</math> such that for any <math>\alpha\leq\lambda</math>, the completion of the span of <math>\{ u_\beta : \beta<\min(\alpha,\kappa) \}</math> is the same as that of Шаблон:Nowrap In particular, when applied to a (algebraic) basis of a Hilbert space (or, more generally, a basis of any dense subspace), it yields a (functional-analytic) orthonormal basis. Note that in the general case often the strict inequality <math>\kappa < \lambda</math> holds, even if the starting set was linearly independent, and the span of <math>(u_\alpha)_{\alpha<\kappa}</math> need not be a subspace of the span of <math>(v_\alpha)_{\alpha<\lambda}</math> (rather, it's a subspace of its completion).
Example
Euclidean space
Consider the following set of vectors in Шаблон:Math (with the conventional inner product) <math display="block">S = \left\{\mathbf{v}_1=\begin{bmatrix} 3 \\ 1\end{bmatrix}, \mathbf{v}_2=\begin{bmatrix}2 \\2\end{bmatrix}\right\}.</math>
Now, perform Gram–Schmidt, to obtain an orthogonal set of vectors: <math display="block">\mathbf{u}_1=\mathbf{v}_1=\begin{bmatrix}3\\1\end{bmatrix}</math> <math display="block"> \mathbf{u}_2 = \mathbf{v}_2 - \operatorname{proj}_{\mathbf{u}_1} (\mathbf{v}_2) = \begin{bmatrix}2\\2\end{bmatrix} - \operatorname{proj}_{\left[\begin{smallmatrix}3 \\ 1\end{smallmatrix}\right]} {\begin{bmatrix}2\\2\end{bmatrix}} = \begin{bmatrix}2\\2\end{bmatrix} - \frac{8}{10} \begin{bmatrix} 3 \\1 \end{bmatrix} = \begin{bmatrix} -2/5 \\6/5 \end{bmatrix}. </math>
We check that the vectors Шаблон:Math and Шаблон:Math are indeed orthogonal: <math display="block">\langle\mathbf{u}_1,\mathbf{u}_2\rangle = \left\langle \begin{bmatrix}3\\1\end{bmatrix}, \begin{bmatrix} -2/5 \\ 6/5 \end{bmatrix} \right\rangle = -\frac{6}{5} + \frac{6}{5} = 0,</math> noting that if the dot product of two vectors is 0 then they are orthogonal.
For non-zero vectors, we can then normalize the vectors by dividing out their sizes as shown above: <math display="block">\mathbf{e}_1 = \frac{1}{\sqrt {10}}\begin{bmatrix}3\\1\end{bmatrix}</math> <math display="block">\mathbf{e}_2 = \frac{1}{\sqrt{40 \over 25}} \begin{bmatrix}-2/5\\6/5\end{bmatrix} = \frac{1}{\sqrt{10}} \begin{bmatrix}-1\\3\end{bmatrix}. </math>
Properties
Denote by <math> \operatorname{GS}(\mathbf{v}_1, \dots, \mathbf{v}_k) </math> the result of applying the Gram–Schmidt process to a collection of vectors <math> \mathbf{v}_1, \dots, \mathbf{v}_k </math>. This yields a map <math> \operatorname{GS} \colon (\R^n)^{k} \to (\R^n)^{k} </math>.
It has the following properties:
- It is continuous
- It is orientation preserving in the sense that <math> \operatorname{or}(\mathbf{v}_1,\dots,\mathbf{v}_k) = \operatorname{or}(\operatorname{GS}(\mathbf{v}_1,\dots,\mathbf{v}_k)) </math>.
- It commutes with orthogonal maps:
Let <math> g \colon \R^n \to \R^n </math> be orthogonal (with respect to the given inner product). Then we have <math display="block"> \operatorname{GS}(g(\mathbf{v}_1),\dots,g(\mathbf{v}_k)) = \left( g(\operatorname{GS}(\mathbf{v}_1,\dots,\mathbf{v}_k)_1),\dots,g(\operatorname{GS}(\mathbf{v}_1,\dots,\mathbf{v}_k)_k) \right) </math>
Further a parametrized version of the Gram–Schmidt process yields a (strong) deformation retraction of the general linear group <math> \mathrm{GL}(\R^n)</math> onto the orthogonal group <math> O(\R^n)</math>.
Numerical stability
When this process is implemented on a computer, the vectors <math>\mathbf{u}_k</math> are often not quite orthogonal, due to rounding errors. For the Gram–Schmidt process as described above (sometimes referred to as "classical Gram–Schmidt") this loss of orthogonality is particularly bad; therefore, it is said that the (classical) Gram–Schmidt process is numerically unstable.
The Gram–Schmidt process can be stabilized by a small modification; this version is sometimes referred to as modified Gram-Schmidt or MGS. This approach gives the same result as the original formula in exact arithmetic and introduces smaller errors in finite-precision arithmetic. Instead of computing the vector Шаблон:Math as <math display="block"> \mathbf{u}_k = \mathbf{v}_k - \operatorname{proj}_{\mathbf{u}_1} (\mathbf{v}_k) - \operatorname{proj}_{\mathbf{u}_2} (\mathbf{v}_k) - \cdots - \operatorname{proj}_{\mathbf{u}_{k-1}} (\mathbf{v}_k), </math> it is computed as <math display="block"> \begin{align} \mathbf{u}_k^{(1)} &= \mathbf{v}_k - \operatorname{proj}_{\mathbf{u}_1} (\mathbf{v}_k), \\ \mathbf{u}_k^{(2)} &= \mathbf{u}_k^{(1)} - \operatorname{proj}_{\mathbf{u}_2} \left(\mathbf{u}_k^{(1)}\right), \\ & \;\; \vdots \\ \mathbf{u}_k^{(k-2)} &= \mathbf{u}_k^{(k-3)} - \operatorname{proj}_{\mathbf{u}_{k-2}} \left(\mathbf{u}_k^{(k-3)}\right), \\ \mathbf{u}_k^{(k-1)} &= \mathbf{u}_k^{(k-2)} - \operatorname{proj}_{\mathbf{u}_{k-1}} \left(\mathbf{u}_k^{(k-2)}\right), \\ \mathbf{e}_k &= \frac{\mathbf{u}_k^{(k-1)}}{\left\|\mathbf{u}_k^{(k-1)}\right\|} \end{align} </math>
This method is used in the previous animation, when the intermediate Шаблон:Math vector is used when orthogonalizing the blue vector Шаблон:Math.
Here is another description of the modified algorithm. Given the vectors <math>v_1, v_2, \dots, v_n</math>, in our first step we produce vectors <math>v_1, v_2^{(1)}, \dots, v_n^{(1)}</math>by removing components along the direction of <math>v_1</math>. In formulas, <math>v_k^{(1)} := v_k - \frac{\langle v_k, v_1 \rangle}{\langle v_1, v_1 \rangle}v_1</math>. After this step we already have two of our desired orthogonal vectors <math>u_1, \dots, u_n</math>, namely <math>u_1 = v_1, u_2 = v_2^{(1)}</math>, but we also made <math>v_3^{(1)}, \dots, v_n^{(1)}</math> already orthogonal to <math>u_1</math>. Next, we orthogonalize those remaining vectors against <math>u_2 = v_2^{(1)}</math>. This means we compute <math>v_3^{(2)}, v_4^{(2)}, \dots, v_n^{(2)}</math> by subtraction <math>v_k^{(2)} := v_k^{(1)} - \frac{\langle v_k^{(1)}, u_2 \rangle}{\langle u_2, u_2 \rangle} u_2</math>. Now we have stored the vectors <math>v_1, v_2^{(1)}, v_3^{(2)}, v_4^{(2)}, \dots, v_n^{(2)}</math> where the first three vectors are already <math>u_1, u_2, u_3</math> and the remaining vectors are already orthogonal to <math>u_1, u_2</math>. As should be clear now, the next step orthogonalizes <math>v_4^{(2)}, \dots, v_n^{(2)}</math> against <math>u_3 = v_3^{(2)}</math>. Proceeding in this manner we find the full set of orthogonal vectors <math>u_1, \dots, u_n</math>. If orthonormal vectors are desired, then we normalize as we go, so that the denominators in the subtraction formulas turn into ones.
Algorithm
The following MATLAB algorithm implements classical Gram–Schmidt orthonormalization. The vectors Шаблон:Math (columns of matrix V
, so that V(:,j)
is the jth vector) are replaced by orthonormal vectors (columns of U
) which span the same subspace.
function U = gramschmidt(V)
[n, k] = size(V);
U = zeros(n,k);
U(:,1) = V(:,1) / norm(V(:,1));
for i = 2:k
U(:,i) = V(:,i);
for j = 1:i-1
U(:,i) = U(:,i) - (U(:,j)'*U(:,i)) * U(:,j);
end
U(:,i) = U(:,i) / norm(U(:,i));
end
end
The cost of this algorithm is asymptotically Шаблон:Math floating point operations, where Шаблон:Mvar is the dimensionality of the vectors.Шаблон:Sfn
Via Gaussian elimination
If the rows Шаблон:Math are written as a matrix <math>A</math>, then applying Gaussian elimination to the augmented matrix <math>\left[A A^\mathsf{T} | A \right]</math> will produce the orthogonalized vectors in place of <math>A</math>. However the matrix <math>A A^\mathsf{T}</math> must be brought to row echelon form, using only the row operation of adding a scalar multiple of one row to another.[2] For example, taking <math>\mathbf{v}_1 = \begin{bmatrix} 3 & 1\end{bmatrix}, \mathbf{v}_2=\begin{bmatrix}2 & 2\end{bmatrix}</math> as above, we have <math display="block">\left[A A^\mathsf{T} | A \right] = \left[\begin{array}{rr|rr} 10 & 8 & 3 & 1 \\ 8 & 8 & 2 & 2\end{array}\right]</math>
And reducing this to row echelon form produces <math display="block">\left[\begin{array}{rr|rr} 1 & .8 & .3 & .1 \\ 0 & 1 & -.25 & .75\end{array}\right]</math>
The normalized vectors are then <math display="block">\mathbf{e}_1 = \frac{1}{\sqrt {.3^2+.1^2}}\begin{bmatrix}.3 & .1\end{bmatrix} = \frac{1}{\sqrt{10}} \begin{bmatrix}3 & 1\end{bmatrix}</math> <math display="block">\mathbf{e}_2 = \frac{1}{\sqrt{.25^2+.75^2}} \begin{bmatrix}-.25 & .75\end{bmatrix} = \frac{1}{\sqrt{10}} \begin{bmatrix}-1 & 3\end{bmatrix}, </math> as in the example above.
Determinant formula
The result of the Gram–Schmidt process may be expressed in a non-recursive formula using determinants.
<math display="block"> \mathbf{e}_j = \frac{1}{\sqrt{D_{j-1} D_j}} \begin{vmatrix} \langle \mathbf{v}_1, \mathbf{v}_1 \rangle & \langle \mathbf{v}_2, \mathbf{v}_1 \rangle & \cdots & \langle \mathbf{v}_j, \mathbf{v}_1 \rangle \\ \langle \mathbf{v}_1, \mathbf{v}_2 \rangle & \langle \mathbf{v}_2, \mathbf{v}_2 \rangle & \cdots & \langle \mathbf{v}_j, \mathbf{v}_2 \rangle \\ \vdots & \vdots & \ddots & \vdots \\ \langle \mathbf{v}_1, \mathbf{v}_{j-1} \rangle & \langle \mathbf{v}_2, \mathbf{v}_{j-1} \rangle & \cdots & \langle \mathbf{v}_j, \mathbf{v}_{j-1} \rangle \\ \mathbf{v}_1 & \mathbf{v}_2 & \cdots & \mathbf{v}_j \end{vmatrix} </math>
<math display="block"> \mathbf{u}_j = \frac{1}{D_{j-1} } \begin{vmatrix} \langle \mathbf{v}_1, \mathbf{v}_1 \rangle & \langle \mathbf{v}_2, \mathbf{v}_1 \rangle & \cdots & \langle \mathbf{v}_j, \mathbf{v}_1 \rangle \\ \langle \mathbf{v}_1, \mathbf{v}_2 \rangle & \langle \mathbf{v}_2, \mathbf{v}_2 \rangle & \cdots & \langle \mathbf{v}_j, \mathbf{v}_2 \rangle \\ \vdots & \vdots & \ddots & \vdots \\ \langle \mathbf{v}_1, \mathbf{v}_{j-1} \rangle & \langle \mathbf{v}_2, \mathbf{v}_{j-1} \rangle & \cdots & \langle \mathbf{v}_j, \mathbf{v}_{j-1} \rangle \\ \mathbf{v}_1 & \mathbf{v}_2 & \cdots & \mathbf{v}_j \end{vmatrix} </math>
where D0=1 and, for j ≥ 1, Dj is the Gram determinant
<math display="block"> D_j = \begin{vmatrix} \langle \mathbf{v}_1, \mathbf{v}_1 \rangle & \langle \mathbf{v}_2, \mathbf{v}_1 \rangle & \cdots & \langle \mathbf{v}_j, \mathbf{v}_1 \rangle \\ \langle \mathbf{v}_1, \mathbf{v}_2 \rangle & \langle \mathbf{v}_2, \mathbf{v}_2 \rangle & \cdots & \langle \mathbf{v}_j, \mathbf{v}_2 \rangle \\ \vdots & \vdots & \ddots & \vdots \\ \langle \mathbf{v}_1, \mathbf{v}_j \rangle & \langle \mathbf{v}_2, \mathbf{v}_j \rangle & \cdots & \langle \mathbf{v}_j, \mathbf{v}_j \rangle \end{vmatrix}. </math>
Note that the expression for uk is a "formal" determinant, i.e. the matrix contains both scalars and vectors; the meaning of this expression is defined to be the result of a cofactor expansion along the row of vectors.
The determinant formula for the Gram-Schmidt is computationally slower (exponentially slower) than the recursive algorithms described above; it is mainly of theoretical interest.
Expressed using geometric algebra
Expressed using notation used in geometric algebra, the unnormalized results of the Gram–Schmidt process can be expressed as <math display="block">\mathbf{u}_k = \mathbf{v}_k - \sum_{j=1}^{k-1} (\mathbf{v}_k \cdot \mathbf{u}_j)\mathbf{u}_j^{-1}\ ,</math> which is equivalent to the expression using the <math>\operatorname{proj}</math> operator defined above. The results can equivalently be expressed as[3] <math display="block">\mathbf{u}_k = \mathbf{v}_{k}\wedge\mathbf{v}_{k-1}\wedge\cdot\cdot\cdot\wedge\mathbf{v}_{1}(\mathbf{v}_{k-1}\wedge\cdot\cdot\cdot\wedge\mathbf{v}_{1})^{-1},</math> which is closely related to the expression using determinants above.
Alternatives
Other orthogonalization algorithms use Householder transformations or Givens rotations. The algorithms using Householder transformations are more stable than the stabilized Gram–Schmidt process. On the other hand, the Gram–Schmidt process produces the <math>j</math>th orthogonalized vector after the <math>j</math>th iteration, while orthogonalization using Householder reflections produces all the vectors only at the end. This makes only the Gram–Schmidt process applicable for iterative methods like the Arnoldi iteration.
Yet another alternative is motivated by the use of Cholesky decomposition for inverting the matrix of the normal equations in linear least squares. Let <math>V</math> be a full column rank matrix, whose columns need to be orthogonalized. The matrix <math>V^* V </math> is Hermitian and positive definite, so it can be written as <math> V^* V = L L^*, </math> using the Cholesky decomposition. The lower triangular matrix <math>L </math> with strictly positive diagonal entries is invertible. Then columns of the matrix <math>U = V\left(L^{-1}\right)^*</math> are orthonormal and span the same subspace as the columns of the original matrix <math>V</math>. The explicit use of the product <math>V^* V </math> makes the algorithm unstable, especially if the product's condition number is large. Nevertheless, this algorithm is used in practice and implemented in some software packages because of its high efficiency and simplicity.
In quantum mechanics there are several orthogonalization schemes with characteristics better suited for certain applications than original Gram–Schmidt. Nevertheless, it remains a popular and effective algorithm for even the largest electronic structure calculations.[4]
Run-time complexity
Gram-Schmidt orthogonalization can be done in strongly-polynomial time. The run-time analysis is similar to that of Gaussian elimination.[5]Шаблон:Rp
References
Sources
External links
- Шаблон:Springer
- Harvey Mudd College Math Tutorial on the Gram-Schmidt algorithm
- Earliest known uses of some of the words of mathematics: G The entry "Gram-Schmidt orthogonalization" has some information and references on the origins of the method.
- Demos: Gram Schmidt process in plane and Gram Schmidt process in space
- Gram-Schmidt orthogonalization applet
- NAG Gram–Schmidt orthogonalization of n vectors of order m routine
- Proof: Raymond Puzio, Keenan Kidwell. "proof of Gram-Schmidt orthogonalization algorithm" (version 8). PlanetMath.org.