Motivation

The problem:
A sequence of linear systems \[ A_i x_i = b_i\quad\text{for}\quad i=1,2,... \] with \(A_{i+1}\approx A_i\) and \(b_{i+1}\approx b_i\).
A solution:

Recycling via deflation.

Apply Krylov subspace method to \[ P_i A_i x_i = P_i b_i, \] where \(P_i\) is

  • a projection that removes "unwanted" portions of \(A_i\)
  • constructed with data from the solution process of \(A_{i-1}x_{i-1}=b_{i-1}\)
See, e.g., Kilmer and de Sturler (2006).

Scope of this talk

Open question:

automatic selection of deflation vectors

Required:

perturbation theory for

  • spectra of deflated matrices
  • Krylov subspace methods
Theory leads to:
  • new convergence bounds for deflated CG, MINRES and GMRES
  • guidance for the automatic selection of deflation vectors for recycling

Assumptions and definitions

  • \(Ax=b\) with nonsingular \(A\in\C^{N,N}\)
  • \(U\in\C^{N,m}\) such that \(U^*AU\) is nonsingular (equivalent: \(A\vsU\oplus\vsU^\perp=\C^N\) for \(\vsU:=\vsR(U)\))
  • Projection \(Q:=P_{\vsU^\perp,A\vsU}=I-AU(U^*AU)^{-1}U^*\)
  • Projection \(P:=P_{(A^*\vsU)^\perp,\vsU}=I-U(U^*AU)^{-1}U^*A\)

Deflation basics

For any initial guess, CG (\(A\) Hpd), MINRES (\(A\) Hermitian) or GMRES applied to \[QAx=Qb\] yields well-defined iterates \(\tilde{x}_n\) and terminates with a solution. The corrected iterates \[ x_n := P\tilde{x}_n + U(U^*AU)^{-1}U^*b \] satisfy \(b-Ax_n=Qb-QA\tilde{x}_n\).
  • Well-known for CG (e.g., Nicolaides, Chapman and Saad, Frank and Vuik, Nabben and Vuik, ...)
  • GMRES/MINRES: follows from a theorem by G., Gutknecht, Liesen and Nabben (2013). GMRES applied to \(By=c\) with possibly singular \(B\) is well-defined for all initial guesses if and only if \(\vsR(B)\cap\vsN(B)=\{0\}\).

Spectra of deflated matrices and a priori bounds

Spectra of deflated matrices

Let \(U_\perp\in\C^{N,N-m}\) such that \(\rank U_\perp=N-m\) and \(\vsU^\perp=\vsR(U_\perp)\). Then \[QA=U_\perp (U_\perp^*A^{-1}U_\perp)^{-1} U_\perp^*.\]
For \(m>0\) and \(U_\perp^*U_\perp=I\), the spectrum of \(QA\) is \[ \Lambda(QA)=\{0\}\cup\Lambda\left( (U_\perp^*A^{-1}U_\perp)^{-1} \right). \]

Note: no restrictions on symmetry or definiteness.

If \(U\) spans an \(A\)-invariant subspace, then things are easy. But not much is known if \(U\) is only an approximation.

Spectra of perturbed deflated matrices

Let \(A=A^*\), \(U^*U=I\), \(R=AU-U(U^*AU)\), \(\Lambda(U^*AU)=\{\mu_1,\dots,\mu_m\}\) and \(\Lambda(QA) =\{0\}\cup\{\hat{\lambda}_1\leq\dots\leq\hat{\lambda}_{N-m}\}\).

For an appropriate ordering (*) of the eigenvalues \(\lambda_1,\dots,\lambda_N\) of \(A\), we have \[ |\lambda_i - \hat{\lambda}_i|\leq\nrm{R}^2 \left(\frac{1}{\delta_i} + \frac{1}{\mu_{\min}} \right), \] where \(\delta_i:=\min_{j=1,\dots,m}|\lambda_i-\mu_j|>0\).

(*) Technical (but explicitly given) ordering is omitted here.

A priori bounds

Eigenvalue inclusion intervals + a priori bounds (e.g., \(\kappa\)-bound) \(\lra\) convergence bounds for deflated CG and deflated MINRES for Hermitian problems.

But:

  • a priori bounds like the \(\kappa\)-bound only take into account extremal eigenvalues and not the actual distribution
  • eigenvalues do not tell the whole story
  • linear asymptotic bounds cannot capture nonlinear convergence histories
  • what about non-Hermitian/non-normal matrices?

Another approach is required!

Approximate Krylov subspaces and deflated GMRES bounds

A new approach

Remember: we solve sequences of linear systems!

The data that is generated in the solution of one linear system reveals much more than just Ritz vectors!

We apply GMRES to \[ QAx=Qb, \qquad\text{where}\qquad Q=I-AU(U^*AU)^{-1}U^*.\] \(\lra\) Arnoldi relation \(QAV_n=V_{n+1}\underline{H}_n\).

We choose a basis \(W=[V_n,U]\widetilde{W}\in\C^{N,k}\) of the new deflation subspace.

How does GMRES perform for \[ Q_WAx=Q_Wb,~\text{where}~Q_W=I-AW(W^*AW)^{-1}W^*?\]

Approximate Krylov subspaces

An Arnoldi relation for \(K_i(Q_WA,Q_Wb)\) is usually not available.

But an Arnoldi relation for a perturbed matrix and initial vector is available!

An Arnoldi relation for \(K_i(Q_WA+F_i, Q_Wb+f)\) can be constructed from \(V_n\) and \(\underline{H}_n\) for \(1\leq i\lt m+n-k\), such that \(F_i\) and \(f\) are optimal with respect to \(K_i(QA,Qb)+\vsU\). Furthermore, \(\nrm{F_i}\) and \(\nrm{f}\) are available and \(\nrm{F_1}\leq\nrm{F_2}\leq\dots\).

The Hessenberg matrix \(\widehat{H}_n\) of the constructed Arnoldi relation reveals the GMRES residual norms for \((Q_WA+F_i)\widehat{x}=Q_Wb+f\).

Goal: bound GMRES residual norms for \(Q_WAx=Q_Wb\).

Krylov subspace methods with perturbations

\(A\in\C^{23\times23}\) and \(\nrm{F}=\nrm{f}=10^{-8}\)

All is not lost: recent results by Sifuentes, Embree and Morgan (2013) for a perturbed matrix; generalized for a perturbed right hand side (G. 2014).

Approximate Krylov subspace bound

The \(i\)-th residual \(\widehat{r}_i\) of GMRES applied to \(Q_WAx=Q_Wb\) satisfies \[ \nrm{\widehat{r}_i}\leq\nrm{\overline{r}_i} + \frac{|\partial\Lambda_\delta|}{2\pi\delta} \left(\frac{\epsilon_i}{\delta-\epsilon_i} (\gamma+\beta)+\beta \right) \max_{\lambda\in\partial\Lambda_\delta}|p_i(\lambda)|, \] for all \(\delta>\epsilon_i\) where
  • \(\overline{r}_i=p_i(\widehat{H}_n)\gamma e_1\) is the \(i\)-th residual of GMRES applied to \(\widehat{H}_ny=\gamma e_1\) with the residual polynomial \(p_i\)
  • \(\partial\Lambda_\delta :=\partial\Lambda_\delta\left( (Q_WA+F_i)|_{\vsW^\perp}\right)\) is the boundary of the \(\delta\)-pseudospectrum
  • \(\epsilon_i=\nrm{F_i}\), \(\beta=\nrm{f}\) and \(\gamma=\nrm{Q_Wb}\)

Numerical experiments with the approximate Krylov subspace bound and convection-diffusion

Convection-diffusion

$$ \begin{align*} -\nu\Delta u + w\cdot\nabla u &= 0 \quad\text{in}\quad\Omega=]-1,1[\times]-1,1[ \\ \text{and}\quad u&=q\quad\text{on}\quad\partial\Omega \end{align*} $$
Finite element solution

Bound for convection-diffusion

Same linear system \(Ax=b\): deflation of 0,...,15 Ritz vectors corresponding to Ritz values of smallest magnitude
actual convergence (solid light); bound (dashed); \(\nrm{\overline{r}_n}\) (dotted)

Bound for convection-diffusion

Modified right hand side \(c=b+g\), \(\nrm{g}\approx 5\cdot 10^{-3}\)
actual convergence (solid); bound (dashed)

Bound for convection-diffusion

Modified matrix \(B=A+G\), \(\nrm{G}\approx 5\cdot 10^{-6}\)
actual convergence (solid); bound (dashed)

Summary

  • Full characterization of spectra of deflated matrices
  • Derived 2 new bounds for deflation:
    • quadratic Ritz residual bound for \(\Lambda(QA)\)
    • approximate Krylov subspace bound for GMRES
  • Implemented in KryPy for the automatic selection of recycling vectors
  • Time savings of up to 40% in the solution of nonlinear Schrödinger equations with Newton's method (G. and Schlömer, 2013)

Selection of further results in the thesis:

  • Characterization of well-defined deflated CG, MINRES and GMRES
  • Perturbation bound on \(\|P-Q\|\) for arbitrary projections

Outlook

Subject to further work:

  • Exploit full characterization of \(\Lambda(QA)\)
  • Better perturbation theory for Krylov subspace methods
    • examples in the PhD thesis indicate: hard work
    • required for better selection strategies for recycling vectors

References

  • These slides can be watched online:
    andrenarchy.github.io/talk-2014-07-phdthesis-defense
  • G., Recycling Krylov subspace methods for sequences of linear systems. PhD thesis, submitted April 2014.
  • Software used for experiments (free open source software):
  • G., Gutknecht, Liesen and Nabben, A framework for deflated and augmented Krylov subspace methods. 2013. SIMAX. arxiv.org/abs/1206.1506
  • G. and Schlömer, Preconditioned recycling Krylov subspace methods for self-adjoint problems. 2014 (revision submitted). arxiv.org/abs/1208.0264
  • Kilmer and de Sturler, Recycling subspace information for diffuse optical tomography. 2006. SISC.
  • Liesen and Strakoš, Krylov subspace methods. Principles and analysis. 2013. Oxford University Press.
  • Nabben and Vuik, A comparison of deflation and coarse grid correction applied to porous media flow. 2004. SISC.
  • Sifuentes, Embree and Morgan, GMRES convergence for perturbed coefficient matrices, with Application to Approximate Deflation Preconditioning. 2013. SIMAX.

Thank you!

Questions?

Appendix: perturbation bound for projections

Orthogonal vs. oblique projections

Orthogonal projection onto \(\vsX\): \(P_{\vsX}=X(X^*X)^{-1}X^*\)
Oblique projection onto \(\vsX\) along \(\vsY=\vsR(Y)^\perp\): \(P_{\vsX,\vsY} = X (Y^*X)^{-1}Y^*\)

Perturbation bound for projections

Let \(\vsV,\vsW,\vsX,\vsY\subseteq\C^N\) be nonzero subspaces such that \(\C^N=\vsV\oplus\vsW=\vsX\oplus\vsY\). Then: \[ \nrm{P_{\vsV,\vsW}-P_{\vsX,\vsY}} \leq \frac{ \sin\theta_{\max}(\vsW,\vsY) + \sin\theta_{\max}(\vsV,\vsX)}{ \cos\theta_{\max}(\vsV,\vsW^\perp) \cos\theta_{\max}(\vsX,\vsY^\perp)}. \]

\(\theta_{\max}(\vsV,\vsW):=\arcsin\|P_\vsV-P_\vsW\|\) is the maximal canonical angle between \(\vsV\) and \(\vsW\).

Perturbation bound in action

\(\nrm{P_{\vsX,\vsY}z-P_{\vsV,\vsW}z}=\) 0.0,
bound: \(\frac{ \sin\theta_{\max}(\vsW,\vsY) + \sin\theta_{\max}(\vsV,\vsX)}{ \cos\theta_{\max}(\vsV,\vsW^\perp) \cos\theta_{\max}(\vsX,\vsY^\perp)} \) = 0.0.

Appendix: nonlinear Schrödinger equations

Nonlinear Schrödinger equations

$$ S: \begin{cases} \mathcal{X} \longrightarrow \mathcal{Y}\\ \psi \mapsto (K+V+g|\psi|^2)\psi \end{cases} $$ where

  • \(\mathcal{X},\mathcal{Y}\subseteq L^2(\Omega)\) and \(\Omega\subseteq\R^d\)
  • \(K:\mathcal{X}\longrightarrow\mathcal{Y}\) is self-adjoint and positive semidefinite with respect to \(\langle\cdot,\cdot\rangle_{L^2}\)
  • \(V:\Omega\longrightarrow\R\) is a scalar potential
  • \(g>0\)
Find nontrivial \(\psi^\star:\Omega\longrightarrow\C\) such that $$ S(\psi^\star)=0. $$ Note: \(S(e^{\iu\varphi}\psi)=e^{\iu\varphi}S(\psi)\).

Newton's method for nonlinear Schrödinger equations

  • Given: initial guess \(\psi_0\in\mathcal{X}\)
  • For \(i=0,1,...\) until convergence: solve $$ J_{\psi_i}\delta_i = - S(\psi_i) $$ where
    • \(\psi_{i+1}=\psi_i+\delta_i\)
    • \(J_{\psi}\phi = (K+V+2g|\psi|^2)\phi + g \psi^2 \overline{\phi}\) is the Jacobian operator
    • \(J_{\psi}\) is linear over \(\R\)
    • \(J_{\psi}\) is self-adjoint with respect to \(\langle\cdot,\cdot\rangle_\R :=\Re\langle\cdot,\cdot\rangle_{L^2}\)

Ginzburg—Landau equation

Magnet levitating above a superconductor
(CC BY-SA 2.5, by Mai-Linh Doan)

Nonlinear Schrödinger equation with:

  • \(K=(-\iu\nabla-A)^2\) where \(A\) is a given magnetic vector potential
  • \(V\equiv-1\)
  • \(g=1\)

$$ 0= \begin{cases} K\psi - \psi(1-|\psi|^2) &\text{in}~\Omega\\ \mathrm{n}\cdot(-\iu\nabla-A)\psi &\text{on}~\partial\Omega \end{cases} $$
  • Models superconductivity (extreme-type-II superconductors)
  • Solutions \(\psi^\star\) describe the density \(|\psi^\star|^2\) of electric charge carriers

3D Ginzburg—Landau

Absolute value of computed solution \(\psi\).
Argument of computed solution \(\psi\).

3D Ginzburg—Landau

3D Ginzburg—Landau

Newton residual norms.

3D Ginzburg—Landau

Without deflation. Light red: first Newton step; dark red: last Newton step.

3D Ginzburg—Landau

Deflation of 12 Ritz vectors corresponding to the Ritz values of smallest magnitude.

3D Ginzburg—Landau

Automatic recycling based on a MINRES a priori bound.

3D Ginzburg—Landau

Automatic recycling based on the approximate Krylov subspace bound.

3D Ginzburg—Landau

Timings: no deflation (red), 12 fixed vectors (blue), a priori (green), approx. Krylov (grey).