Recycling Krylov subspace methods for sequences of linear systems

Motivation

The problem:

A sequence of linear systems \[ A_i x_i = b_i\quad\text{for}\quad i=1,2,... \] with $A_{i+1}\approx A_i$ and $b_{i+1}\approx b_i$.

A solution:

Recycling via deflation.

Apply Krylov subspace method to \[ P_i A_i x_i = P_i b_i, \] where $P_i$ is

a projection that removes "unwanted" portions of $A_i$
constructed with data from the solution process of $A_{i-1}x_{i-1}=b_{i-1}$

See, e.g., Kilmer and de Sturler (2006).

Scope of this talk

Open question:

automatic selection of deflation vectors

Required:

perturbation theory for

spectra of deflated matrices
Krylov subspace methods

Theory leads to:

new convergence bounds for deflated CG, MINRES and GMRES
guidance for the automatic selection of deflation vectors for recycling

Assumptions and definitions

$Ax=b$ with nonsingular $A\in\C^{N,N}$
$U\in\C^{N,m}$ such that $U^*AU$ is nonsingular (equivalent: $A\vsU\oplus\vsU^\perp=\C^N$ for $\vsU:=\vsR(U)$)
Projection $Q:=P_{\vsU^\perp,A\vsU}=I-AU(U^*AU)^{-1}U^*$
Projection $P:=P_{(A^*\vsU)^\perp,\vsU}=I-U(U^*AU)^{-1}U^*A$

Deflation basics

For any initial guess, CG ($A$ Hpd), MINRES ($A$ Hermitian) or GMRES applied to \[QAx=Qb\] yields well-defined iterates $\tilde{x}_n$ and terminates with a solution. The corrected iterates \[ x_n := P\tilde{x}_n + U(U^*AU)^{-1}U^*b \] satisfy $b-Ax_n=Qb-QA\tilde{x}_n$.

Well-known for CG (e.g., Nicolaides, Chapman and Saad, Frank and Vuik, Nabben and Vuik, ...)
GMRES/MINRES: follows from a theorem by G., Gutknecht, Liesen and Nabben (2013). GMRES applied to $By=c$ with possibly singular $B$ is well-defined for all initial guesses if and only if $\vsR(B)\cap\vsN(B)=\{0\}$.

Spectra of deflated matrices and a priori bounds

Spectra of deflated matrices

Let $U_\perp\in\C^{N,N-m}$ such that $\rank U_\perp=N-m$ and $\vsU^\perp=\vsR(U_\perp)$. Then \[QA=U_\perp (U_\perp^*A^{-1}U_\perp)^{-1} U_\perp^*.\]

For $m>0$ and $U_\perp^*U_\perp=I$, the spectrum of $QA$ is \[ \Lambda(QA)=\{0\}\cup\Lambda\left( (U_\perp^*A^{-1}U_\perp)^{-1} \right). \]

Note: no restrictions on symmetry or definiteness.

If $U$ spans an $A$-invariant subspace, then things are easy. But not much is known if $U$ is only an approximation.

Spectra of perturbed deflated matrices

Let $A=A^*$, $U^*U=I$, $R=AU-U(U^*AU)$, $\Lambda(U^*AU)=\{\mu_1,\dots,\mu_m\}$ and $\Lambda(QA) =\{0\}\cup\{\hat{\lambda}_1\leq\dots\leq\hat{\lambda}_{N-m}\}$.

For an appropriate ordering (*) of the eigenvalues $\lambda_1,\dots,\lambda_N$ of $A$, we have \[ |\lambda_i - \hat{\lambda}_i|\leq\nrm{R}^2 \left(\frac{1}{\delta_i} + \frac{1}{\mu_{\min}} \right), \] where $\delta_i:=\min_{j=1,\dots,m}|\lambda_i-\mu_j|>0$.

(*) Technical (but explicitly given) ordering is omitted here.

A priori bounds

Eigenvalue inclusion intervals + a priori bounds (e.g., $\kappa$-bound) $\lra$ convergence bounds for deflated CG and deflated MINRES for Hermitian problems.

But:

a priori bounds like the $\kappa$-bound only take into account extremal eigenvalues and not the actual distribution
eigenvalues do not tell the whole story
linear asymptotic bounds cannot capture nonlinear convergence histories
what about non-Hermitian/non-normal matrices?

Another approach is required!

Approximate Krylov subspaces and deflated GMRES bounds

A new approach

Remember: we solve sequences of linear systems!

The data that is generated in the solution of one linear system reveals much more than just Ritz vectors!

We apply GMRES to \[ QAx=Qb, \qquad\text{where}\qquad Q=I-AU(U^*AU)^{-1}U^*.\] $\lra$ Arnoldi relation $QAV_n=V_{n+1}\underline{H}_n$.

We choose a basis $W=[V_n,U]\widetilde{W}\in\C^{N,k}$ of the new deflation subspace.

How does GMRES perform for \[ Q_WAx=Q_Wb,~\text{where}~Q_W=I-AW(W^*AW)^{-1}W^*?\]

Approximate Krylov subspaces

An Arnoldi relation for $K_i(Q_WA,Q_Wb)$ is usually not available.

But an Arnoldi relation for a perturbed matrix and initial vector is available!

An Arnoldi relation for $K_i(Q_WA+F_i, Q_Wb+f)$ can be constructed from $V_n$ and $\underline{H}_n$ for $1\leq i\lt m+n-k$, such that $F_i$ and $f$ are optimal with respect to $K_i(QA,Qb)+\vsU$. Furthermore, $\nrm{F_i}$ and $\nrm{f}$ are available and $\nrm{F_1}\leq\nrm{F_2}\leq\dots$.

The Hessenberg matrix $\widehat{H}_n$ of the constructed Arnoldi relation reveals the GMRES residual norms for $(Q_WA+F_i)\widehat{x}=Q_Wb+f$.

Goal: bound GMRES residual norms for $Q_WAx=Q_Wb$.

Krylov subspace methods with perturbations

$A\in\C^{23\times23}$ and $\nrm{F}=\nrm{f}=10^{-8}$

All is not lost: recent results by Sifuentes, Embree and Morgan (2013) for a perturbed matrix; generalized for a perturbed right hand side (G. 2014).

Approximate Krylov subspace bound

The $i$-th residual $\widehat{r}_i$ of GMRES applied to $Q_WAx=Q_Wb$ satisfies \[ \nrm{\widehat{r}_i}\leq\nrm{\overline{r}_i} + \frac{|\partial\Lambda_\delta|}{2\pi\delta} \left(\frac{\epsilon_i}{\delta-\epsilon_i} (\gamma+\beta)+\beta \right) \max_{\lambda\in\partial\Lambda_\delta}|p_i(\lambda)|, \] for all $\delta>\epsilon_i$ where

$\overline{r}_i=p_i(\widehat{H}_n)\gamma e_1$ is the $i$-th residual of GMRES applied to $\widehat{H}_ny=\gamma e_1$ with the residual polynomial $p_i$
$\partial\Lambda_\delta :=\partial\Lambda_\delta\left( (Q_WA+F_i)|_{\vsW^\perp}\right)$ is the boundary of the $\delta$-pseudospectrum
$\epsilon_i=\nrm{F_i}$, $\beta=\nrm{f}$ and $\gamma=\nrm{Q_Wb}$

Numerical experiments with the approximate Krylov subspace bound and convection-diffusion

Convection-diffusion

$$ \begin{align*} -\nu\Delta u + w\cdot\nabla u &= 0 \quad\text{in}\quad\Omega=]-1,1[\times]-1,1[ \\ \text{and}\quad u&=q\quad\text{on}\quad\partial\Omega \end{align*} $$

Bound for convection-diffusion

Same linear system $Ax=b$: deflation of 0,...,15 Ritz vectors corresponding to Ritz values of smallest magnitude

actual convergence (solid light); bound (dashed); $\nrm{\overline{r}_n}$ (dotted)

Bound for convection-diffusion

Modified right hand side $c=b+g$, $\nrm{g}\approx 5\cdot 10^{-3}$

actual convergence (solid); bound (dashed)

Bound for convection-diffusion

Modified matrix $B=A+G$, $\nrm{G}\approx 5\cdot 10^{-6}$

Summary

Full characterization of spectra of deflated matrices
Derived 2 new bounds for deflation:
- quadratic Ritz residual bound for $\Lambda(QA)$
- approximate Krylov subspace bound for GMRES
Implemented in KryPy for the automatic selection of recycling vectors
Time savings of up to 40% in the solution of nonlinear Schrödinger equations with Newton's method (G. and Schlömer, 2013)

Selection of further results in the thesis:

Characterization of well-defined deflated CG, MINRES and GMRES
Perturbation bound on $\|P-Q\|$ for arbitrary projections

Outlook

Subject to further work:

Exploit full characterization of $\Lambda(QA)$
Better perturbation theory for Krylov subspace methods
- examples in the PhD thesis indicate: hard work
- required for better selection strategies for recycling vectors

References

These slides can be watched online:
andrenarchy.github.io/talk-2014-07-phdthesis-defense
G., Recycling Krylov subspace methods for sequences of linear systems. PhD thesis, submitted April 2014.
Software used for experiments (free open source software):
- G., KryPy: Krylov subspace methods package for Python.
  github.com/andrenarchy/krypy
- G. and Schlömer, PyNosh: Python framework for nonlinear Schrödinger equations.
  github.com/nschloe/pynosh
- G., PseudoPy: Computation and visualization of pseudospectra in Python.
  github.com/andrenarchy/pseudopy
G., Gutknecht, Liesen and Nabben, A framework for deflated and augmented Krylov subspace methods. 2013. SIMAX. arxiv.org/abs/1206.1506
G. and Schlömer, Preconditioned recycling Krylov subspace methods for self-adjoint problems. 2014 (revision submitted). arxiv.org/abs/1208.0264
Kilmer and de Sturler, Recycling subspace information for diffuse optical tomography. 2006. SISC.
Liesen and Strakoš, Krylov subspace methods. Principles and analysis. 2013. Oxford University Press.
Nabben and Vuik, A comparison of deflation and coarse grid correction applied to porous media flow. 2004. SISC.
Sifuentes, Embree and Morgan, GMRES convergence for perturbed coefficient matrices, with Application to Approximate Deflation Preconditioning. 2013. SIMAX.

Thank you!

Questions?

Appendix: perturbation bound for projections

Orthogonal vs. oblique projections

Orthogonal projection onto $\vsX$: $P_{\vsX}=X(X^*X)^{-1}X^*$

Oblique projection onto $\vsX$ along $\vsY=\vsR(Y)^\perp$: $P_{\vsX,\vsY} = X (Y^*X)^{-1}Y^*$

Perturbation bound for projections

Let $\vsV,\vsW,\vsX,\vsY\subseteq\C^N$ be nonzero subspaces such that $\C^N=\vsV\oplus\vsW=\vsX\oplus\vsY$. Then: \[ \nrm{P_{\vsV,\vsW}-P_{\vsX,\vsY}} \leq \frac{ \sin\theta_{\max}(\vsW,\vsY) + \sin\theta_{\max}(\vsV,\vsX)}{ \cos\theta_{\max}(\vsV,\vsW^\perp) \cos\theta_{\max}(\vsX,\vsY^\perp)}. \]

$\theta_{\max}(\vsV,\vsW):=\arcsin\|P_\vsV-P_\vsW\|$ is the maximal canonical angle between $\vsV$ and $\vsW$.

Perturbation bound in action

$\nrm{P_{\vsX,\vsY}z-P_{\vsV,\vsW}z}=$ 0.0,
bound: $\frac{ \sin\theta_{\max}(\vsW,\vsY) + \sin\theta_{\max}(\vsV,\vsX)}{ \cos\theta_{\max}(\vsV,\vsW^\perp) \cos\theta_{\max}(\vsX,\vsY^\perp)} $ = 0.0.

Appendix: nonlinear Schrödinger equations

Nonlinear Schrödinger equations

$$ S: \begin{cases} \mathcal{X} \longrightarrow \mathcal{Y}\\ \psi \mapsto (K+V+g|\psi|^2)\psi \end{cases} $$ where

$\mathcal{X},\mathcal{Y}\subseteq L^2(\Omega)$ and $\Omega\subseteq\R^d$
$K:\mathcal{X}\longrightarrow\mathcal{Y}$ is self-adjoint and positive semidefinite with respect to $\langle\cdot,\cdot\rangle_{L^2}$
$V:\Omega\longrightarrow\R$ is a scalar potential
$g>0$

Find nontrivial $\psi^\star:\Omega\longrightarrow\C$ such that $$ S(\psi^\star)=0. $$ Note: $S(e^{\iu\varphi}\psi)=e^{\iu\varphi}S(\psi)$.

Newton's method for nonlinear Schrödinger equations

Given: initial guess $\psi_0\in\mathcal{X}$
For $i=0,1,...$ until convergence: solve $$ J_{\psi_i}\delta_i = - S(\psi_i) $$ where
- $\psi_{i+1}=\psi_i+\delta_i$
- $J_{\psi}\phi = (K+V+2g|\psi|^2)\phi + g \psi^2 \overline{\phi}$ is the Jacobian operator
- $J_{\psi}$ is linear over $\R$
- $J_{\psi}$ is self-adjoint with respect to $\langle\cdot,\cdot\rangle_\R :=\Re\langle\cdot,\cdot\rangle_{L^2}$