You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
A set $\{u_1, \dots, u_n\}$ of vectors in $\R^n$ is an _orthonormal basis_ of $\R^n$ if $\scal{u_i, u_j}=\delta_{ij}$ for any $i,j \in \{1,\dots, n\}$.
20
20
For such a set, any vector $w$ in $\R^n$ can be expressed as a linear combination of the $u_i$:
21
21
$$ w \speq \sum_{i=1}^n \alpha_i u_i, $$
22
-
where the weights $\alpha_i$ can be easily obtained by leveraging the orthonormality:
22
+
where the weights $\alpha_i$ can easily be obtained by leveraging the orthonormality:
@@ -150,7 +150,7 @@ Picking this criterion as well as the "Krylov" directions (see below) leads to t
150
150
### Generating directions
151
151
152
152
Ideally, we would like the space $\mathcal P_k$ spanned by the $\{p_1, \dots, p_k\}$ to be such that the projection of $x$ on $\mathcal P_k$ is close to $x$.
153
-
This, unfortunately, is not very easy to translate into a cheap procedure for generating a good sequence of $p_k$ not least because we don't have access to $x$.\\
153
+
This, unfortunately, is not very easy to translate into a cheap procedure for generating a good sequence of $p_k$ not least because we don't have access to $x$.
154
154
After executing step $k-1$, the only new information we can compute is the new residual $r_{k-1}$ and so, naturally, we could try using that in forming the new direction $p_k$.
155
155
156
156
In this point we will briefly discuss three approaches to generating the $p_k$:
@@ -159,7 +159,7 @@ In this point we will briefly discuss three approaches to generating the $p_k$:
159
159
***Krylov**, where the $p_k$ are the residuals $r_{k-1}$,
160
160
***gradient**, where the $p_k$ are $A^tr_{k-1}$.
161
161
162
-
The first one is not motivated by anything else than "it should work" but should not be expected to work great, the second one is a common choice which we already briefly discussed that links with Krylov subspace methods (as [discussed in the second part](\postCG)), and, finally, the third one is motivated by the gradient of \nobr{$F(x) = \|Ax - b\|_2^2$} which is $A^t(Ax-b)$ so that
162
+
The first one is not motivated by anything else than "it should work" but should not be expected to work very well, the second one is a common choice, that links with Krylov subspace methods (as [discussed in the second part](\postCG)), and finally, the third one is motivated by the gradient of \nobr{$F(x) = \|Ax - b\|_2^2$} which is $A^t(Ax-b)$ so that
163
163
164
164
$$
165
165
\nabla F(x_{k-1}) \speq A^tr_{k-1}.
@@ -201,12 +201,12 @@ In that sense, the "Krylov" choice earlier can be connected to a fixed-point ite
201
201
202
202
We've now discussed the different moving parts and can build a simple implementation which, at step $k$, does the following:
203
203
204
-
1. gets a new direction $p_k$ and finds the corresponding $q_k$ using,
205
-
2. computes $Aq_k$ and finds the corresponding $\tilde{q}_k$ as well as the corresponding $r_{i,k}$,
204
+
1. gets a new direction $p_k$ and finds the corresponding orthonormal $q_k$,
205
+
2. computes $Aq_k$ and finds the corresponding orthonormal $\tilde{q}_k$ as well as the $r_{i,k}$,
4. solves $R^{(k)}\beta^{(k)}=\gamma_k$ (a triangular system of size $k$).
208
208
209
-
For step (1) and (2) we can for instance use the modified Gram-Schmidt procedure.
209
+
For step (1) and (2) we can use the modified Gram-Schmidt procedure.
210
210
211
211
In the points below, we show code that solves each of these steps and ultimately put them all together to form a working iterative solver.
212
212
The code is not optimised but should hopefully be easy to read and to analyze.
@@ -387,8 +387,7 @@ The computational complexity (ignoring constants) at iteration $k$ of this funct
387
387
The step with dominant complexity is the matrix-vector multiplication (computation of $Aq_k$) (and computation of $A^tr_k$ in the `grad` case).
388
388
Applying a $n\times n$ matrix has complexity $\mathcal O(n^2)$ making the overall procedure $\mathcal O(Kn^2)$ with $K$ the number of steps.
389
389
390
-
@@alert,alert-secondary **Note**: in many cases, there may be a specific procedure available to compute $Ax$ for some $x$ that will be faster than $\mathcal O(n^2)$. The Fourier matrix for instance can be applied in $\mathcal O(n\log n)$.
391
-
The Fast Multipole Method (FMM) can also be used for many systems arising in physics, also leading to complexity $\mathcal O(n\log n)$ (see e.g. \citet{bg97}). @@
390
+
@@alert,alert-secondary **Note**: in many cases, there is a specific procedure available to compute (exactly or approximately) $Ax$ for some $x$ with complexity better than $\mathcal O(n^2)$. This can for instance be the case when $A$ is very sparse, or in some physics problem where the the Fast Multipole Method (FMM) can be used (see e.g. \citet{bg97}). @@
392
391
393
392
### Comparison
394
393
@@ -445,10 +444,8 @@ On this first plot we can see a number of interesting things:
445
444
3. the "grad" directions (where $p_k = A^tr_k$) lead to much faster convergence than the other two,
446
445
4. all choices eventually lead to $\|r_n\|\approx 0$ as expected.
447
446
448
-
@@alert,alert-info **Note**: when writing this post and running experiments, I was a bit surprised by how much better the "grad" version behaves compared to the others, especially since, to the best of my knowledge, no one seems to discuss this choice in the literature.
449
-
It could be an artifact of the benchmark though, or it could be well known. Either way if you have thoughts on this, I'll be glad to [hear from you](https://github.com/tlienart/tlienart.github.io/issues/new/choose).
450
-
451
-
Note that this is not always the case though as will be seen in the [next post](/posts/2021/05-cg/). @@
447
+
@@alert,alert-info **Note**: when writing this post and running experiments, I was a bit surprised by how much better the "grad" version behaves compared to the others here, especially since I had not seen that choice discussed in the literature.
448
+
It may well be an artefact of the benchmark though (see e.g. [next post](/posts/2021/05-cg/)), or a well known fact. Either way if you have thoughts on this, I'll be glad to [hear from you](https://github.com/tlienart/tlienart.github.io/issues/new/choose). @@
452
449
453
450
We can repeat a similar experiment with a larger matrix and more iterations and look at the time taken since the start instead of the number of iterations. Before doing so note that:
As could be expected, the `gmres!` function is significantly faster (our code is really not optimised) but all methods exhibit the same behaviour, scaling like $n^2$ as expected.
534
-
Note that only the trend should be looked at, the peaks and trophs should be ignored and are most likely due to how Julia handles the vector operations at different sizes.
531
+
Note that only the trend should be looked at, the peaks and trophs should be ignored and are mostly due to how un-optimised our implementation is and how well Julia manages to optimise the steps at various sizes.
0 commit comments