This new MIT course (Spring 2025) introduces numerical methods and numerical analysis to a broad audience (assuming 18.03, 18.06, or equivalents, and some programming experience). It is divided into two 6-unit halves:
-
18.S190/16.S090 (first half-term “hub”): basic numerical methods, including curve fitting, root finding, numerical differentiation and integration, numerical differential equations, and floating-point arithmetic. Emphasizes the complementary concerns of accuracy and computational cost. Prof. Steven G. Johnson and Prof. Qiqi Wang.
-
Second half-term: three options for 6-unit “spokes”
-
18.S191/16.S091 — numerical methods for partial differential equations: finite-difference and finite-volume methods, boundary conditions, accuracy, and stability. Prof. Qiqi Wang.
-
18.S097/16.S097 — large-scale linear algebra: sparse matrices, iterative methods, randomized methods. Prof. Steven G. Johnson.
-
18.S192/16.S098 — parallel numerical computing: multi-threading and distributed-memory, and trading off computation for parallelism — may be taken simultaneously with other spokes! Prof. Alan Edelman.
-
Taking both the hub and any spoke will count as an 18.3xx class for math majors, similar to 18.330. Taking both the hub and the PDE spoke will substitute for 16.90. Weekly homework, no exams, but spokes will include a final project.
This repository is for the "hub" course (currently assigned the temporary numbers 18.S190/16.S090).
Instructors: Prof. Steven G. Johnson and Prof. Qiqi Wang.
Lectures: MWF10 in 2-142 (Feb 3 – Mar 31), slides and notes posted below. Lecture videos posted in Panopto Video on Canvas.
Homework and grading: 6 weekly psets, posted Fridays and due Friday midnight; psets are accepted up to 24 hours late with a 20% penalty; for any other accommodations, speak with S3 and have them contact the instructors. No exams.
-
Homework assignments will require some programming — you can use either Julia or Python (your choice; instruction and examples will use a mix of languages).
-
Submit your homework electronically via Gradescope on Canvas as a PDF containing code and results (e.g. from a Jupyter notebook) and a scan of any handwritten solutions.
-
Collaboration policy: Talk to anyone you want to and read anything you want to, with two caveats: First, make a solid effort to solve a problem on your own before discussing it with classmates or googling. Second, no matter whom you talk to or what you read, write up the solution on your own, without having their answer in front of you (this includes ChatGPT and similar). (You can use psetpartners.mit.edu to find problem-set partners.)
Teaching Assistants: Mo Chen and Shania Mitra (shania at mit.edu)
Office Hours: Wednesday 4pm in 2-345 (Prof. Johnson) and Thursday 5pm via Zoom (Prof. Wang).
Resources: Piazza discussion forum, math learning center, TSR^2 study/resource room, pset partners.
Textbook: No required textbook, but suggestions for further reading will be posted after each lecture. The book Fundamentals of Numerical Computation (FNC) by Driscoll and Braun is freely available online, has examples in Julia, Python, and Matlab, and is a valuable resource. Fundamentals of Engineering Numerical Analysis (FENA) by Moin is another useful resource (readable online with MIT certificates).
This document is a brief summary of what was covered in each lecture, along with links and suggestions for further reading. It is not a good substitute for attending lecture, but may provide a useful study guide.
- Overview and syllabus: slides and this web page
- Finite-difference approximations: Julia notebook and demo
Brief overview of the huge field of numerical methods, and outline of the small portion that this course will cover. Key new concerns in numerical analysis, are (i) performance (traditionally, arithmetic counts, but now memory access often dominates) and (ii) accuracy (both floating-point roundoff errors and also convergence of intrinsic approximations in the algorithms). In contrast, the more pure, abstract mathematics of continuity is called "analysis", and is mainly concerned with (ii) but not (i): they are happy to prove that limits converge, but don't care too much how quickly they converge. Whereas traditional discrete computer science is concerned with mainly with (i) but not (ii): they care about performance and resource usage, but traditional algorithms like sorting are either right or wrong, never approximate.
As a starting example, considered the the convergence of finite-difference approximations to derivatives df/dx of given functions f(x), which appear in many areas of numerical analysis (such as solving differential equations) and are also closely tied to polynomial approximation and interpolation. By examining the errors in the finite-difference approximation, we immediately see two competing sources of error: truncation error from the non-infinitesimal Δx, and roundoff error from the finite precision of the arithmetic. Understanding these two errors will be the gateway to many other subjects in numerical methods.
Further reading: FNC book: Finite differences, FENA book: chapter 2. There is a lot of information online on finite difference approximations, these 18.303 notes, or Section 5.7 of Numerical Recipes. The Julia FiniteDifferences.jl package provides lots of algorithms to compute finite-difference approximations; a particularly robust and powerful way to obtain high accuracy is to employ Richardson extrapolation to smaller and smaller δx. If you make δx too small, the finite precision (#digits) of floating-point arithmetic leads to catastrophic cancellation errors.
One of the most basic sources of computational error is that computer arithmetic is generally inexact, leading to roundoff errors. The reason for this is simple: computers can only work with numbers having a finite number of digits, so they cannot even store arbitrary real numbers. Only a finite subset of the real numbers can be represented using a particular number of "bits", and the question becomes which subset to store, how arithmetic on this subset is defined, and how to analyze the errors compared to theoretical exact arithmetic on real numbers.
In floating-point arithmetic, we store both an integer coefficient and an exponent in some base: essentially, scientific notation. This allows large dynamic range and fixed relative accuracy: if fl(x) is the closest floating-point number to any real x, then |fl(x)-x| < ε|x| where ε is the machine precision. This makes error analysis much easier and makes algorithms mostly insensitive to overall scaling or units, but has the disadvantage that it requires specialized floating-point hardware to be fast. Nowadays, all general-purpose computers, and even many little computers like your cell phones, have floating-point units.
Went through some simple definitions and examples in Julia (see notebook above), illustrating the basic ideas and a few interesting tidbits. In particular, we looked at error accumulation during long calculations (e.g. summation), as well as examples of catastrophic cancellation and how it can sometimes be avoided by rearranging a calculation.
Further reading: FNC book: Floating-poing numbers. Trefethen & Bau's Numerical Linear Algebra, lecture 13. What Every Computer Scientist Should Know About Floating Point Arithmetic (David Goldberg, ACM 1991). William Kahan, How Java's floating-point hurts everyone everywhere (2004): contains a nice discussion of floating-point myths and misconceptions. A brief but useful summary can be found in this Julia-focused floating-point overview by Prof. John Gibson. Because many programmers never learn how floating-point arithmetic actually works, there are many common myths about its behavior. (An infamous example is 0.1 + 0.2
giving 0.30000000000000004
, which people are puzzled by so frequently it has led to a web site https://0.30000000000000004.com/!)
- Interpolation OneNote Notebook Code Complete Notes
- pset 1: due Feb 14
Discussed the important problem of interpolating a function
Further reading: FNC chapter 5 and FENA chapter 1. Piecewise linear interpolation is implemented in Python by numpy.interp
, and several other interpolation schemes by scipy.interpolate
. Interpolation packages in Julia include Interpolations.jl, Dierckx.jl (splines), BasicInterpolators.jl, and FastChebInterp.jl (high-order polynomials).
A basic overview of the Julia programming environment for numerical computations. This tutorial will cover what Julia is and the basics of interaction, scalar/vector/matrix arithmetic, and plotting — just as a "fancy calculator" for now (without the "real programming" features).
- Tutorial materials (and links to other resources)
If possible, try to install Julia on your laptop beforehand using the instructions at the above link. Failing that, you can run Julia in the cloud (see instructions above).
This won't be recorded, but you can find a video of a similar tutorial by Prof. Johnson last year (MIT only), as well as many other tutorial videos at julialang.org/learning.
- Notes: OneNote Notebook
One approach to generalizing piecewise-linear interpolation is to interpolate
A general conceptual approach is to set up a system of linear equations for the polynomial coefficients
-
The matrix
$A$ is nearly singular for large$n$ in the monomial basis, so floating-point roundoff errors are exacerbated. (We will say that it has a large condition number or is "ill-conditioned", and will define this more precisely next time.) Solution: It turns out that monomials are just a poorly behaved basis for high-degree polynomials, and it is much better to use orthogonal polynomials, most commonly Chebyshev polynomials, as a basis — with these, people regularly go to degrees of thousands or even millions. -
Polynomial interpolation from equally spaced points (in any basis!) can diverge exponentially from the underlying "true" smooth function in between the points (even in exact arithmetic, with no roundoff errors!). This is called a Runge phenomenon. Solution 1: Use carefully chosen non-equispaced points. A good choice that leads to exponentially good polynomial approximations (for smooth functions) is the Chebyshev nodes, which are clustered near the endpoints. Solution 2: use a lower degree (
$<< n$ ) polynomial and perform an approximate fit to the given points, rather than requiring the polynomial to go through them exactly. (More on this soon.)
If you address both of these problems, high-degree polynomial approximation can be a fantastic tool for describing smooth functions. If you have noisy data, however, you should typically use much lower-degree polynomials to avoid overfitting (trying to match the noise rather than the underlying "true" function).
Further reading: FNC book, chapter 9. Beware that the FENA book starts with the "Lagrange formula" for the interpolating polynomial, but this formula is very badly behaved ("numerically unstable") for high degrees and should not be used; it is superseded by the "barycentric" Lagrange formula (see FNC book; reviewed in much more detail by Berrut and Trefethen, 2004). The subject of polynomial interpolation is the entry point into approximation theory; if you are interested, the book by Trefethen and accompanying video lectures are a great place to get more depth. The numpy.polynomials
module contains a variety of functions for polynomial interpolation, including with Chebyshev polynomials. A package for multi-dimensional Chebyshev polynomial interpolation in Julia is FastChebInterp. A pioneering package that shows off the power of Chebyshev polynomial interpolation is chebfun
in Matlab, along with related packages in other languages. (This approach is taken to supercomputing extremes by the Dedalus package to solve partial differential equations using exponentially converging polynomial approximations.) It turns out that if you are interpolating using Chebyshev polynomials, and you choose your points
The goal of this lecture is to precisely define the notion of a condition number, which quantifies the sensitivity of a function f(x) to small perturbations in the input. A function that has a "large" condition number is called "ill-conditioned", and any algorithm to compute it may suffer inaccurate results from any sort of error (whether it be roundoff errors, input noise, or other approximations) — it doesn't mean you can't use that function, but it usually means you need to be careful (about both implementation and about interpretation of the results).
For a given function
-
absolute condition number
$\kappa_a(f, x) = \lim_{\epsilon \to 0^+} \sup_{\Vert \delta x \Vert = \epsilon} \frac{\Vert \overbrace{f(x + \delta x) - f(x)}^{\delta f} \Vert}{\Vert \delta x \Vert}$ . It is the worst case$\Vert \delta f \Vert / \Vert \delta x \Vert$ for any arbitrarily small input perturbation$\delta x$ . Unfortunately, if the inputs and outputs of$f$ have different scales ("units"), then$\kappa_a$ may be hard to interpret (it is "dimensionful"). -
relative condition number
$\kappa_r(f, x) = \lim_{\epsilon \to 0^+} \sup_{\Vert \delta x \Vert = \epsilon} \frac{\Vert \delta f \Vert / \Vert f(x) \Vert}{\Vert \delta x \Vert / \Vert x \Vert} = \kappa_a(f, x) \frac{\Vert x \Vert}{\Vert f(x) \Vert}$ . This is a dimensionless quantity: it is the maximum ratio of the relative change in f to the relative change in x. Most of the time,$\kappa_r$ is the quantity of interest.
All of these quantities involve the central concept of a norm ‖⋯‖, which is a way of measuring the "length" of a vector. The most familar norm, and usually the default or implicit choice for column vectors
For example, looked at the condition number of summation
If the function is differentiable, then the condition number simplifies even further:
- If
$x$ and$f(x)$ are scalars, then$\kappa_a = |f'(a)|$ is simply the magnitude of the derivative. - If
$x \in \mathbb{R}^n$ and$f(x) \in \mathbb{R}^m$ are vectors, then$\kappa_a = \Vert J \Vert$ is the "operator" or "induced" norm of the Jacobian matrix J ($J_{i,j} = \partial f_i /\partial x_j$ ), where the induced norm$\Vert A \Vert = \sup_{x \ne 0} \frac{\Vert A x \Vert}{\Vert x \Vert}$ measures "how big" a matrix$A$ is by how much it can stretch a vector. (We won't worry too much yet about how to compute this induced norm, but if you know the SVD it is the largest singular value of$A$ .)
This leads us to our next important concept, the (relative) condition number
For example, we can now explain why the monomial basis was so bad: it is easy to see that the Vandermonde matrix becomes nearly singular for large
Further reading: FNC book: problems and conditioning and conditioning linear systems. 18.06 lecture notes on conditioning of linear systems. Advanced books like Numerical Linear Algebra by Trefethen and Bau (lecture 12) treat this subject in much more depth. See also Trefethen, lecture 3, for more in-depth coverage of norms. A fancy vocabulary word for a vector space with a norm (plus some technical criteria) is a Banach space.
Lecture 6 (Feb 14 💕)
- pset 1 solutions
- pset 2: due Feb 21
- Notes: OneNote Notebook
Introduced the topic of least-square fitting of data to curves. As long as the fitting function is linear in the unknown coefficients c (e.g. a polynomial, or some other linear combination of basis functions), showed that minimizing the sum of the squares of the errors corresponds to minimizing the norm of the residual, i.e. the "loss function" L(c) = ‖Ac - y‖², where
It is a straightforward calculus exercise to show that ∇L = 2Aᵀ(Ac - y), which means that optimal coefficients c can be found by setting the gradient to zero, and ∇L=0 implies the "normal equations" AᵀAc = Aᵀy. In principle, these can be solved directly, the normal equations square the condition number of A (κ(AᵀA)=κ(A)²) so they are normally solved in a different way, typically by QR factorization of A (or sometimes using the SVD of A); there are typically library functions that do this for you e.g. c = numpy.linalg.lstsq(A, y)
in Python or c = A \ y
in Julia and Matlab.
More generally, minimizing L(c) is an example of an optimization problem. One simple way to attack such problems is by gradient descent: simply go downhill, in steps Δc = -s∇L where s is the "learning rate" or "step size" parameter (that has to be chosen carefully: too small and it will converge very slowly, but too large and it won't converge at all). It turns out that an ill-conditioned A also leads to gradient descent that converges slowly (because the local downhill direction -∇L doesn't point to the minimum, necessitating small steps). Nowadays, naive gradient descent (especially with a fixed learning rate) is a rather primitive technique that has been mostly superseded by better "accelerated" methods, but computing the gradient ∇L and using it to identify the downhill direction is still a key conceptual starting point for many algorithms. Of course, for least-square fitting where L(c) is linear in c, we can directly solve for c as shown above, but iterative optimization methods are crucial to solve more general problems where the unknowns enter in a nonlinear way (and/or there are constraints).
In both cases, compared the effect of the monomial basis vs. the Chebyshev-polynomial basis. Because the former leads to an ill-conditioned
Further reading: FNC book chapter 3. Strang's Introduction to Linear Algebra section 4.3 and 18.06 videow lecture 16. There are many, many books and other materials on linear least-squares fitting, from many different perspectives (e.g. numerical linear algebra, statistics, machine learning…) that you can find online. The FastChebInterp.jl package in Julia does least-square fitting ("regression") with Chebyshev polynomials for you in an optimized way, including multi-dimensional fitting.
First, discussed the computational cost of interpolation and fitting.
- Solving the least-squares problem
$\min_x \Vert Ax - b\Vert$ for an$m \times n$ matrix$A$ (m points and n basis functions) has$O(mn^2)$ cost, whether you use the normal equations$A^T A x = A^T b$ (which squares the condition number) or a more accurate method like QR factorization. - Interpolation can be thought of as the special case
$m=n$ : solving an$n \times n$ linear system$Ax =b$ is$O(n^3)$ (usually done by LU factorization, which is the matrix-factor form of Gaussian elimination). However, for specific cases there are more efficient algorithms: for polynomial interpolation from arbitrary points, the barycentric Lagrange formula (mentioned above) has$O(n^2)$ cost, and using the Chebyshev-polynomial basis from Chebyshev points has$O(n \log n)$ cost (via FFTs).
The computational scaling becomes even more important when you go to higher dimensions: multivariate interpolation and fitting. We discussed a few cases cases, using two dimensions (2d) for simplicity:
- If you have a general basis
$b_k(x,y)$ of 2d basis functions, you can form a Vandermonde-like matrix$A$ as above (whose columns are hte basis functions and whose rows are the grid points), and solve it as before. The matrices get much larger in higher dimensions, though! - If you can choose a Cartesian "grid" of points, also known as a "tensor product grid" (a grid in x by a grid in y), then it is convenient to use separable basis functions
$p_i(x) p_j(y)$ , e.g. products of polynomials. While you can still form a Vandermonde-like system as above, it turns out that there are much more efficient algorithms in this case. (The ideal case is a tensor product of Chebyshev points along each dimension, in which case you can interpolate products of Chebyshev polynomials in$O(n \log n)$ cost.) - If you have a tensor product grid, you can treat it as a collection of rectangles, and do bilinear interpolation on each rectangle — this is the obvious generalization of piecewise-linear interpolation in 1d. It is fast and second-order accurate (and there are higher-order versions like bicubic interpolation too).
- If you have an irregular set of points, there is still an analogue of piecewise-linear interpolation. One first connects the set of points into triangles in 2d (or tetrahedra in 3d, or more generally "simplices"); this is a tricky problem of computational geometry, but a standard and robust solution is called Delaunay triangulation. Once this is done, you can interpolate within each triangle (or simplex) using an affine function (linear + constant
$a^T x + \beta$ ).
Further reading: FNC book section 2.5 on efficiency of solving Ax=b; much more detail on both this and the least-squares case can be found in e.g. Trefethen & Bau's Numerical Linear Algebra and many other sources. Links to the barycentric formula can be found above, along with fast algorithms for Chebyshev interpolation. Tensor-product-grid interpolation and fitting products are also closely related to Kronecker products of matrices, and there are often more efficient algorithms than simply forming the giant "Vandermonde" matrix and solving it.
- Radial basis function (RBF) interpolation: slides
- Quadrature: Notes on error analysis of the trapezoidal rule and Clenshaw-Curtis quadrature in terms of Fourier cosine series, and a quick review of cosine series.
To begin with, briefly touched an another popular method for multidimensional interpolation: radial basis functions. This can be understood in the same computational framework as polynomial interpolation: we form a "Vandermonde" matrix equation Ac=b, where the rows are data points and the columns are basis functions, to solve for the coefficients c. In this case, the basis functions are
Launched into a new topic: Numerical integration ("quadrature"). In general, a "quadrature rule" is a scheme for estimating a definite integral ∫f(x)dx by a sum ∑ₖf(xₖ)wₖ of f(x) evaluated at N (or N+1) quadrature points xₖ with quadrature weights wₖ. The goal is to choose these quadrature points and weights so that the estimate converges to the true integral as quickly as possible with N. (The typical assumption is that evaluating f(x) is the dominant computational cost, so we want to minimize function evaluations.) Quadrature is closely related to interpolation: most quadrature schemes (especially in 1d) proceed by picking points, interpolating the function between the points somehow (usually by polynomials), and then integrating the interpolant (which is easy with polynomials).
Began by analyzing the two simplest schemes with equally spaced points: piecewise-linear interpolation leads to a composite trapezoidal rule with
These error estimates are upper bounds, but some functions do much better! For periodic functions, the trapezoidal and rectangle rules are equivalent, and for smooth ("analytic") functions it converges exponentially fast with
Further reading (RBFs): A very readable overview of radial basis functions can be found in this blog post by Shih-Chin. Much deeper coverage can be found in the book Meshfree Approximation Methods With Matlab (Fasshauer, 2007) or in this review by Buhmann (2000).
Further reading (quadrature):: FNC section 5.6. Lloyd N. Trefethen, "Is Gauss quadrature better than Clenshaw-Curtis?," SIAM Review 50 (1), 67-87 (2008). Trefethen's Six Myths of Polynomial Interpolation and Quadrature (2011) is a shorter and more colloquial description some of these ideas (and more!). A related polynomial interpolation method (in some sense a generalization of quadrature by Chebyshev polynomials/points) is Gaussian quadrature (and its many variants), whose accuracy is analyzed in the Trefethen papers above, and the state of the art for computing which is Hale and Townsend (2012); a suboptimal but beautifully simple algorithm was described by Golub & Welsch (1969).
- pset 2 solutions
- pset 3: due Friday, Feb 28.
Continued analysis from Lecture 8 (see notes). We related the convergence rate of trapezoidal rule to the convergence rate of the Fourier cosine series, and showed (using integration by parts) that the convergence rate of the cosine series is determined by the behavior of the odd-order derivatives at the boundaries (assuming that the function is smooth in the interior). This reproduces the
Moreover, showed how we can arrange for this fast convergence to occur all the time: for
Finally, if we change variables back to
Further reading:: See further reading for lecture 8, and for lecture 4 on Chebyshev polynomials.
- quadrature overview slides: now that we've analyzed trapezoidal rule and Clenshaw–Curtis, let's zoom out to survey the bigger picture
Overview of the big picture of quadrature algorithms: Clenshaw–Curtis is not the end!
Discussed Richardson extrapolation, which is a powerful general technique to compute limits
- It computes many extrapolations, formed by degree-$q$ polynomial interpolations of all consecutive subsequences of
$q+1$ $h_i, y(h_i)$ points for all possible degrees$q$ . - All of these extrapolations are computed efficiently at the same time (in
$O(n^2)$ cost for$n$ points) by a linear recurrence relation where each degree-$q$ extrapolation is computed from two degree-$(q-1)$ extrapolations (forming a table called a "Neville tableau" or "Neville–Aitken tableau": it is a version of Neville's algorithm for polynomial interpolation). - Each degree
$q > 0$ extrapolant comes with an error estimate, given by its difference from the degree-$(q-1)$ estimate ending at the same$h_i$ point. This allows Richardson's method to be robust and adaptive: by using the extrapolant with the smallest error estimate for the final$y(0^+)$ result, it automatically selects a good subsequence of$h_i$ values (neither so large that polynomial fitting doesn't work, nor so small that e.g. roundoff/cancellation errors dominate).
Famous applications of Richardson extrapolation include Romberg integration (extrapolating low-order quadrature formulas), the Bulirsch–Stoer ODE method (extrapolating ODE solutions to zero stepsize), and an algorithm by Ridders (1982) for extrapolating finite-difference derivatives (also reviewed in Numerical Recipes sec. 5.7).
Further reading: Unfortunately, almost all descriptions of Richardson extrapolation combine it with a particular application, e.g. many textbooks only describe it in the context of ODEs, or integration, or differentiation. These course notes by Flaherty are in the ODE context, but are written in a general enough way that you can see the applicability to other problems, and they discuss adaptive error estimation briefly at the end. The Richardson.jl package in Julia implements the algorithm in a very general setting, and its documentation includes a number of examples; it's used by the Romberg.jl package for Romberg integration and by the FiniteDifferences.jl package for extrapolating finite-difference approximations. In Python, I'm not currently aware of any general-purpose implementation (though there are implementations in the context of e.g. derivatives or integration).
- pset 3 solutions
- pset 4: due Friday, March 7
Root finding is the problem of solving
The most famous such algorithm is Newton's method, which you probably learned in first-year calculus. The key idea is to use the derivative
In calculus, it was enough to set up this algorithm, try it out, and see that it works pretty well (if you have a reasonable starting point). In numerical analysis, we want to be more precise: how fast does it converge, once
Gave a numerical demo in which we applied Newton's method to find square roots
Another important extension of Newton's method is to systems of
Newton's method works well if you start with a guess that is "reasonably close" to the root. For starting points that are far from the root, however, it can be wildly unpredictable in fascinating ways, leading to Newton fractals.
Further reading: FNC book, chapter 4, especially section 4.3 on Newton's method. A key class of methods that we didn't cover involve what to do when you don't have access to the derivative (or in the case of large nonlinear systems the whole Jacobian matrix might be too expensive to compute or even store). In this case, there are many algorithms:
-
Fixed-point iteration attempts to solve
$g(x)=x$ ($g(x) = f(x) + x$ ) by$x_{k+1} = g(x_k)$ . This may not converge at all unless$|g'(r)| < 1$ , but it is the starting point for the much more robust and fast-converging Anderson-accelerated fixed-point iteration, which itself is closely related to "quasi-Newton" methods. - In 1d, secant methods are Newton-like methods that perform linearization (or higher-order) by interpolating two (or more) points. Generalization to multiple dimensions yields quasi–Newton methods such as Broyden's method and Anderson acceleration.
-
Newton–Krylov methods only require you to compute Jacobian–vector products(jvp's)
$f'(x_k)\delta$ for given$x_k, delta$ , which corresponds to a single directional derivative — this is much cheaper than computing the entire Jacobian in high dimensions, at the price of using an iterative linear solver to find each Newton step. Another important class of methods are numerical continuation methods, which solve a family of root problems$f(x,\lambda) = 0$ for the roots$x(\lambda)$ at each value of the parameter$\lambda$ . Given a root at one$\lambda$ , the simplest algorithm is to use this as the starting point for Newton's method to find the root at a nearby$\lambda$ , but there are much more sophisticated methods such as pseudo-arclength continuation. A modern implementation of numerical continuation can be found in, for example, the BifurcationKit.jl package in Julia. There are many books and reviews on this topic, e.g. Herbert B. Keller, Lectures on Numerical Methods in Bifurcation Problems (Springer, 1988).
- lecture notes/code: (One Note)[https://mitprod-my.sharepoint.com/personal/qiqi_mit_edu/_layouts/15/Doc.aspx?sourcedoc={24321bd2-8c69-451a-b0cc-c2b42aed5743}&action=view&wd=target%28B.%20Initial%20Value%20Problems%2F20250303.one%7C6663d749-3747-45de-bb34-f8393732d800%2F%29&wdorigin=717]
Numerical methods for ordinary differential equations (ODEs). Introduced the concept of an initial value problem
Looked at an example problem
Moreover, we used the "stencil" algorithm from pset 1 to derive a 3rd-order finite difference approximation, and found that the behaviour was even worse than the midpoint rule. Even for a fixed time
Further reading: FNC book, chapter 6, sections 6.1–6.2. FENA book, chapter 4 and section 4.1. You can also find hundreds of other web pages and videos on these topics. 3Blue1Brown has an entertaining introduction to the idea of a differential equation. And here is a nice video about the history of numerical ODE solvers talks about the pioneering contributions of Katherine Johnson and her portrayal in the 2016 film Hidden Figures.
- notes and code: (One Note)[https://mitprod-my.sharepoint.com/personal/qiqi_mit_edu/_layouts/15/Doc.aspx?sourcedoc={24321bd2-8c69-451a-b0cc-c2b42aed5743}&action=view&wd=target%28B.%20Initial%20Value%20Problems%2F20250305.one%7Cd1ccb4e9-d0d8-414c-98cd-f1392b3d5277%2F%29&wdorigin=717] (Code)[https://colab.research.google.com/drive/1fmyyPSp5gImrcj42wxxn3kBqTDavq5qJ?usp=sharing]
Stability of numerical ODE methods. The term "stability" has special meanings in this context, different from other areas of numerical analysis where "stability" is mostly about the sensitivity of an algorithm to roundoff errors. For ODE methods, "stability" refers to sensitivity to truncation error
-
Consistency: Do the local truncation errors of a single time step
$u_{n+1} = \cdots$ go to zero as$O(\Delta t^{1+p})$ for$p > 0$ ? That is, are we correctly approximating$\frac{du}{dt} - f(u,t)$ to some positive order$O(\Delta t^p)$ ? -
Zero stability: With a right-hand sided
$f=0$ , it is zero-unstable if a nonzero initial condition can diverge ($\Vert u_k \Vert \to \infty$ ) as the step$k \to \infty$ (with$\Delta t$ cancelling from the analysis for$f=0$ , so this can be viewed as a divergence as$\Delta t \to 0$ for a fixed$t$ ). -
Linear stability: Later, we will include the right-hand side in the stability analysis by linearizing it when
$u$ is close to a root of$f$ , especially for autonomous ODEs$f(u,t) = f(u)$ . Through this analysis, we will show that some methods diverge for a fixed$\Delta t$ as you increase the total time$t$ , perhaps unless$\Delta t$ is sufficiently small (conditional stability).
An important result is the Dahlquist equivalence theorem (closely related to the Lax (or Lax–Richtmyer) equivalence theorem):
- If a method is both consistent and zero-stable, then it is ("globally") convergent: the approximate solution
$\tilde{u}(t)$ approaches the exact solution$u(t)$ at any time$t$ , as$\Delta t \to 0$ , with a convergence rate$\Vert\tilde{u}(t) - u(t)\Vert = O(\Delta t^p)$ matching the local truncation error.
That is, there are only two ways an ODE method could go badly wrong: either you have a bug in your finite-difference approximation (it's inconsistent with the equation you are discretizing, i.e. not approximating the right equation), or the solution diverges (it's unstable) as
All three of the schemes (Euler, midpoint, and third-order) from last lecture were consistent (with
The trick to analyze zero stability is to set
Showed by this analysis that Euler is zero-stable (its
Furher reading: FNC book, section 6.8: zero stability (which used an alternative framing equivalent to eigenvalues, but without explicitly forming the matrix). FENA book, chapter 4.2 (which used the eigenvalue formulation). For the general problem of analyzing matrix powers and linear recurrences via eigenvalues, you may want to review some material from 18.06: Strang Intro. to Linear Algebra section 6.2, and 18.06 OCW lecture 22 (diagonalization and matrix powers).
- notes and code: One Note Code
- pset 4 solutions
- pset 5: due Friday March 14
Linear stability analysis of ODEs and discretization schemes.
- The exact ODE
$\frac{du}{dt} = \lambda u$ has exponentially growing solutions for$\Re \lambda > 0$ , and non-growing ("stable") solutions for$\Re \lambda \ke 0$ . This analysis extends to linear autonomous ODEs$\frac{du}{dt} = A u$ where$A$ is a (diagonalizable) matrix, since we can just check each eigenvalue$\lambda$ of$A$ . (Later, we will also extend this analysis to nonlinear autonomous ODEs$\frac{du}{dt} = f(u)$ by approximately linearizing$f(u)$ around a root using the Jacobian of$f$ .) - When we discretize the ODE, can plug
$\lambda u$ (or$Au$ ) in for the right-hand side, for a fixed$\Delta t$ , and again use eigenvalues to analyze whether$u_k \approx u(k\Delta t)$ is growing or decaying with$k$ . (This reduces to "zero stability" analysis in the limit$\Delta t \to 0$ , for which the right-hand-side disappears from the formula for$u_k$ .)
In this way, we find that certain discretization schemes are linearly stable (non-growing
To begin with, we analyzed the forward Euler ("explicit") scheme
- For
$\frac{du}{dt} = -u$ ,$\lambda = 1$ so it is stable for$0 \le \Delta t \le 2$ . (This is why it performed so well numerically.) (The fact that it is only stable for certain values of$\Delta t$ in this case is called conditional stability; note that this depends on the right-hand side of the ODE!) - For
$\frac{d^2 u}{dt^2} = -u$ , we saw last time that this is equivalent to the$2 \times 2$ system $\frac{d}{dt} \begin{pmatrix} u \ v \end{pmatrix} = \underbrace{\begin{pmatrix} 0 & 1 \ -1 & 0 \end{pmatrix}_{A} \begin{pmatrix} u \ v \end{pmatrix}$, where$A$ has eigenvalues$\lambda = \pm i$ . In this case$|1 + \lambda \Delta t| > 1$ for all$\Delta t > 0$ , and forward Euler is linearly unstable. It is still zero stable, so it still converges as$\Delta t \to 0$ for any fixed time$t$ ! But instead of oscillating solutions (for the exact ODE), we have solutions that are oscillating and slowly exponentially growing (more and more slowly as$\Delta t$ gets smaller, which allows it to converge). Even though the solutions converge, people are often unhappy with ODE methods that generate exponential growth (not present in the exact ODE) as you run for longer and longer times!
Next, we analyzed the backward Euler ("implicit") scheme
- This is why people use implicit ODE schemes: they are often more complicated to implement, because
$f(u_{k+1})$ appears on the right-hand-side, requiring you to solve for$u_{k+1}$ (which gets expensive as$f$ gets more complicated), but they tend to be more stable than explicit schemes.
Further reading: FENA book, section 4.3.
- notes and code: see links above
Analyzed the linear stability of the midpoint rule, and found that it was only stable for a small range of purely imaginary
To illustrate the utility of "implicit" schemes like backwards Euler, which are more difficult to implement and require a more expensive time-stepping procedure (solving a system of equations on each step, possibly even a nonlinear system), considered a simple example problem involving heat flow. In a system where heat can flow quickly between some components (e.g. metals in contact) but slowly between other components (e.g. between metals and air), one obtains a "stiff" ODE, characterized by a large ratio of timescales (eigenvalues).
With forward Euler (or other explicit methods) in a stiff problem, a small
Further reading: The FENA book, section 4.10, has some further discussion of stiff systems. There are many sources online about methods for stiff equations and implicit ODE methods. See e.g. these course notes from MIT 18.337/6.338, which focuses mainly on ways to construct the Jacobian (to linearize the right-hand-side) and/or efficiently perform the implicit solves on each step for the nonlinear case. The book by Griffiths and Highham (2010), along with many other similar books on numerical ODEs, contains a wealth of information, at a much more formal level than this course.
More on numerical ODEs: Runga–Kutta methods.