5 Differential Equations – Computational Mathematics II (MATH2731)

How can computers solve differential equations?

Almost all differential equations which arise in mathematics and its applications do not have exact analytical solutions. This is a central motivation for numerical analysis, as well as the historical development of increasingly powerful computers. Most scientific computing languages have pre-built packages for solving differential equations quickly and accurately using numerical approximations, and there are entire classes of software packages designed to do this for specialised industries, such as aerospace or finance. The goal of this Chapter is to understand the fundamentals of numerical timestepping, so that you can understand properties of these numerical methods, and hence so you can choose which to use for a given problem.

This topic is vast, with many textbooks detailing aspects of numerical differential equations from a variety of theoretical or applied perspectives. We will focus on building up the mathematical terminology of different classes of solvers, such as those available in MATLAB and described in detail here, as well as the basics of convergence theory and error analysis.

Symbolic tools can solve many of the analytically-tractable classes of differential equations using a variety of algorithms and heuristics, but these are such a limited set of equations that they are not as frequently used outside of theoretical areas. The MATLAB function desolve can be used to solve a reasonably large class of ODEs symbolically.

5.1 Basic Concepts

We will focus on general systems of $n$ first-order differential equations of the form \[ \dot{\mathbf{u}} = \frac{\mathrm{d}\mathbf{u}}{\mathrm{d}t} = \mathbf{f}(t,\mathbf{u}), \quad \mathbf{u}(t_0)=\mathbf{u}_0 \in \mathbb{R}^n. \] Many general classes of equations can be written in this form, and such systems also arise when discretising partial differential equations, as well as other kinds of models involving derivatives. When $\mathbf{f}(t,\mathbf{u})=\mathbf{f}(\mathbf{u})$ (i.e. does not depend on time) we call the system autonomous.

It is reasonable to then ask: Why do we only focus on first order differential equations? This ie because any ordinary differential equation of order $n$ can be written as a system of $n$ first-order ordinary differential equations by introducing new variables to represent each derivative up to order $n-1$. This is a standard step in both numerical and analytical solution methods as most modern solvers only handle systems of first-order ODEs.

Suppose we have an $n$th order ODE: \[ u^{(n)} = f\big(t, u, u', u'',\dotsc, u^{(n-1)}\big), \] where $u^{(n)} := d^nu(t)/du^n$. Now introduce the auxiliary variables: \[ y_0 = u, \quad y_1 = u', \quad \ldots, \quad y_{n-1} = u^{(n-1)} \] Then, the new system is: \[ \begin{aligned} y_0' &= y_1 \\ y_1' &= y_2 \\ &\;\;\vdots \\ y_{n-2}' &= y_{n-1} \\ y_{n-1}' &= f\big(t, y_0, y_1, \dotsc, y_{n-1}\big) \end{aligned} \] This approach allows the use of vectorized notation and standard solution methods for systems of first-order ODEs.

Example 5.1

Convert the third-order ODE \[ u''' = t + 2u + 3u'' \] with initial conditions $u(0) = 1$, $u'(0) = 2$, $u''(0) = 3$, into a system of three first-order ODEs.

Let \[ y_0 = u \qquad y_1 = u' \qquad y_2 = u'', \] and then deduce that \[ \begin{aligned} y_0' &= y_1 \\ y_1' &= y_2 \\ y_2' &= t + 2y_0 + 3y_2 \end{aligned} \] with initial conditions \[ y_0(0) = 1, \quad y_1(0) = 2, \quad y_2(0) = 3. \] We can also represent this system in matrix form as follows: \[ \frac{d}{dt} \begin{bmatrix} y_0 \\ y_1 \\ y_2 \end{bmatrix} = \begin{bmatrix} 0 & 1 & 0 \\ 0 & 0 & 1 \\ 2 & 0 & 3 \end{bmatrix} \begin{bmatrix} y_0 \\ y_1 \\ y_2 \end{bmatrix} + \begin{bmatrix} 0 \\ 0 \\ t \end{bmatrix}. \]

5.1.1 Preliminary ODE theory

Before discussing practical aspects of solving such equations, recall the basic existence and uniqueness theory. For ODEs this theory is not too complicated, provided the right-hand side is sufficiently well behaved.

Theorem 5.1

Picard–Lindelöf theorem.
Let $\mathcal{D}\subset\mathbb{R}\times\mathbb{R}^n$ be an open rectangle with interior point $(t_0,\mathbf{u}_0)$. Suppose $\mathbf{f}:\mathcal{D}\to\mathbb{R}^n$ is continuous in $t$ and Lipschitz continuous in $\mathbf{u}$ for all $(t,\mathbf{u})\in\mathcal{D}$. Then there exists $\varepsilon>0$ such that the initial-value problem above has a unique solution $\mathbf{u}(t)$ for $t\in[t_0-\varepsilon,t_0+\varepsilon]$.

Essentially, this says that if $\mathbf{f}$ is sufficiently nice then the ODE has a unique solution for each initial condition. The result follows from the contraction-mapping theorem or via an iterative scheme. If $\mathbf{f}$ is globally Lipschitz (same Lipschitz constant for all $\mathbf{u}\in\mathbb{R}^n$) and continuous for all $t\in\mathbb{R}$, then the solution exists for all $t$. For autonomous systems, solution curves $\mathbf{u}(t)\in\mathbb{R}^n$ cannot cross.

While this is theoretical, the theorem is important: there are ODEs without solutions or with non-unique solutions, and for PDEs existence and uniqueness are often much harder (and can fail).

Millennium Prize Problem: Navier–Stokes Equations

The Clay Mathematics Institute has offered a $1,000,000 prize for resolving one of the most important open problems in mathematics:

Do smooth and uniquely determined solutions always exist for the three-dimensional, incompressible Navier–Stokes equations, given reasonable initial data?

The Navier-Stokes equations are the fundamental equations of fluid mechanics. Resolving this longstanding problem would be a huge achievement in pure mathematics and would also revolutionise our understanding of turbulence in physics.

Example 5.2

The ODE given by \[ \frac{\mathrm{d}^2u}{\mathrm{d}t^2} = \sqrt{u}, \quad u(0) = \frac{\mathrm{d}u}{\mathrm{d}t}(0) = 0, \] which is equivalent to the first-order system \[ \frac{\mathrm{d}u}{\mathrm{d}t} = v, \quad \frac{\mathrm{d}v}{\mathrm{d}t} = \sqrt{u}, \quad u(0) = v(0) = 0, \] is a famous example of a system with non-unique solutions. Namely, for any $T > 0$, there are infinitely many solutions to this equation given by \[ u(t) = \begin{cases} 0 & t < T,\\ \frac{1}{144}(t-T)^4 & t \geq T, \end{cases} \] in addition to the solution $u(t) = 0$. This is also known as Norton’s Dome, which has an amusing interpretation as a model in Newtonian mechanics that breaks the notion of causality, as it represents a situation where a particle can spontaneously start moving after an arbitrary amount of time.

5.2 Finite Difference Methods

Moving to practical questions, we now consider how to approximate solutions to the general ODE system using a computer. We have already seen in the previous chapter how to approximate the derivative using what are known as finite differences. These approximations are the main ingredients we will use to develop our numerical approaches.

The most well-known numerical method is commonly referred to as the forward Euler method, which is given by taking the forward difference operator on the left-hand side of the ODE system and rearranging the equation to obtain: \[ \mathbf{u}(t + \Delta t) = \mathbf{u}(t) + \Delta t \mathbf{f}(\mathbf{u}(t)), \] where we are now using $\Delta t$ to denote a small time step, rather than $h$. This is essentially the same approximation for the gradient of a function illustrated in Chapter 4, except now for the solution of an ODE. We can then iterate this formula starting at the initial condition to find an approximate solution at an arbitrary time $t$.

We can use a more natural notation for this approximation by taking $t \approx n\Delta t$ and letting $\mathbf{u}_n \approx \mathbf{u}(n\Delta t)$. The forward Euler method can then be written as the iteration: \[ \mathbf{u}_{n+1} = \mathbf{u}_n + \Delta t \mathbf{f}(\mathbf{u}_n). \]

How accurate is this method for approximating the true solution? How can we know that this method is convergent; that is, if we take $\Delta t \to 0$, can we ensure that $\mathbf{u}_n \to \mathbf{u}(t)$? These are more subtle issues compared to numerical differentiation, as here we are using previous approximations for each subsequent timestep, and the dynamics of these schemes can play important roles in determining convergence. Let’s consider a simple example to illustrate these ideas.

5.2.1 The van der Pol Oscillator

The van der Pol oscillator is given by \[ \frac{\mathrm{d}^2 u}{\mathrm{d} t^2} - C\left(1-u^2 \right )\frac{\mathrm{d} u}{\mathrm{d} t} + u = 0, \] which can be converted to the first-order system: \[ \frac{\mathrm{d} u}{\mathrm{d} t} = v, \quad \frac{\mathrm{d}v}{\mathrm{d} t} = C\left(1-u^2\right)\frac{\mathrm{d} u}{\mathrm{d} t} - u. \]

For $C=0$, this is the simple harmonic oscillator. For $C>0$, the extra term means that for $u>1$, the oscillations are damped, but for $u<1$, there is an extra force due to “negative damping” driving the system away from $u=0$.

We can first solve the simple case of $C=0$ using the forward Euler scheme. The code for this can be compared with a high-order scheme that MATLAB has built-in called ode45.

% Forward Euler implementation for the simple harmonic oscillator
% Solving the van der Pol equation using Matlab's ode45 command

C = 0; % Parameter C in the model.
% The ODE rhs function as an "anonymous function".
f = @(t,u)  [u(2); -C*(u(1)^2 - 1).*u(2) - u(1)];

U0 = [2; -0.65];  % Initial condition
tspan = linspace(0,20,1e3); % Time span

% Set low tolerances to ensure an accurate solution
options = odeset('RelTol',1e-11,'AbsTol',1e-11); 
[~,u_ode45] = ode45(f, tspan, U0);

% Forward Euler setup (timestep, vector of solution points etc)
dt = 0.15; n = round(tspan(end)/dt); u_Euler = NaN(2,n);

u_Euler(:,1) = U0; % Initial condition for Euler method
for i = 1:n-1
    t = (i-1) * dt; % Current time (NB: Unused currently!)
    u_Euler(:,i+1) = u_Euler(:,i) + dt * f(t, u_Euler(:,i));
end

Running this code and plotting the outputs, we obtain the following graph:

The inset shows that the first few steps of the method match the solution reasonably well. However, over time it appears that the error builds up, and the amplitudes grow (despite the correct solution having the same amplitude for all time). Reducing the timestep will improve this, but eventually the amplitude will always begin to grow, at a rate which will become approximately exponential.

We can see how this scheme behaves with the timestep more clearly by considering the nonlinear van der Pol oscillator with $C=3$, shown below for four different choices of timestep $\Delta t$:

We observe convergence towards a similar solution profile as $\Delta t$ decreases. However, there is still a buildup of error as $t$ increases, meaning that we will need to consider local errors over one timestep, as well as global errors over iterative schemes for many timesteps.

We can formalize these ideas in terms of the local truncation error (LTE), which is essentially how much the approximation given in the forward Euler method fails to exactly satisfy the ODE. We can compute this by substituting in the true solution, $\mathbf{u}(t)$, and using Taylor series expansions to determine the error.

Example 5.3

LTE of forward Euler

Expanding $\mathbf{u}(t)$ in a Taylor series, we find: \[ \mathbf{u}_{n+1} = \mathbf{u}(t+\Delta t) = \mathbf{u}(t) + \Delta t \frac{\mathrm{d}\mathbf{u}}{\mathrm{d}t} + O(\Delta t^2) = \mathbf{u}_n + \Delta t \mathbf{f}(\mathbf{u}_n), \] which implies that this method has an LTE of $O(\Delta t^2)$. Note, however, that we can do precisely the same calculation before rearranging (via the approximation of the derivative directly) as: \[ \frac{\mathbf{u}_{n+1} - \mathbf{u}_n}{\Delta t} = \frac{\mathbf{u}(t+\Delta t) - \mathbf{u}(t)}{\Delta t} = \frac{\mathrm{d}\mathbf{u}}{\mathrm{d}t} + O(\Delta t) = \mathbf{f}(\mathbf{u}(t)), \] which implies an LTE of $O(\Delta t)$. These two definitions are both used, despite being somewhat inconsistent. We will adopt the former definition, which is sometimes called the single-step error.

An integration scheme which has an LTE of the form $O(\Delta t)$ or smaller is called consistent. Essentially, a consistent scheme is one where the approximations are equivalent to a collection of Taylor series approximations, and hence, subject to various smoothness assumptions, we expect to be able to make the error tend to $0$ for small enough time steps. However, consistency is not enough to ensure that a numerical scheme converges to the analytical solution of the original ODE.

5.3 Stability of Finite Difference Schemes

A consistent scheme may fail to converge to the true solution if the method is not stable. To illustrate stability, we can explore exactly the plot above of the van der Pol oscillator, but for $$C=4$ instead. The code gives us this output:

The solutions for $\Delta t \leq 0.01$ appear very much as they did before, but the solution for $\Delta t = 0.1$ now rapidly increases. MATLAB says that it has reached $u_{70} \approx 10^{172}$, meaning that after 70 timesteps (i.e., $t \approx 7$), it has blown up in such a way that numerical calculations on these values are no longer meaningful. This illustrates a general property of numerical methods for differential equations known as stability, which is essentially the idea that we want numerical methods not to make successive iterations worse by compounding small errors. To be precise, let’s focus on a particular equation which serves as a model for this phenomenon.

5.3.1 The Dahlquist Problem

We consider the Dahlquist test problem, which is the following simple linear equation: \[ \frac{\mathrm{d}u}{\mathrm{d}t} = \lambda u, \] where $\lambda \in \mathbb{C}$ is a given complex parameter. While this equation is simple, it gives insight into any solution of our original problem if we “linearize” the equations around a single point in time. Importantly, this problem also allows us to make analytical progress in determining the stability of a general numerical scheme. Analytically, we expect the ODE to have decaying solutions for $\Re(\lambda)<0$, but the iterations themselves can sometimes cause the opposite behavior of solutions growing without bound. It is easiest to see this by working with an example.

Example 5.4

Stability of forward Euler

Let’s apply the forward Euler method to the Dahlquist problem. We have the iterations: \[ u_{n+1} = u_n + \Delta t \lambda u_n = (1+\Delta t \lambda)u_n = (1+\Delta t \lambda)^n u_0, \] where we have explicitly “solved” the difference equation by iterating it back to the initial condition. Importantly, this will now blow up if $|1 + \Delta t \lambda| \geq 1$, meaning that for real $\lambda$, stability requires \[ -2 < \Delta t \lambda < 0. \]

Example 5.5

A consistent but unstable method

Consider the following numerical method for the test problem: \[ u_{n+1} = -4u_n + 5u_{n-1} + \Delta t(4\lambda u_n + 2\lambda u_{n-1}), \] which can be rewritten as: \[ u_{n+1} = 4(\Delta t \lambda - 1)u_n + (5 + 2\Delta t \lambda)u_{n-1}. \] We can solve this difference equation by rewriting it as a system, or by noting that it must have two solutions of the form $u_n = C\mu^n$ for some $\mu$ (analogous to how linear ODEs have solutions of the form $e^{\mu t}$). Substituting this in, and dividing by $u_n$, we find that $\mu$ satisfies: \[ \mu^2 = 4(\Delta t \lambda - 1)\mu + (5 + 2\Delta t \lambda), \] which simplifies to: \[ \mu = 2(\Delta t \lambda - 1) \pm \sqrt{4(\Delta t \lambda)^2 - 6 \Delta t \lambda + 9}. \] For $|\Delta t| \ll 1$, we see that one of these solutions approaches $\mu \approx -5$, so that $u_n \sim (-5)^n$, implying that we expect this scheme to be unstable for all $\lambda$ as long as $\Delta t$ is sufficiently small. With more work, we can in fact show that at least one of the solutions $\mu$ will always have $|\mu| \geq \sqrt{19} > 1$ for all $\lambda \in \mathbb{C}$ and $\Delta t \in \mathbb{R}$, so this scheme is always unstable.

5.3.2 Stability Regions and A-Stability

For a given numerical method, we can use the Dahlquist problem to define stability regions for complex $\lambda$. These essentially tell us for what values of $\lambda$ we expect the numerical method to behave well. Importantly, this also gives us insight into how they impact the growth of truncation errors, which is why these stability regions apply to the method more generally.

We see that, as long as $\Re(\lambda)<0$, then for sufficiently small $\Delta t$, this scheme is stable. In practice, this criterion means that we have to take extremely small time steps. A numerical method that is stable for all $\Re(\lambda)<0$ is called A-Stable, which is a very desirable property. For such methods, we can (usually) mostly ignore aspects of stability and decide on suitable timesteps based solely on accuracy considerations.

The simplest example of an A-Stable alternative is to consider a slight modification of the Euler scheme, where instead of taking a forward derivative, we take a backward derivative. This results in the backward Euler scheme: \[ \mathbf{u}_{n+1} = \mathbf{u}_n + \Delta t \mathbf{f}(\mathbf{u}_{n+1}). \] While this method looks very similar, in practice it is harder to implement as it is an implicit scheme, where the unknown $\mathbf{u}_{n+1}$ appears inside of the function on the right-hand side. This means that to actually take a step, we have to solve the problem: \[ \mathbf{G}(\mathbf{u}_{n+1}) := \mathbf{u}_{n+1} - \Delta t \mathbf{f}(\mathbf{u}_{n+1}) = \mathbf{u}_n \implies \mathbf{u}_{n+1} = \mathbf{G}^{-1}(\mathbf{u}_n), \] where $\mathbf{G}^{-1}$ is the inverse of the function $\mathbf{G}$. In practice, this becomes a (possibly high-dimensional) rootfinding problem, though for some special functions $\mathbf{f}$, it may be possible to evaluate it directly (e.g., when $\mathbf{f}$ is a linear function).

You can show (try this yourself) that this scheme has the same LTE as the forward Euler method. But its real advantage is in its stability.

Example 5.6

Stability of backward Euler

As before, we have the iterations: \[ u_{n+1} - \Delta t \lambda u_{n+1} = u_n \implies u_{n+1} = \frac{1}{1-\Delta t \lambda}u_n = \left| \frac{1}{1-\Delta t \lambda} \right|^n u_0. \] Now, for any complex number $z$ with $\Re(z)<0$, we have that $|1-z|^2 = 1 - 2\Re(z) + z^2 > 0$. Hence, as $\Delta t > 0$, if $\Re(\lambda)<0$, then we have that this scheme is A-Stable.

5.3.3 Convergence

In general, it is difficult to prove that a numerical scheme converges in a suitable sense to the actual solution of an ODE system, as this requires some heavy machinery from analysis involving how the solutions of difference equations can embed into spaces of functions properly. One way to short-circuit these deep issues, however, is the use of the following theorem.

Theorem 5.2

Lax Equivalence Theorem
Assume that the system satisfies the conditions of the Picard–Lindelöf theorem. A finite difference method converges (in a suitable sense) to the unique solution of the system \[ \dot{\mathbf{u}} = \frac{\mathrm{d}\mathbf{u}}{\mathrm{d}t} = \mathbf{f}(t, \mathbf{u}), \quad \mathbf{u}(t_0) = \mathbf{u}_0 \in \mathbb{R}^N, \] as $\Delta t \to 0$ if the method is both consistent and stable.

One important caveat here is that this theorem only holds in the theoretical case of infinite-precision arithmetic; rounding error can build up if we need to simulate the equation for a long time interval, or to use very small time steps (as we have to for methods that are not A-stable). There is a related concept known as global truncation error, which is one way to keep track of how the LTE can build up over many timesteps, but we will not discuss this further except to say that the LTE being sufficiently small is usually the most desirable measure of accuracy.

5.4 Higher-Order Methods

So far we have discussed just the Euler stepping methods, which both have an LTE of $O(\Delta t)$. There are a variety of higher-order methods commonly used, which broadly fall into classes of explicit or implicit as described before, as well as single-step and multi-step. Here we will focus on explicit methods and give an example from each class. For simplicity of notation, let’s consider a single ODE (i.e., $N=1$), and let’s only consider autonomous problems where $\dot{u} = f(u)$. While the basic ideas will be essentially identical for larger systems, adding in explicit time-dependence will make the formulae messier (but conceptually the same).

5.4.1 Single-step (Runge-Kutta) methods

One can take a single integration step broken up over different points within an interval. Let $t_n = n\Delta t$ be the current time corresponding to the approximate solution $u_n$. Then these methods can be written as: \[ u_{n+1} = u_n + \Delta t \sum_{i=1}^s a_i k_i, \quad k_1 = f(u_n), \quad k_i = f\left(u_n + \Delta t \sum_{j=1}^{i-1} a_{ij} k_j\right). \] These schemes are typically referred to as explicit Runge-Kutta methods (specifically having order $s$). They involve taking a single timestep using multiple function evaluations, essentially in order to get a better approximation of the derivative. They are derived in a similar way to how higher-order differentiation schemes can be derived; namely by taking different points within the interval of one timestep, and considering function evaluations and Taylor series on those points. It is easiest to see this with an example.

Example 5.7

Derivation of RK2 Methods

Let’s consider three time points $t_n, t_n+\Delta t/2,$ and $t_{n+1} = t_n + \Delta t$. In this case, we can expand the Taylor series for $u(t_n+\Delta t)$ as: \[ \begin{aligned} u(t_n+\Delta t) &= u(t_n) + \Delta t \dot{u}(t_n) + \frac{\Delta t^2}{2} \ddot{u}(t_n) + O(\Delta t^3) \\ &= u(t_n) + \Delta t f(u(t_n)) + \frac{\Delta t^2}{2} f_u(u(t_n)) \dot{u}(t_n) + O(\Delta t^3), \end{aligned} \] where we have used the chain rule to evaluate $\ddot{u}(t_n)$. The goal will be to derive a formula which includes the $O(\Delta t^2)$ term explicitly, so that the method is exact up to this order. We perform a Taylor expansion of $f$ evaluated at the midpoint as: \[ \begin{aligned} &f(u(t_n) + (\Delta t/2) f(u(t_n))) \\ = &f(u(t_n)) + \frac{\Delta t}{2} f_u(u(t_n)) \dot{u}(t_n) + O(\Delta t^2). \end{aligned} \] Now, if we multiply this expansion by $\Delta t$, we see that it exactly matches the expansion of $u(t+\Delta t)$ up to $O(\Delta t^2)$. So if we consider the numerical scheme: \[ u_{n+1} = u_n + \Delta t f(u_n + (\Delta t/2) f(u_n)), \] we see that this scheme has an LTE of $O(\Delta t^2)$, which is an improvement over the Euler methods at the cost of one additional function evaluation.

One can work out that this method also very slightly improves the region of numerical stability compared to forward Euler, though it is not an A-stable method and hence will require sufficiently small timesteps to be stable.

5.4.2 Linear multistep methods

One disadvantage of the Runge-Kutta methods is that they require multiple function evaluations to arrive at a single timestep with a desired accuracy. A different approach is to use information from multiple timesteps to predict the next one in such a way that higher-order accuracy is maintained. Such methods are known as linear multistep methods, which have the general form: \[ u_{n+s} = \Delta t \sum_{j=0}^s \beta_j f(u_{n+j}), \] where $\beta_j$ are constants chosen to ensure the scheme is consistent, has a good LTE, etc. If $\beta_k = 0$, then the method is explicit, as everything on the right-hand side involves earlier function values. An example of such a method is the two-step Adams-Bashforth scheme given by: \[ u_{n+2} = u_{n+1} + \frac{\Delta t}{2} \left( 3 f(u_{n+1}) - f(u_n) \right). \] One can work out that this method is second-order accurate, just like the RK2 method above, but only requires one new function evaluation at each timestep as opposed to two.

One disadvantage to these methods is that the initial-value problem given in the ODE system is not enough data to start them, as it only contains the solution at one time point $u_0$. So a typical method is to use a high-order Runge-Kutta scheme to start a multistep method.

5.5 Beyond the Basics

What we have covered here is a very brief sketch of the core ideas which were all mostly well-understood before 1960 or so. Since then, enormous progress has been made on a number of topics to develop increasingly sophisticated algorithms for simulating differential equations. A key idea from a “user” perspective is to be aware of how the stability of a method is important, but often difficult to ensure. There are no explicit methods of the forms described which are A-stable, and all implicit methods require some form of (often difficult) rootfinding.

A stiff differential equation is one which is difficult to integrate from a stability perspective; often these correspond to equations which have large (in magnitude) values of $\lambda$ when linearized, though making the idea precise to cover all known cases is not so easy.