6  Further Topics

6.1 Partial Differential Equations

How can we solve Partial Differential Equations numerically?

We have seen how to approximate solutions to systems of Ordinary Differential Equations (ODEs) using various timestepping schemes. Many Partial Differential Equations (PDEs) can also be reduced to large systems of ODEs using the method of lines. In this section, we will illustrate this idea for one of the simplest PDEs: the one-dimensional heat equation.

As in the last Chapter, this topic is also vast. Entire fields of science, such as numerical General Relativity, Computational Fluid Mechanics, and Finite Element Analysis, are essentially subfields within the broad topic of Numerical PDEs. Each of those topics has dozens or hundreds of textbooks written on particular aspects of it. We will therefore be extremely cursory, giving a flavour of some ideas that underlie VisualPDE.com and many standard numerical PDE tools, but leaving a “deep” exploration for future study.

6.1.1 The heat equation and boundary conditions

The one-dimensional heat equation describes the diffusion of temperature \(u(x,t)\) in a homogeneous medium: \[ \frac{\partial u}{\partial t} = \alpha \frac{\partial^2 u}{\partial x^2}, \quad x \in [0,L], \quad t > 0, \] where \(\alpha > 0\) is the diffusivity constant. To fully specify the problem, we must also prescribe initial and boundary conditions.

We will use an initial condition: \[ u(x,0) = u_0(x), \] and homogeneous Neumann boundary conditions: \[ \frac{\partial}{\partial x}u(0,t) = \frac{\partial}{\partial x}u(L,t) = 0, \] which correspond physically to an insulated domain where no heat flows in or out through the boundaries.


6.1.2 Spatial discretisation: finite differences

To apply the method of lines, we first discretise the spatial variable \(x\) using a uniform grid of \(N\) nodes. Let: \[ x_j = j \Delta x, \quad j = 1, 2, \dots, N, \quad \Delta x = L / (N-1), \] and define: \[ u_j(t) \approx u(x_j, t). \] The spatial derivatives in the heat equation can be approximated using finite differences. For the second derivative, a centred difference gives: \[ \frac{\partial^2}{\partial x^2}u(x_j,t) \approx \frac{u_{j-1}(t) - 2u_j(t) + u_{j+1}(t)}{(\Delta x)^2}. \]

For the Neumann boundary conditions, we use the concept of ghost nodes, introducing extra nodes at \(u_0\) and \(u_{N+1}\). To determine these, we approximate the first derivative at the endpoints using forward and backward differences: \[ \frac{u_1 - u_0}{\Delta x} = 0 \quad \Rightarrow \quad u_1 = u_0, \] \[ \frac{u_{N+1} - u_{N}}{\Delta x} = 0 \quad \Rightarrow \quad u_{N+1} = u_{N}. \]


6.1.3 The method of lines formulation

Plugging the finite difference approximation into the heat equation leads to the large system of ODEs for the approximate solution at each gridpoint, \(u_j(t)\): \[ \frac{d}{dt} u_j(t) = \alpha\frac{u_{j-1}(t) - 2u_j(t) + u_{j+1}(t)}{(\Delta x)^2}, \quad j = 1,2,\dots N, \] where we recall that \(u_0=u_1\) and \(u_{N+1} = u_N\), so that two of these equations have a slightly different form.

We can also write this system for the vector: \[ \mathbf{u}(t) = (u_0(t), u_1(t), \dots, u_N(t))^\top. \] We obtain: \[ \frac{d}{dt}\mathbf{u}(t) = \alpha \mathbf{A} \mathbf{u}(t), \] where \(\mathbf{A}\) is the \((N+1)\times(N+1)\) matrix defined by: \[ \mathbf{A} = \frac{1}{(\Delta x)^2} \begin{pmatrix} -1 & 1 & 0 & 0 & \cdots & 0\\ 1 & -2 & 1 & 0 & \cdots & 0\\ 0 & 1 & -2 & 1 & \cdots & 0\\ \vdots & & \ddots & \ddots & \ddots & \vdots\\ 0 & \cdots & 0 & 1 & -2 & 1\\ 0 & \cdots & 0 & 0 & 1 & -1 \end{pmatrix}. \] This matrix represents the discrete Laplacian operator with Neumann boundary conditions.


6.1.4 Time discretisation and stability

For example, applying the forward Euler method gives: \[ \mathbf{u}_{n+1} = \mathbf{u}_n + \Delta t \, \alpha \mathbf{A} \mathbf{u}_n. \] This scheme is simple to implement, but its stability depends strongly on the size of \(\Delta t\). For the one-dimensional heat equation, it can be shown that stability requires: \[ \Delta t \leq \frac{(\Delta x)^2}{2\alpha}. \] Violating this condition leads to rapidly growing (unphysical) numerical oscillations.

Alternatively, we can use the backward Euler scheme, which gives: \[ \mathbf{u}_{n+1} = \mathbf{u}_n + \Delta t \, \alpha \mathbf{A} \mathbf{u}_{n+1}. \] This is an implicit system, requiring the solution of a linear system at each timestep: \[ (\mathbf{I} - \Delta t \alpha \mathbf{A})\mathbf{u}_{n+1} = \mathbf{u}_n. \] However, the backward Euler method is A-stable, and therefore unconditionally stable for the heat equation. This means that we can take arbitrarily large timesteps without introducing instability, though accuracy may still require small \(\Delta t\).


6.1.5 Summary

The method of lines converts a PDE into a large system of ODEs by discretising only in space. The resulting ODE system can then be integrated using any of the timestepping methods from Chapter 5. The stability and accuracy of the full PDE scheme are determined by both the spatial discretisation and the ODE integrator used in time.

On the Blackboard page are several code demos for the Heat equation, using both the matrix approach and a for-loop approach to discretising this equation. A demo is also provided for the Schnakenberg system of reaction-diffusion equations.

6.2 Random Number Generation

How do we generate random numbers on a computer?

The ability to simulate random numbers on a computer is essential for modelling and understanding systems where chance and variability play a central role. Robust and repeatable random number generation is thus central to statistics, finance, queuing systems, cryptography, and a host of algorithms and other applications.

But what exactly do we mean by randomness or random numbers? For our purposes, we will consider “random numbers” to mean sequences of numbers that mimic the statistical properties of realisations of random variables drawn with a given distribution.

6.2.1 Uniform Random Numbers

Recall that a continuous random variable is a “random experiment” that takes values in \(\mathbb{R}^n\) (or some other uncountably infinite set). A discrete random variable is one that takes values in a countably infinite or finite set, like a subset of \(\mathbb{Z}\), for example. A continuous random variable is typically described in terms of its probability density function (PDF), while a discrete random variable is most often described by its probability mass function (PMF).

Example 6.1
Let \(X\) be a uniformly distributed random variable on \([a,b]\). Then \(X\) has probability density function given by \[ f(x) = \begin{cases} \frac{1}{b-a} & \text{if } a \leq x \leq b \\ 0 & \text{otherwise}. \end{cases} \] If \(Y\) is a random variable that is uniformly distributed on the integers \(\{0,\dots,n-1\}\), then its probability mass function is given by \[ P(k) = \mathbb{P}[X = k] = \begin{cases} \frac{1}{n} & \text{if } k \in \{0, 1, \ldots, n-1\} \\ 0 & \text{otherwise}. \end{cases} \]

We will mainly be concerned with continuous random variables, and each continuous real-valued random variable \(X\) has an associated cumulative distribution function (CDF) (distribution function for short) that is defined by: \[ F(x) = \mathbb{P}[X \leq x]. \] Moreoever, for reasonably behaved real-valued random variables, the PDF and CDF can be related by the Fundamental Theorem of Calculus as follows: \[ f(x) = \frac{d}{dx}F(x), \quad F(x) = \int_{-\infty}^x f(y)\,dy. \]

If we know how to create a sample from a uniform distribution, then we can obtain a sample with a given distribution using the following result:

Theorem 6.1: The Inverse Transform Method
Let \(U\) be a uniformly distributed random variable on \([0,1]\), and let \(F(x)\) be a cumulative distribution function. If \(F\) is invertible, then \(X=F^{-1}(U)\) is distributed according to \(F\).

Example 6.2: Inverse Transform Sampling for the Pareto Distribution
The PDF of the Pareto distribution with scale parameter \(x_m > 0\) and shape parameter \(\alpha > 0\) is given by: \[ f(x) = \frac{\alpha x_m^\alpha}{x^{\alpha+1}}, \quad x \geq x_m \]

The cumulative distribution function (CDF) is thus \[ F(x) = 1 - \left( \frac{x_m}{x} \right)^\alpha, \quad x \geq x_m \]

To use the inverse transform method, we need to compute \(F^{-1}(x)\). We know that \[ F(F^{-1}(x)) = x \] and hence we can solve for \(F^{-1}(x)\) as follows: \[ \left(\frac{x_m}{F^{-1}(x)}\right)^{\alpha} - 1 \implies F^{-1}(x) = \frac{x_m}{(1-x)^{1/\alpha}}. \] Thus if \(U\sim \text{uniform([0,1])}\), then \[ \frac{x_m}{(1-U)^{1/\alpha}} \sim \text{Pareto}(x_m,\alpha). \]

Therefore it is often enough to have samples from the uniform distribution and hence this distribution plays a central role in the theory of random number generation.

In some sense, there are no truly random number generators. Computers can only execute algorithms, which are deterministic instructions, and thus they can only yield samples which appear random. We call these numbers pseudorandom numbers, and the algorithms which produce these numbers are called pseudorandom number generators.

The theoretical wish

A generator of genuine random numbers is an algorithm that produces a sequence of random variables \(U_1\), \(U_2\), \(\ldots\) which satisfies:

  1. Each \(U_i\) is uniformly distributed between \(0\) and \(1\).
  2. The \(U_i\) are mutually independent.

Property (ii) is the more important one since the normalisation in (i) is convenient but not crucial. Property (ii) implies that all pairs of values are uncorrelated and, more generally, that the value of \(U_i\) should not be predictable from \(U_1,\ldots, U_{i-1}\). Of course, the properties listed above are those of authentically random numbers; the goal is to come as close as possible to these properties with our artificially generated pseudorandom numbers.

6.2.2 Linear Congruential Generators

An important and simple class of generators are the linear congruential generators, abbreviated as LCGs.
We need the modulo operation in order to define this class of generators.

Definition 6.1
For nonnegative integers \(x\) and \(m\), we call \(y=x \;\text{mod}\;m\) the integer remainder from the division \(x/m\); we will write this as \[ y = x \mod m. \]

Definition 6.2
A linear congruential generator (LCG) is an iteration of the form \[ \begin{aligned} x_{i+1} &= (a x_i + c) \mod m \\ u_{i+1} &= \frac{x_{i+1}}{m} \in (0,1), \end{aligned} \] where \(a\), \(c\), and \(m\) are integers.

Example 6.3
Choose \(a=6\), \(c=0\), and \(m=11\). Starting from \(x_0=1\), which is called the seed, gives \[ 1, 6, 3, 7, 9, 10, 5, 8, 4, 2, 1, 6, \ldots \] Choosing \(a=3\) yields the sequence \[ 1, 3, 9, 5, 4, 1, \ldots \] whereas the seed \(x_0 = 2\) results in \[ 2, 6, 7, 10, 8, 2, \ldots \]

Conditions for a Full Period I

Theorem 6.2
Suppose \(c\neq 0\). The generator has full period (that is, the number of distinct values generated from any seed \(x_0\) is \(m-1\)) if and only if the following conditions hold:

  1. \(c\) and \(m\) are relatively prime (their only common divisor is \(1\)).

  2. Every prime number that divides \(m\) divides \(a-1\) as well.

  3. \(a-1\) is divisible by \(4\) if \(m\) is.

If \(m\) is a power of \(2\), the generator has full period if \(c\) is odd and \(a=4n+1\) for some integer \(n\).

Example 6.4
The Borland C++ LCG has parameters \[ m=2^{32},\quad a=22695477=1+4\times 5673869,\quad c=1. \] Hence, by the corollary above, the LCG has full period.

Conditions for a Full Period II

If \(c=0\) and \(m\) is a prime, then full period is achieved from any \(x_0 \not = 0\) when

  1. \(a^{m-1} - 1\) is a multiple of \(m\),
  2. \(a^j - 1\) is not a multiple of \(m\) for \(j=1, \ldots , m-2\).

If \(a\) satisfies these two properties it is called a primitive root of \(m\). In this situation, the sequence \(\{ x_i \}_{i \geq 1}\) is of the form \[ x_0 , a x_0 , a^2 x_0 , a^3 x_0 , \ldots \quad \mod m, \] given that \(c=0\). The sequence returns to \(x_0\) for the first time for the smallest \(k\) which satisfies \(a^k x_0 \mod m = x_0\). This is the smallest \(k\) for which \(a^k \mod m = 1\), that is \(a^k - 1\) is a multiple of \(m\).

Hence, the definition of a prime root coincides with the requirement that the series does not return to \(x_0\) until \(a^{m-1} x_0\).

Example 6.5: Examples of LCG Parameters
Modulus \(m\) Multiplier \(a\) Reference
\(2^{31} - 1\) \(16807\) Lewis, Goodman, and Miller, Park and Miller
\((=2147483647)\) \(39373\) L’Ecuyer
\(742938285\) Fishman and Moore
\(950706376\) Fishman and Moore
\(1226874159\) Fishman and Moore
\(2147483399\) \(40692\) L’Ecuyer
\(2147483563\) \(40014\) L’Ecuyer

Exercise 6.1
Define an LCG with parameters \[ c=0,\quad m=2^3-1=7,\quad a=3. \] Does this LCG have full period?

Earlier versions of Microsoft Excel (prior to 2003) relied on a linear congruential generator (LCG) for its RAND() function, which produced predictable and statistically non-random outputs that undermined the reliability of numerical simulations, Monte Carlo methods, and statistical sampling. This flaw was widely recognized in the simulation community during the late 1990s and early 2000s, leading Microsoft to update Excel’s random number algorithm in the 2003 release.

In applications, the following considerations are typically the most important when considering the appropriateness of a random number generating scheme:

  • Reproducibility
  • Speed
  • Portability
  • Period Length (if any)
  • Randomness (i.e. robust statistical properties)

Linear congruential generators are no longer (and have not been for some time) used practically, although they are a useful illustration of pseudorandom number generation. The Mersenne Twister family of algorithms is among the most popular in practice, and modern implementations are appropriate for most applications. It is notable for its extremely long period (\(2^{19937} - 1\)), strong statistical properties, and fast performance (see Mersenne Twister: A 623-dimensionally equidistributed uniform pseudorandom number generator by Matsumoto and Nishimura, 1998).

6.2.3 Testing uniformity

Previously, we have only checked for a full period, to see whether a pseudorandom number generator is reasonable or not. In this section, we demonstrate more elaborate means to test the quality of a pseudorandom number generator.

Given a sample from a supposedly uniform distribution, one can use statistical tests to reject the hypothesis of uniformity. The samples provided by a computer are fake since they are totally deterministic, and therefore not random (as such they cannot be uniformly distributed). However, they are so “well chosen”, that they might appear random. Hence we require statistical tests of randomness in order to judge the quality of a candidate PRNG.

6.2.3.1 Chi-Squared Test

Definition 6.3
Let \(k\geq 1\), and let \(X_1,\dots,X_k\) be a sequence of i.i.d. standard normally distributed random variables. The distribution of the sum of squares \[ S=X_1^2+\dots+X_k^2 \] is called chi-square with \(k\) degrees of freedom.

The probability density function of \(\chi(k)\) is given by \[ f(x,k)=\frac{1}{2^{k/2}\Gamma(k/2)}x^{k/2-1}e^{-x/2}, \] where \(\Gamma\) denotes the gamma function, which is defined by \[ \Gamma(\xi)=\int_0^\infty x^{\xi-1}e^{-x}\,\mathrm{d}x. \] For integers \(n \in \mathbb{N}\), we can write the Gamma function in terms of the factorial function as follows: \[ \Gamma(n)=(n-1)!=(n-1)(n-2)\dots 2\cdot 1. \]

Let \[ X_1,X_2,\dots, X_n \] be a sample.

The chi-squared test for uniformity is constituted by the following: - The null hypothesis \(H_0\): The sample is from a uniform distribution against the alternative hypothesis \(H_a\) that it is not. - The test statistic \[ T=\frac{k}{n}\sum_{j=1}^k \left ( n_j-\frac{n}{k}\right)^2, \] where \(k\) is the number of equidistant partitions (“bins”, to be chosen) of the unit interval, given by \[ [0,1/k),\quad [1/k, 2/k),\dots, [(k-1)/k,1]. \] and \(n_j\) is the number of observations in the \(j\)th bin. - The confidence level \(\alpha\) (to be chosen).

The following theorem is stated without proof.

Theorem 6.3
As \(n\rightarrow\infty\), \(T\) converges, in distribution, to the chi-square distribution \(\chi_{k-1}\) with \(k-1\) degrees of freedom.

Example 6.6: Using the Chi-Square Test for Uniformity
Suppose you have a sample of pseudorandom numbers generated on the interval \([0,1]\) and you wish to test if these are consistent with being independent and uniformly distributed.

Let’s set up the test with the following parameters: - The number of bins: \(k = 10\) - The sample size: \(n = 1000\) - The observed counts in each bin: \(n_j\) for \(j = 1, \ldots, 10\)

Step 1: Compute the Test Statistic

Partition the interval \([0,1]\) into 10 bins: \([0, 0.1), [0.1, 0.2), \ldots, [0.9, 1]\).

Count how many values fall in each bin, giving counts \(n_1, n_2, \ldots, n_{10}\).

The theoretical expected count in each bin is \(n/k = 1000/10 = 100\).

The chi-square test statistic is \[ T = \frac{k}{n} \sum_{j=1}^k \left( n_j - \frac{n}{k} \right)^2 \]

Suppose the observed bin counts based on the pseudorandom sample are:

Bin Observed Count \(n_j\)
1 98
2 104
3 112
4 107
5 94
6 95
7 108
8 97
9 96
10 89

Compute the sum: \[ T = \frac{10}{1000} \sum_{j=1}^{10} (n_j - 100)^2 \] \[ T = 0.01 \left[ (98-100)^2 + (104-100)^2 + (112-100)^2 + \cdots + (89-100)^2 \right] \] \[ T = 0.01 [4 + 16 + 144 + 49 + 36 + 25 + 64 + 9 + 16 + 121] \] \[ T = 0.01 \times 484 = 4.84 \]

Step 2: Find the Critical Value

The null hypothesis is rejected if \(T\) exceeds the critical value from the chi-square distribution with \(k-1 = 9\) degrees of freedom at significance level \(\alpha\) (e.g., \(\alpha = 5%\)).

For \(\chi^2_{0.95,9}\), the critical value is approximately 16.919 (which can be computed using the Matlab function chi2inv).

Step 3: Compare \(T\) to the critical value:

  • If \(T < 16.919\), do not reject the null hypothesis (the sample is consistent with being uniform).
  • If \(T > 16.919\), reject the null hypothesis (the sample is not uniform).

In this example: \[ 4.84 < 16.919 \] So we do not reject the null hypothesis; the sample passes the uniformity test.

6.2.3.2 The Kolmogorov–Smirnov Test

Another simple test uses the empirical distribution function of the sample, which we define below:

Definition 6.4
If \(x=(x_1,\dots,x_n)\) is a sample, then the empirical distribution function of \(x\) is given by \[ F_n(x) = \frac{1}{n}\sum_{i=1}^n \mathbb{1}_{(-\infty,x)}(x_i), \quad x \in \mathbb{R}. \]

Example 6.7

Comparison of the empirical and true CDFs for a sample of normally distributed pseudorandom numbers. We observe increasingly good agreement for larger sample sizes, as expected if the PRNG has good statistical properties.

The Kolmogorov–Smirnov test is based on the following intuition: If the sample is uniformly distributed, the deviation of the empirical distribution function from the theoretical distribution function should be small, as shown in the figure above.

The Kolmogorov–Smirnov test for uniformity is constituted by the following: - The null hypothesis \(H_0\): The sample is from a given distribution \(F\) against the alternative hypothesis \(H_a\) that it is not. - The test statistic \[ D_n=\sup_{x\in\mathbb R}\vert F_n(x)-F(x)\vert, \] where \(n\) is the sample size. - The confidence level \(\alpha\) (to be chosen).

As \(n\rightarrow\infty\), \(\sqrt{n} D_n\) converges to \[ \sup_{t\in\mathbb R}\vert B(F(t))\vert, \] in distribution, where \(B\) is a Brownian Bridge (i.e., the quantity \(\sup_{t\in\mathbb R}\vert B(F(t))\vert\) is a random variable).

For \(F(t)=t\), the uniform distribution, the critical values of the Kolmogorov statistics \(D_n\) are known. For large \(n\), the statistics converge in distribution to the so-called Kolmogorov distribution, which satisfies \[ K=\sup_{t\in [0,1]}\vert B(t)\vert. \] In fact, it can be shown that \[ \mathbb P[K\leq x]=1-2\sum_{k=1}^\infty (-1)^{k-1}e^{-2k^2x^2}, \quad x \in \mathbb{R}^+. \]

Along with the statistical tests outlined above, popular test suites such as Diehard and Dieharder provide batteries of statistical tests for evaluating the quality of pseudorandom number generators (PRNGs). These suites assess a PRNG’s output for statistical randomness and are widely used in research and industry to ensure reliability and unpredictability.

6.3 Monte Carlo Methods

6.3.1 A motivating example

The basic idea of the “Monte Carlo Method” is that we can use random sampling to estimate the value of integrals. The method is named for the famous Monte Carlo casinos. The following example illustrates the basic idea, which can be generalised and applied to a wide variety of situations.

Example 6.8
Consider the unit circle in \(\mathbb{R}^2\), i.e. the set of points \[ \{(x,y) : x^2 + y^2 \leq 1 \}. \]

We know that the area of the unit circle is \(\pi\) and we could hence consider the following way to estimate \(\pi\) by random sampling. If we take two indepedent random numbers, \(X_i\) and \(Y_i\), each uniformly random on \([-1,1]\), then \((X_i,\,Y_i)\) is a uniform random number on the square \([-1,1] \times [-1,1]\). Since the distribution is uniform, the probability that a random pair \((X_i,\,Y_i)\) falls into the unit circle is \(\pi/4\).

Looking at the two numerical experiments below, we see that in the left panel, we have \(8\) of \(10\) points falling in unit circle. This gives us an estimate of \(\pi\) of about \(3.2\). For \(N = 1000\), we can’t count so easily by eye but the estimate in fact improves to about \(3.1523\). Put another way, we are estimating the integral \[ \int\int_{x^2 + y^2 \leq 1}^{} dx dy = \int_{-1}^1 \int_{-\sqrt{1-x^2}}^{\sqrt{1-x^2}} dx dy. \] This connection between probabilities, areas and integration is the basis of the Monte Carlow method, and we will spend the rest of this section formalising these connections.

6.3.2 The Monte Carlo Estimator

For a continuous real-valued random variable \(X\) with PDF \(f\), its mean or expectation can be computed via the formula:

\[ \mathbb{E}[X] = \int_\mathbb{R} x f(x)\,dx \]

Moreoever, if we want to calculate an expectation of a random variable \(Y = A(X)\), for some smooth function \(x \mapsto A(x)\), this is given by the related formula:

\[ \mu := \mathbb{E}[A(X)] = \int_{\mathbb{R}} A(x) f(x)\, dx. \]

However, often the function \(A\) is sufficiently complicated that the integral cannot be derived in closed form. For example, in financial applications, \(A\) might be the payoff function of an exotic option or be related to the claims policy on an insurance contract. The Monte-Carlo method (MCM) uses simulations to derive an approximation of \(\mu\) as follows:

Suppose we have a sample from the same distribution as \(X\), namely

\[ X_1,\ldots,X_n. \]

Then

\[ A(X_1), \ldots, A(X_n) \]

is a sample with the distribution of \(A(X)\). The strong law of large numbers says that the sample average converges almost surely to the expectation \(\mu\), i.e.,

\[ \lim_{n \to \infty} \frac{A(X_1) + \cdots + A(X_n)}{n} = \mathbb{E}[A(X)] = \mu, \quad \text{almost surely}. \]

The strong law applies under mild conditions (finite means and variances of the random variables), and allows us to use the sample average as an estimator for the true value of \(\mu\) for large values of \(n\).

Almost sure convergence is a mode of convergence appropriate to sequences of random variables. It is a very strong form of convergence (there are weaker forms) and roughly means that if you repeatedly observe values from a sequence of random variables, eventually the sequence will settle down and stay close to the limit for all future observations.

In practice, one runs a simulation and obtains a sample, say

\[ x_1, \ldots, x_n, \]

and thus a sample from the same distribution as \(A(X)\),

\[ A(x_1), \ldots, A(x_n). \]

For large \(n\), the SLLN therefore allows us to use the sample average, also called the Monte Carlo estimator in this context, as a proxy for \(\mu\), i.e. \[ \hat{\mu}_n := \frac{A(x_1) + \cdots + A(x_n)}{n} \approx \mu. \]

However, each time we run a new simulation the sample average delivers a new value. This begs the question:

6.3.3 How good is the approximation?

The Monte Carlo estimator has several desirable properties as a statistical estimator. In particular, the Monte Carlo estimator is:

  • Unbiased, i.e. \(\mathbb{E}[\hat{\mu}_n] = \mu\), or “the sample mean is equal to the true mean”.
  • Strongly consistent, i.e. \(\lim_{n\rightarrow\infty} \hat{\mu}_n = \mu\) almost surely.

Furthermore, we can quantify the error of the Monte Carlo estimator. If \(\sigma^2 = \text{Var}[A(X)] = \mathbb{E}[(A(X) - \mu)^2]\), the variance of \(A(X)\), was known, then we could use the Central Limit Theorem to quantify the quality of approximation of the Monte Carlo estimator. In fact, \[ \frac{\sqrt{n}}{\sigma}(\hat{\mu}_n-\mu) \approx \mathcal{N}(0,1), \quad \text{as }n \to\infty. \] In practice, we can estimate the variance of \(A(X)\) from random samples as part of the Monte Carlo Method.

Therefore, the \(\alpha\) confidence interval of \(\hat{\mu}_n\) is approximately equal to \[ \mu \pm z_{1-\alpha/2} \frac{\sigma}{\sqrt{n}}, \] where \(z_{\beta}\) is the \(\beta\)-quantile of \(\mathcal{N}(0,1)\) distribution, i.e. \[ \mathbb{P}[- z_{\beta} \leq Z \leq z_{\beta}] = \beta, \quad Z \sim \mathcal{N}(0,1). \]

6.3.4 The practical approach

Recall that for a sample \(y_1, \ldots, y_n\), the sample mean and variance are given by \[ \bar{y} = \frac{y_1 + \cdots + y_n}{n} \qquad s = \frac{\sum_{k=1}^{n}(y_k - \bar{y})^2}{n-1}. \]

Often, one does not know the precise values of \(\mu\) or \(\sigma^2\) (especially \(\mu\), otherwise we don’t need to run Monte-Carlo simulations at all!). One thus replaces \(\mu\) and \(\sigma^2\) by their empirical estimates as follows: \[ \hat{\mu}_n \pm z_{1-\alpha/2} \frac{\hat{\sigma}_n}{\sqrt{n}}, \] where \(\hat{\mu}_n, \hat{\sigma}_n^2\) are the empirical mean and variance of the given sample \[ A(x_1), \ldots, A(x_n). \]

The quantity \(\hat{\sigma}_n/\sqrt{n}\) is called standard error. Summarizing, we have \[ S.E. = \sqrt{\frac{\sum_{i=1}^n (A(x_i) - \hat{\mu}_n)^2}{n(n-1)}}. \]

The standard error expression tells us that the error scales like \(n^{-1/2}\). For example, if we wish to make our estimate 10 times more accurate we need to increase \(n\) by a factor of 100! This is a key feature of the Monte Carlo method and persists even in a higher dimensional setting. In one-dimensional the simple trapezoidal rule has an error which scales like \(n^{-2}\) and is hence far superior to the MCM. However, in \(d\)-dimensions the trapezoidal rule error scales like \(n^{-2/d}\) while the MCM retains its \(n^{-1/2}\) error scaling. Therefore MCMs are attractive methods for quickly and accurately evaluating higher dimensional integrals whose closed form expressions are not available.

Example 6.9
Let \(X\) be a uniformly distributed r.v. on the unit interval and consider \[ \mu = \mathbb{E}[X^2]. \]

For size \(n\), let us sample from the uniform distribution, which gives us \[ x_1, x_2, \ldots, x_n \] independent copies of \(X\). The Monte Carlo estimator for \(\mu\) is just the sample average \[ \mu \approx \frac{x_1^2 + \cdots + x_n^2}{n} =: \hat{\mu}_n. \]

The theoretical value of \(\mu\) can actually be calculated explicitly, and it is given by \[ \mu = \int_0^1 x^2 f(x)\,dx = \int_0^1 x^2 dx = \left.\frac{x^3}{3}\right|_0^1 = \frac{1}{3}, \] where we recall that the uniform distribution has density \(f \equiv 1\) on \([0,1]\).

The variance \(\sigma^2\) of the Monte Carlo estimator is the Variance of \(X^2\) divided by \(n\), \(\sigma^2/n\), where \[ \sigma^2 = \mathbb{E}\left[(X^2 - \mu)^2\right] = \mathbb{E}[X^4] - \mathbb{E}\left[(X^2)\right]^2 = 1/5 - (1/3)^2 = \frac{9-5}{45} = \frac{4}{45}. \]

We conclude that the \(\alpha\) confidence interval of \(\hat{\mu}_n\) is approximately \[ \mu \pm z_{1-\alpha/2} \frac{2}{\sqrt{9n}}. \]

6.3.5 Application: Stochastic Processes

Random variables are our standard mathematical model or formalisation of a random experiment. However, in many real-world scenarios we are observing a collection of observations from the same random experiment as time evolves. Hence, we need the concept of a stochastic process, which is simply a collection of random variables indexed by time. The simplest example is the random walk:

Example 6.10: Random walk

We take time to be discrete, i.e. the set of times is \(\{0,1,2,\dots\}\). A simple symmetric random walk can be written as \[ S_0 = 0, \quad S_n = Z_1 + Z_2 + \cdots + Z_n, \] where each \(Z_i\) is an independent random variable taking the value \(+1\) or \(-1\) with equal probability. Given that we know how to simulate random numbers on a comuputer, we can immediately simulate a random walk directly from the formula above.

Three simulated paths of a symmetric random walk.

The Monte Carlo method for estimating integrals using repeated random sampling is especially useful for estimating complicated integrals arising in applications. In many cases, closed form expressions are difficult or impossible to obtain, but we can easily approximate their values with simple direct simulations.

Example 6.11: Expected meal time for a hungry forager
Suppose an animal starts foraging for food in a meadow. We suppose they move within a patchy landscape of possible food locations, i.e. each point \((i,j)\) for \(i,j \in \mathbb{Z}_+\) is a patch, with their starting coordinate denoted by \((0, 0)\). At each step, the animal moves to one of four neighboring patches (up, down, left, right) with equal probability. There is food located in the square \(\mathcal{D}\) of patches with vertices \([4,4]\), \([4,8]\), \([8,4]\), \([8,8]\). To ensure that the forager stays in \(\mathbb{Z}^2_+\), we must impose a reflecting boundary condition on the random walk, so that if the forager tries to take a step that would result in them leaving \(\mathbb{Z}^2_+\), they are pushed back inside.

How long do we expect it to take the animal to successfully secure some food?

If \(S_n\) denotes the position of forager at time \(n \in \mathbb{N}\), we are looking to estimate the expected value (mean) of the hitting time \(\tau\), where \[ \tau := \inf_{\{n \geq 0\}}\{ S_n \in \mathcal{D} \}. \] It is challenging (but possible) to write down an analytic expression for the value of \(\mathbb{E}[\tau]\). However, this becomes increasingly hard for more realistic models of animal movement! Instead we will directly simualate the random walk the forager is taking and record the number of steps it takes to reach \(\mathcal{D}\).

If \(\hat{\tau}_i\) is the number of steps to reach \(\mathcal{D}\) in simulation \(i\), then our Monte Carlo approximation is this \[ \mathbb{E}[\tau] \approx\frac{\hat{\tau}_1 + \hat{\tau}_2 + \dots \hat{\tau}_n}{n}. \]

Three simulated paths of the hungry forager searching for food.

Simulating the process gives us \(\tau \approx 113\) and we can also use the values of \(\hat{\tau}_i\) to make the following plot of the PDF of \(\tau\):