Global sensitivity analysis

Global sensitivity analysis is a Monte Carlo based method to rank the importance of parameters in a given modelling problem. As opposed to local senstivity analysis, it does not require the construction of the Jacobian, making it a flexible tool to evaluate complex problems.

Global sensitivty analysis is available in mainly uncertainty quantificaiton packages, as well as some flow and transport programs (e.g. iTOUGH2). GSA is also very popular in catchment modelling and civil engineering/risk analysis problems.

Some GSA work in hydrogeophysics (mainly by Berkeley Lab):

  • coupled hydrological-thermal-geophysical inversion (Tran et al 2017) > Nicely show how to simplify (i.e. reduce the number of parameters) for a very complex, highly coupled problem

  • making sense of global senstivity analysis (Wainwright et al 2014) > Very good review article

  • Sensitivity analysis of environmental models (Pianosi et al 2014) > A systematic review, includes GLUE and RSA

  • hydrogeology of a nuclear site in the Paris Basin (Deman et al 2016) > A different GSA method was used instead here to look at the low probability breakthrough events

  • Global Sensitivity and Data-Worth Analyses in iTOUGH2 User’s Guide (Wainwright et al 2016) > An useful manual if you want to learn about the details of setting up a probllem.

In this tutorial, we will see how to link the RESiPy API and SALib for senstivity analysis. Two key elements of SA are (i) forward modelling (Monte Carlo runs) and (ii) specifying the parameter ranges. This notebook will showcase of the use of the Method of Morris, which is known for its relatively small computational cost. This tutorial is modified from the one posted on https://github.com/SALib/SATut to demonstrate its coupling with RESiPy

Morris sensitivity method

The Morris one-at-a-time (OAT) method (Morris, 1991) can be considered as an extension of the local sensitivity method. Each parameter range is scaled to the unit interval [0, 1] and partitioned into \((p−1)\) equally-sized intervals. The reference value of each parameter is selected randomly from the set \({0, 1/(p−1), 2/(p−1), …, 1−Δ}\). The fixed increment \(Δ=p/{2(p−1)}\) is added to each parameter in random order to compute the elementary effect (\(EE\)) of \(x_i\)

\[ \begin{align}\begin{aligned}EE_i=\frac{1}{\tau_y}\frac{f(x_1*,...,x_i*+\Delta,...,x_k*)-f(x_1*,...,x_k*)}{\Delta}\\where :math:`{x_i⁎}` is the randomly selected parameter set, and τy is the output-scaling factor. To compute EEi for k parameters, we need (k+1) simulations (called one “path”) in the same way as that of the local sensitivity method. By having multiple paths, we have an ensemble of EEs for each parameter. The total number of simulations is r(k+1), where r is the number of paths.\end{aligned}\end{align} \]

We compute three statistics: the mean \(EE\), standard deviation (STD) of \(EE\), and mean of absolute \(EE\). * mean EE (:math:`mu`) represents the average effect of each parameter over the parameter space, the mean EE can be regarded as a global sensitivity measure. * mean |EE| (:math:`mu*`) is used to identify the non-influential factors, * STD of EE (:math:`sigma`) is used to identify nonlinear and/or interaction effects. (The standard error of mean (SEM) of EE, defined as \(SEM=STD/r^{0.5}\), is used to calculate the confidence interval of mean EE (Morris, 1991))

This cell is copied from (Wainwright et al 2014)

Importing libraries

    import warnings
warnings.filterwarnings('ignore')
import os
import sys
sys.path.append((os.path.relpath('../src'))) # add here the relative path of the API folder

import numpy as np # numpy for electrode generation
import pandas as pd
from IPython.utils import io  # suppress R2 outputs during MC runs
from resipy import Project
API path =  /media/jkl/data/phd/tmp/pyr2/src/resipy
ResIPy version =  2.1.0

The SALib package

SALib is a free open-source Python library

If you use Python, you can install it by running the command

pip install SALib

Documentation is available online and you can also view the code on Github.

The library includes: * Sobol Sensitivity Analysis (Sobol 2001, Saltelli 2002, Saltelli et al. 2010) * Method of Morris, including groups and optimal trajectories (Morris 1991, Campolongo et al. 2007) * Fourier Amplitude Sensitivity Test (FAST) (Cukier et al. 1973, Saltelli et al. 1999) * Delta Moment-Independent Measure (Borgonovo 2007, Plischke et al. 2013) * Derivative-based Global Sensitivity Measure (DGSM) (Sobol and Kucherenko 2009) * Fractional Factorial Sensitivity Analysis (Saltelli et al. 2008)

SALib Tutorial

# import the packages
from SALib.sample import morris as ms
from SALib.analyze import morris as ma
from SALib.plotting import morris as mp

Create ERT forward problem with ResIPy

In the code below, created a Project forward problem to be analyzed

k = Project(typ='R2')

elec = np.zeros((24,3))
elec[:,0] = np.arange(0, 24*0.5, 0.5) # with 0.5 m spacing and 24 electrodes
k.setElec(elec)
#print(k.elec)

# defining electrode array
x = np.zeros((24, 3))
x[:,0] = np.arange(0, 24*0.5, 0.5)
k.setElec(elec)

# creating mesh
k.createMesh(res0=20)

# add region
k.addRegion(np.array([[2,-0.3],[2,-2],[3,-2],[3,-0.3],[2,-0.3]]), 50)
k.addRegion(np.array([[5,-2],[5,-3.5],[8,-3.5],[8,-2],[5,-2]]), 500)

# define sequence
k.createSequence([('dpdp1', 1, 10)])

# forward modelling
k.forward(noise=0.025)

# read results
fwd_dir = os.path.relpath('../src/resipy/invdir/fwd')

obs_data = np.loadtxt(os.path.join(fwd_dir, 'R2_forward.dat'),skiprows =1)
obs_data = obs_data[:,6]

# plot
k.showMesh()
Working directory is: /media/jkl/data/phd/tmp/pyr2/src/resipy/invdir
clearing the dirname
Creating triangular mesh...Reading mesh.msh
Gmsh version == 3.x
reading node coordinates...
Determining element type...Triangle
Reading connection matrix...
ignoring 0 elements in the mesh file, as they are not required for R2/R3t
Finished reading .msh file
done
165 quadrupoles generated.
Writing .in file and mesh.dat... done!
Writing protocol.dat... done!
Running forward model...

 >> R  2    R e s i s t i v i t y   I n v e r s i o n   v4.0 <<

 >> D a t e : 04 - 05 - 2020
 >> My beautiful survey
 >> F o r w a r d   S o l u t i o n   S e l e c t e d <<
 >> Determining storage needed for finite element conductance matrix
 >> Generating index array for finite element conductance matrix
 >> Reading start resistivity from resistivity.dat

 Measurements read:   165     Measurements rejected:     0

 >> Total Memory required is:          0.066 Gb
165/165 reciprocal measurements NOT found.
0 measurements error > 20 %
Forward modelling done.
../_images/nb_Morris_0.png

Option to view resistivity fields with pyvista

import pyvista as pv
mesh = pv.read(os.path.join(fwd_dir, 'forward_model.vtk'))

mesh.cell_arrays
mesh.plot(scalars='Resistivity(log10)',notebook=True)
mesh
../_images/nb_Morris_1.png
HeaderData Arrays
UnstructuredGridInformation
N Cells1989
N Points1063
X Bounds-2.460e-01, 1.172e+01
Y Bounds-7.670e+00, 0.000e+00
Z Bounds0.000e+00, 0.000e+00
N Arrays2
NameFieldTypeN CompMinMax
Resistivity(ohm.m)Cellsfloat6412.000e+015.000e+02
Resistivity(log10)Cellsfloat6411.301e+002.699e+00
import pyvista as pv

plotter = pv.Plotter()
mesh = pv.read(os.path.join(fwd_dir, 'forward_model.vtk'))

plotter.add_mesh(mesh,cmap="viridis" ,stitle="Resistivity(log10)") #scalars='Resistivity(log10)'
#plotter.show_axes()

_ = plotter.show_bounds(grid='front', location='outer', all_edges=True, xlabel="X [m]",ylabel="Y [m]")
plotter.add_text('pyvista example')
#plotter.update_scalar_bar_range([-2000,2000], name="Resistivity(log10)")

#plotter.add_mesh(mesh, cmap="bone", opacity="linear", stitle="Linear Opacity")
plotter.view_xy()
plotter.show()
#plotter.show(screenshot='test.png')
../_images/nb_Morris_2.png

Define a problem file

In the code below, a problem file is used to define the parameters and their ranges we wish to explore, which corresponds to the following table:

Parameter

Range

Description

rho0 [ohm m]

10 1

background

rho1 [ohm m]

10 2

inclusion A

rho2 [ohm m]

10 3

inclusion B

1

0.5,3.5

2

0.5,3.5

3

0.5,3.5

morris_problem = {
    'num_vars': 3,
    # These are their names
    'names': ['rho1', 'rho2', 'rho3'], # can add z1 z2 etc.
    # Plausible ranges over which we'll move the variables
    'bounds': [[0.5,3.5], # log10 of rho (ohm m)
               [0.5,3.5],
               [0.5,3.5]#,
   #            [-3,-1],
   #            [-7,-4],
              ],
    # I don't want to group any of these variables together
    'groups': None
    }

Generate a Sample

We then generate a sample using the morris.sample() procedure from the SALib package.

number_of_trajectories = 20
sample = ms.sample(morris_problem, number_of_trajectories, num_levels=10)
len(sample)
print(sample[79,:])
[1.83333333 3.5        1.83333333]

Run the sample through the monte carlo procedure in R2

Great! You have defined your problem and have created a series of input files for forward runs. Now you need to run R2 for each of them to obtain their ERT responses.

For this example, each sample takes a few seconds to run on a PC.

#%%capture
simu_ensemble = np.zeros((len(obs_data),len(sample)))
for ii in range(0, len(sample)):
    with io.capture_output() as captured:          # suppress inline output from ResIPy
        # creating mesh
        k.createMesh(res0=10**sample[ii,0])   # need to use more effective method, no need to create mesh every time

        # add region
        k.addRegion(np.array([[2,-0.3],[2,-2],[3,-2],[3,-0.3],[2,-0.3]]), 10**sample[ii,1])
        k.addRegion(np.array([[5,-2],[5,-3.5],[8,-3.5],[8,-2],[5,-2]]), 10**sample[ii,2])

        # forward modelling
        k.forward(noise=0.025, iplot = False)
        out_data = np.loadtxt(os.path.join(fwd_dir, 'R2_forward.dat'),skiprows =1)
        simu_ensemble[:,ii] = out_data[:,6]
    print("Running sample",ii+1)
Running sample 1
Running sample 2
Running sample 3
Running sample 4
Running sample 5
Running sample 6
Running sample 7
Running sample 8
Running sample 9
Running sample 10
Running sample 11
Running sample 12
Running sample 13
Running sample 14
Running sample 15
Running sample 16
Running sample 17
Running sample 18
Running sample 19
Running sample 20
Running sample 21
Running sample 22
Running sample 23
Running sample 24
Running sample 25
Running sample 26
Running sample 27
Running sample 28
Running sample 29
Running sample 30
Running sample 31
Running sample 32
Running sample 33
Running sample 34
Running sample 35
Running sample 36
Running sample 37
Running sample 38
Running sample 39
Running sample 40
Running sample 41
Running sample 42
Running sample 43
Running sample 44
Running sample 45
Running sample 46
Running sample 47
Running sample 48
Running sample 49
Running sample 50
Running sample 51
Running sample 52
Running sample 53
Running sample 54
Running sample 55
Running sample 56
Running sample 57
Running sample 58
Running sample 59
Running sample 60
Running sample 61
Running sample 62
Running sample 63
Running sample 64
Running sample 65
Running sample 66
Running sample 67
Running sample 68
Running sample 69
Running sample 70
Running sample 71
Running sample 72
Running sample 73
Running sample 74
Running sample 75
Running sample 76
Running sample 77
Running sample 78
Running sample 79
Running sample 80

Factor Prioritisation

We’ll run a sensitivity analysis of the power module to see which is the most influential parameter.

The results parameters are called mu, sigma and mu_star.

  • Mu is the mean effect caused by the input parameter being moved over its range.

  • Sigma is the standard deviation of the mean effect.

  • Mu_star is the mean absolute effect.

The higher the mean absolute effect for a parameter, the more sensitive/important it is*

# Define an objective function: here I use the error weighted rmse
def obj_fun(sim,obs,noise):
    y = np.divide(sim-obs,noise)   # weighted data misfit
    y = np.sqrt(np.inner(y,y))
    return y

output = np.zeros((1,len(sample)))
for ii in range(0, len(sample)):
    output[0,ii] = obj_fun(simu_ensemble[:,ii],obs_data,0.025*obs_data)    # assume 2.5% noise in the data

# Store the results for plotting of the analysis
Si = ma.analyze(morris_problem, sample, output, print_to_console=False)
print("{:20s} {:>7s} {:>7s} {:>7s}".format("Name", "mean(EE)", "mean(|EE|)", "std(EE)"))
for name, s1, st, mean in zip(morris_problem['names'],
                              Si['mu'],
                              Si['mu_star'],
                              Si['sigma']):
    print("{:20s} {:=7.3f} {:=7.3f} {:=7.3f}".format(name, s1, st, mean))
Name                 mean(EE) mean(|EE|) std(EE)
rho1                 59023.790 59023.790 46473.532
rho2                 2972.826 2979.171 4804.446
rho3                 1930.959 1933.686 3186.307
# make a plot
import matplotlib.pyplot as plt
import numpy as np
fig, ax = plt.subplots()
ax.scatter(Si['mu_star'],Si['sigma'])
#ax.plot(Si['mu_star'],2*Si['sigma']/np.sqrt(number_of_trajectories),'--',alpha=0.5)
#ax.plot(np.array([0,Si['mu_star'][0]]),2*np.array([0,Si['sigma'][0]/np.sqrt(number_of_trajectories)]),'--',alpha=0.5)

plt.title('Distribution of Elementary effects')
plt.xlabel('mean(|EE|)')
plt.ylabel('std($EE$)')
for i, txt in enumerate(Si['names']):
    ax.annotate(txt, (Si['mu_star'][i], Si['sigma'][i]))

# higher mean |EE|, more important factor
# line within the dashed envelope means nonlinear or interaction effects dominant
../_images/nb_Morris_3.png

Summary

Sensitivity analysis helps you:

  • Think through your assumptions

  • Quantify uncertainty

  • Focus on the most influential uncertainties first

Learn Python

Similar packages to `SALib <>`__ for other languages/programmes:

# run this so that a navigation sidebar will bee generated when exporting this notebook as HTML
#%%javascript
#$('<div id="toc"></div>').css({position: 'fixed', top: '120px', left: 0}).appendTo(document.body);
#$.getScript('https://kmahelona.github.io/ipython_notebook_goodies/ipython_notebook_toc.js');
help(k.addRegion)
Help on method addRegion in module resipy.R2:

addRegion(xz, res0=100, phase0=1, blocky=False, fixed=False, ax=None, iplot=False) method of resipy.R2.R2 instance
    Add region according to a polyline defined by xz and assign it
    the starting resistivity res0.

    Parameters
    ----------
    xz : array
        Array with two columns for the x and y coordinates.
    res0 : float, optional
        Resistivity values of the defined area.
    phase0 : float, optional
        Read only if you choose the cR2 option. Phase value of the defined
        area in mrad
    blocky : bool, optional
        If True the boundary of the region will be blocky if inversion
        is block inversion.
    fixed : bool, optional
        If True, the inversion will keep the starting resistivity of this
        region.
    ax : matplotlib.axes.Axes
        If not None, the region will be plotted against this axes.
    iplot : bool, optional
        If True , the updated mesh with the region will be plotted.

Download python script: nb_Morris.py

Download Jupyter notebook: nb_Morris.ipynb

View the notebook in the Jupyter nbviewer

Run this example interactively: binder