5 Bayesian Spatial Modelling for Areal Data in Stan

5.1 Introduction

5.1.1 Lecture video (Length: 1:19:54)

[Watch on YouTube]

Today, you will learn how to apply spatial Bayesian models for risk assessments for areal-level discrete outcomes. This is a powerful tool used often in many applications e.g., spatial epidemiology, disaster risk reduction and environmental criminology and many more.

This exercise will focus on casualty data resulting from road accidents by car users in the UK. Road traffic accidents & injuries are a serious problem worldwide. Here, we will estimate the area-specific relative risks (RR) of casualties due to road accidents in local authority areas across England; and we will quantify the levels of uncertainty using a device called exceedance probabilities.

5.1.2 Learning outcomes

You will learn how to:

  • Implement the Spatial intrinsic conditional autoregressive model (ICAR) to areal data in RStan;
  • How to use the ICAR model to predict the area-specific relative risks (RR) for areal units and how to determine whether the levels of such risks are statistically significant or not through the 95% credible intervals (95% CrI);
  • How to determine the Exceedance Probability i.e., the probability that an area has an excess risk that exceeds a given risk threshold (e.g., RR > 1 (null value));

You can follow the live walkthrough demonstration of the practical and follow the instructions in this recording at your own.

5.1.3 Live walk-through demonstration video (Length: 2:32:59)

[Watch on YouTube]

5.1.4 Loading and installing packages

We need to install INLA package to access some INLA functions such as the sparseMatrix() function to manipulate the spatial adjacency matrix for computing the scale factor:

install.packages("INLA",repos=c(getOption("repos"),INLA="https://inla.r-inla-download.org/R/stable"), dep=TRUE)

We will need to load the following packages for this computer practical::

# Load the packages with library()
library("sf")
library("tmap")
library("spdep")
library("rstan")
library("geostan")
library("SpatialEpi")
library("tidybayes")
library("tidyverse")
library("INLA")
inla.upgrade()

Upon loading the rstan package, remember to configure it with RStudi using the following code:

options(mc.cores = parallel::detectCores())
rstan_options(auto_write = TRUE)

5.1.5 Datasets & setting up the work directory

Go to your folder CPD-course and create a sub folder called “Day 4”. Here, we will store all our R & Stan scripts and data files. Set your work directory to the Day 4 folder.

For Windows, the code for setting the work directory will be:

setwd("C:/Users/AccountName/Desktop/CPD-course/Day 4")

For MAC, the code for setting the work directory will be:

setwd("/Users/AccountName/Desktop/CPD-course/Day 4")

The dataset for this practical are:

  • Road Casualties Data 2015-2020.csv
  • England Local Authority Shapefile.shp
  • England Regions Shapefile.shp

The dataset were going to start of with is the Road Casualties Data 2015-2020.csv. This data file contains the following information:

  • It contains the 307 local authority areas that operate in England. The names and codes are defined under the columns LAD21NM and LAD21CD respectively;
  • It contains the following variables: Population, Casualties and IMDScore. The Casualties is the dependent variable with information on road accidents that occurred over a 5-year period, and IMDScore is the independent variable. We will need Population column to derive the expected number of road casualties to be used as an offset in the Bayesian model.

The shapefile England Local Authority Shapefile.shp contains the boundaries for all 307 local authorities in England. The England Regions Shapefile.shp contains the boundaries for all 10 regions that make up England.

Let us load these dataset to memory:

# load the shape files
england_LA_shp <- read_sf("England Local Authority Shapefile.shp")
england_Region_shp <- read_sf("England Regions Shapefile.shp")

# load in the cross-sectional road accident dataset
road_accidents <- read.csv("Road Casualties Data 2015-2020.csv")

If you want to learn more about managing spatial data in using RStudio. You are welcome to use the following learning resources at your own time: Principles of Spatial Analysis

5.2 Data preparation in RStudio

5.2.1 Calculation for expected numbers

In order to estimate the risk of casualties due to road accidents at an LA-level in England, we will need to first obtain a column that contains estimates from expected number of road casualties. This is derived from the Population column which as denominators or reference population size which is multiplied to the overall incidence rates of road accidents to get the number of expected casualties for each LA area.

You can use the expected() function to compute this column into the road_accident data frame

# calculate the expected number of cases
road_accidents$ExpectedNum <- round(expected(population = road_accidents$Population, cases = road_accidents$Casualties, n.strata = 1), 0)

This particular column ExpectedNum is important, it must be computed and used as an offset in our spatial model.

5.2.2 Converting the spatial adjacency matrix to nodes & edges

We will need to transform the image below into a list of nodes and edges accordingly as Stan can only identify an adjacency with a set of paired nodes with edges that connect them. For instance, node1 is the index region and node2 is the list of neighbouring regions connected to the index region in node1

We can perform this by first merging in the road accident data to the LA-level shapefile. Once this action is completed, we will then need to coerce the spatial object to be from a simple features (i.e., sf) object to the spatial object (i.e., sp).

Here is the code:

# merge the attribute table to the shapefile
spatial.data <- merge(england_LA_shp, road_accidents, by.x = c("LAD21CD", "LAD21NM"), by.y = c("LAD21CD", "LAD21NM"), all.x = TRUE)
# reordering the columns
spatial.data <- spatial.data[, c(3,1,2,4,5,7,6)]
# need to be coerced into a spatial object
sp.object <- as(spatial.data, "Spatial")

Now, we are going to need the nodes and edges from the sp.object using the shape2mat() function - this changes it into a matrix object. From the matrix object, we will be able to prepare the data from spatial ICAR model using the prep_icar_data() function. Here, is the code:

# needs to be coerced into a matrix object
adjacencyMatrix <- shape2mat(sp.object)
# we extract the components for the ICAR model
extractComponents <- prep_icar_data(adjacencyMatrix)

From the extractComponents object, we will need to extract the following contents:

  • $group_size this is the number of areal units under observation listed in the shapefile (should be the same in the road accidents dataset)
  • $node1 are index regions of interest
  • $node2 are the other neighbouring regions that are connected to the index region of interest listed in node1
  • $n_edges creates the network as show area is connected to what neighbourhood. It’s still an adjacency matrix using the queen contiguity matrix but as a network.

Here is the code for performing the extraction:

n <- as.numeric(extractComponents$group_size)
nod1 <- extractComponents$node1
nod2 <- extractComponents$node2
n_edges <- as.numeric(extractComponents$n_edges)

Note that the information needed are stored in n, nod1, nod2 and n_edges.

We need to build the adjacency matrix using INLA library functions based from our nodes from the shapefile for England.

adj.matrix=sparseMatrix(i=nod1,j=nod2,x=1,symmetric=TRUE)

The next four lines of code performs some horrible matrix manipulation to compute the scaling factor which is a geometric mean of the variances captured along the diagonal of invert adjacency matrix.

Q=Diagonal(n, rowSums(adj.matrix)) - adj.matrix
Q_pert=Q+Diagonal(n) * max(diag(Q)) * sqrt(.Machine$double.eps)
Q_inv=inla.qinv(Q_pert, constr=list(A = matrix(1,1,n),e=0))
scaling_factor=exp(mean(log(diag(Q_inv))))

The scaling_factor value will be multiplied to the rho proportion (for our spatial random effects) drawn from the beta(0.5, 0.5) distribution.

5.2.3 Create the dataset to be compiled in Stan

For the list step in the data preparation, we need to define the variables needed to be compiled in Stan. The outcome Casualties, independent variable IMDScore and offset variable ExpectedNum needs to be extracted into separate vectors. The data needs to be aligned with the areas in shapefile as the result will be churned to that order. So make sure the data is already linked in to the geometries!

Here is the code:

y <- spatial.data$Casualties
x <- spatial.data$IMDScore
e <- spatial.data$ExpectedNum

Now, we create our dataset for Stan:

# put all components into a list object
stan.spatial.dataset <- list(N=n, N_edges=n_edges, node1=nod1, node2=nod2, Y=y, X=x, E=e)

The above information is going to be passed to Stan in the data block. Now, we are in the position to develop our spatial intrinsic conditional autoregressive (ICAR) model. Now open your Stan script and we begin.

5.3 Creating the script for the Spatial ICAR model

5.3.1 Data block

In the data block, we specify the following:

  • The total number of areal unit observations N as an integer (i.e., 307);
  • The total number edges N_edges as an integer (i.e., 823);
  • The nodes1 based on the size of N_edges (i.e., 823) must be specified as an array to connect with nodes2;
  • The nodes2 based on the size of N_edges (i.e., 823) must be specified as an array to connect with nodes1;
  • We define our Y outcome (i.e., road accidents) as an array of size N (i.e., 307) which is an integer;
  • We define our single independent variable as X. Note that this is not a matrix;
  • We must define our offset for the expected numbers Offset as a vector of size N (i.e., 307);
  • We also define our scaling factor as factor

Here is what our data block will look like:

data {
  int<lower=0> N;                                                        // number of spatial units or neighbourhoods
  int<lower=0> N_edges;                                                  // number of edges connecting adjacent areas using Queens contiguity
  array[N_edges] int<lower=1, upper=N> node1;                            // list of index areas showing which spatial units are neighbours
  array[N_edges] int<lower=1, upper=N> node2;                            // list of neighbouring areas showing the connection to index spatial unit
  array[N] int<lower=0> Y;                                               // dependent variable
  vector<lower=0>[N] X;                                                  // Single independent variable
  vector<lower=0>[N] Offset;                                             // offset variable
  real<lower=0> factor;                                                  // scaling factor the variance of the spatial effects
}

5.3.2 Transformed data block

We are going to include a transformed data block. Here, we are simply changing the expected numbers by taking its log() and creating another vector called log_offset. This will be added to the poisson_log() sampling statement in our likelihood function of the spatial model to account for the reference population in England.

Here, we specify it as follows:

data {
  int<lower=0> N;                                                        // number of spatial units or neighbourhoods
  int<lower=0> N_edges;                                                  // number of edges connecting adjacent areas using Queens contiguity
  array[N_edges] int<lower=1, upper=N> node1;                            // list of index areas showing which spatial units are neighbours
  array[N_edges] int<lower=1, upper=N> node2;                            // list of neighbouring areas showing the connection to index spatial unit
  array[N] int<lower=0> Y;                                               // dependent variable
  vector<lower=0>[N] X;                                                  // Single independent variable
  vector<lower=0>[N] Offset;                                             // offset variable
  real<lower=0> factor;                                                  // scaling factor the variance of the spatial effects
}

transformed data {
    vector[N] log_Offset = log(Offset);
}

5.3.3 Parameters block

For the parameters block, we will need to specify the following:

  • The global intercept i.e., alpha for the entire study area (i.e., average risk of road accidents on a population-level);
  • The coefficient beta for our independent variable X which is the IMDScore;
  • We also specify sigma as a real value which is some error or standard deviation that is multiplied to the combined effects of our structured and unstructured random effects;
  • We define the structured spatial random effects phi to be vector of size N;
  • We define the unstructured spatial random effects theta to be vector of size N;
  • We define rho as the proportion of the variation coming from the structured spatial random effects;

We add the block as follows:

data {
  int<lower=0> N;                                                        // number of spatial units or neighbourhoods
  int<lower=0> N_edges;                                                  // number of edges connecting adjacent areas using Queens contiguity
  array[N_edges] int<lower=1, upper=N> node1;                            // list of index areas showing which spatial units are neighbours
  array[N_edges] int<lower=1, upper=N> node2;                            // list of neighbouring areas showing the connection to index spatial unit
  array[N] int<lower=0> Y;                                               // dependent variable
  vector<lower=0>[N] X;                                                  // Single independent variable
  vector<lower=0>[N] Offset;                                             // offset variable
  real<lower=0> factor;                                                  // scaling factor the variance of the spatial effects
}

transformed data {
    vector[N] log_Offset = log(Offset);
}

parameters {
  real alpha;                                                            // intercept
  real beta;                                                             // covariates
  real<lower=0> sigma;                                                   // overall standard deviation
  real<lower=0, upper=1> rho;                                            // proportion unstructured vs. spatially structured variance
  vector[N] theta;                                                       // unstructure random effects (heterogeneous)
  vector[N] phi;                                                         // spatial random effects
}

5.3.4 Transformed parameters block

Here, we calculate the combined random effects from the structured and unstructured component for our model:

data {
  int<lower=0> N;                                                        // number of spatial units or neighbourhoods
  int<lower=0> N_edges;                                                  // number of edges connecting adjacent areas using Queen contiguity
  array[N_edges] int<lower=1, upper=N> node1;                            // list of index areas showing which spatial units are neighbours
  array[N_edges] int<lower=1, upper=N> node2;                            // list of neighbouring areas showing the connection to index spatial unit
  array[N] int<lower=0> Y;                                               // dependent variable
  vector<lower=0>[N] X;                                                  // Single independent variable
  vector<lower=0>[N] Offset;                                             // offset variable
  real<lower=0> factor;                                                  // scaling factor the variance of the spatial effects
}

transformed data {
    vector[N] log_Offset = log(Offset);
}

parameters {
  real alpha;                                                            // intercept
  real beta;                                                             // covariates
  real<lower=0> sigma;                                                   // overall standard deviation
  real<lower=0, upper=1> rho;                                            // proportion unstructured vs. spatially structured variance
  vector[N] theta;                                                       // unstructure random effects (heterogeneous)
  vector[N] phi;                                                         // spatial random effects
}

transformed parameters {
  vector[N] combined;                                                    // combined random effect i.e., unstructure and structured
  combined = sqrt(1 - rho) * theta + sqrt(rho/factor) * phi;             // formulation for the combined random effect i.e., unstructure and structured
}

5.3.5 Model block

We build our likelihood function and specify the priors for each parameter under the model block. We are using a typical Poisson model with a log link function as we are assuming there’s some linear relationship between the counts and IMD score, but here we are also taking into account the combined random effects:

data {
  int<lower=0> N;                                                        // number of spatial units or neighbourhoods
  int<lower=0> N_edges;                                                  // number of edges connecting adjacent areas using Queens contiguity
  array[N_edges] int<lower=1, upper=N> node1;                            // list of index areas showing which spatial units are neighbours
  array[N_edges] int<lower=1, upper=N> node2;                            // list of neighbouring areas showing the connection to index spatial unit
  array[N] int<lower=0> Y;                                               // dependent variable
  vector<lower=0>[N] X;                                                  // Single independent variable
  vector<lower=0>[N] Offset;                                             // offset variable
  real<lower=0> factor;                                                  // scaling factor the variance of the spatial effects
}

transformed data {
    vector[N] log_Offset = log(Offset);
}

parameters {
  real alpha;                                                            // intercept
  real beta;                                                             // covariates
  real<lower=0> sigma;                                                   // overall standard deviation
  real<lower=0, upper=1> rho;                                            // proportion unstructured vs. spatially structured variance
  vector[N] theta;                                                       // unstructure random effects (heterogeneous)
  vector[N] phi;                                                         // spatial random effects
}

transformed parameters {
  vector[N] combined;                                                    // combined random effect i.e., unstructure and structured
  combined = sqrt(1 - rho) * theta + sqrt(rho/factor) * phi;             // formulation for the combined random effect i.e., unstructure and structured
}

model {
  Y ~ poisson_log(log_Offset + alpha + X * beta + combined * sigma);    // likelihood function: multivariable Poisson ICAR regression model
                                                                        // setting priors
  alpha ~ normal(0.0, 1.0);                                             // prior for alpha: weakly informative
  beta ~ normal(0.0, 1.0);                                              // prior for betas: weakly informative
  theta ~ normal(0.0, 1.0);                                             // prior for theta: weakly informative
  sigma ~ normal(0.0, 1.0);                                             // prior for sigma: weakly informative
  rho ~ beta(0.5, 0.5);                                                 // prior for rho
  target += -0.5 * dot_self(phi[node1] - phi[node2]);                   // calculates the spatial weights
  sum(phi) ~ normal(0, 0.001 * N);                                      // priors for phi
}

5.3.6 Generated quantities block

Lastly, we instruct Stan on the parameters we want to report. We want them as relative risk ratio (RR). We can use the generated quantities block to obtain these estimates by exponentiation of the ICAR regression model:

data {
  int<lower=0> N;                                                        // number of spatial units or neighbourhoods
  int<lower=0> N_edges;                                                  // number of edges connecting adjacent areas using Queen contiguity
  array[N_edges] int<lower=1, upper=N> node1;                            // list of index areas showing which spatial units are neighbours
  array[N_edges] int<lower=1, upper=N> node2;                            // list of neighbouring areas showing the connection to index spatial unit
  array[N] int<lower=0> Y;                                               // dependent variable
  vector<lower=0>[N] X;                                                  // Single independent variable
  vector<lower=0>[N] Offset;                                             // offset variable
  real<lower=0> factor;                                                  // scaling factor the variance of the spatial effects
}

transformed data {
    vector[N] log_Offset = log(Offset);
}

parameters {
  real alpha;                                                            // intercept
  real beta;                                                             // covariates
  real<lower=0> sigma;                                                   // overall standard deviation
  real<lower=0, upper=1> rho;                                            // proportion unstructured vs. spatially structured variance
  vector[N] theta;                                                       // unstructure random effects (heterogeneous)
  vector[N] phi;                                                         // spatial random effects
}

transformed parameters {
  vector[N] combined;                                                    // combined random effect i.e., unstructure and structured
  combined = sqrt(1 - rho) * theta + sqrt(rho/factor) * phi;             // formulation for the combined random effect i.e., unstructure and structured
}


model {
  Y ~ poisson_log(log_Offset + alpha + X * beta + combined * sigma);    // likelihood function: multivariable Poisson ICAR regression model
                                                                        // setting priors
  alpha ~ normal(0.0, 1.0);                                             // prior for alpha: weakly informative
  beta ~ normal(0.0, 1.0);                                              // prior for betas: weakly informative
  theta ~ normal(0.0, 1.0);                                             // prior for theta: weakly informative
  sigma ~ normal(0.0, 1.0);                                             // prior for sigma: weakly informative
  rho ~ beta(0.5, 0.5);                                                 // prior for rho
  target += -0.5 * dot_self(phi[node1] - phi[node2]);                   // calculates the spatial weights
  sum(phi) ~ normal(0, 0.001 * N);                                      // priors for phi
}

generated quantities {
  vector[N] eta = alpha + X * beta + combined * sigma;                  // compute eta and exponentiate into mu                   
  vector[N] rr_mu = exp(eta);                                           // output the neighbourhood-specific relative risks in mu
  real rr_beta = exp(beta);                                             // output the risk ratios for each coefficient
  real rr_alpha = exp(alpha);                                           // output the risk ratios for the intercept
}

Well done! You have coded your first spatial risk model. Alright, let us save the script as icar_poisson_model.stan. We can now compile and run it through RStudio to get our posterior estimates as risk ratios (RR) for each areas. We can also get the exceedance probabilities. The next steps are easy from this point onwards.

5.4 Compiling Stan code for Spatial ICAR modelling

5.4.1 Printing of the global results

Now, let us turn our attention to RStudio. Using the stan() to compile the saved script to obtain the posterior estimation of the parameters from our model:

icar_poisson_fit = stan("icar_poisson_model.stan", data=stan.spatial.dataset, iter=20000, chains=6, verbose = FALSE)

We can see our estimated results for alpha, beta and sigma:

# remove that annoying scientific notation
options(scipen = 999)
summary(icar_poisson_fit, pars=c("alpha", "beta", "sigma"), probs=c(0.025, 0.975))$summary

Output from summary()$summary function:

              mean       se_mean          sd         2.5%        97.5%     n_eff     Rhat
alpha  0.163397860 0.00077532133 0.084400797 -0.002859062  0.328252917 11850.308 1.000519
beta  -0.009785261 0.00003665654 0.004084191 -0.017807560 -0.001755543 12413.926 1.000480
sigma  0.533539788 0.00050085117 0.030058077  0.480434789  0.598541039  3601.679 1.002182

Alternative output from summary()$summary function with the relative risks:

# remove that annoying scientific notation
options(scipen = 999)
summary(icar_poisson_fit, pars=c("rr_alpha", "rr_beta", "sigma"), probs=c(0.025, 0.975))$summary
              mean       se_mean          sd      2.5%    97.5%     n_eff     Rhat
rr_alpha 1.1817041 0.00091591071 0.099828745 0.9971450 1.388540 11879.686 1.000485
rr_beta  0.9902707 0.00003629897 0.004044607 0.9823501 0.998246 12415.496 1.000481
sigma    0.5335398 0.00050085117 0.030058077 0.4804348 0.598541  3601.679 1.002182

Here is the interpretation:

  • alpha is the global mean (or average) in the population under study. It means on average the road accident occurrence in England for the period 2015 to 2020 is 0.1639 (95% CrI: -0.0028 to 0.32825). If we take the exponent of this value i.e., exp(0.163397860) - we get the relative risks of road accidents which is 1.18 times higher in England (95% CrI: 0.997 to 1.389). The result is not significant as the null value (i.e., 1) exists between these limits.
  • beta is the coefficient for IMDScore. This means that its yields a decrease on average for road accidents throughout England for more deprived areas -0.0097 (95% CI: -0.0178 to -0.00175). While its a significant decrease, the result is incredibly negligible and really close to 1. So its is safe to air on caution and say its highly likely that the relationship is not significant.
  • sigma is the overall standard deviation or global error.

Note, we can view the spatial random effects i.e., phi for each area using this code:

# show first 6 rows only instead of the full 307
head(summary(icar_poisson_fit, pars=c("phi"), probs=c(0.025, 0.975))$summary)

Alternatively, you can use the print() function to get a detailed output:

print(icar_poisson_fit, pars=c("alpha", "beta", "sigma", "phi"), probs=c(0.025, 0.975))

5.4.2 Rapid diagnostics of the rHATs

Before mapping the relative risks, we must check if the any of the estimates i.e., alpha, beta, sigma and all phi exceed the rHAT value of 1.1. This is an indication that iterations did not perform well if an rHAT for a parameter is above 1.1. We can do a rapid checks to see which parameter is valid or not by creating a binary variable of 1’s (Valid) and 0’s (Not valid). We can tabulate it to see the numbers:

# diagnostic check on the rHats - put everything into a data frame
diagnostic.checks <- as.data.frame(summary(icar_poisson_fit, pars=c("alpha", "beta", "sigma", "phi", "lp__"), probs=c(0.025, 0.5, 0.975))$summary)
# create binary variable
diagnostic.checks$valid <- ifelse(diagnostic.checks$Rhat < 1.1, 1, 0)
# tabulate it
table(diagnostic.checks$valid)

Everything is okay - all outputted parameters have an rHAT < 1.1. We are free to generate maps.

NOTES: To avoid such complications, it is always to best to run about 10000, 15000 or more iterations. Usually, shorter iterations yield low effective sample sizes after thinning/warm-up samples are discarded, which in turn, may lead to complications that may cause the rHAT to be above 1.1.

5.4.3 Extraction of the area-specific relative risks

If you run the following code:

# show first 6 rows only instead of the full 307
head(summary(icar_poisson_fit, pars=c("rr_mu"), probs=c(0.025, 0.975))$summary)

We see the relative risk (RR) estimates for the first areas under the column mu with their corresponding credibility limits under the 2.5% and 97.5% column. We are going to extract this information into a data frame and applying the cleaning and renaming of columns accordingly:

# extraction key posterior results for the generated quantities 
relativeRisk.results <- as.data.frame(summary(icar_poisson_fit, pars=c("rr_mu"), probs=c(0.025, 0.975))$summary)
# now cleaning up this table up
# first, insert clean row numbers to new data frame
row.names(relativeRisk.results) <- 1:nrow(relativeRisk.results)
# second, rearrange the columns into order
relativeRisk.results <- relativeRisk.results[, c(1,4,5,7)]
# third, rename the columns appropriately
colnames(relativeRisk.results)[1] <- "rr"
colnames(relativeRisk.results)[2] <- "rrlower"
colnames(relativeRisk.results)[3] <- "rrupper"
colnames(relativeRisk.results)[4] <- "rHAT"

# view clean table 
head(relativeRisk.results)

See clean table:

         rr   rrlower   rrupper      rHAT valid
1 0.6295692 0.5759348 0.6859993 0.9999354     1
2 0.8043656 0.7550300 0.8555160 0.9999809     1
3 0.6518326 0.6068501 0.6991145 0.9999186     1
4 0.6292392 0.5920648 0.6674334 0.9999475     1
5 0.8885551 0.8284516 0.9505547 1.0000141     1
6 0.7971718 0.7451160 0.8513495 0.9999757     1

Insert these columns into the spatial.data object as follow:

# now, we proceed to generate our risk maps
# align the results to the areas in shapefile
spatial.data$rr <- relativeRisk.results[, "rr"]
spatial.data$rrlower <- relativeRisk.results[, "rrlower"]
spatial.data$rrupper <- relativeRisk.results[, "rrupper"]

These relative will allow us to see the mapped risks of road accidents across local authorities in England. We also want a supporting map indicate whether the risks are significant or not. Here, we create an extra column in the spatial.data called Significance.

# create categories to define if an area has significant increase or decrease in risk, or nothing all 
spatial.data$Significance <- NA
spatial.data$Significance[spatial.data$rrlower<1 & spatial.data$rrupper>1] <- 0    # NOT SIGNIFICANT
spatial.data$Significance[spatial.data$rrlower==1 | spatial.data$rrupper==1] <- 0  # NOT SIGNIFICANT
spatial.data$Significance[spatial.data$rrlower>1 & spatial.data$rrupper>1] <- 1    # SIGNIFICANT INCREASE
spatial.data$Significance[spatial.data$rrlower<1 & spatial.data$rrupper<1] <- -1   # SIGNIFICANT DECREASE

5.4.4 Mapping of RR and significance

The next set of codes are all cosmetics for the creating our risk map for road accidents. Here is the code:

# For map design for the relative risk -- you want to understand or get a handle on what the distribution for risks look like
# this would inform you of how to create the labelling for the legends when make a map in tmap
summary(spatial.data$rr)
hist(spatial.data$rr)

# creating the labels
RiskCategorylist <- c(">0.0 to 0.25", "0.26 to 0.50", "0.51 to 0.75", "0.76 to 0.99", "1.00 & <1.01",
    "1.01 to 1.10", "1.11 to 1.25", "1.26 to 1.50", "1.51 to 1.75", "1.76 to 2.00", "2.01 to 3.00")

# next, we are creating the discrete colour changes for my legends and want to use a divergent colour scheme
# scheme ranges from extreme dark blues to light blues to white to light reds to extreme dark reds
# you can pick your own colour choices by checking out this link [https://colorbrewer2.org]

RRPalette <- c("#65bafe","#98cffe","#cbe6fe","#dfeffe","white","#fed5d5","#fcbba1","#fc9272","#fb6a4a","#de2d26","#a50f15")

# categorising the risk values to match the labelling in RiskCategorylist object
spatial.data$RelativeRiskCat <- NA
spatial.data$RelativeRiskCat[spatial.data$rr>= 0 & spatial.data$rr <= 0.25] <- -4
spatial.data$RelativeRiskCat[spatial.data$rr> 0.25 & spatial.data$rr <= 0.50] <- -3
spatial.data$RelativeRiskCat[spatial.data$rr> 0.50 & spatial.data$rr <= 0.75] <- -2
spatial.data$RelativeRiskCat[spatial.data$rr> 0.75 & spatial.data$rr < 1] <- -1
spatial.data$RelativeRiskCat[spatial.data$rr>= 1.00 & spatial.data$rr < 1.01] <- 0
spatial.data$RelativeRiskCat[spatial.data$rr>= 1.01 & spatial.data$rr <= 1.10] <- 1
spatial.data$RelativeRiskCat[spatial.data$rr> 1.10 & spatial.data$rr <= 1.25] <- 2
spatial.data$RelativeRiskCat[spatial.data$rr> 1.25 & spatial.data$rr <= 1.50] <- 3
spatial.data$RelativeRiskCat[spatial.data$rr> 1.50 & spatial.data$rr <= 1.75] <- 4
spatial.data$RelativeRiskCat[spatial.data$rr> 1.75 & spatial.data$rr <= 2.00] <- 5
spatial.data$RelativeRiskCat[spatial.data$rr> 2.00 & spatial.data$rr <= 10] <- 6

# check to see if legend scheme is balanced - if a number is missing that categorisation is wrong!
table(spatial.data$RelativeRiskCat)

Generating the maps as a paneled output:

# map of relative risk
rr_map <- tm_shape(spatial.data) + 
    tm_fill("RelativeRiskCat", style = "cat", title = "Relavtive Risk", palette = RRPalette, labels = RiskCategorylist) +
    tm_shape(england_Region_shp) + tm_polygons(alpha = 0.05) + tm_text("name", size = "AREA") +
    tm_layout(frame = FALSE, legend.outside = TRUE, legend.title.size = 0.8, legend.text.size = 0.7) +
    tm_compass(position = c("right", "top")) + tm_scale_bar(position = c("right", "bottom"))

# map of significance regions
sg_map <- tm_shape(spatial.data) + 
    tm_fill("Significance", style = "cat", title = "Significance Categories", 
        palette = c("#33a6fe", "white", "#fe0000"), labels = c("Significantly low", "Not Significant", "Significantly high")) +
    tm_shape(england_Region_shp) + tm_polygons(alpha = 0.10) + tm_text("name", size = "AREA") +
    tm_layout(frame = FALSE, legend.outside = TRUE, legend.title.size = 0.8, legend.text.size = 0.7) +
    tm_compass(position = c("right", "top")) + tm_scale_bar(position = c("right", "bottom"))

# create side-by-side plot
tmap_arrange(rr_map, sg_map, ncol = 2, nrow = 1)

Output:

5.4.5 Extracting and mapping of the exceedance probabilities

Exceedance probabilities allows the user to quantify the levels of uncertainty surrounding the risks we quantified. We can use a threshold for instance an RR > 1 and ask what is the probability that an area has an excess risk of road accidents and visualise this as well.

Just like the RRs, we are going to extract this information into a vector and include it into our spatial.data object. For this extraction, we will need to use functions from the tidybayes and tidyverse packages i.e., spread_draws(), group_by(), summarise() and pull():

# extract the exceedence probabilities from the icar_possion_fit object
# compute the probability that an area has a relative risk ratio > 1.0
threshold <- function(x){mean(x > 1.00)}
excProbrr <- icar_poisson_fit %>% spread_draws(mu[i]) %>% 
    group_by(i) %>% summarise(mu=threshold(mu)) %>%
    pull(mu)

# insert the exceedance values into the spatial data frame
spatial.data$excProb <- excProbrr

The next set of codes are all cosmetics for the creating our probability exceedance map for road accidents. Here is the code:

# create the labels for the probabilities
ProbCategorylist <- c("<0.01", "0.01-0.09", "0.10-0.19", "0.20-0.29", "0.30-0.39", "0.40-0.49","0.50-0.59", "0.60-0.69", "0.70-0.79", "0.80-0.89", "0.90-0.99", "1.00")

# categorising the probabilities in bands of 10s
spatial.data$ProbCat <- NA
spatial.data$ProbCat[spatial.data$excProb>=0 & spatial.data$excProb< 0.01] <- 1
spatial.data$ProbCat[spatial.data$excProb>=0.01 & spatial.data$excProb< 0.10] <- 2
spatial.data$ProbCat[spatial.data$excProb>=0.10 & spatial.data$excProb< 0.20] <- 3
spatial.data$ProbCat[spatial.data$excProb>=0.20 & spatial.data$excProb< 0.30] <- 4
spatial.data$ProbCat[spatial.data$excProb>=0.30 & spatial.data$excProb< 0.40] <- 5
spatial.data$ProbCat[spatial.data$excProb>=0.40 & spatial.data$excProb< 0.50] <- 6
spatial.data$ProbCat[spatial.data$excProb>=0.50 & spatial.data$excProb< 0.60] <- 7
spatial.data$ProbCat[spatial.data$excProb>=0.60 & spatial.data$excProb< 0.70] <- 8
spatial.data$ProbCat[spatial.data$excProb>=0.70 & spatial.data$excProb< 0.80] <- 9
spatial.data$ProbCat[spatial.data$excProb>=0.80 & spatial.data$excProb< 0.90] <- 10
spatial.data$ProbCat[spatial.data$excProb>=0.90 & spatial.data$excProb< 1.00] <- 11
spatial.data$ProbCat[spatial.data$excProb == 1.00] <- 12

# check to see if legend scheme is balanced
table(spatial.data$ProbCat)

Generating the probability map output:

# map of exceedance probabilities
tm_shape(spatial.data) + 
    tm_fill("ProbCat", style = "cat", title = "Probability", palette = "GnBu", labels = ProbCategorylist) +
    tm_shape(england_Region_shp) + tm_polygons(alpha = 0.05, border.col = "black") + tm_text("name", size = "AREA") +
    tm_layout(frame = FALSE, legend.outside = TRUE, legend.title.size = 0.8, legend.text.size = 0.7) +
    tm_compass(position = c("right", "top")) + tm_scale_bar(position = c("right", "bottom"))

Output:

Example Interpretation: We can see that the risk patterns for road accidents across England is quite heterogeneous. While it is quite pronounced in all 10 regions in England, the burden is quite significant in South West region with large numbers of local authorities having an increased risk which are statistically significant. While, there’s significant limitation the models used here - perhaps, the Department for Transport should do an investigation on these patterns starting with the South West area.

5.5 Bayesian Updating

The content below contains extra data and code to perform Bayesian updating for charting out the spatiotemporal tracjectories of risk for an outcome. Here, we will use an excerpt dataset containing aggregated measures for mosquito infestation in Brazilian neighbourhoods and other climate related factors that increase the risk of infestation.

CLICK HERE DOWNLOADABLE DATA

5.6 Task

Try your hand on this problem in Stan: Build a spatial ICAR model using data on counts of low birth weights in Georgia US to create the following maps:

  1. Map showing the relative risk of low birth weight across the 163 counties in Georgia
  2. Map showing the statistical significance of the relative risk
  3. Map showing the Exceedance Probabilities using the threshold of RR > 1

Use the following dataset:

  • Low_birth_weights_data.csv: Contains NAME, Lowbirths (Counts) and ExpectedNumber
  • Georgia_Shapefile.shp: Contains NAME

HINT:

  • Try using merge() on the two files
  • Remember to generate the nodes from the shapefile