4 Bayesian Hierarchical Regression Models
4.1 Introduction
4.1.2 Live walk-through demonstration video
We will learn how to perform hierarchical regression modelling within a Bayesian framework. These models are useful and robust especially if there’s a hierarchical structure in the dataset. This artefact must be taken into account to ensure statistical robustness. We will focus on implementing 2-level hierarchical regressions with examples on random-intercept-only and slope model as these are quite simple to pull off in Stan.
Let us begin.
4.1.3 Datasets & setting up the work directory
Go to your folder CPD-course and create a sub folder called “Day 3”. Here, we will store all our R & Stan scripts and data files. Set your work directory to the Day 3 folder.
For Windows, the code for setting the work directory will be:
For MAC, the code for setting the work directory will be:
The dataset for this practical are:
Maths attainment and activity study.csv
Pregnancies and hospital data.csv
We will start of with the Maths attainment and activity study.csv
. Let us load this dataset into memory:
4.2 Data preparation and set-up for Bayesian analysis in Stan
We are going to fit a 2-level hierarchical model on a continuous outcome i.e., MathScore
and see how the following independent variables i.e., ActiveTime
and Supportive
impact it. The Maths_dataset
contains a total of 200 students evenly clustered into 4 classrooms ClassroomID
(i.e., 50 students in each classroom). The students have a unique student number i.e., StudentID
and classroom number in the GroupID
column.
We want fit an random-intercept-only
to explore the difference in the classroom’s performance in maths, as well as how ActiveTime
and Supportive
impact it. To perform the 2-level hierarchical model, we need to extract right information from the Maths_dataset
data frame for the Stan script compile.
Here’s the information we’ll create in the list()
object:
N
= total number of students (level 1);CL
= maximum number of group (level 2);ClassroomID
= the list of all the group numbers that align with the students;k
= the total number of independent variables to be included in the model (in this case, its two which are both continuous). We are usingActiveTime
andSupportive
;MathScore
= we want to extract the dependent variable;ActiveTime
= we want to extract the first independent variable;Supportive
= we want to extract the second independent variable;
Note: This code shows an instance where we explicitly list the variables out in the modl
We can condense this information into list() object in line of code as our dataset for Stan to compile:
stan.dataset <- list(N=nrow(Maths_dataset),
CL=max(unique(Maths_dataset$GroupID)),
ClassroomID=as.integer(Maths_dataset$GroupID),
k=2,
MathScore=Maths_dataset$Math,
ActiveTime=Maths_dataset$ActiveTime,
Supportive=Maths_dataset$Support
)
At this point, the dataset is fully prepared and ready to go. Let’s create our Stan script for running a 2-level Multilevel linear regression within a Bayesian framework.
4.2.1 Creating a script to run a 2-level Multilevel linear regression in Stan
We will need Data
, Parameters
and Model
blocks only for this analysis. Recall that the Data
, Parameters
and Model
block must be specified for any regression analysis to work.
FIRST STEP: We specify the total number of observations int N
as well as the number of independent variables int k
as an integer in the data block. We define our MathScore
outcome as a vector
of size N
, and we do the same for the two independent variables ActiveTime
and Supportive
, and specify all these variables to be real
and constrained - this is sanity check (not compulsory).
We need to specific the number of classrooms for which the students are clustered in through CL
and ClassroomID
. Note that CL
is the maximum number of classrooms (i.e., 4
). The ClassroomID
is the declared in the Stan code as integer which takes values from 1
to 4
, with 4
being the maximum number constrained by the CL
. The ClassroomID[N]
aligns the group ID number with the student’s ID number - e.g., classroom 1 with student’s of ID’s 1, 2, 3, 4 and so on etc., Stan will treat the records as 1[1]
, 1[2]
, 1[3]
, 1[4]
and so on.
data {
int<lower = 0> N; // Number of students (i.e., observations)
int<lower = 0> CL; // Total number of classrooms
int<lower = 0, upper = CL> ClassroomID[N]; // Aligning the classroom ID number to the student ID
int<lower = 0> k; // Specify total number of independent variables to be used in model
real<lower = 0> ActiveTime[N]; // First independent variable
real<lower = 0> Supportive[N]; // Second independent variable
real<lower = 0> MathScore[N]; // Outcome or dependent variable
}
SECOND STEP: For the parameters block, here we will need to specify the name of the fixed effect part of regression i.e., gamma00
and the level-1 coefficients beta
for our independent variables. We need to also specify the random effect part of the model as well i.e., u0,1 u0,2, u0,3 and u0,4
as a vector by linking this to the four number of classes. Finally, we need to specify the sigma_error
and group_error
as the variance (or residual errors on group- and individual-level) as we must estimate this!
Here is the code:
data {
int<lower = 0> N; // Number of students (i.e., observations)
int<lower = 0> CL; // Total number of classrooms
int<lower = 0, upper = CL> ClassroomID[N]; // Aligning the classroom ID number to the student ID
int<lower = 0> k; // Specify total number of independent variables to be used in model
real<lower = 0> ActiveTime[N]; // First independent variable
real<lower = 0> Supportive[N]; // Second independent variable
real<lower = 0> MathScore[N]; // Outcome or dependent variable
}
parameters {
real gamma00; // Declare the fixed part (intercept) (Global intercept)
vector[k] beta; // Declare the fixed part (beta1 and beta2 coefficients for ActiveTime and Supportive)
real u[CL]; // Declare the random effects u0,1 u0,2, u0,3 and u0,4
real<lower = 0> sigma_error; // Declare the random part (level-1)
real<lower = 0> group_error; // Declare the random part (level-2)
}
THIRD STEP: We need to include the varying intercept into the statistical model. Here, we can write that by adding transformed parameters block to the code.
data {
int<lower = 0> N; // Number of students (i.e., observations)
int<lower = 0> CL; // Total number of classrooms
int<lower = 0, upper = CL> ClassroomID[N]; // Aligning the classroom ID number to the student ID
int<lower = 0> k; // Specify total number of independent variables to be used in model
real<lower = 0> ActiveTime[N]; // First independent variable
real<lower = 0> Supportive[N]; // Second independent variable
real<lower = 0> MathScore[N]; // Outcome or dependent variable
}
parameters {
real gamma00; // Declare the fixed part (intercept) (Global intercept)
vector[k] beta; // Declare the fixed part (beta1 and beta2 coefficients for ActiveTime and Supportive)
real u[CL]; // Declare the random effects u0,1 u0,2, u0,3 and u0,4
real<lower = 0> sigma_error; // Declare the random part (level-1)
real<lower = 0> group_error; // Declare the random part (level-2)
}
transformed parameters {
real beta0[CL];
for (j in 1:CL){
beta0[j] = gamma00 + u[j]; // Varying intercept by classroom
}
}
FOURTH AND LAST STEP: We build our likelihood function and specify the priors for each parameter under the model block. We are using a typical linear model as we are assuming there’s some linear relationship:
data {
int<lower = 0> N; // Number of students (i.e., observations)
int<lower = 0> CL; // Total number of classrooms
int<lower = 0, upper = CL> ClassroomID[N]; // Aligning the classroom ID number to the student ID
int<lower = 0> k; // Specify total number of independent variables to be used in model
real<lower = 0> ActiveTime[N]; // First independent variable
real<lower = 0> Supportive[N]; // Second independent variable
real<lower = 0> MathScore[N]; // Outcome or dependent variable
}
parameters {
real gamma00; // Declare the fixed part (intercept) (Global intercept)
vector[k] beta; // Declare the fixed part (beta1 and beta2 coefficients for ActiveTime and Supportive)
real u[CL]; // Declare the random effects u0,1 u0,2, u0,3 and u0,4
real<lower = 0> sigma_error; // Declare the random part (level-1)
real<lower = 0> group_error; // Declare the random part (level-2)
}
transformed parameters {
real beta0[CL];
for (j in 1:CL){
beta0[j] = gamma00 + u[j]; // Varying intercept by classroom
}
}
model {
real mu; // define mu to be the mean MathScore we want to predict
u ~ normal(0, group_error); // prior for each random effects u[1], u[2], u[3] and u[4]
gamma00 ~ normal(0, 20); // prior for the overall intercept gamma00
beta ~ normal(0, 20); // prior for beta[1] and beta[2] coefficients
// note that group_error and sigma_error has no priors - automatically given a uniform
// likelihood function
for (i in 1:N) {
mu = beta0[ClassroomID[i]] + beta[1]*ActiveTime[i] + beta[2]*Supportive[i];
MathScore[i] ~ normal(mu, sigma_error);
}
}
Let’s save the script as Maths_Activity_study.stan
. Now, we can compile and run it through RStudio to get our estimates coefficients and random intercept results.
4.2.2 Compiling our Stan code for Hierarchical Modelling
Now, let us turn our attention to RStudio. Using the stan()
to compile the save script to obtain the posterior estimation of the parameters in our hierarchical model:
# Start the clock
ptm <- proc.time()
# compile linear regression model for now
bayesian.hierarchical.model = stan("Maths_Activity_study.stan", data=stan.dataset, iter=10000, chains=6, verbose = FALSE)
# Stop the clock
proc.time() - ptm
Output summary table
bayesian.hierarchical.model
Inference for Stan model: anon_model.
6 chains, each with iter=20000; warmup=10000; thin=1;
post-warmup draws per chain=10000, total post-warmup draws=60000.
mean se_mean sd 2.5% 25% 50% 75% 97.5% n_eff Rhat
gamma00 33.57 0.26 24.12 -17.25 16.54 35.96 54.19 68.46 8298 1
beta[1] 11.43 0.00 0.80 9.87 10.89 11.44 11.96 12.99 36973 1
beta[2] 3.20 0.00 0.83 1.59 2.64 3.20 3.76 4.83 35288 1
u[1] 44.59 0.27 24.15 9.73 23.94 42.15 61.73 95.49 8302 1
u[2] 36.90 0.26 24.14 2.05 16.22 34.51 54.01 87.67 8301 1
u[3] 21.27 0.26 24.13 -13.58 0.62 18.85 38.39 72.19 8301 1
u[4] 27.75 0.26 24.14 -7.12 7.12 25.30 44.86 78.65 8299 1
sigma_error 3.34 0.00 0.17 3.03 3.22 3.33 3.45 3.69 34741 1
group_error 56.27 0.52 59.72 8.21 20.55 40.72 71.63 198.40 13438 1
beta0[1] 78.15 0.00 0.79 76.61 77.62 78.15 78.69 79.70 41641 1
beta0[2] 70.47 0.00 0.73 69.05 69.98 70.47 70.96 71.89 43111 1
beta0[3] 54.83 0.00 0.76 53.35 54.32 54.83 55.35 56.33 42592 1
beta0[4] 61.32 0.00 0.77 59.82 60.80 61.32 61.84 62.83 41752 1
lp__ -353.89 0.02 2.15 -358.97 -355.09 -353.54 -352.31 -350.72 17880 1
Samples were drawn using NUTS(diag_e) at Wed Jul 12 03:49:00 2023.
For each parameter, n_eff is a crude measure of effective sample size,
and Rhat is the potential scale reduction factor on split chains (at
convergence, Rhat=1).
Here is the interpretation:
gamma00
is the global mean (or average) in the population under study. This means after for accounting for clustering, on average the students’ marks in maths are 33.57 (95% CrI: -17.25 to 68.46). Note that theu[1]
,u[2]
,u[3]
andu[4]
are the random-effects for for each Classroom. By adding the random effects to the global intercept (i.e.,gamma00 = 33.57
) would allow the intercepts to vary across the groups. This result is reflected in thebeta0[1]
,beta0[2]
,beta0[3]
andbeta0[4]
.beta[1]
is the level-1 coefficient forActiveTime
. This means after accounting for clustering, for unit increase in the time spent on active learning on average yields a positive increase on the math scores by 11.43 (95% CrI: 9.87 to 12.99). This increase is significant since the credibility intervals excludes the null value of zero.beta[2]
, in the same vein, the same can be said for the amount of one-to-one support a student receives from a teacher.Variances
for level-1 and level-2 can used to explain the model’s performance by taking the56.27/(56.16 + 3.34) = 0.9457
is the proportion of explained variation in the 2-level hierarchical model when accounting for group clustering. Here, the value is quite high and so its a good model. Anything close to 1 (or 100%) is good, and vice versa, bad.
Perhaps, you might want know the probability that a classroom(s) on average would attain maths score of at least 70 (A grade). We can use the exceedance probabilities to find out:
# create a function to summarise the posterior distribution for each coefficient exceeding 70.00 as threshold
threshold.varying_intercepts <- function(x){mean(x > 70)}
# use spread_draws() and pull() functions on the varying intercept estimates i.e., beta0 to compute their exceedance probability from `threshold.varying_intercepts` object using the summarise() function
varying_intercepts.exc.probs <- bayesian.hierarchical.model %>% spread_draws(beta0[i]) %>%
group_by(i) %>% summarise(beta0=threshold.varying_intercepts(beta0)) %>%
pull(beta0)
# Run
varying_intercepts.exc.probs
> varying_intercepts.exc.probs
[1] 1.0000000 0.7407833 0.0000000 0.0000000
The certainty of achieving 70 score in maths in classroom 1 (100.0%) and 2 (74.1%) are very high. Classroom 3 and 4 do not stand a chance according to our model.
4.3 Task - Maternal access to adequate delivery services
Try your hand on this problem in Stan: Build a random intercept and slope logit model using data on 1060 births to 501 mothers. The outcome of interest is whether the birth
was delivered in a hospital (1) or elsewhere (0). The predictors include the log of income loginc
, the distance
to the nearest hospital distance, and two indicators of mothers’s education: dropout
for less than high school and college for college graduate
, so high school or some college is the reference cell.
Build a model containing the loginc
, distance
and dropout
as covariates.
HINTS
You will need to create a varying slope for the
dropout
since it is measured on the mother level. This would be similar to creating a varying intercept in thetransformed parameters
block. Therefore, creategamma00
andgamma01
in theparameters
block, which should be used to createbeta00
andbeta01
in the transformed parameters` block.The statistical model should use a
bernoulli()
with ainv_logit()
function in themodel block
. The model should be as follows:
for (i in 1:N){
births[i] ~ bernoulli(inv_logit(beta00[MotherID[i]] + beta[1]*logincome[i] + beta[2]*distance[i] + beta01[MotherID[i]]*dropout[i]));
}
- You can compute the desired coefficients and estimates with the following:
print(logit.hierarchical.model, pars = c("gamma00", "gamma01", "beta", "sigma_error", "group_error"))
- Try to compute the odds ratios for
gamma01
,beta[1]
andbeta[2]
.