library(here)
library(tidyverse)
library(marginaleffects)
library(AIPW)
library(tmle)
library(ranger)
library(bartMachine)
library(boot)
library(clarify)
library(cobalt)
library(estimatr)
library(WeightIt)
<- read_csv("nhefs.csv")
nhefs
# remove participants missing outcome data
<-
nhefs |>
nhefs filter(!is.na(wt82_71))
# set type for factor variables
<-
nhefs |>
nhefs mutate(across(c(education,exercise,active), as.factor))
# fit outcome model
<-
outcome_mod lm(wt82_71 ~ qsmk + sex + race + poly(age,2) + education +
poly(smokeintensity,2) + poly(smokeyrs,2) +
+ active + poly(wt71,2) + qsmk:smokeintensity,
exercise data = nhefs)
This is the first post in a series looking at g-methods and their doubly robust extensions for estimating causal effects in epidemiological research. This one is focused on time fixed treatments (or exposures) and the second will cover the case of time varying treatments.
Thank you to Noah Greifer and Cameron Patrick for some very helpful notes on a previous version
Potential outcomes
The goal of causally motivated epidemiology research is often to estimate the average treatment effect (ATE). For a binary treatment or exposure
Where
The ATE quantifies the average difference between the two potential outcomes. Put another way, it quantifies the average difference in the level of
Given that the ATE is defined in terms of potential, rather than observed, outcomes, it raises the question: how can we estimate the ATE with real data? To do so, a few (mostly untestable) assumptions must be made.
Identifying conditions
The conditions that allow us to make the leap from observed data to contrasts of potential outcomes are referred to as the ‘identifying conditions’. A causal effect is said to be identified if it can be computed as a function of the observed data.
The three core conditions are as follows:
Consistency
Consistency is the assumption that the potential outcome for an individual under treatment
This might not seem like a high bar to clear but the implications of the consistency assumption are subtle and far reaching. This paper by Miguel Hernán draws out some of the issues.
(Conditional) exhangeability
Conditional exchangeability implies that, within levels of the measured confounding variables
Positivity
Positivity implies that for all individuals, there is some (positive) chance of receiving each of the treatment levels. Formally,
When these three essential conditions are met (or approximately met) for a study, we may then estimate causal effects with the data we have at hand. One set of methods to do so are referred to as the g-methods (the g stands for ‘generalised’), a set of methods for estimating treatment effects in the context of time fixed or time-varying treatments.
The g-methods consist of two main strategies for estimating treatment effects: the g-formula and inverse probability of treatment weighting (IPTW). (There is a third method, g-estimation, which is not particularly well supported by popular software so won’t be discussed here. I apologise for the terminology, I’m not responsible!). Each method will now be introduced and applied with an example of a time-fixed treatment.
The g-formula
Assuming exchangeability, positivity and consistency hold, the g-formula for a time fixed treatment is:
Here
The above equation only works when the variables within
Thankfully, we do not need to obtain
The process of averaging over the distribution of confounders in the sample is known as standardisation.
The computational means for applying the g-formula are as follows:
- Fit a model (e.g., linear or logistic regression) for the conditional mean of the outcome given treatment and confounders
- Create two copies of the dataset. In the first, set treatment to 0 (untreated) and in the second, set treatment to 1 (treated)
- Compute predicted values from the fitted model for each of the two artificial datasets
- Average each set of predicted values, giving
and - Contrast the predicted values. E.g.,
When a parametric model is used to estimate the conditional mean of the outcome given treatment and the confounders, this method is referred to as the parametric g-formula.
The g-formula ‘by hand’
The NHEFS dataset available here, used by Hernán and Robins in their book Causal Inference, will be used as an example. The goal is to estimate the effect of smoking cessation between baseline (1971) and follow-up (1982) on weight gain during the same interval. First, we’ll load the data and fit a linear regression model (step 1). We fit the same model as Hernan and Robins, including confounding variables sex, race, age, education, smoking intensity, smoking years, exercise, activeness, and baseline weight. Departures from linearity are allowed for continuous variables by including a quadratic term with the poly() function.
For step 2 and 3, we compute predictions for two artificial copies of the dataset, one in which treatment (qsmk) is set to 0 and one in which treatment is set to 1, respectively. Then, we take their average and compute the ATE.
.0 <- predict(outcome_mod,
fittednewdata = mutate(nhefs, qsmk = 0))
.1 <- predict(outcome_mod,
fittednewdata = mutate(nhefs, qsmk = 1))
<- mean(fitted.1)
E_1 <- mean(fitted.0)
E_0
<- E_1 - E_0
treatment_effect
print(paste("E[Y(1)] - E[Y(0)] = ",round(treatment_effect,2)))
[1] "E[Y(1)] - E[Y(0)] = 3.52"
The estimated effect of smoking cessation, relative to no cessation, is a weight gain of 3.52 kg.
Confidence intervals can be obtained by bootstrapping. Note that less computationally costly options are also available, as demonstrated below.
<- function(data, indices){
parametric_g
<- nhefs[indices,]
df
<-
outcome_mod lm(wt82_71 ~ qsmk + sex + race + poly(age,2) + education +
poly(smokeintensity,2) + poly(smokeyrs,2) +
+ active + poly(wt71,2) + qsmk:smokeintensity,
exercise data = df)
.0 <- predict(outcome_mod, newdata = df |> mutate(qsmk = 0))
fitted
.1 <- predict(outcome_mod, newdata = df |> mutate(qsmk = 1))
fitted
<- mean(fitted.1)
E_1 <- mean(fitted.0)
E_0
<- E_1 - E_0
treatment_effect
return(treatment_effect)
}
<- boot(data = nhefs,
boot_out statistic = parametric_g,
R = 500)
boot.ci(boot_out, conf = 0.95, type = "perc")
BOOTSTRAP CONFIDENCE INTERVAL CALCULATIONS
Based on 500 bootstrap replicates
CALL :
boot.ci(boot.out = boot_out, conf = 0.95, type = "perc")
Intervals :
Level Percentile
95% ( 2.486, 4.387 )
Calculations and Intervals on Original Scale
R packages for the parametric g-formula
marginaleffects
The R package marginaleffects computes the ATE by standardisation, with standard errors obtained by the delta method:
avg_comparisons(outcome_mod, variables = "qsmk")
Estimate Std. Error z Pr(>|z|) S 2.5 % 97.5 %
3.52 0.44 7.99 <0.001 49.4 2.65 4.38
Term: qsmk
Type: response
Comparison: mean(1) - mean(0)
Columns: term, contrast, estimate, std.error, statistic, p.value, s.value, conf.low, conf.high, predicted_lo, predicted_hi, predicted
Clarify
The package clarify computes the ATE via standardisation and obtains confidence intervals by simulation. A useful quality of clarify’s simulation based inference is that it does not assume normality of the sampling distribution of the estimated quantity (unlike the delta method). This is useful for parameters with natural bounds like probabilities.
<- sim(outcome_mod)
sim_coefs
<- sim_ame(sim_coefs, var = "qsmk", contrast = "diff", verbose = FALSE)
sim_est
summary(sim_est)
Estimate 2.5 % 97.5 %
E[Y(0)] 1.76 1.34 2.19
E[Y(1)] 5.27 4.60 6.03
Diff 3.52 2.71 4.38
The confidence intervals produced by all three methods are similar.
Beyond the core identification conditions described above, there is another essential assumption for the parametric g-formula to be unbiased. That is that the conditional mean outcome model is correctly specified. Even if you have meausured all important confounders there may still be bias if you have misspecified the model (e.g., missed important interactions or non-linearities). The method described in the next section relies on a different modelling assumption - that is, that the model for treatment is correctly specified.
IPTW
The aim of inverse probability of treatment weighting (IPTW) is to create a pseudo-population in which treatment
For a binary treatment, the weights can be defined in terms of the propensity score (PS), which is the conditional probability of treatment given confounders
The steps for computing the ATE by IPTW are as follows:
- Fit a model to estimate the PS (often a logistic regression model for a binary treatment)
- Compute
as and for those who received treatment and those who were untreated, respectively - Estimate the ATE by fitting a weighted regression model for the outcome
Provided identifying conditions are satisfied, the model fitted to the pseudo-population created in step 3 has the form:
IPT weighting ‘by hand’
The code below performs steps 1 and 2:
# fit model for treatment given confounders and compute weights
<-
treatment_mod glm(qsmk ~ sex + race + poly(age,2) + education +
poly(smokeintensity,2) + poly(smokeyrs,2) +
+ active + poly(wt71,2),
exercise family = binomial(), data = nhefs)
<- predict(treatment_mod, type = "response") # propensity score
ps
<- ifelse(nhefs$qsmk == 1,
ip_weights 1 / ps,
1 / (1 - ps))
Now we can fit a weighted regression model to estimate the ATE. Because IP weighting must be taken into account for standard errors to be correct, we use a method which provides robust ‘sandwich’ type standard errors (bootstrapping is an another option).
# fit weighted model (step 3)
<- lm_robust(wt82_71 ~ qsmk,
msm data = nhefs,
weights = ip_weights)
summary(msm)
Call:
lm_robust(formula = wt82_71 ~ qsmk, data = nhefs, weights = ip_weights)
Weighted, Standard error type: HC2
Coefficients:
Estimate Std. Error t value Pr(>|t|) CI Lower CI Upper DF
(Intercept) 1.780 0.2248 7.917 4.582e-15 1.339 2.221 1564
qsmk 3.441 0.5265 6.535 8.574e-11 2.408 4.473 1564
Multiple R-squared: 0.0435 , Adjusted R-squared: 0.04289
F-statistic: 42.71 on 1 and 1564 DF, p-value: 8.574e-11
The IPTW estimate for smoking cessation is a 3.44 kg weight gain, with a 95% CI ranging from 2.4 to 4.5. Very similar to that obtained by the parametric g-formula. This is a reassuring sign as these two methods rely on different modeling assumptions.
It is worth noting that the IPTW ATE can be computed by an estimator which is very similar (though not quite identical) to the weighted regression above. This is the Horvitz-Thompson estimator:
R packages for IPTW
A particularly useful package for estimating the ATE via IPTW is the WeightIt package.
Though several methods are available for estimating weights, including machine learning algorithms and several other balancing methods which I do not presently understand, we’ll use logistic regression for consistency with the “by hand” approach:
<-
ip_weights weightit(qsmk ~ sex + race + poly(age,2) + education +
poly(smokeintensity,2) + poly(smokeyrs,2) +
+ active + poly(wt71,2),
exercise method = "glm", data = nhefs,
estimand = "ATE")
As discussed by Austin & Stuart (2015), it is best practise to assess that the estimated weights are working as they should and that measured confounders are balanced across levels of treatment in the pseudopopulation produced by the weights. This can be easily checked using The bal.tab function of the cobalt package.
We’ll use cobalt to assess balance in terms of standardised mean differences for continuous variables and proportion differences for binary variables. We’ll also compute the Kolmogorov–Smirnov statistic, as Austin and Stuart recommend.
bal.tab(ip_weights,
stats = c("m", "ks"))
Balance Measures
Type Diff.Adj KS.Adj
prop.score Distance 0.0098 0.0425
sex Binary -0.0014 0.0014
race Binary 0.0023 0.0023
poly(age, 2)1 Contin. 0.0058 0.0293
poly(age, 2)2 Contin. 0.0034 0.0340
education_1 Binary -0.0102 0.0102
education_2 Binary 0.0010 0.0010
education_3 Binary 0.0020 0.0020
education_4 Binary 0.0070 0.0070
education_5 Binary 0.0003 0.0003
poly(smokeintensity, 2)1 Contin. -0.0241 0.0310
poly(smokeintensity, 2)2 Contin. -0.0127 0.0307
poly(smokeyrs, 2)1 Contin. -0.0035 0.0428
poly(smokeyrs, 2)2 Contin. 0.0109 0.0468
exercise_0 Binary -0.0046 0.0046
exercise_1 Binary 0.0182 0.0182
exercise_2 Binary -0.0136 0.0136
active_0 Binary -0.0089 0.0089
active_1 Binary 0.0141 0.0141
active_2 Binary -0.0051 0.0051
poly(wt71, 2)1 Contin. -0.0090 0.0358
poly(wt71, 2)2 Contin. 0.0039 0.0515
Effective sample sizes
Control Treated
Unadjusted 1163. 403.
Adjusted 1128.61 325.97
All the confounders are well balanced.
Now that balance has been checked, a weighted regression model can be fitted in the same manner as before.
<- lm_robust(wt82_71 ~ qsmk,
msm data = nhefs,
weights = ip_weights$weights)
summary(msm)
Call:
lm_robust(formula = wt82_71 ~ qsmk, data = nhefs, weights = ip_weights$weights)
Weighted, Standard error type: HC2
Coefficients:
Estimate Std. Error t value Pr(>|t|) CI Lower CI Upper DF
(Intercept) 1.780 0.2248 7.917 4.582e-15 1.339 2.221 1564
qsmk 3.441 0.5265 6.535 8.574e-11 2.408 4.473 1564
Multiple R-squared: 0.0435 , Adjusted R-squared: 0.04289
F-statistic: 42.71 on 1 and 1564 DF, p-value: 8.574e-11
Doubly robust estimation
Key assumptions of the parametric g-formula and IPTW estimators are that the models for the outcome and treatment, respectively, are correctly specified. Doubly robust (DR) estimators, on the other hand, combine models for both treatment and the outcome and by doing so, give the analyst two chances to get the model right. That is, a DR estimator will be consistent (in the statistical sense of converging to the true parameter as
While several DR estimators have been proposed, we will focus on augmented inverse probability weighting (AIPW) and targeted maximum likelihood estimation (TMLE), prominent examples which are well supported by current software.
AIPW
The AIPTW estimator is a doubly robust estimator of the ATE which, as the name suggests, “augments” the IPTW estimator with additional terms which improve efficiency and provide double robustness. Before covering AIPW in more detail, it is helpful to return to the previously mentioned IPTW estimator of the ATE:
Where
The parts on the left side of each set of big brackets are identical to the IPTW estimator above. The parts on the right are the augmentation components. These components provide the double robustness. That is, so long as either the model for treatment or the outcome are correctly specified, the estimator will be consistent (a proof of this property is given in Funk et al., (2011)).
AIPW ‘by hand’
All of the required components for applying AIPW, the propensity score and conditional mean of the outcome under treated and untreated, were estimated in previous steps. Nonetheless, we’ll estimate them again for sake of completeness.
First, fit the treatment and outcome models and compute
<-
outcome_mod lm(wt82_71 ~ qsmk + sex + race + poly(age,2) + education +
poly(smokeintensity,2) + poly(smokeyrs,2) +
+ active + poly(wt71,2) + qsmk:smokeintensity,
exercise data = nhefs)
.0 <- predict(outcome_mod,
fittednewdata = nhefs |> mutate(qsmk = 0))
.1 <- predict(outcome_mod,
fittednewdata = nhefs |> mutate(qsmk = 1))
<-
treatment_mod glm(qsmk ~ sex + race + poly(age,2) + education +
poly(smokeintensity,2) + poly(smokeyrs,2) +
+ active + poly(wt71,2),
exercise family = binomial(), data = nhefs)
<- predict(treatment_mod, type = "response") # propensity score ps
Secondly, compute the AIPW ATE:
<- function(A, Y, PS, Y_1, Y_0){
aipw <- mean((((A*Y)/PS) - ((A - PS)/PS) * Y_1) -
out 1-A)*Y)/(1-PS)) - ((A - PS)/(1-PS)) * Y_0))
((((
out
}
<- aipw(nhefs$qsmk, nhefs$wt82_71,
aipw_out .1, fitted.0)
ps, fitted
paste("AIPW ATE =", round(aipw_out,2))
[1] "AIPW ATE = 3.43"
Confidence intervals can be obtained by bootstrapping or based on the efficient influence curve.
AIPW using SuperLearner
Previously, we have estimated causal effects using the standard parametric models linear and logistic regression (this was by no means a requirement, the g-formula and IPTW are compatible with other choices but these were chosen for ease of demonstration). A problem with these simple parametric models is that they tend to impose restrictions on the manner in which covariates are related to the treatment or outcome. For instance, they often assume there are no or few interactions and that continuous variables are linearly related to the outcome/treatment. Consequently, some amount of model mispecification is very likely in real settings. By contrast, highly flexible machine learning models impose few such restrictions and may be less prone to misspecification related bias.
To venture into the world of machine learning, we will apply the AIPW estimator using the R package AIPW. AIPW makes use of the SuperLearner algorithm to estimate
To estimate the effect of smoking cessation, we will use AIPW with generalised linear models, penalised maximum likelihood (glmnet), random forests, and Bayesian adaptive regression trees (BART) included in the SuperLearner algorithm. Standard errors are computed based on the efficient influence curve.
#! cache: true
<-
confounds select(nhefs, sex, race, age, education,
smokeintensity, smokeyrs, exercise, active, wt71)
<- c("SL.glm","SL.glmnet",
SL.libs "SL.ranger","SL.bartMachine")
<- AIPW$new(Y=nhefs$wt82_71,
aipw_sl A=nhefs$qsmk,
W.Q=confounds,
W.g=confounds,
Q.SL.library=SL.libs,
g.SL.library=SL.libs,
k_split=5,
verbose=FALSE)
$fit()
aipw_sl
$summary() aipw_sl
$result aipw_sl
Estimate SE 95% LCL 95% UCL N
Risk of exposure 4.907530 0.6708074 3.592747 6.222312 403
Risk of control 1.751597 0.2178300 1.324650 2.178543 1163
Risk Difference 3.155933 0.6989296 1.786031 4.525835 1566
The results are very similar to those obtained by the previous methods.
TMLE
TMLE is a doubly robust procedure for the estimation of causal effects which is similar to AIPW. Indeed, the two are asymptotically equivalent. In practical settings, however, TMLE can sometimes outperform AIPW as it is less sensitive to the extreme weights which can occur under random nonpositivity.
Applying the TMLE estimator proceeds like so:
- Fit a model for the the outcome, including treatment and confounders, to obtain
and . The predicted value of the outcome under treatment given and treatment not given, respectively. - Fit a model for treatment and estimate the propensity score
- Compute the ‘clever covariates’
and which are equal to and , respectively - Fit a new model for the residuals of
from the outcome model including clever covariates and - ‘Update’ the estimation of
, and using an expression which incorprates the estimated coefficients for the clever covariates and . That is, - Compute the ATE based on the updated estimates
, and
The reason that
TMLE ‘by hand’
First, we fit a model for the outcome and compute the desired predictions:
<-
outcome_mod lm(wt82_71 ~ qsmk + sex + race + poly(age,2) + education +
poly(smokeintensity,2) + poly(smokeyrs,2) +
+ active + poly(wt71,2) + qsmk:smokeintensity,
exercise data = nhefs)
.0 <- predict(outcome_mod,
fittednewdata = mutate(nhefs, qsmk = 0))
.1 <- predict(outcome_mod,
fittednewdata = mutate(nhefs, qsmk = 1))
Then we fit a model for treatment, obtain the propensity score and compute the clever covariates.
<-
treatment_mod glm(qsmk ~ sex + race + poly(age,2) + education +
poly(smokeintensity,2) + poly(smokeyrs,2) +
+ active + poly(wt71,2),
exercise family = binomial(), data = nhefs)
<- predict(treatment_mod, type = "response") # propensity score
ps
<- (1-nhefs$qsmk)/(1-ps)
H0 <- nhefs$qsmk / ps H1
Then, fit the model to the residuals of Y including clever covariates:
<- residuals(outcome_mod)
resid
<- lm(resid ~ H0 + H1) resid_mod
Finally, estimate
1.new <- fitted.1 + (coef(resid_mod)["H1"]/ps)
fitted.0.new <- fitted.0 + (coef(resid_mod)["H0"]/(1-ps))
fitted.
<- mean(fitted.1.new - fitted.0.new)
ATE_out
paste("TMLE ATE =", round(ATE_out, 2))
[1] "TMLE ATE = 3.46"
Like AIPW, confidence intervals could be obtained based on the efficient influence curve or by bootstrapping. See Luque‐Fernandez et al (2017) for a demonstration of the former.
TMLE using SuperLearner
For a very final demonstration, we will estimate the TMLE ATE using the R package tmle. Again, we will use SuperLearner as the prediction algorithm, using the same set of libraries as for AIPW.
#! cache: true
<- c("SL.glm","SL.glmnet",
SL.libs "SL.ranger","SL.bartMachine")
<- tmle(Y = nhefs$wt82_71,
tmle.out A = nhefs$qsmk,
W = confounds,
family = "gaussian",
V.Q = 5,
V.g = 5,
Q.SL.library = SL.libs,
g.SL.library = SL.libs)
tmle.out
Marginal mean under treatment (EY1)
Parameter Estimate: 5.1703
Estimated Variance: 0.17413
p-value: <2e-16
95% Conf Interval: (4.3524, 5.9882)
Marginal mean under comparator (EY0)
Parameter Estimate: 1.7488
Estimated Variance: 0.046878
p-value: 6.6289e-16
95% Conf Interval: (1.3244, 2.1732)
Additive Effect
Parameter Estimate: 3.4215
Estimated Variance: 0.21337
p-value: 1.2912e-13
95% Conf Interval: (2.5161, 4.3268)
Additive Effect among the Treated
Parameter Estimate: 3.4633
Estimated Variance: 0.21727
p-value: 1.0855e-13
95% Conf Interval: (2.5497, 4.3769)
Additive Effect among the Controls
Parameter Estimate: 3.3705
Estimated Variance: 0.21767
p-value: 5.0397e-13
95% Conf Interval: (2.4561, 4.2849)
Again, results are very similar.
Conclusion
In summary, here are a few of what I think are a few key takeaways:
- Even when conditions for causal identification are met, unbiased estimation of causal effects requires correct model specification
- The parametric g-formula and IPTW can be straightforwardly applied using tools most researchers are already familiar with
- Doubly robust estimators are straightforward extensions of the g-formula and IPTW and, as they readily allow for the incorporation of machine learning libraries, should generally be preferred over singly robust alternatives
In part 2, I will cover the case of time varying treatments and demonstrate how g-methods and their doubly robust extensions become essential in this setting.
References
Hernán, M. A., & Taubman, S. L. (2008). Does obesity shorten life? The importance of well-defined interventions to answer causal questions. International journal of obesity, 32(3), S8-S14.
Hernán, M. A., & Robins, J. M. (2010). Causal inference.
Arel-Bundock V (2023). marginaleffects: Predictions, Comparisons, Slopes, Marginal Means, and Hypothesis Tests. R package version 0.9.0, https://vincentarelbundock.github.io/marginaleffects/.
Greifer N, Worthington S, Iacus S, King G (2023). clarify: Simulation-Based Inference for Regression Models. https://github.com/iqss/clarify, https://iqss.github.io/clarify/.
Austin, P. C., & Stuart, E. A. (2015). Moving towards best practice when using inverse probability of treatment weighting (IPTW) using the propensity score to estimate causal treatment effects in observational studies. Statistics in medicine, 34(28), 3661-3679.
Greifer N (2023). WeightIt: Weighting for Covariate Balance in Observational Studies. https://ngreifer.github.io/WeightIt/, https://github.com/ngreifer/WeightIt.
Greifer N (2023). cobalt: Covariate Balance Tables and Plots. https://ngreifer.github.io/cobalt/, https://github.com/ngreifer/cobalt.
Michele Jonsson Funk and others, Doubly Robust Estimation of Causal Effects, American Journal of Epidemiology, Volume 173, Issue 7, 1 April 2011, Pages 761–767, https://doi.org/10.1093/aje/kwq439
Yongqi Zhong and others, AIPW: An R Package for Augmented Inverse Probability–Weighted Estimation of Average Causal Effects, American Journal of Epidemiology, Volume 190, Issue 12, December 2021, Pages 2690–2699, https://doi.org/10.1093/aje/kwab207
Van der Laan, M. J., Polley, E. C., & Hubbard, A. E. (2007). Super learner. Statistical applications in genetics and molecular biology, 6(1).
Luque‐Fernandez, M. A., Schomaker, M., Rachet, B., & Schnitzer, M. E. (2018). Targeted maximum likelihood estimation for a binary treatment: A tutorial. Statistics in medicine, 37(16), 2530-2546.