Benz, L., Mukherjee, R., Wang, R., Arterburn, D., Fischer, H., Lee, C., Shortreed, S.M., Haneuse, S., and Levis, A.W. "Robust Causal Inference for EHR-based Studies of Point Exposures with Missingness in Eligibility Criteria" Under Review (Pre-Print)
- helpers.R: File of helper functions
Folder of scripts used to clean and process EHR data for use in data application
- rygb_vsg_data_prep.R: This script is used to prep analysis dataset(s) for data application presented in the paper. It calls several raw EHR files which are not directly sharable due to data use agreements with Kaiser Permanente. Nevertheless, this script is commented with specific details on how the underlying cohort was created including application of the eligibility criteria to the entire cohort across all 40 operationalizations.
- surgical_px_cleaning.R Clean some chart review for surgical procedure types and correct a few cases that were incorrectly tagged in the original EHR files.
Folder of scripts used for data application analysis. For each of the two outcomes examined, there is one script that fits each of the four estimators explored in this work. When used, specific functions in each script are commented with descriptions of input and output. Given that the underlying EHR data can not be shared, this example can not be reproduced locally. For a detailed reproducible example, please refer to the [OMOP Worked Example]
Weight Change Analysis
-
fit_CC_outcome_regression_estimator.R: Naive ATT analysis (
$\hat\theta_\text{CC}$ ) for weight change outcome -
fit_iwor_estimator.R: IWOR ATT analysis (with
$\hat\theta_\text{IWOR}$ ) for weight change outcome -
fit_IF_estimator.R: IF ATT analysis (with
$\hat\theta_\text{IF}$ ) for weight change outcome -
fit_EIF_estimator.R: EIF ATT analysis (with
$\hat\theta_\text{EIF}$ ) for weight change outcome
T2DM Remission Analysis
-
fit_CC_outcome_remission: Naive ATT analysis (
$\hat\theta_\text{CC}$ ) for diabetes remission outcome -
fit_iwor_estimato_remission.R: IWOR ATT analysis (with
$\hat\theta_\text{IWOR}$ ) for diabetes remission outcome -
fit_IF_estimator_remission.R: IF ATT analysis (with
$\hat\theta_\text{IF}$ ) for diabetes remission outcome -
fit_EIF_estimator_remission.R: EIF ATT analysis (with
$\hat\theta_\text{EIF}$ ) for diabetes remission outcome
Figures
- diabetes_figure.R: Plot of diabetes figure showing frequency of certain measurements for select surgical patients (Generates Figure 1)
- elig_figures.R: Plot eligibility distributions (Generates Figure 2 and S2)
- measure_times_figure.R: Plots distribution of time between date of surgery and most recent measure of BMI/A1c (Generates Figure 3)
- plot_nuisances.R: Plotting code for distributions of nuisance functions (Generates Figures 4, S3, and S4)
- plot_results.R: Plot point estimates and 95% confidence intervals (Generates Figure 4)
Given that the underlying EHR data can not be shared, the example presented in the main text can not be reproduced locally. For a detailed reproducible example, we have created a working example based on OMOP-CDM formated data. In particular, we use the omock package to create a synthetic dataset based on OMOP standards which can be shared. We then demonstrate how to turn this data into an analytical dataset to be analyzed by
-
build_omop_example.R: This script creates a synthetic EHR dataset using the
omockpackage. It then illustrates how to clean this example dataset and prepare the dataset for analysis by our EIF-based estimator,$\widehat\theta_\text{EIF}$ . The final output of this script is the datasetscripts/worked_omop_example/analysis_dataset.csv. -
EIF_omop_example.R: This script contains a documented function
eif_estimatorwhich implements$\widehat\theta_\text{EIF}$ . The script loads in the prepared datasetscripts/worked_omop_example/analysis_dataset.csvand applieseif_estimatorto that worked data example.
Folder of scripts for a setting where time zero (
-
estimators.R: Script contains functions which implement each of the four estimators explored in this work, in the context of the simulation study. Specific parameters used in simulations and estimator instructions are downloadable in
simulations/aligned_t0/inputs. -
generate_data.R: Script contains function
generate_datato generate simulated datasets given a list of instructions. - inform_sims.R: Exploratory analysis to guide range of models for consideration in simulated datasets. This script is how the coefficient values used in generating simulated datasets (those in Table S2) were chosen.
- run_simulation.R: This script calls all functions to generate and analyze simulated datasets. In other words, this is the main simulation wrapper which controls the simulations.
-
pnp.R: Script to examine how well
$\mu_0$ is calibrated in single simulated dataset between correctly specified parametric model and 6 non-parametric models based onSuperLearner(Generates Figure S1) -
specify_inputs.R: Script which specifies simulation parameters and estimators for consideration. This script is specifically used to generate
paramsused bygenerate_datafunction in generate_data.R. - latex_tables.R: Generates all tables for describing simulation results (Table S1) and parameters (Table S2).
- Simulation inputs + results
- Data application results
Figures saved out from various analyses
Tables saved out from various analyses
.sh files for batch jobs on the cluster
- aligned_t0_sims_loop.sh: SBATCH job file for running simulations for comnination of estimator/simulation parameters
- run_aligned_t0_loop.sh: Wrapper for fully 2-D job array for aligned_t0_sims_loop.sh .
- data_application.sh: Wrapper for submitting all the jobs for the data application, in the application/ sub-directory.